History log of /freebsd-10-stable/sys/amd64/amd64/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
338756 18-Sep-2018 dim

MFC r309748 (by glebius):

Treat R_X86_64_PLT32 relocs as R_X86_64_PC32.

If we load a binary that is designed to be a library, it produces
relocatable code via assembler directives in the assembly itself
(rather than compiler options). This emits R_X86_64_PLT32 relocations,
which are not handled by the kernel linker.

Submitted by: gallatin
Reviewed by: kib
PR: 231451

337245 03-Aug-2018 kib

MFC r336683:
Extend ranges of the critical sections to ensure that context switch
code never sees FPU pcb flags not consistent with the hardware state.

335455 20-Jun-2018 kib

MFC r335072, r335089, r335131, r335132:
Enable eager FPU context switch on i386 and amd64.

CVE: CVE-2018-3665
Tested by: emaste (smoke boot)

333370 08-May-2018 emaste

MFC r333368: Prepare DB# handler for deferred trigger of watchpoints.

Prepare DB# handler for deferred trigger of watchpoints.

Since pop %ss/mov %ss instructions defer all interrupts and exceptions
for the next instruction, it is possible that the userspace watchpoint
trap executes on the first instruction of the kernel entry for
syscall/bpt.

In this case, DB# should be treated similarly to NMI: on amd64 we must
always load GSBASE even if the trap comes from kernel mode, and load
the kernel page table root into %cr3. Moreover, the trap must
use the dedicated stack, because we are still on the user stack when
trapped on syscall entry.

For i386, we must reload %cr3. The syscall instruction is not configured,
so there is no issue with executing on user stack when trapping.

Due to some CPU erratas it is not always possible to detect that the
userspace watchpoint triggered by inspecting %dr6. In trap(), compare the
trap %rip with the known unsafe entry points and if matched pretend that
the watchpoint did not fire at all.

Thank you to the MSRC Incident Response Team, and in particular Greg
Lenti and Nate Warfield, for coordinating the response to this issue
across multiple vendors.

Thanks to Computer Recycling at The Working Center of Kitchener for
making hardware available to allow us to test the patch on additional
CPU families.

Reviewed by: jhb
Discussed with: Matthew Dillon
Tested by: emaste
Security: CVE-2018-8897
Security: FreeBSD-SA-18:06.debugreg
Sponsored by: The FreeBSD Foundation

333205 03-May-2018 avg

MFC r332752: set kdb_why to "trap" when calling kdb_trap from trap_fatal

This will allow to hook a ddb script to "kdb.enter.trap" event.
Previously there was no specific name for this event, so it could only
be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks.
Both are very unspecific.

Having a specific event is useful because the fatal trap condition is
very similar to panic but it has an additional property that the current
stack frame is the frame where the trap occurred. So, both a register
dump and a stack bottom dump have additional information that can help
analyze the problem.

I have added the event only on architectures that have trap_fatal()
function defined. I haven't looked at other architectures. Their
maintainers can add support for the event later.

Sample script:
kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20

Note: changes to powerpc/aim/trap.c and powerpc/booke/trap.c are direct
changes.

Sponsored by: Panzura

333201 03-May-2018 avg

MFC r332730: don't check for kdb reentry in trap_fatal(), it's impossible

Sponsored by: Panzura

332759 19-Apr-2018 avg

MFC r331875: x86 cpu_reset: if failed to switch to BSP proceed to cpu_reset_real

332757 19-Apr-2018 avg

MFC r331874: x86 cpu_reset_proxy: no need to stop_cpus() the original processor

332069 05-Apr-2018 kib

MFC r331374:
Fixes for ptrace(PT_GETXSTATE_INFO) related to the padding in struct
ptrace_xstate_info).

331910 03-Apr-2018 avg

MFC r327056: Use resume_cpus() instead of restart_cpus() to resume from ACPI suspension.

The merge needed fixing because common x86 code changed by the original
commit is still split between i386 and amd64 in this branch.

330132 28-Feb-2018 jhb

MFC 328610: Ensure 'name' is not NULL before passing to strcmp().

This avoids a nested page fault when obtaining a stack trace in DDB if
the address from the first frame does not resolve to a known symbol.

325543 08-Nov-2017 kib

MFC r325270:
Consistently ensure that we do not load MXCSR with reserved bits set.

322523 14-Aug-2017 jkim

MFC: r322323

Split identify_cpu() into two functions for amd64 as we do for i386. This
fixes a regression introduced in r322205.

Approved by: re (marius)

322303 09-Aug-2017 kib

MFC r321919:
Do not call trapsignal() after handling usermode fault or interrupt,
when a signal is not intended to be sent.

322205 07-Aug-2017 jkim

MFC: r322076

Detect hypervisor early so that we set lower hz on it.
> Description of fields to fill in above: 76 columns --|
> PR: If and which Problem Report is related.
> Submitted by: If someone else sent in the change.
> Reported by: If someone else reported the issue.
> Reviewed by: If someone else reviewed your modification.
> Approved by: If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after: N [day[s]|week[s]|month[s]]. Request a reminder email.
> MFH: Ports tree branch name. Request approval for merge.
> Relnotes: Set to 'yes' for mention in release notes.
> Security: Vulnerability reference (one per line) or description.
> Sponsored by: If the change was sponsored by an organization.
> Differential Revision: https://reviews.freebsd.org/D### (*full* phabric URL needed).
> Empty fields above will be automatically removed.

_M .
M sys/amd64/amd64/machdep.c
M sys/amd64/include/md_var.h
M sys/i386/i386/machdep.c
M sys/i386/include/md_var.h
M sys/x86/x86/identcpu.c

322063 04-Aug-2017 marius

MFC: r290156, r318354

pmap_change_attr: Only fixup DMAP for DMAPed ranges

321363 22-Jul-2017 alc

MFC r320546
When "force" is specified to pmap_invalidate_cache_range(), the given
start address is not required to be page aligned. However, the loop
within pmap_invalidate_cache_range() that performs the actual cache
line invalidations requires that the starting address be truncated to
a multiple of the cache line size. This change corrects an error in
that truncation.

320432 28-Jun-2017 alc

MFC r314310
Refine the fix from r312954. Specifically, add a new PDE-only flag,
PG_PROMOTED, that indicates whether lingering 4KB page mappings might
need to be flushed on a PDE change that restricts or destroys a 2MB
page mapping. This flag allows the pmap to avoid range invalidations
that are both unnecessary and costly.

314845 07-Mar-2017 kib

MFC r314429:
Initialize pcb_save for thread0.

314667 04-Mar-2017 avg

MFC r283291: don't use CALLOUT_MPSAFE with callout_init()

The main purpose of this MFC is to reduce conflicts for other merges.
Parts of the original change have already "trickled down" via individual MFCs.


mp_watchdog.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c
/freebsd-10-stable/sys/cddl/dev/profile/profile.c
/freebsd-10-stable/sys/compat/ndis/subr_ntoskrnl.c
/freebsd-10-stable/sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
/freebsd-10-stable/sys/dev/altera/jtag_uart/altera_jtag_uart_tty.c
/freebsd-10-stable/sys/dev/ath/if_ath.c
/freebsd-10-stable/sys/dev/ce/if_ce.c
/freebsd-10-stable/sys/dev/cp/if_cp.c
/freebsd-10-stable/sys/dev/ctau/if_ct.c
/freebsd-10-stable/sys/dev/cx/if_cx.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_main.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_sge.c
/freebsd-10-stable/sys/dev/dcons/dcons_os.c
/freebsd-10-stable/sys/dev/drm2/drm_irq.c
/freebsd-10-stable/sys/dev/drm2/i915/intel_display.c
/freebsd-10-stable/sys/dev/glxsb/glxsb.c
/freebsd-10-stable/sys/dev/gxemul/cons/gxemul_cons.c
/freebsd-10-stable/sys/dev/hifn/hifn7751.c
/freebsd-10-stable/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
/freebsd-10-stable/sys/dev/if_ndis/if_ndis.c
/freebsd-10-stable/sys/dev/isci/isci_io_request.c
/freebsd-10-stable/sys/dev/mfi/mfi.c
/freebsd-10-stable/sys/dev/mwl/if_mwl.c
/freebsd-10-stable/sys/dev/nand/nandsim_chip.c
/freebsd-10-stable/sys/dev/ntb/ntb_hw/ntb_hw.c
/freebsd-10-stable/sys/dev/nxge/if_nxge.c
/freebsd-10-stable/sys/dev/oce/oce_if.c
/freebsd-10-stable/sys/dev/patm/if_patm_attach.c
/freebsd-10-stable/sys/dev/rndtest/rndtest.c
/freebsd-10-stable/sys/dev/safe/safe.c
/freebsd-10-stable/sys/dev/sound/midi/mpu401.c
/freebsd-10-stable/sys/dev/sound/pci/atiixp.c
/freebsd-10-stable/sys/dev/sound/pci/es137x.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdaa.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdac.c
/freebsd-10-stable/sys/dev/sound/pci/via8233.c
/freebsd-10-stable/sys/dev/twa/tw_osl_freebsd.c
/freebsd-10-stable/sys/dev/tws/tws.c
/freebsd-10-stable/sys/dev/ubsec/ubsec.c
/freebsd-10-stable/sys/dev/virtio/random/virtio_random.c
/freebsd-10-stable/sys/dev/xen/netfront/netfront.c
/freebsd-10-stable/sys/fs/nfs/nfs_commonport.c
/freebsd-10-stable/sys/gdb/gdb_cons.c
/freebsd-10-stable/sys/geom/gate/g_gate.c
/freebsd-10-stable/sys/geom/journal/g_journal.c
/freebsd-10-stable/sys/geom/mirror/g_mirror.c
/freebsd-10-stable/sys/geom/raid3/g_raid3.c
/freebsd-10-stable/sys/geom/sched/gs_rr.c
/freebsd-10-stable/sys/i386/i386/mp_watchdog.c
/freebsd-10-stable/sys/kern/init_main.c
/freebsd-10-stable/sys/kern/kern_synch.c
/freebsd-10-stable/sys/kern/kern_thread.c
/freebsd-10-stable/sys/kern/subr_vmem.c
/freebsd-10-stable/sys/kern/uipc_domain.c
/freebsd-10-stable/sys/mips/cavium/octe/ethernet.c
/freebsd-10-stable/sys/mips/cavium/octeon_rnd.c
/freebsd-10-stable/sys/mips/nlm/dev/net/xlpge.c
/freebsd-10-stable/sys/mips/rmi/dev/xlr/rge.c
/freebsd-10-stable/sys/net/if_spppsubr.c
/freebsd-10-stable/sys/net80211/ieee80211_ht.c
/freebsd-10-stable/sys/net80211/ieee80211_hwmp.c
/freebsd-10-stable/sys/net80211/ieee80211_mesh.c
/freebsd-10-stable/sys/net80211/ieee80211_node.c
/freebsd-10-stable/sys/net80211/ieee80211_proto.c
/freebsd-10-stable/sys/netgraph/netflow/ng_netflow.c
/freebsd-10-stable/sys/netgraph/netgraph.h
/freebsd-10-stable/sys/netinet/in_pcb.c
/freebsd-10-stable/sys/netinet/ip_mroute.c
/freebsd-10-stable/sys/netinet/tcp_hostcache.c
/freebsd-10-stable/sys/netinet/tcp_subr.c
/freebsd-10-stable/sys/netinet6/in6_rmx.c
/freebsd-10-stable/sys/netpfil/ipfw/ip_dummynet.c
/freebsd-10-stable/sys/netpfil/ipfw/ip_fw_dynamic.c
/freebsd-10-stable/sys/netpfil/pf/if_pfsync.c
/freebsd-10-stable/sys/ofed/include/linux/timer.h
/freebsd-10-stable/sys/ofed/include/linux/workqueue.h
/freebsd-10-stable/sys/powerpc/mambo/mambo_console.c
/freebsd-10-stable/sys/powerpc/pseries/phyp_console.c
/freebsd-10-stable/sys/sys/callout.h
/freebsd-10-stable/sys/vm/uma_core.c
/freebsd-10-stable/sys/x86/x86/mca.c
313256 05-Feb-2017 kib

MFC r312954:
Do not leave stale 4K TLB entries on pde (superpage) removal or
protection change.

313150 03-Feb-2017 kib

MFC r289894:
CLFLUSH does not need barriers, the instruction is ordered WRT other writes.
Use CLFLUSHOPT when available.

MFC r312555:
Use SFENCE for ordering CLFLUSHOPT.

310975 31-Dec-2016 mjg

MFC r303583:

amd64: implement pagezero using rep stos

The current implementation uses non-temporal writes. This turns out to
be detrimental to performance if the page is used shortly after, which
is the typical case with page faults.

Switch to rep stos.

310485 23-Dec-2016 jhb

MFC 308820,308821: Fixes for fatal page faults on x86.

308820:
Report page faults due to reserved bits in PTEs as a separate fault type.

Rather than reporting a page fault due to a bad PTE as a protection
violation with the "rsv" flag, treat these faults as a separate type of
fault altogether.

308821:
MFamd64: Various fatal page fault fixes.

- If a page fault is triggered due to reserved bits in a PTE, treat it
as a fatal fault and panic.
- If PG_NX is in use, report whether a fatal page fault is due to an
instruction fetch or a data access.
- If a fatal page fault is due to reserved bits in a PTE, report that as
the page fault type rather than a protection violation.

310364 21-Dec-2016 kib

MFC r310205:
Fix typo.

309426 02-Dec-2016 jhb

MFC 303753,308004: Add bounds checking on addresses used with /dev/mem.

303753:
Don't permit mappings of invalid physical addresses on amd64 via /dev/mem.

308004:
MFamd64: Add bounds checks on addresses used with /dev/mem.

Reject attempts to read from or memory map offsets in /dev/mem that are
beyond the maximum-supported physical address of the current CPU.

307998 27-Oct-2016 avg

MFC r305539: work around AMD erratum 793 for family 16h, models 00h-0Fh

307940 25-Oct-2016 glebius

Merge r307936:
The argument validation in r296956 was not enough to close all possible
overflows in sysarch(2).

Submitted by: Kun Yang <kun.yang chaitin.com>
Patch by: kib
Security: SA-16:15

306080 21-Sep-2016 kib

MFC r305939:
Remove trailing space.

304262 17-Aug-2016 kib

MFC r303913:
Unconditionally perform checks that FPU region was entered, when #NM
exception is caught in kernel mode.

304131 15-Aug-2016 avg

MFC r302835: fix-up for configuration of AMD Family 10h processors
borrowed from Linux

302041 21-Jun-2016 sephe

MFC 297931,298022

297931
Expose doreti as a global symbol on amd64 and i386.

doreti provides the common code path for returning from interrupt
andlers on x86. Exposing doreti as a global symbol allows kernel
modules to include low-level interrupt handlers instead of requiring
all low-level handlers to be statically compiled into the kernel.

Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: kib

298022
hyperv: Deprecate HYPERV option by moving Hyper-V IDT vector into vmbus

Submitted by: Jun Su <junsu microsoft com>
Reviewed by: jhb, kib, sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5910

302028 20-Jun-2016 kib

MFC r301853:
Do not access pv_table array for fictitious pages.

301835 12-Jun-2016 kib

MFC r301457:
Avoid spurious EINVAL in amd64 pmap_change_attr().

301428 05-Jun-2016 dchagin

MFC r300415:

Add macro to convert errno and use it when appropriate.

300441 23-May-2016 kib

MFC r300305, r300332:
Check for overflow and return EINVAL if detected. Use unsigned index.

300038 17-May-2016 avg

MFC r298737: fix up r300036

300036 17-May-2016 avg

MFC r298736: ensure that initial local apic id is sane on AMD 10h systems

299062 04-May-2016 avg

MFC r297857: re-enable AMD Topology extension on certain models if
disabled by BIOS

298653 26-Apr-2016 pfg

MFC r298482:
Cleanup redundant parenthesis from existing howmany()/roundup() macro uses.

Requested by: dchagin

298621 26-Apr-2016 avg

MFC r297846: [amd64] dtrace_invop handler is to be called only for
kernel exceptions

296957 16-Mar-2016 glebius

Merge r296956:

Due to invalid use of a signed intermediate value in the bounds checking
during argument validity verification, unbound zero'ing of the process LDT
and adjacent memory can be initiated from usermode.

Submitted by: CORE Security
Patch by: kib
Security: SA-16:15

296941 16-Mar-2016 kib

MFC r296908:
Force the desired alignment of the user save area.

296562 09-Mar-2016 kib

MFC r295966:
Return dst as the result from memcpy(9) on amd64.

PR: 207422

295148 02-Feb-2016 kib

MFC r294311:
Clear whole XMM register file instead of only XMM0. Also clear x87
registers. This brings amd64 on par with i386, providing consistent
initial FPU state.

PR: 206370

MFC r294312:
Use ANSI definitions. Wrap long line.

MFC r294313:
Adjust i386 comment to match amd64 one after r294311.

Approved by: re (gjb)

294683 24-Jan-2016 ian

MFC r293045, r293046:

Make the 'env' directive described in config(5) work on all architectures,
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.

Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly. Most startup code wasn't
doing so. Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.

Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.

The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values. Now if the size is zero, the provided buffer
full of existing env data is installed. A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.

Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp. A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data. Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.

Also, revert accidental change that snuck into r293045.

294136 16-Jan-2016 dchagin

MFC r293613:

Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall
instead of vdso. An upcoming linux_base-c6 needs it.

293581 09-Jan-2016 dchagin

MFC r283479:

The kernel sends signals to the processes via ABI specific sv_sendsig method.
Native ABI do not need signal conversion, only emulators may want this. Usually
emulators implements its own sv_sendsig method. For now only ibcs2 emulator does
not have own sv_sendsig implementation and depends on native sendsig() method.
So, remove any extra attempts to convert signal numbers from native sendsig()
methods except from i386 where ibsc2 is living.

293490 09-Jan-2016 dchagin

MFC r283382:

In preparation for switching linuxulator to the use the native 1:1
threads add a hook for cleaning thread resources before the thread die.

292551 21-Dec-2015 dim

MFC r277735 (by royger):

amd64: allow base memory segment to start at address different than 0

Current code requires that the first physical memory segment starts at 0,
but this is not really needed. We only need to make sure the bootstrap code
and page tables for APs are allocated below 4GB.

This patch removes this requirement and allows booting a Dell R710 from
UEFI, where the first physical memory segment starts at 0x10000.

Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D1417

292182 14-Dec-2015 kib

MFC r291948:
Use ANSI C definition.

290734 13-Nov-2015 jhb

MFC 284325:
Report the values of x86 segment registers to remote debuggers.

While here, also report %eflags from the i386 trapframe.

290731 13-Nov-2015 jhb

MFC 285783:
Various changes to the registers displayed in DDB for x86.
- Fix segment registers to only display the low 16 bits.
- Remove unused handlers and entries for the debug registers.
- Display xcr0 (if valid) in 'show sysregs'.
- Add '0x' prefix to MSR values to match other values in 'show sysregs'.
- MFamd64: Display various MSRs in 'show sysregs'.
- Add a 'show dbregs' to display the value of debug registers.
- Dynamically size the column width for register values to properly
align columns on 64-bit platforms.
- Display %gs for i386 in 'show registers'.

290730 12-Nov-2015 jhb

MFC 285773,285775,285776:
Various fixes for stack unwinding in DDB on x86.

285773:
Remove some dead code from DDB's amd64 stack unwinder.

The amd64 port copied some code from i386 to fetch function arguments and
display them in backtraces. However, it was commented out and can't easily
be implemented since the function arguments are passed in
registers rather than on the stack in amd64. Remove it in preparation for
some bug fixes in this area.

285775:
Improve stack unwinding on i386 and amd64 after an IP fault.

If we can't find a symbol corresponding to the faulting instruction, assume
that the previously-executed function is a call and attempt to find the
calling function using the return address on the stack. Otherwise we end
up associating the last stack frame with the current call, which is
incorrect and causes the unwinder to skip printing of the calling function,
resulting in a confusing backtrace.

285776:
Let the unwinder handle faults during function prologues or epilogues.

The i386 and amd64 DDB stack unwinders contain code to detect and handle
the case where the first frame is not completely set up or torn down. This
code was accidentally unused however, since db_backtrace() was never called
with a non-NULL trap frame. This change fixes that.

Also remove get_rsp() from the amd64 code. It appears to have come from
i386, which needs to take into account whether the exception triggered a
CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64,
SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for
kernel-mode exceptions. As a result, we can also remove custom print
functions for these registers.

288287 27-Sep-2015 kib

MFC r288000:
Add support for weak symbols to the kernel linkers.

287945 17-Sep-2015 rstone

MFC r280957

Fix integer truncation bug in malloc(9)

A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int. This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc. zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

287128 25-Aug-2015 marcel

MFC r286808, r286809, r286867, r286868

- Improve support for Macs that have a stride not equal to the
horizonal resolution (width).
- Support frame buffers that are larger than the default screen
size.
- Support large frame buffers: add 24 more page table pages we
allocate on boot-up.

PR: 193745

287126 25-Aug-2015 marcel

MFC r286667 & r286723

Better support memory mapped console devices, such as VGA and EFI
frame buffers and memory mapped UARTs.

PR: 191564, 194952, 202276

286852 17-Aug-2015 kib

MFC r286228:
Clear the IA32_MISC_ENABLE MSR bit on APs.

286396 07-Aug-2015 kib

MFC r285643:
When checking for the valid value of the frame pointer, verify that it
belongs to the kernel stack address range for the thread.

286311 05-Aug-2015 kib

Implement x86 ptrace(2) requests PT_{GET,SET}{FS,GS}BASE.

MFC r284918:
Add helper fill_based_sd(9).

MFC r284919:
Add x86 PT_GETFSBASE, PT_GETGSBASE machine-depended ptrace requests to
obtain the thread %fs and %gs bases. Add x86 PT_SETFSBASE and
PT_SETGSBASE requests to set the bases from debuggers. The set
requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE), override
the corresponding segment registers.

MFC r284965:
Document x86 machine-specific ptrace(2) requests.

MFC r285011:
Disallow a debugger on 64bit system to set fs/gs bases of the 32bit
process beyond the end of the process address space.

MFC r285104:
Grammar and language fixes.

286308 05-Aug-2015 kib

MFC r284921:
pcb_gs32sd is unused for long time, remove it. Keep the padding in pcb.

284338 13-Jun-2015 kib

MFC r284104:
Updates from SDM rev. 55.

284164 08-Jun-2015 dim

MFC r283870:

Remove unneeded NULL checks in amd64's trap_fatal().

Since td_name is an array member of struct thread, it can never be NULL,
so the check can be removed. In addition, curproc can never be NULL,
so remove the if statement, and splice the two printfs() together.

While here, remove the u_long cast, and use the correct printf format
specifier curproc->p_pid.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D2695

284021 05-Jun-2015 kib

MFC r283735:
Remove several write-only variables.

283280 22-May-2015 whu

MFC r282212:

Microsoft vmbus, storage and other related driver enhancements for HyperV.
- Vmbus multi channel support.
- Vector interrupt support.
- Signal optimization.
- Storvsc driver performance improvement.
- Scatter and gather support for storvsc driver.
- Minor bug fix for KVP driver.
Thanks royger, jhb and delphij from FreeBSD community for the reviews
and comments. Also thanks Hovy Xu from NetApp for the contributions to
the storvsc driver.

PR: 195238
Submitted by: whu
Reviewed by: royger
Approved by: royger
Relnotes: yes
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D2575

283262 21-May-2015 emaste

MFC r258431: Disable amd64 boot time memory test by default

The page presence memory test takes a long time on large memory systems
and has little value on contemporary amd64 hardware.

Relnotes: Yes
Reviewed by: jhb, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1544

282811 12-May-2015 kib

MFC r282680:
Remove unused define.

282066 27-Apr-2015 kib

MFC r281762:
Remove duplicate definitions of MWAIT_CX hints. Identical defines in
specialreg.h are enough.

281560 15-Apr-2015 jhb

MFC 278325,280866:
Revert the IPI startup sequence to match what is described in the
Intel Multiprocessor Specification v1.4. The Intel SDM claims that

278325:
Revert the IPI startup sequence to match what is described in the
Intel Multiprocessor Specification v1.4. The Intel SDM claims that
the INIT IPIs here are invalid, but other systems follow the MP
spec instead.

While here, fix the IPI wait routine to accept a timeout in microseconds
instead of a raw spin count, and don't spin forever during AP startup.
Instead, panic if a STARTUP IPI is not delivered after 20 us.

280866:
Wait 100 microseconds for a local APIC to dispatch each startup-related IPI
rather than 20. The MP 1.4 specification states in Appendix B.2:

"A period of 20 microseconds should be sufficient for IPI dispatch to
complete under normal operating conditions".

(Note that this appears to be separate from the 10 millisecond (INIT) and
200 microsecond (STARTUP) waits after the IPIs are dispatched.) The
Intel SDM is silent on this issue as far as I can tell.

At least some hardware requires 60 microseconds as noted in the PR, so
bump this to 100 to be on the safe side.

PR: 196542, 197756

280973 02-Apr-2015 jhb

MFC 276724:
On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
may also halt in C2 and not just C3 (it seems that in some cases the BIOS
advertises its C3 state as a C2 state in _CST). Just play it safe and
disable both C2 and C3 states if a user forces the use of the TSC as the
timecounter on such CPUs.

PR: 192316

280876 31-Mar-2015 kib

MFC r280781:
Make it possible for the signal handler to act on #ss. Load the
canonical user data segment' selector into %ss when calling the
handler.

280875 31-Mar-2015 kib

MFC r280780:
The #ss fault handler erronously does not check for the fault
originated from the return to usermode. #ss must be handled same as
#np.

280272 19-Mar-2015 markj

MFC r278655:
Add support for decoding multibyte NOPs.

280258 19-Mar-2015 rwatson

Merge r263233 from HEAD to stable/10:

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

Sponsored by: Google, Inc.


sys_machdep.c
/freebsd-10-stable/sys/amd64/linux32/linux32_machdep.c
/freebsd-10-stable/sys/arm/arm/sys_machdep.c
/freebsd-10-stable/sys/cam/ctl/ctl_frontend_iscsi.c
/freebsd-10-stable/sys/cddl/compat/opensolaris/sys/file.h
/freebsd-10-stable/sys/compat/freebsd32/freebsd32_capability.c
/freebsd-10-stable/sys/compat/freebsd32/freebsd32_ioctl.c
/freebsd-10-stable/sys/compat/freebsd32/freebsd32_misc.c
/freebsd-10-stable/sys/compat/linux/linux_file.c
/freebsd-10-stable/sys/compat/linux/linux_ioctl.c
/freebsd-10-stable/sys/compat/linux/linux_socket.c
/freebsd-10-stable/sys/compat/svr4/svr4_fcntl.c
/freebsd-10-stable/sys/compat/svr4/svr4_filio.c
/freebsd-10-stable/sys/compat/svr4/svr4_ioctl.c
/freebsd-10-stable/sys/compat/svr4/svr4_misc.c
/freebsd-10-stable/sys/compat/svr4/svr4_stream.c
/freebsd-10-stable/sys/dev/aac/aac_linux.c
/freebsd-10-stable/sys/dev/aacraid/aacraid_linux.c
/freebsd-10-stable/sys/dev/amr/amr_linux.c
/freebsd-10-stable/sys/dev/filemon/filemon.c
/freebsd-10-stable/sys/dev/hwpmc/hwpmc_logging.c
/freebsd-10-stable/sys/dev/ipmi/ipmi_linux.c
/freebsd-10-stable/sys/dev/iscsi/icl.c
/freebsd-10-stable/sys/dev/iscsi/icl_proxy.c
/freebsd-10-stable/sys/dev/iscsi_initiator/iscsi.c
/freebsd-10-stable/sys/dev/mfi/mfi_linux.c
/freebsd-10-stable/sys/dev/tdfx/tdfx_linux.c
/freebsd-10-stable/sys/fs/fdescfs/fdesc_vnops.c
/freebsd-10-stable/sys/fs/fuse/fuse_vfsops.c
/freebsd-10-stable/sys/fs/nfsclient/nfs_clport.c
/freebsd-10-stable/sys/fs/nfsserver/nfs_nfsdport.c
/freebsd-10-stable/sys/i386/i386/sys_machdep.c
/freebsd-10-stable/sys/i386/ibcs2/ibcs2_fcntl.c
/freebsd-10-stable/sys/i386/ibcs2/ibcs2_ioctl.c
/freebsd-10-stable/sys/i386/ibcs2/ibcs2_misc.c
/freebsd-10-stable/sys/i386/linux/linux_machdep.c
/freebsd-10-stable/sys/kern/imgact_elf.c
/freebsd-10-stable/sys/kern/kern_descrip.c
/freebsd-10-stable/sys/kern/kern_event.c
/freebsd-10-stable/sys/kern/kern_exec.c
/freebsd-10-stable/sys/kern/kern_exit.c
/freebsd-10-stable/sys/kern/kern_ktrace.c
/freebsd-10-stable/sys/kern/kern_sig.c
/freebsd-10-stable/sys/kern/kern_sysctl.c
/freebsd-10-stable/sys/kern/subr_capability.c
/freebsd-10-stable/sys/kern/subr_syscall.c
/freebsd-10-stable/sys/kern/subr_trap.c
/freebsd-10-stable/sys/kern/sys_capability.c
/freebsd-10-stable/sys/kern/sys_generic.c
/freebsd-10-stable/sys/kern/sys_procdesc.c
/freebsd-10-stable/sys/kern/tty.c
/freebsd-10-stable/sys/kern/uipc_mqueue.c
/freebsd-10-stable/sys/kern/uipc_sem.c
/freebsd-10-stable/sys/kern/uipc_shm.c
/freebsd-10-stable/sys/kern/uipc_syscalls.c
/freebsd-10-stable/sys/kern/uipc_usrreq.c
/freebsd-10-stable/sys/kern/vfs_acl.c
/freebsd-10-stable/sys/kern/vfs_aio.c
/freebsd-10-stable/sys/kern/vfs_extattr.c
/freebsd-10-stable/sys/kern/vfs_lookup.c
/freebsd-10-stable/sys/kern/vfs_syscalls.c
/freebsd-10-stable/sys/netsmb/smb_dev.c
/freebsd-10-stable/sys/nfsserver/nfs_srvkrpc.c
/freebsd-10-stable/sys/security/mac/mac_syscalls.c
/freebsd-10-stable/sys/sparc64/sparc64/sys_machdep.c
/freebsd-10-stable/sys/ufs/ffs/ffs_alloc.c
/freebsd-10-stable/sys/vm/vm_mmap.c
279921 12-Mar-2015 jhb

MFC 277713:
If the boot-time memory test is enabled, output a dot ('.') for
each GB of RAM tested so people watching the console can see that
the machine is making progress and not hung.

PR: 196650

279211 23-Feb-2015 jhb

MFC 274817,274878,276801,276840,278976:
Improve support for XSAVE with debuggers.
- Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed
to match what Linux does in that 1) it dumps the entire XSAVE area
including the fxsave state, and 2) it stashes a copy of the current
xsave mask in the unused padding between the fxsave state and the
xstate header at the same location used by Linux.
- Teach readelf() to recognize NT_X86_XSTATE notes.
- Change PT_GET/SETXSTATE to take the entire XSAVE state instead of
only the extra portion. This avoids having to always make two
ptrace() calls to get or set the full XSAVE state.
- Add a PT_GET_XSTATE_INFO which returns the length of the current
XSTATE save area (so the size of the buffer needed for PT_GETXSTATE)
and the current XSAVE mask (%xcr0).

278347 07-Feb-2015 kib

MFC r278001:
Do not qualify the mcontext_t *mcp argument for set_mcontext(9) as const.

277379 19-Jan-2015 kib

MFC r277055:
Revert r263475: TDP_DEVMEMIO no longer needed.

277377 19-Jan-2015 kib

MFC r277051:
Fix several issues with /dev/mem and /dev/kmem devices on amd64.

277374 19-Jan-2015 kib

MFC r277047:
For x86, read MAXPHYADDR into variable cpu_maxphyaddr.

276871 09-Jan-2015 kib

MFC r276523:
Restore access to the page at zero through /dev/mem after r263475.

276870 09-Jan-2015 kib

MFC r276522:
Actually remove GIANT_REQUIRED, declared but not done in r263475.
Style.

276553 02-Jan-2015 alc

MFC r270961
Update a comment to reflect the changes in r213408.

276546 02-Jan-2015 alc

MFC r273701, r274556
By the time that pmap_init() runs, vm_phys_segs[] has been initialized.
Obtaining the end of memory address from vm_phys_segs[] is a little
easier than obtaining it from phys_avail[].

Enable the use of VM_PHYSSEG_SPARSE on amd64 and i386, making it the
default on i386 PAE. (The use of VM_PHYSSEG_SPARSE on i386 PAE saves
us some precious kernel virtual address space that would have been
wasted on unused vm_page structures.)

276386 30-Dec-2014 neel

MFC 261321
Rename the AMD MSR_PERFCTR[0-3] so the Pentium Pro MSR_PERFCTR[0-1] aren't
redefined.

MFC r273214
Fix build to not bogusly always rebuild vmm.ko.

MFC r273338
Add support for AMD's nested page tables in pmap.c:
- Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit)
for a pmap of type PT_RVI.
- Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of
type PT_EPT or PT_RVI.

Add CPU_SET_ATOMIC_ACQ(num, cpuset):
This is used when activating a vcpu in the nested pmap. Using the 'acquire'
variant guarantees that the load of the 'pm_eptgen' will happen only after
the vcpu is activated in 'pm_active'.

Add defines for various AMD-specific MSRs.

Discussed with: kib (r261321)

276084 22-Dec-2014 jhb

MFC 273988,273989,273995,274057:
MFamd64: Add support for extended FPU states on i386. This includes
support for AVX on i386.

276076 22-Dec-2014 jhb

MFC 271405,271408,271409,272658:
MFamd64: Use initializecpu() to set various model-specific registers on
AP startup and AP resume (it was already used for BSP startup and BSP
resume).

276070 22-Dec-2014 jhb

MFC 260557,271076,271077,271082,271083,271098:
- Remove spaces from boot messages when we print the CPU ID/Family/Stepping
- Move prototypes for various functions into out of C files and into
<machine/md_var.h>.
- Reduce diffs between i386 and amd64 initcpu.c and identcpu.c files.
- Move blacklists of broken TSCs out of the printcpuinfo() function
and into the TSC probe routine.
- Merge the amd64 and i386 identcpu.c into a single x86 implementation.

275933 19-Dec-2014 kib

MFC r275833:
The iret instruction may generate #np and #ss fault, besides #gp.
When returning to usermode, the handler for that exceptions is also
executed with wrong gs base. Handle all three possible faults in the
same way, checking for iret fault, and performing full iret.

274844 22-Nov-2014 kib

MFC r274555:
Fix END()s for fueword and fueword64, match the name in END() with
entry.

274833 22-Nov-2014 scottl

MFC r274489:

Add frame pointers to ASM functions in support.S

Obtained from: Netflix

274648 18-Nov-2014 kib

Merge the fueword(9) and casueword(9). In particular,

MFC r273783:
Add fueword(9) and casueword(9) functions.
MFC note: ia64 is handled like arm, with NO_FUEWORD define.

MFC r273784:
Replace some calls to fuword() by fueword() with proper error checking.

MFC r273785:
Convert kern_umtx.c to use fueword() and casueword().
MFC note: the sys__umtx_lock and sys__umtx_unlock syscalls are not
converted, they are removed from HEAD, and not used. The do_sem2*()
family is not yet merged to stable/10, corresponding chunk will be
merged after do_sem2* are committed.

MFC r273788 (by jkim):
Actually install casuword(9) to fix build.

MFC r273911:
Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9).

273736 27-Oct-2014 hselasky

MFC r263710, r273377, r273378, r273423 and r273455:

- De-vnet hash sizes and hash masks.
- Fix multiple issues related to arguments passed to SYSCTL macros.

Sponsored by: Mellanox Technologies


fpu.c
/freebsd-10-stable/sys/arm/arm/busdma_machdep-v6.c
/freebsd-10-stable/sys/arm/arm/busdma_machdep.c
/freebsd-10-stable/sys/cam/scsi/scsi_sa.c
/freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
/freebsd-10-stable/sys/cddl/dev/dtrace/dtrace_sysctl.c
/freebsd-10-stable/sys/compat/ndis/kern_ndis.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_asus.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_asus_wmi.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_hp.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_ibm.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_rapidstart.c
/freebsd-10-stable/sys/dev/acpi_support/acpi_sony.c
/freebsd-10-stable/sys/dev/bxe/bxe.c
/freebsd-10-stable/sys/dev/cxgb/cxgb_sge.c
/freebsd-10-stable/sys/dev/cxgbe/t4_main.c
/freebsd-10-stable/sys/dev/e1000/if_em.c
/freebsd-10-stable/sys/dev/e1000/if_igb.c
/freebsd-10-stable/sys/dev/e1000/if_lem.c
/freebsd-10-stable/sys/dev/hatm/if_hatm.c
/freebsd-10-stable/sys/dev/ixgbe/ixgbe.c
/freebsd-10-stable/sys/dev/ixgbe/ixv.c
/freebsd-10-stable/sys/dev/ixl/if_ixl.c
/freebsd-10-stable/sys/dev/mpr/mpr.c
/freebsd-10-stable/sys/dev/mps/mps.c
/freebsd-10-stable/sys/dev/mrsas/mrsas.c
/freebsd-10-stable/sys/dev/mrsas/mrsas.h
/freebsd-10-stable/sys/dev/mxge/if_mxge.c
/freebsd-10-stable/sys/dev/oce/oce_sysctl.c
/freebsd-10-stable/sys/dev/qlxgb/qla_os.c
/freebsd-10-stable/sys/dev/qlxgbe/ql_os.c
/freebsd-10-stable/sys/dev/rt/if_rt.c
/freebsd-10-stable/sys/dev/sound/pci/hda/hdaa.c
/freebsd-10-stable/sys/dev/vxge/vxge.c
/freebsd-10-stable/sys/dev/xen/netfront/netfront.c
/freebsd-10-stable/sys/fs/devfs/devfs_devs.c
/freebsd-10-stable/sys/fs/fuse/fuse_main.c
/freebsd-10-stable/sys/fs/fuse/fuse_vfsops.c
/freebsd-10-stable/sys/fs/nfsserver/nfs_nfsdkrpc.c
/freebsd-10-stable/sys/geom/geom_kern.c
/freebsd-10-stable/sys/kern/kern_cpuset.c
/freebsd-10-stable/sys/kern/kern_descrip.c
/freebsd-10-stable/sys/kern/kern_mib.c
/freebsd-10-stable/sys/kern/kern_synch.c
/freebsd-10-stable/sys/kern/subr_devstat.c
/freebsd-10-stable/sys/kern/subr_kdb.c
/freebsd-10-stable/sys/kern/subr_uio.c
/freebsd-10-stable/sys/kern/vfs_cache.c
/freebsd-10-stable/sys/mips/mips/busdma_machdep.c
/freebsd-10-stable/sys/net/if_lagg.c
/freebsd-10-stable/sys/net/pfvar.h
/freebsd-10-stable/sys/net80211/ieee80211_ht.c
/freebsd-10-stable/sys/net80211/ieee80211_hwmp.c
/freebsd-10-stable/sys/net80211/ieee80211_mesh.c
/freebsd-10-stable/sys/net80211/ieee80211_superg.c
/freebsd-10-stable/sys/netgraph/bluetooth/common/ng_bluetooth.c
/freebsd-10-stable/sys/netgraph/ng_base.c
/freebsd-10-stable/sys/netgraph/ng_socket.c
/freebsd-10-stable/sys/netinet/cc/cc_chd.c
/freebsd-10-stable/sys/netinet/tcp_reass.c
/freebsd-10-stable/sys/netipsec/ipsec.h
/freebsd-10-stable/sys/netipx/ipx_proto.c
/freebsd-10-stable/sys/netpfil/pf/if_pfsync.c
/freebsd-10-stable/sys/netpfil/pf/pf.c
/freebsd-10-stable/sys/netpfil/pf/pf_ioctl.c
/freebsd-10-stable/sys/ofed/drivers/net/mlx4/mlx4_en.h
/freebsd-10-stable/sys/powerpc/powermac/fcu.c
/freebsd-10-stable/sys/powerpc/powermac/smu.c
/freebsd-10-stable/sys/powerpc/powerpc/busdma_machdep.c
/freebsd-10-stable/sys/powerpc/powerpc/cpu.c
/freebsd-10-stable/sys/sys/sysctl.h
/freebsd-10-stable/sys/vm/memguard.c
/freebsd-10-stable/sys/vm/vm_kern.c
/freebsd-10-stable/sys/x86/x86/busdma_bounce.c
273573 24-Oct-2014 neel

MFC r273356:
Fix a race in pmap_emulate_accessed_dirty() that could trigger a EPT
misconfiguration VM-exit.

273136 15-Oct-2014 kib

MFC r272761:
Add an argument to the x86 pmap_invalidate_cache_range() to request
forced invalidation of the cache range regardless of the presence of
self-snoop feature.

MFC r272943:
MFi386 r272761.

272913 10-Oct-2014 jhb

MFC 270828,271487,271495:
Add sysctls to export the BIOS SMAP and EFI memory maps along with
handlers in the sysctl(8) binary to format them.

272540 04-Oct-2014 kib

MFC r271747:
- Use NULL instead of 0 for fpcurthread.
- Note the quirk with the interrupt enabled state of the dna handler.
- Use just panic() instead of printf() and panic(). Print tid instead
of pid, the fpu state is per-thread.

MFC r271924:
Update and clarify comments. Remove the useless counter for impossible, but
seen in wild situation (on buggy hypervisors).

271999 22-Sep-2014 jhb

MFC 270850,271053,271192,271717:
Save and restore FPU state across suspend and resume on i386.
- Create a separate structure for per-CPU state saved across suspend and
resume that is a superset of a pcb.
- Store the FPU state for suspend and resume in the new structure
(for amd64, this moves it out of the PCB)
- On both i386 and amd64, all of the FPU suspend/resume handling is now
done in C.

Approved by: re (hrs)

271905 20-Sep-2014 kib

MFC r271716:
Presence of any VM_PROT bits in the permission argument on x86 implies
that the entry is readable and valid.

Approved by: re (gjb)

271541 13-Sep-2014 pfg

MFC r271149:
Apply known workarounds for less modern MacBooks.

The legacy USB circuit tends to give trouble on older MacBooks.
While the original report covered MacBook4, extend the fix
preemptively for the newer MacBookPro4 too.

PR: 191693
Reviewed by: emaste
Approved by: re

271289 08-Sep-2014 emaste

MFC r265014: Report boot method (BIOS/UEFI) via sysctl machdep.bootmethod

Approved by: re
Sponsored by: The FreeBSD Foundation

271071 04-Sep-2014 pfg

MFC r270844:
Minor space/tab cleanups.

Most of them were ripped from the GSoC 2104
SMAP + kpatch project (but unrelated).
Only cosmetic changes.

Taken from: Oliver Pinter (op@)

270988 02-Sep-2014 emaste

MFC automatic vt(4) selection for UEFI boot

r268158: Prefer vt(4) for UEFI boot

The UEFI framebuffer driver vt_efifb requires vt(4), so add a
mechanism for the startup routine to set the preferred console.
This change is ugly because console init happens very early in the
boot, making a cleaner interface difficult. This change is intended
only to facilitate the sc(4) / vt(4) transition, and can be reverted
once vt(4) is the default.

r268160: Fix typos in VTY constant names from r268158

r268982: Don't pass null kmdp to preload_search_info

On Xen PVH guests kmdp == NULL.

Sponsored by: The FreeBSD Foundation

270920 01-Sep-2014 kib

Fix a leak of the wired pages when unwiring of the PROT_NONE-mapped
wired region. Rework the handling of unwire to do the it in batch,
both at pmap and object level.

All commits below are by alc.

MFC r268327:
Introduce pmap_unwire().

MFC r268591:
Implement pmap_unwire() for powerpc.

MFC r268776:
Implement pmap_unwire() for arm.

MFC r268806:
pmap_unwire(9) man page.

MFC r269134:
When unwiring a region of an address space, do not assume that the
underlying physical pages are mapped by the pmap. This fixes a leak
of the wired pages on the unwiring of the region mapped with no access
allowed.

MFC r269339:
In the implementation of the new function pmap_unwire(), the call to
MOEA64_PVO_TO_PTE() must be performed before any changes are made to the
PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic.

MFC r269365:
Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)

MFC r269433:
Handle wiring failures in vm_map_wire() with the new functions
pmap_unwire() and vm_object_unwire().
Retire vm_fault_{un,}wire(), since they are no longer used.

MFC r269438:
Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable
"rv" is uninitialized.

MFC r269485:
Retire pmap_change_wiring().

Reviewed by: alc

270439 24-Aug-2014 kib

Merge the changes to pmap_enter(9) for sleep-less operation (requested
by flag). The ia64 pmap.c changes are direct commit, since ia64 is
removed on head.

MFC r269368 (by alc):
Retire PVO_EXECUTABLE.

MFC r269728:
Change pmap_enter(9) interface to take flags parameter and superpage
mapping size (currently unused).

MFC r269759 (by alc):
Update the text of a KASSERT() to reflect the changes in r269728.

MFC r269822 (by alc):
Change {_,}pmap_allocpte() so that they look for the flag
PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether
to sleep on page table page allocation.

MFC r270151 (by alc):
Replace KASSERT that no PV list locks are held with a conditional
unlock.

Reviewed by: alc
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation

270344 22-Aug-2014 emaste

MFC r263822: amd64: Parse the EFI memory map if present

With this change (and loader.efi from [HEAD]) we can now boot under
qemu using the OVMF UEFI firmware image with the limitation that a
serial console is required.

Sponsored by: The FreeBSD Foundation

270159 19-Aug-2014 grehan

MFC r267921, r267934, r267949, r267959, r267966, r268202, r268276,
r268427, r268428, r268521, r268638, r268639, r268701, r268777,
r268889, r268922, r269008, r269042, r269043, r269080, r269094,
r269108, r269109, r269281, r269317, r269700, r269896, r269962,
r269989.

Catch bhyve up to CURRENT.

Lightly tested with FreeBSD i386/amd64, Linux i386/amd64, and
OpenBSD/amd64. Still resolving an issue with OpenBSD/i386.

Many thanks to jhb@ for all the hard work on the prior MFCs !

r267921 - support the "mov r/m8, imm8" instruction
r267934 - document options
r267949 - set DMI vers/date to fixed values
r267959 - doc: sort cmd flags
r267966 - EPT misconf post-mortem info
r268202 - use correct flag for event index
r268276 - 64-bit virtio capability api
r268427 - invalidate guest TLB when cr3 is updated, needed for TSS
r268428 - identify vcpu's operating mode
r268521 - use correct offset in guest logical-to-linear translation
r268638 - chs value
r268639 - chs fake values
r268701 - instr emul operand/address size override prefix support
r268777 - emulation for legacy x86 task switching
r268889 - nested exception support
r268922 - fix INVARIANTS build
r269008 - emulate instructions found in the OpenBSD/i386 5.5 kernel
r269042 - fix fault injection
r269043 - Reduce VMEXIT_RESTARTs in task_switch.c
r269080 - fix issues in PUSH emulation
r269094 - simplify return values from the inout handlers
r269108 - don't return -1 from the push emulation handler
r269109 - avoid permanent sleep in vm_handle_hlt()
r269281 - list VT-x features in base kernel dmesg
r269317 - Mark AHCI fatal errors as not completed
r269700 - Support PCI extended config space in bhyve
r269896 - Minor cleanup
r269962 - use max guest memory when creating IOMMU domain
r269989 - fix interrupt mode names


/freebsd-10-stable/lib/libvmmapi/vmmapi.c
/freebsd-10-stable/lib/libvmmapi/vmmapi.h
identcpu.c
/freebsd-10-stable/sys/amd64/include/vmm.h
/freebsd-10-stable/sys/amd64/include/vmm_dev.h
/freebsd-10-stable/sys/amd64/include/vmm_instruction_emul.h
/freebsd-10-stable/sys/amd64/vmm/intel/vmcs.c
/freebsd-10-stable/sys/amd64/vmm/intel/vmcs.h
/freebsd-10-stable/sys/amd64/vmm/intel/vmx.c
/freebsd-10-stable/sys/amd64/vmm/intel/vmx_msr.c
/freebsd-10-stable/sys/amd64/vmm/intel/vmx_msr.h
/freebsd-10-stable/sys/amd64/vmm/intel/vtd.c
/freebsd-10-stable/sys/amd64/vmm/io/vatpic.c
/freebsd-10-stable/sys/amd64/vmm/vmm.c
/freebsd-10-stable/sys/amd64/vmm/vmm_dev.c
/freebsd-10-stable/sys/amd64/vmm/vmm_instruction_emul.c
/freebsd-10-stable/sys/x86/include/specialreg.h
/freebsd-10-stable/usr.sbin/bhyve/Makefile
/freebsd-10-stable/usr.sbin/bhyve/acpi.c
/freebsd-10-stable/usr.sbin/bhyve/atkbdc.c
/freebsd-10-stable/usr.sbin/bhyve/bhyve.8
/freebsd-10-stable/usr.sbin/bhyve/bhyverun.c
/freebsd-10-stable/usr.sbin/bhyve/bhyverun.h
/freebsd-10-stable/usr.sbin/bhyve/block_if.c
/freebsd-10-stable/usr.sbin/bhyve/block_if.h
/freebsd-10-stable/usr.sbin/bhyve/inout.c
/freebsd-10-stable/usr.sbin/bhyve/inout.h
/freebsd-10-stable/usr.sbin/bhyve/mem.c
/freebsd-10-stable/usr.sbin/bhyve/mem.h
/freebsd-10-stable/usr.sbin/bhyve/pci_ahci.c
/freebsd-10-stable/usr.sbin/bhyve/pci_emul.c
/freebsd-10-stable/usr.sbin/bhyve/pci_emul.h
/freebsd-10-stable/usr.sbin/bhyve/pci_irq.c
/freebsd-10-stable/usr.sbin/bhyve/pm.c
/freebsd-10-stable/usr.sbin/bhyve/smbiostbl.c
/freebsd-10-stable/usr.sbin/bhyve/task_switch.c
/freebsd-10-stable/usr.sbin/bhyve/virtio.c
/freebsd-10-stable/usr.sbin/bhyve/virtio.h
/freebsd-10-stable/usr.sbin/bhyvectl/bhyvectl.c
/freebsd-10-stable/usr.sbin/bhyveload/bhyveload.8
/freebsd-10-stable/usr.sbin/bhyveload/bhyveload.c
269752 09-Aug-2014 markj

MFC r266826, r266827
Move some duplicated hook definitions from machine-dependent files to
kern_dtrace.c.

269402 01-Aug-2014 emaste

MFC r258436: Refactor amd64 startup SMAP parsing

Extracted from the projects/uefi branch, this change is a reasonable
cleanup and will reduce the diffs to review when bringing in the
UEFI work.

269253 29-Jul-2014 markj

MFC r263329:
Only invoke fasttrap hooks for traps from user mode, and ensure that they're
called with interrupts enabled. Calling fasttrap_pid_probe() with interrupts
disabled can lead to deadlock if fasttrap writes to the process' address
space.

269238 29-Jul-2014 marius

MFC: r269051

Copying pages via temporary mappings in the !DMAP case of pmap_copy_pages()
involves updating the corresponding page tables followed by accesses to the
pages in question. This sequence is subject to the situation exactly described
in the "AMD64 Architecture Programmer's Manual Volume 2: System Programming"
rev. 3.23, "7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore,
issuing the INVLPG right after modifying the PTE bits is crucial (see also
r269050, MFCed to stable/10 in r269235).
For the amd64 PMAP code, the order of instructions was already correct. The
above fact still is worth documenting, though.

1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf

Reviewed by: alc
Sponsored by: Bally Wulff Games & Entertainment GmbH

269072 24-Jul-2014 kib

MFC r267213 (by alc):
Add a page size field to struct vm_page.

Approved by: alc

269060 24-Jul-2014 emaste

MFC r258471: Don't abort SMAP processing after an entry of length 0

Length 0 is not special and should just be skipped. This is the same
behaviour as i386.

Sponsored by: The FreeBSD Foundation

269056 24-Jul-2014 kib

MFC r268660:
Make amd64 pmap_copy_pages() functional for pages not mapped by DMAP.

268742 16-Jul-2014 kib

MFC r268471:
For safety, ensure that any consumer of the set_regs() and
ptrace_set_pc() use the correct return to userspace using iret.

268661 15-Jul-2014 kib

MFC r268383:
Correct si_code for the SIGBUS signal generated by the alignment trap.

268033 30-Jun-2014 kib

MFC r267767:
Add FPU_KERN_KTHR flag to fpu_kern_enter(9).
Apply the flag to padlock(4) and aesni(4).
In aesni_cipher_process(), do not leak FPU context state on error.

267964 27-Jun-2014 jhb

MFC 261781:
Don't waste a page of KVA for the boot-time memory test on x86. For amd64,
reuse the first page of the crashdumpmap as CMAP1/CADDR1. For i386,
remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.

267598 17-Jun-2014 neel

MFC r266901
Allocate a zeroed LDT.
Failing to do this might result in the LDT appearing to run out of free
descriptors because of random junk in the descriptor's 'sd_type' field.

267418 12-Jun-2014 jhb

MFC 266263,266551,266552:
- Add definitions for more structured extended features as well as
XSAVE Extended Features for AVX512 and MPX (Memory Protection Extensions).
- Don't permit users to request a subset of the AVX512 or MPX xsave masks.

267083 05-Jun-2014 kib

MFC r266846:
When usermode loaded non-default segment selector into the %gs,
correctly prepare KGSBASE msr to restore the user descriptor base on
the last swapgs during return to usermode.

266339 17-May-2014 jhb

MFC 259641,259863,259924,259937,259961,259978,260380,260383,260410,260466,
260531,260532,260550,260619,261170,261453,261621,263280,263290,264516:
Add support for local APIC hardware-assist.
- Restructure vlapic access and register handling to support hardware-assist
for the local APIC.
- Use the 'Virtual Interrupt Delivery' and 'Posted Interrupt Processing'
feature of Intel VT-x if supported by hardware.
- Add an API to rendezvous all active vcpus in a virtual machine and use
it to support level triggered interrupts with VT-x 'Virtual Interrupt
Delivery'.
- Use a cheaper IPI handler than IPI_AST for nested page table shootdowns
and avoid doing unnecessary nested TLB invalidations.

Reviewed by: neel

266312 17-May-2014 ian

MFC 263036, 263059: delete advertising clause in licenses, renumber.

265606 07-May-2014 scottl

Merge r264984

Retire smp_active. It was racey and caused demonstrated problems with
the cpufreq code. Replace its use with smp_started. There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Obtained from: Netflix, Inc.

265476 07-May-2014 alc

MFC r262338
When the kernel is running in a virtual machine, it cannot rely upon the
processor family to determine if the workaround for AMD Family 10h Erratum
383 should be enabled. To enable virtual machine migration among a
heterogeneous collection of physical machines, the hypervisor may have
been configured to report an older processor family with a reduced feature
set. Effectively, the reported processor family and its features are like
a "least common denominator" for the collection of machines.

Therefore, when the kernel is running in a virtual machine, instead of
relying upon the processor family, we now test for features that prove
that the underlying processor is not affected by the erratum. (The
features that we test for are unlikely to ever be emulated in software
on an affected physical processor.)

PR: 186061

265312 04-May-2014 kib

MFC r265004:
Same as it was done in r263878 for invlrng_handler(), fix order of
checks for special pcid values in invlpg_pcid_handler().

264147 05-Apr-2014 kib

MFC r263912:
Clear the kernel grab of the FPU state on fork.

264134 04-Apr-2014 kib

MFC r263878:
Several fixes for the PCID implementation.

264118 04-Apr-2014 royger

MFC r263001

Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Approved by: gibbs
Sponsored by: Citrix Systems R&D

263875 28-Mar-2014 kib

MFC r263475:
Fix two issues with /dev/mem access on amd64, both causing kernel page
faults.

First, for accesses to direct map region should check for the limit by
which direct map is instantiated.

Second, for accesses to the kernel map, use a new thread private flag
TDP_DEVMEMIO, which instructs vm_fault() to return error when fault
happens on the MAP_ENTRY_NOFAULT entry, instead of panicing.

MFC r263498:
Add change forgotten in r263475. Make dmaplimit accessible outside
amd64/pmap.c.

262981 10-Mar-2014 jkim

MFC: r262746, r262748, r262750, r262752

Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.

262753 04-Mar-2014 emaste

Disable amd64 TLB Context ID (pcid) by default for now

There are a number of reports of userspace application crashes that
are "solved" by setting vm.pmap.pcid_enabled=0, including Java and the
x11/mate-terminal port (PR ports/184362).

For now apply a band-aid of disabling pcid by default in stable/10.

Sponsored by: The FreeBSD Foundation

262042 17-Feb-2014 avg

MFC r257417: Remove references to an unused fasttrap probe hook

261275 29-Jan-2014 jhb

MFC 259782:
Add a resume hook for bhyve that runs a function on all CPUs during
resume. For Intel CPUs, invoke vmxon for CPUs that were in VMX mode
at the time of suspend.

260467 09-Jan-2014 kib

MFC r260205:
Update the description for pmap_remove_pages() to match the modern
times. Assert that the pmap passed to pmap_remove_pages() is only
active on current CPU.

260464 09-Jan-2014 kib

MFC r260204:
Assert that accounting for the pmap resident pages does not underflow.

260289 04-Jan-2014 dim

MFC r260103:

In sys/amd64/amd64/pmap.c, remove static function pmap_is_current(),
which has been unused since r189415.

Reviewed by: alc

258996 05-Dec-2013 royger

MFC 258176:

Fix accounting for hw.realmem on the i386 and amd64 platforms.

sys/i386/i386/machdep.c:
sys/amd64/amd64/machdep.c:
The value reported by FreeBSD as "real memory" when booting
doesn't match what is later reported by sysctl as hw.realmem.
This is due to the fact that the value printed during the
boot process is fetched from smbios data (when possible),
and accounts for holes in physical memory. On the other
hand, the value of hw.realmem is unconditionally set to be
one larger than the highest page of the physical address
space.

Fix this by setting hw.realmem to the same value printed
during boot, this makes hw.realmem honour it's name and
account properly for physical memory present in the system.

Submitted by: Roger Pau Monné
Reviewed by: gibbs
Approved by: gibbs (mentor)
Approved by: re (gjb)

258559 25-Nov-2013 emaste

MFC r258135: x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)

Debuggers may need to change PSL_RF. Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could
be modified via sigreturn, so this change should not provide any new
ability to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by: jhb, kib

Sponsored by: DARPA, AFRL
Approved by: re (gjb)

258159 15-Nov-2013 kib

MFC r257856:
Add bits for the AMD features from CPUID function 0x80000001 ECX,
described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.

Approved by: re (gjb)

257575 03-Nov-2013 kib

MFC r257216:
Several small fixes for the amd64 minidump code.

Approved by: re (gjb)

256869 22-Oct-2013 neel

MFC r256645.

Add a new capability, VM_CAP_ENABLE_INVPCID, that can be enabled to expose
'invpcid' instruction to the guest. Currently bhyve will try to enable this
capability unconditionally if it is available.

Consolidate code in bhyve to set the capabilities so it is no longer
duplicated in BSP and AP bringup.

Add a sysctl 'vm.pmap.invpcid_works' to display whether the 'invpcid'
instruction is available.

Approved by: re (hrs)

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


256166 08-Oct-2013 dim

In sys/amd64/amd64/pmap.c, fix several gcc warnings about uninitialized
variables in reclaim_pv_chunk().

Approved by: re (marius)
Reviewed by: neel, kib
X-MFC-With: r256072


256072 05-Oct-2013 neel

Merge projects/bhyve_npt_pmap into head.

Make the amd64/pmap code aware of nested page table mappings used by bhyve
guests. This allows bhyve to associate each guest with its own vmspace and
deal with nested page faults in the context of that vmspace. This also
enables features like accessed/dirty bit tracking, swapping to disk and
transparent superpage promotions of guest memory.

Guest vmspace:
Each bhyve guest has a unique vmspace to represent the physical memory
allocated to the guest. Each memory segment allocated by the guest is
mapped into the guest's address space via the 'vmspace->vm_map' and is
backed by an object of type OBJT_DEFAULT.

pmap types:
The amd64/pmap now understands two types of pmaps: PT_X86 and PT_EPT.

The PT_X86 pmap type is used by the vmspace associated with the host kernel
as well as user processes executing on the host. The PT_EPT pmap is used by
the vmspace associated with a bhyve guest.

Page Table Entries:
The EPT page table entries as mostly similar in functionality to regular
page table entries although there are some differences in terms of what
bits are used to express that functionality. For e.g. the dirty bit is
represented by bit 9 in the nested PTE as opposed to bit 6 in the regular
x86 PTE. Therefore the bitmask representing the dirty bit is now computed
at runtime based on the type of the pmap. Thus PG_M that was previously a
macro now becomes a local variable that is initialized at runtime using
'pmap_modified_bit(pmap)'.

An additional wrinkle associated with EPT mappings is that older Intel
processors don't have hardware support for tracking accessed/dirty bits in
the PTE. This means that the amd64/pmap code needs to emulate these bits to
provide proper accounting to the VM subsystem. This is achieved by using
the following mapping for EPT entries that need emulation of A/D bits:
Bit Position Interpreted By
PG_V 52 software (accessed bit emulation handler)
PG_RW 53 software (dirty bit emulation handler)
PG_A 0 hardware (aka EPT_PG_RD)
PG_M 1 hardware (aka EPT_PG_WR)

The idea to use the mapping listed above for A/D bit emulation came from
Alan Cox (alc@).

The final difference with respect to x86 PTEs is that some EPT implementations
do not support superpage mappings. This is recorded in the 'pm_flags' field
of the pmap.

TLB invalidation:
The amd64/pmap code has a number of ways to do invalidation of mappings
that may be cached in the TLB: single page, multiple pages in a range or the
entire TLB. All of these funnel into a single EPT invalidation routine called
'pmap_invalidate_ept()'. This routine bumps up the EPT generation number and
sends an IPI to the host cpus that are executing the guest's vcpus. On a
subsequent entry into the guest it will detect that the EPT has changed and
invalidate the mappings from the TLB.

Guest memory access:
Since the guest memory is no longer wired we need to hold the host physical
page that backs the guest physical page before we can access it. The helper
functions 'vm_gpa_hold()/vm_gpa_release()' are available for this purpose.

PCI passthru:
Guest's with PCI passthru devices will wire the entire guest physical address
space. The MMIO BAR associated with the passthru device is backed by a
vm_object of type OBJT_SG. An IOMMU domain is created only for guest's that
have one or more PCI passthru devices attached to them.

Limitations:
There isn't a way to map a guest physical page without execute permissions.
This is because the amd64/pmap code interprets the guest physical mappings as
user mappings since they are numerically below VM_MAXUSER_ADDRESS. Since PG_U
shares the same bit position as EPT_PG_EXECUTE all guest mappings become
automatically executable.

Thanks to Alan Cox and Konstantin Belousov for their rigorous code reviews
as well as their support and encouragement.

Thanks for John Baldwin for reviewing the use of OBJT_SG as the backing
object for pci passthru mmio regions.

Special thanks to Peter Holm for testing the patch on short notice.

Approved by: re
Discussed with: grehan
Reviewed by: alc, kib
Tested by: pho


255849 24-Sep-2013 kib

In pmap_clear_modify(), initialize pvh even for fictitious managed
page, otherwise the small mappings loop would use uninitialized value.
Note that currently pmap_clear_modify() is not called for fictitious
pages.

Sponsored by: The FreeBSD Foundation
Approved by: re (glebius)


255845 24-Sep-2013 kib

Use the pv lists generation count to read-lock the pvh_global_lock in
pmap_clear_modify().

Noted and reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)


255844 24-Sep-2013 kib

Ensure that the ERESTART return from the syscall reloads the
registers, to make the restarted syscall instruction pass the correct
arguments.

PR: kern/182161
Reported by: Russ Cox <rsc@swtch.com>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Approved by: re (marius)


255827 23-Sep-2013 kib

Free both KVA and backing pages when freeing TSS memory.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)


255744 20-Sep-2013 gibbs

Merge Xen PVHVM support into the GENERIC kernel config for both
amd64 and i386.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
- Introduce two new CPU hooks for initialization and resume
purposes. This allows us to get rid of the XENHVM ifdefs in
mp_machdep, and also sets some hooks into common code that can be
used by other hypervisor implementations.

sys/amd64/conf/XENHVM:
sys/i386/conf/XENHVM:
- Remove these configs now that GENERIC has builtin support for Xen
HVM.

sys/kern/subr_smp.c:
- Make sure there are no pending IPIs when suspending a system.

sys/x86/xen/hvm.c:
- Add cpu init and resume vectors that are called from mp_machdep
using the new hooks.
- Only clear the vcpu_info mapping data on resume. It is already
clear for the BSP on a cold boot and is set correctly as APs
are started.
- Gate xen_hvm_init_cpu only to systems running under Xen.

sys/x86/xen/xen_intr.c:
- Gate the setup of event channels only to systems running under Xen.


255726 20-Sep-2013 gibbs

Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
- Make sure that are no MMU related IPIs pending on migration.
- Reset pending IPI_BITMAP on resume.
- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
- Add a "suspend_cancelled" parameter to pic_resume(). For the
Xen PIC, restoration of interrupt services differs between
the aborted suspend and normal resume cases, so we must provide
this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
- Don't swap out "suspend safe" timers across a suspend/resume
cycle. This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
- Perform proper suspend/resume process for PVHVM:
- Suspend all APs before going into suspension, this allows us
to reset the vcpu_info on resume for each AP.
- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
- Implement suspend/resume support for the PV timer. Since FreeBSD
doesn't perform a per-cpu resume of the timer, we need to call
smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
- When suspending a PVHVM domain make sure there are no MMU IPIs
in-flight, or we will get a lockup on resume due to the fact that
pending event channels are not carried over on migration.
- Implement a generic version of restart_cpus that can be used by
suspended and stopped cpus.

sys/x86/xen/hvm.c:
- Implement resume support for the hypercall page and shared info.
- Clear vcpu_info so it can be reset by APs when resuming from
suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
- Properly rebind per-cpus VIRQs and IPIs on resume.


255724 20-Sep-2013 alc

The pmap function pmap_clear_reference() is no longer used. Remove it.

pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


255677 18-Sep-2013 pjd

Fix panic in ktrcapfail() when no capability rights are passed.
While here, correct all consumers to pass NULL instead of 0 as we pass
capability rights as pointers now, not uint64_t.

Reported by: Daniel Peyrolon
Tested by: Daniel Peyrolon
Approved by: re (marius)


255607 16-Sep-2013 kib

In pmap_copy(), when the copied region is mapped with superpage but does
not cover entire superpage, avoid copying. Doing partial copy would
require demotion, which is incompatible with the already held locks.

Reported by: cperciva
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (delphij)


255409 08-Sep-2013 alc

Prior to r254304, we only began scanning the active page queue when the
amount of free memory was close to the point at which we would begin
reclaiming pages. Now, we continuously scan the active page queue,
regardless of the amount of free memory. Consequently, we are continuously
calling pmap_ts_referenced() on active pages.

Prior to this change, pmap_ts_referenced() would always demote superpage
mappings in order to obtain finer-grained reference information. This made
sense because we were coming under memory pressure and would soon have to
begin reclaiming pages. Now, however, with continuous scanning of the
active page queue, these demotions are taking a toll on performance. For
example, on one of my test machines, the running time for the HPCC Random
Access benchmark (also known as GUPS) has increased by 54%. To address this
problem, I have replaced the demotion with a heuristic for periodically
clearing the reference flag on superpage mappings.

Reviewed by: kib
Approved by: re (glebius)
Sponsored by: EMC / Isilon Storage Division


255331 06-Sep-2013 gibbs

Implement PV IPIs for PVHVM guests and further converge PV and HVM
IPI implmementations.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Submitted by: gibbs (misc cleanup, table driven config)
Reviewed by: gibbs
MFC after: 2 weeks

sys/amd64/include/cpufunc.h:
sys/amd64/amd64/pmap.c:
Move invltlb_globpcid() into cpufunc.h so that it can be
used by the Xen HVM version of tlb shootdown IPI handlers.

sys/x86/xen/xen_intr.c:
sys/xen/xen_intr.h:
Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(),
and remove the ipi vector parameter. This api allocates
an event channel port that can be used for ipi services,
but knows nothing of the actual ipi for which that port
will be used. Removing the unused argument and cleaning
up the comments surrounding its declaration helps clarify
its actual role.

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
Implement a generic framework for amd64 and i386 that allows
the implementation of certain CPU management functions to
be selected at runtime. Currently this is only used for
the ipi send function, which we optimize for Xen when running
on a Xen hypervisor, but can easily be expanded to support
more operations.

sys/x86/xen/hvm.c:
Implement Xen PV IPI handlers and operations, replacing native
send IPI.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
sys/i386/include/smp.h:
Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS
is defined already for us in the xen interface files.
NR_IPIS is only needed in one file per Xen platform and is
easily inferred by the IPI vector table that is defined in
those files.

sys/i386/xen/mp_machdep.c:
Restructure to more closely match the HVM implementation by
performing table driven IPI setup.


255312 06-Sep-2013 kib

Only lock pvh_global_lock read-only for pmap_page_wired_mappings(),
pmap_is_modified() and pmap_is_referenced(), same as it was done for
pmap_ts_referenced().

Consolidate identical code for pmap_is_modified() and
pmap_is_referenced() into helper pmap_page_test_mappings().

Reviewed by: alc
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation


255311 06-Sep-2013 kib

In pmap_ts_referenced(), when restarting the loop due to pv list
generation changed, do not drop and immediately relock the pv list.

Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


255289 06-Sep-2013 glebius

On those machines, where sf_bufs do not represent any real object, make
sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely
empty functions.

Reviewed by: alc, kib, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


255217 04-Sep-2013 kib

Tidy up some loose ends in the PCID code:

- Restore the pre-PCID TLB shootdown handlers for whole address space
and single page invalidation asm code, and assign the IPI handler to
them when PCID is not supported or disabled. Old handlers have
linear control flow. But, still use the common return sequence.

- Stop using pcpu for INVPCID descriptors in the invlrg handler. It
is enough to allocate descriptors on the stack. As result, two
SWAPGS instructions are shaved off from the code for Haswell+.

- Fix the reverted condition in invlrng for checking of the PCID
support [1], also in invlrng check that pmap is kernel pmap before
performing other tests. For the kernel pmap, which provides global
mappings, the INVLPG must be used for invalidation always.

- Save the pre-computed pmap' %CR3 register in the struct pmap. This
allows to remove several checks for pm_pcid validity when %CR3 is
reloaded [2].

Noted by: gibbs [1]
Discussed with: alc [2]
Tested by: pho, flo
Sponsored by: The FreeBSD Foundation


255192 03-Sep-2013 jhb

Add support for the 'invpcid' instruction to binutils and DDB's
disassembler on amd64.

MFC after: 1 month


255106 31-Aug-2013 kib

Fix two build failures for non-tb configurations, UP [2] and when using gas [1].

Reported by: andreast [1], bf [2]
Sponsored by: The FreeBSD Foundation


255079 30-Aug-2013 kib

The pm_save should be cleared on the pmap initialization, and not on
the activation.

Noted by: alc


255060 30-Aug-2013 kib

Implement support for the process-context identifiers ('PCID') on
Intel CPUs. The feature tags TLB entries with the Id of the address
space and allows to avoid TLB invalidation on the context switch, it
is available only in the long mode. In the microbenchmarks, using the
PCID decreased latency of the context switches by ~30% on SandyBridge
class desktop CPUs, measured with the lat_ctx program from lmbench.

If available, use INVPCID instruction when a TLB entry in non-current
address space needs to be invalidated. The instruction is typically
available on the Haswell.

If needed, the use of PCID can be turned off with the
vm.pmap.pcid_enabled loader tunable set to 0. The state of the
feature is reported by the vm.pmap.pcid_enabled sysctl. The sysctl
vm.pmap.pcid_save_cnt reports the number of context switches which
avoided invalidating the TLB; compare with the total number of context
switches, available as sysctl vm.stats.sys.v_swtch.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
Tested by: pho, bf


255040 29-Aug-2013 gibbs

Implement vector callback for PVHVM and unify event channel implementations

Re-structure Xen HVM support so that:
- Xen is detected and hypercalls can be performed very
early in system startup.
- Xen interrupt services are implemented using FreeBSD's native
interrupt delivery infrastructure.
- the Xen interrupt service implementation is shared between PV
and HVM guests.
- Xen interrupt handlers can optionally use a filter handler
in order to avoid the overhead of dispatch to an interrupt
thread.
- interrupt load can be distributed among all available CPUs.
- the overhead of accessing the emulated local and I/O apics
on HVM is removed for event channel port events.
- a similar optimization can eventually, and fairly easily,
be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
Reserve IDT vector 0x93 for the Xen event channel upcall
interrupt handler. On Hypervisors that support the direct
vector callback feature, we can request that this vector be
called directly by an injected HVM interrupt event, instead
of a simulated PCI interrupt on the Xen platform PCI device.
This avoids all of the overhead of dealing with the emulated
I/O APIC and local APIC. It also means that the Hypervisor
can inject these events on any CPU, allowing upcalls for
different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
Increase the FreeBSD IRQ vector table to include space
for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
Remove Xen HVM per-cpu variable data. These fields are now
allocated via the dynamic per-cpu scheme. See xen_intr.c
for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
Remove constants, macros, and functions unused in FreeBSD's Xen
support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
Introduce new functions xen_domain(), xen_pv_domain(), and
xen_hvm_domain(). These are used in favor of #ifdefs so that
FreeBSD can dynamically detect and adapt to the presence of
a hypervisor. The goal is to have an HVM optimized GENERIC,
but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
Refactor magic ioport, Hypercall table and Hypervisor shared
information page setup, and move it to a dedicated HVM support
module.

HVM mode initialization is now triggered during the
SI_SUB_HYPERVISOR phase of system startup. This currently
occurs just after the kernel VM is fully setup which is
just enough infrastructure to allow the hypercall table
and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
Add definitions and a method for configuring Hypervisor event
delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
Adjust kernel build to reflect the refactoring of early
Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
Since blkback defers all event handling to a taskqueue,
convert this task queue to a "fast" taskqueue, and schedule
it via an interrupt filter. This avoids an unnecessary
ithread context switch.

sys/xen/xenstore/xenstore.c:
The xenstore driver is MPSAFE. Indicate as much when
registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
Remove unused event channel APIs.

sys/xen/evtchn.h:
Remove all kernel Xen interrupt service API definitions
from this file. It is now only used for structure and
ioctl definitions related to the event channel userland
device driver.

Update the definitions in this file to match those from
NetBSD. Implementing this interface will be necessary for
Dom0 support.

sys/xen/evtchn/evtchnvar.h:
Add a header file for implemenation internal APIs related
to managing event channels event delivery. This is used
to allow, for example, the event channel userland device
driver to access low-level routines that typical kernel
consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
Standardize on the evtchn_port_t type for referring to
an event channel port id. In order to prevent low-level
event channel APIs from leaking to kernel consumers who
should not have access to this data, the type is defined
twice: Once in the Xen provided event_channel.h, and again
in xen/xen_intr.h. The double declaration is protected by
__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
New implementation of Xen interrupt services. This is
similar in many respects to the i386 PV implementation with
the exception that events for bound to event channel ports
(i.e. not IPI, virtual IRQ, or physical IRQ) are further
optimized to avoid mask/unmask operations that aren't
necessary for these edge triggered events.

Stubs exist for supporting physical IRQ binding, but will
need additional work before this implementation can be
fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
Add support for placing vcpu_info into an arbritary memory
page instead of using HYPERVISOR_shared_info->vcpu_info.
This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
Add support for new event channle implementation.


255028 29-Aug-2013 alc

Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division


254667 22-Aug-2013 kib

Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


254666 22-Aug-2013 kib

Use the generation count of the pv list to work around LOR between
pmap lock and pv list lock, and use the shared locking on
pvh_global_lock in pmap_remove_write(), same as it was done for
pmap_ts_referenced().

Noted and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


254547 20-Aug-2013 neel

Fix breakage caused by r254466 in minidumpsys().

r254466 increased the KVA from 512GB to 2TB which requires 4 PDP pages as
opposed to a single one before the change. This broke minidumpsys() since
it assumed that the entire KVA could be addressed via a single PDP page.

Fix this by obtaining the address of the PDP page from the PML4 entry
associated with the KVA being dumped.

Reported by: pho
Submitted by: kib
Pointy hat to: neel


254501 18-Aug-2013 kib

When code from r254064 in pmap_ts_referenced() drops pv lock and
blocks on a pmap lock, pmap_release() might proceed in parallel and
destroy the pmap mutex, since unlocked pv lock allows to remove pv
entry owned by the pmap.

For now, gate the pmap_release() on write-locked pvh_global_lock.
Since pmap_ts_release() does not unlock the global lock,
pmap_release() would not destroy pmap mutex until the
pmap_ts_referenced() finished. We cannot enter pmap_ts_referenced()
and encounter a pv entry for the destroyed pmap if pmap_release()
passed the global lock gate, since pmap_remove_pages() would finish
earlier.

Reported by: jeff, pho
Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation


254466 17-Aug-2013 neel

Bump up the maximum addressable memory on amd64 systems from 1TB to 4TB.
Bump up the KVA size proportionally from 512GB to 2TB.

The number of page table pages used by the direct map is now calculated at
run time based on 'Maxmem'. This means the small memory systems will not
see any additional tax in terms of page table pages for the direct map.

However all amd64 systems, regardless of the memory size, will use 3 more
pages to accomodate the bump in the KVA size.

More details available here:
http://lists.freebsd.org/pipermail/freebsd-hackers/2013-June/043015.html
http://lists.freebsd.org/pipermail/freebsd-current/2013-July/043143.html

Tested with the following configurations:
- Sandybridge server with 64GB of memory.
- bhyve VM with 64MB of memory.
- bhyve VM with a 8GB of memory with the memory segment above 4GB cuddling
right up against the 4TB maximum memory limit.

Discussed on: hackers@, current@
Submitted by: Chris Torek (torek@torek.net)


254374 15-Aug-2013 brooks

Use an ANSI C definition of initializecpucache() to match the declaration
and the rest of the file.


254182 10-Aug-2013 kib

Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue. Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked. See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members. Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


254141 09-Aug-2013 attilio

On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc (older version)
Reviewed by: jeff
Tested by: pho, scottl


254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


254081 08-Aug-2013 neel

Use local variables with the appropriate types and eliminate a bunch of casts.

This is a cosmetic change but it does help with a proposed change to increase
the maximum size of physical memory supported on amd64 platforms.

Submitted by: Chris Torek (torek@torek.net)


254065 07-Aug-2013 kib

Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation


254064 07-Aug-2013 kib

Change the pmap_ts_referenced() method of amd64 pmap to use shared
pvh_global_lock. This allows the method to be executed in parallel,
avoiding undue contention on the pvh_global_lock for the multithreaded
pagedaemon.

The pmap_ts_referenced() function has to inspect the page mappings for
several pmaps, which need to be locked while pv list lock is owned.
This contradicts to the lock order, where pmap lock is before pv list
lock. Introduce the generation count for the pv list of the page or
superpage, which indicate any change in the pv list, and, as usual,
perform restart of the iteration if generation changed while pv lock
was dropped for blocking acquire of a pmap lock.

Reported and tested by: pho
Reviewed by: alc
Sponsored by: The FreeBSD Foundation


254025 07-Aug-2013 jeff

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253949 05-Aug-2013 jeff

- Introduce a specific function, pmap_remove_kernel_pde, for removing
huge pages in the kernel's address space. This works around several
asserts from pmap_demote_pde_locked that did not apply and gave false
warnings.

Discovered by: pho
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division


253747 28-Jul-2013 avg

x86: detect mwait capabilities and extensions, when present

Reviewed by: kib (earlier amd64-only version)
MFC after: 2 weeks


253685 26-Jul-2013 jeff

- Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc.
This eliminates some unusual uses of that API in favor of more typical
uses of kmem_malloc().

Discussed with: kib/alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253582 23-Jul-2013 neel

Fix a bug introduced in r252646 that causes a page with the PG_PTE_PAT bit set
to be interpreted as a superpage. This is because PG_PTE_PAT is at the same
bit position in PTE as PG_PS is in a PDE.

This caused a number of regressions on amd64 systems: panic when starting
X applications, freeze during shutdown etc.

Pointy hat to: me
Tested by: gperez@entel.upc.edu, joel, dumbbell
Reviewed by: kib


253352 15-Jul-2013 kib

MFi386: add ddb "show sysregs" command.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


253140 10-Jul-2013 kib

Clear m->object for the page taken from the delayed free list for
reuse as the pv chink page in reclaim_pv_chunk(). Having non-NULL
m->object is wrong for page not owned by an object and confuses both
vm_page_free_toq() and vm_page_remove() when the page is freed later.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


252646 03-Jul-2013 neel

If a superpage mapping is being removed then we need to ignore the PG_PDE_PAT
bit when looking up the vm_page associated with the superpage's physical
address.

If the caching attribute for the mapping is write combining or write protected
then the PG_PDE_PAT bit will be set and thus cause an 'off-by-one' error
when looking up the vm_page.

Fix this by using the PG_PS_FRAME mask to compute the physical address for
a superpage mapping instead of PG_FRAME.

This is a theoretical issue at this point since non-writeback attributes are
currently used only for fictitious mappings and fictitious mappings are not
subject to promotion.

Discussed with: alc, kib
MFC after: 2 weeks


251988 19-Jun-2013 kib

Some clarifications and updates for the comments, mostly retrieved
from Bruce Evans. Trim the trailing spaces.

MFC after: 1 week


251720 14-Jun-2013 neel

Remove unused macros PTESHIFT, PDESHIFT, PDPESHIFT and PML4ESHIFT.

Reviewed by: alc


251703 13-Jun-2013 jeff

- Add a BIT_FFS() macro and use it to replace cpusetffs_obj()

Discussed with: attilio
Sponsored by: EMC / Isilon Storage Division


251324 03-Jun-2013 kib

Assert that interrupts are enabled in the trap handlers on x86 before
calling generic code to deliver signals.

Discussed with: bde
Tested by: pho
MFC after: 1 week


251039 27-May-2013 kib

Use slightly more idiomatic expression to get the address of array.

Tested by: dim, pgj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


251035 27-May-2013 kib

When reporting the fault details, also print %rsp.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


251033 27-May-2013 kib

When handling an exception from the attempt from loading the faulting
context on return from the trap handler, re-enable the interrupts on
i386 and amd64. The trap return path have to disable interrupts since
the sequence of loading the machine state is not atomic. The trap()
function which transfers the control to the special handler would
enable the interrupt, but an iret loads the previous eflags with PSL_I
clear. Then, the special handler calls trap() on its own, which now
sees the original eflags with PSL_I set and does not enable
interrupts.

The end result is that signal delivery and process exiting code could
be executed with interrupts disabled, which is generally wrong and
triggers several assertions.

For amd64, the interrupts are enabled conditionally based on PSL_I in
the eflags of the outer frame, as it is already done for
doreti_iret_fault. For i386, the interrupts are enabled
unconditionally, the ast loop could have opened a window with
interrupts enabled just before the iret anyway.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


250884 21-May-2013 attilio

o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


250851 21-May-2013 kib

Fix the hardware watchpoints on SMP amd64. Load the updated %dr
registers also on other CPUs, besides the CPU which happens to execute
the ddb. The debugging registers are stored in the pcpu area,
together with the command which is executed by the IPI stop handler
upon resume.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


250850 21-May-2013 kib

Add amd64-specific ddb command 'show phys2dmap', which calculates the
address in the direct map corresponding to the given physical address.

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


250840 21-May-2013 marcel

Add basic support for FDT to i386 & amd64. This change includes:
1. Common headers for fdt.h and ofw_machdep.h under x86/include
with indirections under i386/include and amd64/include.
2. New modinfo for loader provided FDT blob.
3. Common x86_init_fdt() called from hammer_time() on amd64 and
init386() on i386.
4. Split-off FDT specific low-level console functions from FDT
bus methods for the uart(4) driver. The low-level console
logic has been moved to uart_cpu_fdt.c and is used for arm,
mips & powerpc only. The FDT bus methods are shared across
all architectures.
5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from: Juniper Networks, Inc.


250624 13-May-2013 ed

Improve readability of static assertions for OFFSET_* macros.

Instead of doing all sorts of weird casting of constants to
pointer-pointers, simply use the standard C offsetof() macro to obtain
the offset of the respective fields in the structures.


250495 11-May-2013 rpaulo

Fix several standard extended feature bits.

Submitted by: Oliver Pinter <oliver.pntr at gmail.com>


250423 09-May-2013 dchagin

Retire write-only PCB_GS32BIT pcb flag on amd64.


250415 09-May-2013 kib

Correct the type for the literal used on the left side of the shift up
to 63 bit positions.

Do not fill the save area and do not set the saved bit in the xstate
bit vector for the state which is not marked as enabled in xsave_mask.

Reported and tested by: Jim Ohlstein <jim@ohlste.in>
MFC after: 3 days


250153 01-May-2013 kib

Partially saved extended state must be handled always, i.e. for both
fpu-owned context, and for pcb-saved one. More, the XSAVE could do
partial save, same as XSAVEOPT, so qualifier for the handler should be
use_xsave and not use_xsaveopt.

Since xsave_area_desc is now needed regardless of the XSAVEOPT use,
remove the write-only use_xsaveopt variable.

In collaboration with: jhb
MFC after: 1 week


250152 01-May-2013 kib

The check to ensure that xstate_bv always has XFEATURE_ENABLED_X87 and
XFEATURE_ENABLED_SSE bits set is not needed. CPU correctly handles
any bitmask which is subset of the enabled bits in %XCR0.

More, CPU instructions XSAVE and XSAVEOPT could write the mask without
e.g. XFEATURE_ENABLED_SSE, after the VZEROALL. The check prevents the
restoration of the otherwise valid FPU save area.

In collaboration with: jhb
MFC after: 1 week


249601 18-Apr-2013 rpaulo

Print RDSEED, ADX, and SMAP.

Pointed out by: kib


249577 17-Apr-2013 rpaulo

Print more bits from the standard extended features CPUID which will be
available in the Haswell architecture (c.f. Intel Document #319433-012A).


249439 13-Apr-2013 kib

Fix the name of the pcb member in the comments.

Submitted by: Oliver Pinter <oliver.pntr@gmail.com>
MFC after: 3 days


248508 19-Mar-2013 kib

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


248449 18-Mar-2013 attilio

Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
resident/cached pages collections as the per-vm_page_t splay iterators
are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations. In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone. This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves. However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan. It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by: EMC / Isilon storage division
In collaboration with: alc, jeff
Tested by: flo, pho, jhb, davide
Tested by: ian (arm)
Tested by: andreast (powerpc)


248280 14-Mar-2013 kib

Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks


248140 10-Mar-2013 alc

The kernel pmap is statically allocated, so there is really no need to
explicitly initialize its pm_root field to zero.

Sponsored by: EMC / Isilon Storage Division


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


247622 02-Mar-2013 attilio

Merge from vmc-playground branch:
Rename the pv_entry_t iterator from pv_list to pv_next.
Besides being more correct technically (as the name seems to suggest
this is a list while it is an iterator), it will also be needed by
vm_radix work to avoid a nameclash on macro expansions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: flo, pho, jhb, davide


247454 28-Feb-2013 davide

MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.


247400 27-Feb-2013 attilio

Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


246855 15-Feb-2013 jkim

Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is
no functional change.


246802 14-Feb-2013 kib

Print slightly more useful information on the 'bad pte' panic.

No objections from: alc
MFC after: 1 week


246801 14-Feb-2013 kib

Assert that user address is never qremoved.

No objections from: alc
MFC after: 1 week


246384 06-Feb-2013 neel

Compute the number of initial kernel page table pages (NKPT) dynamically.

This eliminates the need to recompile the kernel when the default value
of NKPT is not big enough - for e.g. when loading large kernel modules
or memory disk images from the loader.

If NKPT is defined in the kernel configuration file then it overrides the
dynamic calculation.

Reviewed by: alc, kib


246248 02-Feb-2013 avg

cpususpend_handler: mark AP as resumed only after fully setting up lapic

Reviewed by: jhb
Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>,
KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after: 12 days


245640 19-Jan-2013 jhb

Fix build with SMP disabled.`

Reported by: bf


245577 17-Jan-2013 jhb

Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7). The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.

MFC after: 2 weeks


245204 09-Jan-2013 neel

Add a "pause" to busy wait loops in the cpu reset path.

This should not matter much when running on bare metal but it makes the guest
more friendly when running inside a virtual machine.

Discussed with: jhb
Obtained from: NetApp


244144 12-Dec-2012 grehan

Implement an API to allow a hypervisor to save/restore
guest floating point state without having to know the
size of floating-point state.

Unstaticize fpurestore to allow the hypervisor to
save/restore guest state using fpusave/fpurestore
on the allocated FPU state area.

Reviewed by: kib
Obtained from: NetApp/bhyve
MFC after: 1 week


244077 10-Dec-2012 kib

Add amd64-specific ddb command "show pte". The command displays the
hierarchy of the page table entries which map the specified address.

Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


243836 03-Dec-2012 kib

Print the frame addresses for the backtraces on i386 and amd64. It
allows both to inspect the frame sizes and to manually peek into the
frames from ddb, if needed.

Reviewed by: dim
MFC after: 2 weeks


243132 16-Nov-2012 kib

Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h
to vm/vm_phys.h, where it belongs.

Requested and reviewed by: alc
MFC after: 2 weeks


243040 14-Nov-2012 kib

Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks


242828 09-Nov-2012 kib

Do not try to enable new features in the %cr4 if running under
hypervisor. Apparently, hypervisors failed to filter out 'Standard
Extended Features' report from CPUID, but deliver #gp when
corresponding bit in %cr4 is toggled.

This shall be reconsidered later, after hypervisors correct the bug.

Reported and tested by: joel
Reviewed by: avg
MFC after: 2 weeks


242534 03-Nov-2012 attilio

Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by: alc, jimharris


242433 01-Nov-2012 kib

Enable the new instructions for reading and writing bases for %fs,
%gs, when supported. Note that WRFSBASE and WRGSBASE are not very
useful on FreeBSD right now, because a return from the kernel mode to
userspace reloads the bases specified by the sysarch(2) syscall, most
likely.

Enable the Supervisor Mode Execution Prevention (SMEP) when
supported. Since the loader(8) performs hand-off to the kernel with
the page tables which contradict the SMEP, postpone enabling the SMEP
on BSP until pmap switched for the proper kernel tables.

Debugged with the help from: avg
Tested by: avg, Michael Moll <kvedulv@kvedulv.de>
MFC after: 1 month


242432 01-Nov-2012 kib

Provide the reading and display of the Standard Extended Features,
introduced with the IvyBridge CPUs. Provide the definitions for new
bits in CR3 and CR4 registers.

Tested by: avg, Michael Moll <kvedulv@kvedulv.de>
MFC after: 2 weeks


241880 22-Oct-2012 eadler

The 'testing memory' patch gets printed too many times

Approved by: cperciva (implicit)


241850 22-Oct-2012 eadler

Explain the upcoming delay by printing a message when the kernel
is about to begin testing memory.

Reviewed by: dteske, adri
Approved by: cperciva
MFC after: 1 week


241549 14-Oct-2012 kib

Print the %rip value for uprintf_signal.

MFC after: 1 week


241371 09-Oct-2012 attilio

Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by: marius


241020 28-Sep-2012 alc

Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.


240773 21-Sep-2012 dim

After r205013, amd64 and i386 CPU family and model IDs were printed out
in hexadecimal, but without any 0x prefix, which can be very misleading.

MFC after: 3 days


240317 10-Sep-2012 alc

Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures. Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.

MFC after: 10 days


240244 08-Sep-2012 attilio

userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by: kib
MFC after: 1 week


240126 05-Sep-2012 alc

Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the
comment describing them. Both the function names and the comment had grown
stale. Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page. Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.


239252 14-Aug-2012 kib

Add a hackish debugging facility to provide a bit of information about
reason for generated trap. The dump of basic signal information and 8
bytes of the faulting instruction are printed on the controlling
terminal of the process, if the machdep.uprintf_signal syscal is
enabled.

The print is the only practical way to debug traps from a.out
processes I am aware of. Because I have to reimplement it each time I
debug an issue with a.out support on amd64, commit the hack to main
tree.

MFC after: 1 week


239241 13-Aug-2012 jhb

Remove the deassert INIT IPI from the IPI startup sequence for APs.
It is not listed in the boot sequence in the MP specification (1.4),
and it is explicitly ignored on modern CPUs. It was only ever required
when bootstrapping systems with external APICs (that is, SMP machines
with 486s), which FreeBSD has never supported (and never will).

While here, tidy some comments and remove some banal ones.


239235 13-Aug-2012 jhb

Add a 10 millisecond delay after sending the initial INIT IPI. This
matches the algorithm in the MP specification (1.4). Previously we
were sending out the deassert INIT IPI immediately after the initial
INIT IPI was sent.


239137 08-Aug-2012 alc

The assertion that I added in r238889 could legitimately fail when a
debugger creates a breakpoint. Replace that assertion with a narrower
one that still achieves my objective.

Reported and tested by: kib


239125 07-Aug-2012 kib

Do not apply errata 721 workaround when under hypervisor, since
typical hypervisor does not implement access to the required MSR,
causing #GP on boot.

Reported and tested by: olgeni
PR: amd64/170388
MFC after: 3 days


239123 07-Aug-2012 pluknet

Remove duplicate header inclusion of <sys/sysent.h>

Discussed with: bz


239072 05-Aug-2012 alc

Shave off a few more cycles from the average execution time of pmap_enter()
by simplifying the control flow and reducing the live range of "om".


238970 01-Aug-2012 alc

Revise pmap_enter()'s handling of mapping updates that change the
PTE's PG_M and PG_RW bits but not the physical page frame. First,
only perform vm_page_dirty() on a managed vm_page when the PG_M bit is
being cleared. If the updated PTE continues to have PG_M set, then
there is no requirement to perform vm_page_dirty(). Second, flush the
mapping from the TLB when PG_M alone is cleared, not just when PG_M
and PG_RW are cleared. Otherwise, a stale TLB entry may stop PG_M
from being set again on the next store to the virtual page. However,
since the vm_page's dirty field already shows the physical page as
being dirty, no actual harm comes from the PG_M bit not being set.
Nonetheless, it is potentially confusing to someone expecting to see
the PTE change after a store to the virtual page.


238914 30-Jul-2012 kib

Change (unused) prototype for stmxcsr() to match reality.

Noted by: jhb
MFC after: 1 week


238889 29-Jul-2012 alc

Shave off a few more cycles from pmap_enter()'s critical section. In
particular, do a little less work with the PV list lock held.


238671 21-Jul-2012 kib

Constently use 2-space sentence breaks.

Submitted by: bde
MFC after: 1 week


238670 21-Jul-2012 kib

Stop caching curpcb in the local variable.

Requested by: bde
MFC after: 1 week


238669 21-Jul-2012 kib

The PT_I386_{GET,SET}XMMREGS and PT_{GET,SET}XSTATE operate on the
stopped threads. Implementation assumes that the thread's FPU context
is spilled into the PCB due to stop. This is mostly true, except when
FPU state for the thread is not initialized. Then the requests operate
on the garbage state which is currently left in the PCB, causing
confusion.

The situation is indeed observed after a signal delivery and before
#NM fault on execution of any FPU instruction in the signal handler,
since sendsig(9) drops FPU state for current thread, clearing
PCB_FPUINITDONE. When inspecting context state for the signal handler,
debugger sees the FPU state of the main program context instead of the
clear state supposed to be provided to handler.

Fix this by forcing clean FPU state in PCB user FPU save area by
performing getfpuregs(9) before accessing user FPU save area in
ptrace_machdep.c.

Note: this change will be merged to i386 kernel as well, where it is
much more important, since e.g. gdb on i386 uses PT_I386_GETXMMREGS to
inspect FPU context on CPUs that support SSE. Amd64 version of gdb
uses PT_GETFPREGS to inspect both 64 and 32 bit processes, which does
not exhibit the bug.

Reported by: bde
MFC after: 1 week


238668 21-Jul-2012 kib

Stop clearing x87 exceptions in the #MF handler on amd64. If user code
understands FPU hardware enough to catch SIGFPE and unmask exceptions
in control word, then it may as well properly handle return from
SIGFPE without causing an infinite loop of #MF exceptions due to
faulting instruction restart, when needed.

Clearing exceptions causes information loss for handlers which do
understand FPU hardware, and struct siginfo si_code member cannot be
considered adequate replacement for en_sw content due to translation.

Supposed reason for clearing the exceptions, which is IRQ13 handling
oddities, were never applicable to amd64.

Note: this change will be merged to i386 kernel as well, since we do
not support IRQ13 delivery of #MF notifications for some time.

Requested by: bde
MFC after: 1 week


238623 19-Jul-2012 kib

Introduce curpcb magic variable, similar to curthread, which is MD
amd64. It is implemented as __pure2 inline with non-volatile asm read
from pcpu, which allows a compiler to cache its results.

Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb.

Note that __curthread() uses magic value 0 as an offsetof(struct pcpu,
pc_curthread). It seems to be done this way due to machine/pcpu.h
needs to be processed before sys/pcpu.h, because machine/pcpu.h
contributes machine-depended fields to the struct pcpu definition. As
result, machine/pcpu.h cannot use struct pcpu yet.

The __curpcb() also uses a magic constant instead of offsetof(struct
pcpu, pc_curpcb) for the same reason. The constants are now defined as
symbols and CTASSERTs are added to ensure that future KBI changes do
not break the code.

Requested and reviewed by: bde
MFC after: 3 weeks


238610 19-Jul-2012 alc

Don't unnecessarily set PGA_REFERENCED in pmap_enter().


238598 18-Jul-2012 kib

On AMD64, provide siginfo.si_code for floating point errors when error
occurs using the SSE math processor. Update comments describing the
handling of the exception status bits in coprocessors control words.

Remove GET_FPU_CW and GET_FPU_SW macros which were used only once.
Prefer to use curpcb to access pcb_save over the longer path of
referencing pcb through the thread structure.

Based on the submission by: Ed Alley <wea llnl gov>
PR: amd64/169927
Reviewed by: bde
MFC after: 3 weeks


238597 18-Jul-2012 kib

Add stmxcsr.

Submitted by: Ed Alley <wea llnl gov>
PR: amd64/169927
MFC after: 3 weeks


238450 14-Jul-2012 kib

Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage
mostly meets the guidelines set by the Intel SDM:
1. We use XRSTOR and XSAVE from the same CPL using the same linear
address for the store area
2. Contrary to the recommendations, we cannot zero the FPU save area
for a new thread, since fork semantic requires the copy of the
previous state. This advice seemingly contradicts to the advice
from the item 6.
3. We do use XSAVEOPT in the context switch code only, and the area
for XSAVEOPT already always contains the data saved by XSAVE.
4. We do not modify the save area between XRSTOR, when the area is
loaded into FPU context, and XSAVE. We always spit the fpu context
into save area and start emulation when directly writing into FPU
context.
5. We do not use segmented addressing to access save area, or rather,
always address it using %ds basing.
6. XSAVEOPT can be only executed in the area which was previously
loaded with XRSTOR, since context switch code checks for FPU use by
outgoing thread before saving, and thread which stopped emulation
forcibly get context loaded with XRSTOR.
7. The PCB cannot be paged out while FPU emulation is turned off, since
stack of the executing thread is never swapped out.

The context switch code is patched to issue XSAVEOPT instead of XSAVE
if supported. This approach eliminates one conditional in the context
switch code, which would be needed otherwise.

For user-visible machine context to have proper data, fpugetregs()
checks for unsaved extension blocks and manually copies pristine FPU
state into them, according to the description provided by CPUID leaf
0xd.

MFC after: 1 month


238414 13-Jul-2012 alc

Wring a few cycles out of pmap_enter(). In particular, on a user-space
pmap, avoid walking the page table twice.


238311 09-Jul-2012 jhb

Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h>
on x86 and use that to implement stop_emulating() in the fpu/npx code.
Reimplement start_emulating() in the non-XEN case by using load_cr0() and
rcr0() instead of the 'lmsw' and 'smsw' instructions. Intel explicitly
discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in
the description of these instructions in Volume 2 of the ADM.

Reviewed by: kib
MFC after: 1 month


238310 09-Jul-2012 jhb

Partially revert r217515 so that the mem_range_softc variable is always
present on x86 kernels. This fixes the build of kernels that include
'device acpi' but do not include 'device mem'.

MFC after: 1 month


238179 06-Jul-2012 kib

Use assembler mnemonic instead of manually assembling, contination for r238142.

Reviewed by: jhb
MFC after: 1 month


238166 06-Jul-2012 jhb

Several fixes to the amd64 disassembler:
- Add generic support for opcodes that are escape bytes used for
multi-byte opcodes (such as the 0x0f prefix). Use this to replace
the hard-coded 0x0f special case and add support for three-byte
opcodes that use the 0x0f38 prefix.
- Decode all Intel VMX instructions. invept and invvpid in particular are
three-byte opcodes that use the 0x0f38 escape prefix.
- Rework how the special 'SDEP' size flag works such that the default
instruction name (i_name) is the instruction when the data size
prefix (0x66) is not specified, and the alternate name in i_extra is
used when the prefix is included.
- Add a new 'ADEP' size flag similar to 'SDEP' except that it chooses
between i_name and i_extra based on the address size prefix (0x67).
Use this to fix the decoding for jrcxz vs jecxz which is determined
by the address size prefix, not the operand size prefix. Also, jcxz
is not possible in 64-bit mode, but jrcxz is the default instruction
for that opcode.
- Add support for handling instructions that have a mandatory 'rep'
prefix (this means not outputting the 'repe ' prefix until determining
if it is used as part of an opcode). Make 'pause' less of a special
case this way.
- Decode 'cmpxchg16b' and 'cdqe' which are variants of other instructions
but with a REX.W prefix.

MFC after: 1 month


238163 06-Jul-2012 alc

Make pmap_enter()'s management of PV entries consistent with the other pmap
functions that manage PV entries. Specifically, remove the PV entry from
the containing PV list only after the corresponding PTE is destroyed.

Update the pmap's wired mapping count in pmap_enter() before the PV list
lock is acquired.


238142 05-Jul-2012 jhb

Now that our assembler supports the xsave family of instructions, use them
natively rather than hand-assembled versions. For xgetbv/xsetbv, add a
wrapper API to deal with xcr* registers: rxcr() and load_xcr().

Reviewed by: kib
MFC after: 1 month


238126 05-Jul-2012 alc

Calculate the new PTE value in pmap_enter() before acquiring any locks.

Move an assertion to the beginning of pmap_enter().


238124 05-Jul-2012 alc

Correct an error in r237513. The call to reserve_pv_entries() must come
before pmap_demote_pde() updates the PDE. Otherwise, pmap_pv_demote_pde()
can crash.

Crash reported by: kib
Patch tested by: kib


238109 04-Jul-2012 jhb

Decode the 'xsave', 'xrstor', 'xsaveopt', 'xgetbv', 'xsetbv', and
'rdtscp' instructions.

MFC after: 1 month


237855 30-Jun-2012 alc

Optimize reserve_pv_entries() using the popcnt instruction.


237813 29-Jun-2012 alc

In r237592, I forgot that pmap_enter() might already hold a PV list lock
at the point that it calls get_pv_entry(). Thus, pmap_enter()'s PV list
lock pointer must be passed to get_pv_entry() for those rare occasions
when get_pv_entry() calls reclaim_pv_chunk().

Update some related comments.


237733 28-Jun-2012 alc

Avoid some unnecessary PV list locking in pmap_enter().


237684 28-Jun-2012 alc

Optimize pmap_pv_demote_pde().


237623 27-Jun-2012 alc

Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.


237604 26-Jun-2012 alc

Introduce RELEASE_PV_LIST_LOCK().


237592 26-Jun-2012 alc

Add PV list locking to pmap_enter(). Its execution is no longer serialized
by the pvh global lock.

Add a needed atomic operation to pmap_object_init_pt().


237551 25-Jun-2012 alc

Add PV chunk and list locking to pmap_change_wiring(), pmap_protect(), and
pmap_remove(). The execution of these functions is no longer serialized
by the pvh global lock.

Make some stylistic changes to the affected code for the sake of
consistency with related code elsewhere in the pmap.


237513 23-Jun-2012 alc

Introduce reserve_pv_entry() and use it in pmap_pv_demote_pde(). In order
to add PV list locking to pmap_pv_demote_pde(), it is necessary to change
the way that pmap_pv_demote_pde() allocates PV entries. Specifically,
once pmap_pv_demote_pde() begins modifying the PV lists, it can't allocate
any new PV chunks, because that could require the PV list lock to be
dropped. So, all necessary PV chunks must be allocated in advance. To my
surprise, this new approach is a few percent faster than the old one.


237414 22-Jun-2012 alc

Introduce CHANGE_PV_LIST_LOCK_TO_{PHYS,VM_PAGE}() to avoid duplication of
code.


237404 21-Jun-2012 alc

Update the PV stats in free_pv_entry() using atomics. After which, it is
no longer necessary for free_pv_entry() to be serialized by the pvh global
lock.

Retire pmap_insert_entry() and pmap_remove_entry(). Once upon a time,
these functions were called from multiple places within the pmap. Now,
each has only one caller.


237290 20-Jun-2012 alc

Add PV list locking to pmap_copy(), pmap_enter_object(), and
pmap_enter_quick(). These functions are no longer serialized by the pvh
global lock.

There is no need to release the PV list lock before calling free_pv_chunk()
in pmap_remove_pages().


237264 19-Jun-2012 alc

Condition the implementation of pv_entry_count on PV_STATS. On amd64,
pv_entry_count is purely informational. It does not serve any functional
purpose.

Add PV chunk locking to get_pv_entry().


237243 18-Jun-2012 kib

Adjust the fix in r236953, by not generating the signal manually, but
performing the return to usermode using full return path. This
consolidates the handling of exceptional situations in less number of
places, and is less code as well.

Reviewed by: jhb
MFC after: 1 week


237228 18-Jun-2012 alc

Add PV chunk and list locking to pmap_page_exists_quick(),
pmap_page_is_mapped(), and pmap_remove_pages(). These functions
are no longer serialized by the pvh global lock.


237086 14-Jun-2012 alc

Update a couple comments to reflect r235598.

X-MFC after: r235598


237085 14-Jun-2012 alc

Correctly identify the function in a KASSERT().

MFC after: 3 days


237037 13-Jun-2012 jkim

- Remove unused code for CR3 and CR4.
- Fix few style(9) nits while I am here.


237027 13-Jun-2012 jkim

- Fix resumectx() prototypes to reflect reality.
- For i386, simply jump to resumectx() with PCB in %ecx.
- Fix a style(9) nit while I am here.


236953 12-Jun-2012 bz

Fix a problem where zero-length RDATA fields can cause named(8) to crash.
[12:03]

Correct a privilege escalation when returning from kernel if
running FreeBSD/amd64 on non-AMD processors. [12:04]

Fix reference count errors in IPv6 code. [EN-12:02]

Security: CVE-2012-1667
Security: FreeBSD-SA-12:03.bind
Security: CVE-2012-0217
Security: FreeBSD-SA-12:04.sysret
Security: FreeBSD-EN-12:02.ipv6refcount
Approved by: so (simon, bz)


236938 12-Jun-2012 iwasaki

Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c
as ipi_startup().


236930 11-Jun-2012 alc

Avoid unnecessary atomic operations for clearing PGA_WRITEABLE in
pmap_remove_pages(). This reduces pmap_remove_pages()'s running time by
4 to 11% in my tests.

MFC after: 1 week


236830 10-Jun-2012 iwasaki

Some fixes for r236772.

- Remove cpuset stopped_cpus which is no longer used.
- Add a short comment for cpuset suspended_cpus clearing.
- Fix the un-ordered x86/acpica/acpi_wakeup.c in conf/files.amd64 and i386.

Pointed-out by: attilio@


236772 09-Jun-2012 iwasaki

Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.

Reviewed by: attilio@, acpi@


236534 04-Jun-2012 alc

Various small changes to PV entry management:

Constify pc_freemask[].

pmap_pv_reclaim()
Eliminate "freemask" because it was a pessimization. Add a comment about
the resident count adjustment.

free_pv_entry() [i386 only]
Merge an optimization from amd64 (r233954).

get_pv_entry()
Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
(The right strategy needs more thought. Moreover, there were unintended
differences between the amd64 and i386 implementation.)

pmap_remove_pages()
Eliminate unnecessary ()'s.


236503 03-Jun-2012 avg

free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG

Those calls are useful with hardware watchdog drivers too.

MFC after: 3 weeks


236494 02-Jun-2012 alc

Isolate the global pv list lock from data and other locks to prevent false
sharing within the cache.


236378 01-Jun-2012 alc

Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by
introducing free_pv_chunk().


236291 30-May-2012 alc

Eliminate some purely stylistic differences among the amd64, i386 native,
and i386 xen PV entry allocators.


235973 25-May-2012 alc

Correct an error in pmap_pv_reclaim(). In a rare case, when it should have
returned NULL, it might instead return a pointer to a page that it had just
unmapped.


235695 20-May-2012 alc

Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c. This new r/w lock is used primarily to synchronize access
to the PV lists. However, it will be used in a somewhat unconventional
way. As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

Reviewed by: kib
X-MFC after: r235598


235598 18-May-2012 alc

Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues. Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero. However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed. The new
pmap_pv_reclaim() doesn't even try.

Reviewed by: kib
MFC after: 6 weeks


235555 17-May-2012 kib

Use singular form for a modifier.

Submitted by: alc
MFC after: 3 days


235538 17-May-2012 kib

Fix typo.

MFC after: 3 days


234723 26-Apr-2012 attilio

Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by: gibbs
Reviewed by: jhb, marius
MFC after: 1 week


234208 13-Apr-2012 avg

add actual interrupt counters to back ipi_invlcache_counts

Otherwise one could run into a panic with COUNT_IPIS when cache
invalidation actually happened.

Reviewed by: jhb
MFC after: 1 week


234105 10-Apr-2012 marius

Fix !SMP build after r234074.

Reviewed by: attilio, jhb


234074 09-Apr-2012 attilio

BSP is not added to the mask of valid target CPUs for interrupts
in set_apic_interrupt_ids(). Besides, set_apic_interrupts_ids() is not
called in the !SMP case too.
Fix this by:
- Adding the BSP as an interrupt target directly in cpu_startup().
- Remove an obsolete optimization where the BSP are skipped in
set_apic_interrupt_ids().

Reported by: jh
Reviewed by: jhb
MFC after: 3 days
X-MFC: r233961
Pointy hat to: me


234059 09-Apr-2012 jhb

Recognize the RDRAND instruction feature.

Submitted by: Michael Fuckner michael fuckner net
MFC after: 3 days


233954 06-Apr-2012 alc

Micro-optimize free_pv_entry() for the expected case.


233781 02-Apr-2012 jhb

Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible. Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).

MFC after: 2 weeks


233707 30-Mar-2012 jhb

Move the legacy(4) driver to x86.


233704 30-Mar-2012 jkim

Re-initialize model-specific MSRs when we resume CPUs.

MFC after: 1 week


233702 30-Mar-2012 jkim

Work around Erratum 721 for AMD Family 10h and 12h processors.

"Under a highly specific and detailed set of internal timing conditions,
the processor may incorrectly update the stack pointer after a long series
of push and/or near-call instructions, or a long series of pop and/or
near-return instructions. The processor must be in 64-bit mode for this
erratum to occur."

MFC after: 3 days


233676 29-Mar-2012 jhb

Use a more proper fix for enabling HT MSI mapping windows on Host-PCI
bridges. Rather than blindly enabling the windows on all of them, only
enable the window when an MSI interrupt is enabled for a device behind
the bridge, similar to what already happens for HT PCI-PCI bridges.

To implement this, each x86 Host-PCI bridge driver has to be able to
locate it's actual backing device on bus 0. For ACPI, use the _ADR
method to find the slot and function of the device. For the non-ACPI
case, the legacy(4) driver already scans bus 0 looking for Host-PCI
bridge devices. Now it saves the slot and function of each bridge that
it finds as ivars that the Host-PCI bridge driver can then use in its
pcib_map_msi() method.

This fixes machines where non-MSI interrupts were broken by the previous
round of HT MSI changes.

Tested by: bapt
MFC after: 1 week


233628 28-Mar-2012 fabient

Add software PMC support.

New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after: 1 month


233433 24-Mar-2012 alc

Disable detailed PV entry accounting by default. Add a config option
to enable it.

MFC after: 1 week


233291 22-Mar-2012 alc

Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB. In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB. This is exactly as
recommended in AMD's documentation. In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry. In short, this saves us from
having to perform potentially costly TLB flushes. In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry. Usually, such spurious page faults are handled
by vm_fault() without incident. However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault(). This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with: kib
Tested by: gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after: 1 week


233290 22-Mar-2012 alc

Change pv_entry_count to a long. During the lifetime of FreeBSD 10.x,
physical memory sizes at the high-end will likely reach a point that
the number of pv entries could overflow an int.

Submitted by: kib


233256 21-Mar-2012 alc

Eliminate vm.pmap.shpgperproc and vm.pmap.pv_entry_max because they no
longer serve any purpose. Prior to r157446, they served a purpose
because there was a fixed amount of kernel virtual address space
reserved for pv entries at boot time. However, since that change pv
entries are accessed through the direct map, and so there is no limit
imposed by a fixed amount of kernel virtual address space.

Fix a couple of nearby style issues.

Reviewed by: jhb, kib
MFC after: 1 week


233185 19-Mar-2012 kib

Re-apply r233122 erronously reverted in r233168.

Submitted by: jhb
Pointy hat to: kib
MFC after: 2 weeks


233168 19-Mar-2012 kib

If we ever allow for managed fictitious pages, the pages shall be
excluded from superpage promotions. At least one of the reason is
that pv_table is sized for non-fictitious pages only.

Consistently check for the page to be non-fictitious before accesing
superpage pv list.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 2 weeks


233122 18-Mar-2012 alc

Style fix to pmap_protect().

Submitted by: bde


233097 17-Mar-2012 alc

With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists.

Reviewed by: kib


232842 12-Mar-2012 alc

Simplify the error checking in one branch of trap_pfault() and update
the nearby comment.

Add missing whitespace to a return statement in trap_pfault().

Submitted by: kib [2]


232747 09-Mar-2012 jhb

Move i386's intr_machdep.c to the x86 tree and share it with amd64.


232520 04-Mar-2012 tijl

Copy amd64 ptrace.h to x86 and merge with i386 ptrace.h. Replace
amd64/i386/pc98 ptrace.h with stubs.

For amd64 PT_GETXSTATE and PT_SETXSTATE have been redefined to match the
i386 values. The old values are still supported but should no longer be
used.

Reviewed by: kib


232226 27-Feb-2012 jhb

Update incorrect comment.


231797 15-Feb-2012 jkim

Clean up RFLAG and CR3 register handling and nearby comments. For BSP, use
spinlock_enter()/spinlock_exit() to save/restore RFLAGS. We know interrupt
is disabled when returning from S3. For AP, we do not have to save/restore
it because IRET will do it for us any way. Do not save CR3 locally because
savectx() does it and BSP does not have to switch to kernel map for amd64.
Change contigmalloc(9) flag while I am in the neighborhood.


231781 15-Feb-2012 jkim

Some BIOSes are known for corrupting low 64KB between suspend and resume.
Mask off the first 16 pages unless we appear to be running in a VM. This
address may be overridden by 'hw.physmem.start' tunable from loader.
Note Linux used to have a BIOS quirk table for this issue but it seems they
made it default recently.


231441 10-Feb-2012 kib

In cpu_set_user_tls(), consistently set PCB_FULL_IRET pcb flag for
both 64bit and 32bit binaries, not for 64bit only.

The set of the flag is not neccessary there, because the only current
user of the cpu_set_user_tls() is create_thread(), which calls
cpu_set_upcall() before and cpu_set_upcall() itself sets PCB_FULL_IRET.
Change the function for consistency and preserve existing KPI for now.

MFC after: 1 week


231169 07-Feb-2012 jkim

Do not EOI local APIC too early. Just do doreti normally after resuming.


230766 30-Jan-2012 kib

Move xrstor/xsave/xsetbv into fpu.c and reorder them.

Requested by: bde
MFC after: 1 month


230623 27-Jan-2012 kmacy

exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create

Reviewed by: alc, avg
MFC after: 2 weeks


230426 21-Jan-2012 kib

Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs. As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
and the required size of FPU save area. The hw.use_xsave tunable is
provided for disabling XSAVE, and hw.xsave_mask may be used to
select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
(run-time sized) user save area on the top of the kernel stack,
right above the PCB. Reorganize the thread0 PCB initialization to
postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
well. FPU state is only useful for suspend, where it is saved in
dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
mcontext_t has a valid pointer to out-of-struct extended FPU
state. Signal handlers are supplied with stack-allocated fpu
state. The sigreturn(2) and setcontext(2) syscall honour the flag,
allowing the signal handlers to inspect and manipilate extended
state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
place in the fixed-sized mcontext_t to place variable-sized save
area. And, since mcontext_t is embedded into ucontext_t, makes it
impossible to fix in a reasonable way. Instead of extending
getcontext(2) syscall, provide a sysarch(2) facility to query
extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
consumers, making it opaque. Internally, struct fpu_kern_ctx now
contains a space for the extended state. Convert in-kernel consumers
of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after: 1 month


229085 31-Dec-2011 gavin

Default to not performing the early-boot memory tests when we detect we
are booting inside a VM. There are three reasons to disable this:

o It causes the VM host to believe that all the tested pages or RAM are
in use. This in turn may force the host to page out pages of RAM
belonging to other VMs, or otherwise cause problems with fair resource
sharing on the VM cluster.
o It adds significant time to the boot process (around 1 second/Gig in
testing)
o It is unnecessary - the host should have already verified that the
memory is functional etc.

Note that this simply changes the default when in a VM - it can still be
overridden using the hw.memtest.tests tunable.

MFC after: 4 weeks


228935 28-Dec-2011 alc

Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold():
If the page lock acquisition is retried, then the underlying thread is
not unpinned.

Wrap nearby lines that exceed 80 columns.


227843 22-Nov-2011 marius

- There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.


227442 11-Nov-2011 kib

Weaken the part of assertions added in the r227394. Only check that the
process state is stopped.

MFC after: 1 week


227399 09-Nov-2011 kib

Attempt to improve formatting and content of several comments for
amd64 and i386 MD code.

Based on suggestions by: bde
MFC after: 1 week


227394 09-Nov-2011 kib

Stopped process may legitimately have some threads sleeping and not
suspended, if the sleep is uninterruptible.

Reported and tested by: pho
MFC after: 1 week


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227290 07-Nov-2011 rstone

Fix the DTrace pid return trap interrupt vector. Previously we were using
31, but that vector is reserved.

Without this fix, running dtrace -p <pid> would either cause the target
process to crash or the kernel to page fault.

Obtained from: rpaulo
MFC after: 3days


226925 30-Oct-2011 marcel

Revert rev. 226893: subr_syscall.c is being included from C files and
on amd64 with FREEBSD32 enabled, this means that systrace_probe_func
gets defined twice.


226893 29-Oct-2011 marcel

Define systrace_probe_func in subr_syscall.c where it's used, instead
of defining it in MD code. This eliminates porting to other architectures.


226843 27-Oct-2011 alc

Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc(). While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.


226498 18-Oct-2011 des

Trace attempts to call restricted MD syscalls.


225943 03-Oct-2011 kib

Do not allow the kernel to access usermode pages without installed
fault handler. Panic immediately in such situation, on i386 and amd64.

Reviewed by: avg, jhb
MFC after: 1 week


225936 03-Oct-2011 attilio

Add some improvements in the idle table callbacks:
- Replace instances of manual assembly instruction "hlt" call
with halt() function calling.
- In cpu_idle_mwait() avoid races in check to sched_runnable() using
the same pattern used in cpu_idle_hlt() with the 'hlt' instruction.
- Add comments explaining the logic behind the pattern used in
cpu_idle_hlt() and other idle callbacks.

In collabouration with: jhb, mav
Reviewed by: adri, kib
MFC after: 3 weeks


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


225576 15-Sep-2011 kib

Put amd64_syscall() prototype in md_var.h.

Requested by: jhb
Reviewed by: alc, jhb
Approved by: re (bz)
MFC after: 2 weeks


225575 15-Sep-2011 kib

Microoptimize the return path for the fast syscalls on amd64. Arrange
the code to have the fall-through path to follow the likely target.
Do not use intermediate register to reload user %rsp.

Proposed by: alc
Reviewed by: alc, jhb
Approved by: re (bz)
MFC after: 2 weeks


225483 11-Sep-2011 kib

The jump target shall be after the padding, not into it.

Reported by: alc
Approved by: re (bz)
MFC after: 2 weeks


225475 11-Sep-2011 kib

Perform amd64-specific microoptimizations for native syscall entry
sequence. The effect is ~1% on the microbenchmark.

In particular, do not restore registers which are preserved by the
C calling sequence. Align the jump target. Avoid unneeded memory
accesses by calculating some data in syscall entry trampoline.

Reviewed by: jhb
Approved by: re (bz)
MFC after: 2 weeks


225474 11-Sep-2011 kib

Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by: jhb
Approved by: re (bz)
MFC after: 2 weeks


225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


225194 26-Aug-2011 jhb

Make NKPT a kernel option on amd64 so that it can be set to a non-default
value from kernel config files.

Reviewed by: alc
Approved by: re (kib)
MFC after: 1 week


225048 20-Aug-2011 bz

In HEAD when doing no further checkes there is no reason use the
temporary variable and check with if as TUNABLE_*_FETCH do not
alter values unless successfully found the tunable.

Reported by: jhb, bde
MFC after: 3 days
X-MFC with: r224516
Approved by: re (kib)


224746 09-Aug-2011 kib

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


224516 30-Jul-2011 bz

Introduce a tunable to disable the time consuming parts of bootup
memtesting, which can easily save seconds to minutes of boot time.
The tunable name is kept general to allow reusing the code in
alternate frameworks.

Requested by: many
Discussed on: arch (a while a go)
Obtained from: Sandvine Incorporated
Reviewed by: sbruno
Approved by: re (kib)
MFC after: 2 weeks


224187 18-Jul-2011 attilio

- Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
This area can be widely improved by applying the same to other
architectures and likely finding an unified approach among them and
move the whole code to be MI. More work in this area is expected to
happen fairly soon.

No MFC is previewed for this patch.

Tested by: pluknet
Reviewed by: jhb
Approved by: re (kib)


223758 04-Jul-2011 attilio

With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by: marcel, marius, alc
Tested by: pluknet
MD testing by: marcel, marius, gonzo, andreast


223732 02-Jul-2011 alc

When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by: kib


223692 30-Jun-2011 jonathan

Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.

Approved by: mentor(rwatson), re(bz)


223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


223668 29-Jun-2011 jonathan

We may split today's CAPABILITIES into CAPABILITY_MODE (which has
to do with global namespaces) and CAPABILITIES (which has to do with
constraining file descriptors). Just in case, and because it's a better
name anyway, let's move CAPABILITIES out of the way.

Also, change opt_capabilities.h to opt_capsicum.h; for now, this will
only hold CAPABILITY_MODE, but it will probably also hold the new
CAPABILITIES (implying constrained file descriptors) in the future.

Approved by: rwatson
Sponsored by: Google UK Ltd


222929 10-Jun-2011 jhb

Implement BUS_ADJUST_RESOURCE() for the x86 drivers that sit between the
Host-PCI bridge drivers and nexus.


222853 08-Jun-2011 avg

remove code for dynamic offlining/onlining of CPUs on x86

The code has definitely been broken for SCHED_ULE, which is a default
scheduler. It may have been broken for SCHED_4BSD in more subtle ways,
e.g. with manually configured CPU affinities and for interrupt devilery
purposes.
We still provide a way to disable individual CPUs or all hyperthreading
"twin" CPUs before SMP startup. See the UPDATING entry for details.

Interaction between building CPU topology and disabling CPUs still
remains fuzzy: topology is first built using all availble CPUs and then
the disabled CPUs should be "subtracted" from it. That doesn't work
well if the resulting topology becomes non-uniform.

This work is done in cooperation with Attilio Rao who in addition to
reviewing also provided parts of code.

PR: kern/145385
Discussed with: gcooper, ambrisko, mdf, sbruno
Reviewed by: attilio
Tested by: pho, pluknet
X-MFC after: never


222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


222756 06-Jun-2011 avg

don't use cpuid level 4 in x86 cpu topology detection if it's not supported

This regression was introduced in r213323.
There are probably no Intel cpus that support amd64 mode, but do not
support cpuid level 4, but it's better to keep i386 and amd64 versions
of this code in sync.

Discovered by: pho
Tested by: pho
MFC after: 2 weeks


222043 17-May-2011 jkim

Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features.
Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel
AVX instructions. The SSE5 bit was recycled as XOP extended instruction
bit, CVT16 was deprecated in favor of F16C (half-precision float conversion
instructions for AVX), and the remaining FMA4 (4-operand FMA instructions)
gained a separate CPUID bit. Replace non-existent references with today's
CPUID specifications.


221784 11-May-2011 dchagin

Remove wrong comment.

MFC after: 1 week.


221527 06-May-2011 avg

prepare code that does topology detection for amd cpus for bulldozer

This also introduces a new detection path for family 10h and newer
pre-bulldozer cpus, pre-10h hardware should not be affected.

Tested by: Gary Jennejohn <gljennjohn@googlemail.com>
(with pre-10h hardware)
MFC after: 2 weeks


221188 28-Apr-2011 jkim

Define "Hypervisor Present" bit. This bit is used by several hypervisors to
identify CPUs running under emulation. Currently QEMU-KVM, Xen-HVM, VMware,
and MS Hyper-V are known to set this bit.

MFC after: 3 days


221173 28-Apr-2011 attilio

Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by: Sandvine Incorporated
Discussed with: emaste, des
MFC after: 2 weeks


221069 26-Apr-2011 sobomax

With the typical memory size of the system in tenth of gigabytes
counting memory being dumped in 16MB increments is somewhat silly.
Especially if the dump fails and everything you've got for debugging
is screen filled with numbers in 16 decrements... Replace that with
percentage-based progress with max 10 updates all fitting into one
line.

Collapse other very "useful" piece of crash information (total ram) into
the same line to save some more space.

MFC after: 1 week


221032 25-Apr-2011 rmacklem

Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.

Reviewed by: jhb
MFC after: 2 weeks


220803 18-Apr-2011 kib

Make pmap_invalidate_cache_range() available for consumption on amd64.

Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


220584 13-Apr-2011 jkim

Reduce errors in effective frequency calculation.


220583 12-Apr-2011 jkim

Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and
MPERF MSRs are available. It was disabled in r216443. Remove the earlier
hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little
bit more reliable now.


220579 12-Apr-2011 jkim

Probe capability to find effective frequency. When the TSC is P-state
invariant, APERF/MPERF ratio can be used to find effective frequency.


220461 08-Apr-2011 kib

Remove setting of PCB_FULL_IRET at the places where we are going to call
update_gdt_{f,g}sbase. The functions set the flag when td == curthread,
and sysarch is always called with curthread.

Reviewed by: jhb, jkim
MFC after: 1 week


220460 08-Apr-2011 kib

Disable local interrupts before testing the PCB_FULL_IRET flag.
Thread might be preempted after testing, which causes the flag to be
cleared. If ast was not delivered, we will do sysret with potentially
wrong fs/gs bases.

Reviewed by: jhb, jkim
MFC after: 1 week (together with r220430, r220452)


220453 08-Apr-2011 rstone

Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi
and machdep.kdb_on_nmi.

Approved by: emaste (mentor)
MFC after: 1 week


220452 08-Apr-2011 jhb

Fix a bug in the previous change to restore the fast path for syscall
return. The ast() function may cause a context switch in which case
PCB_FULL_IRET would be set in the pcb. However, the code was not
rechecking the flag after ast() returned and would not properly restore
the FSBASE and GSBASE MSRs. To fix, recheck the PCB_FULL_IRET flag after
ast() returns.

While here, trim an instruction (and memory access) from the doreti path
and fix a typo in a comment.

MFC after: 1 week


220433 07-Apr-2011 jkim

Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now. More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly). Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).


220431 07-Apr-2011 jhb

pcb_flags is an int, so use testl rather than testq.

Pointy hat to: jhb
Submitted by: jkim
MFC after: 1 week


220430 07-Apr-2011 jhb

If a system call does not request a full interrupt return, use a fast
path via the sysretq instruction to return from the system call. This was
removed in 190620 and not quite fully restored in 195486. This resolves
most of the performance regression in system call microbenchmarks between
7 and 8 on amd64.

Reviewed by: kib
MFC after: 1 week


220429 07-Apr-2011 jkim

Remove stale checks for RDTSC support. amd64 must have TSC support anyway.


220090 28-Mar-2011 alc

The new binutils has correctly redefined MAXPAGESIZE on amd64 as 0x200000
instead of 0x100000. As a side effect, an amd64 kernel now loads at
physical address 0x200000 instead of 0x100000. This is probably for the
best because it avoids the use of a 2MB page mapping for the first 1MB of
the kernel that also spans the fixed MTRRs. However, getmemsize() still
thinks that the kernel loads at 0x100000, and so the physical memory between
0x100000 and 0x200000 is lost. Fix this problem by replacing the hard-wired
constant in getmemsize() by a symbol "kernphys" that is defined by the
linker script.

In collaboration with: kib


220058 27-Mar-2011 alc

Amd64 doesn't have a lazypmap ipi.


220021 26-Mar-2011 alc

Move an external declaration to the appropriate header file.


220018 26-Mar-2011 jkim

Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta
CPUs. These CPUs need explicit MSR configuration to expose ceratin CPU
capabilities (e.g., CMPXCHG8B) to work around compatibility issues with
ancient software. Unfortunately, Rise mP6 does not set the CX8 bit in CPUID
and there is no MSR to expose the feature although all mP6 processors are
capable of CMPXCHG8B according to datasheets I found from the Net. Clean up
and simplify VIA PadLock detection while I am in the neighborhood.


219523 11-Mar-2011 mdf

Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.


219473 11-Mar-2011 jkim

Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns
off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as
a CPU ticker. Note tsc_present does not change by this tunable.


219468 10-Mar-2011 mdf

Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name. Also consistenly use strlcpy().

Suggested by: Warner Losh


219461 10-Mar-2011 jkim

Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.


219405 08-Mar-2011 dchagin

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


219157 02-Mar-2011 alc

Make a change to the implementation of the direct map to improve performance
on processors that support 1 GB pages. Specifically, if the end of physical
memory is not aligned to a 1 GB page boundary, then map the residual
physical memory with multiple 2 MB page mappings rather than a single 1 GB
page mapping. When a 1 GB page mapping is used for this residual memory,
access to the memory is slower than when multiple 2 MB page mappings are
used. (I suspect that the reason for this slowdown is that the TLB is
actually being loaded with 4 KB page mappings for the residual memory.)

X-MFC after: r214425


219134 01-Mar-2011 rwatson

Continue to introduce Capsicum capability mode:

White list sysarch calls allowed in capability mode; arguably, there
should be some link between the capability mode model and the privilege
model here. Sysarch is a morass similar to ioctl, in many senses.

Submitted by: anderson
Discussed with: benl, kris, pjd
Sponsored by: Google, Inc.
Obtained from: Capsicum Project
MFC after: 3 months


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218744 16-Feb-2011 dchagin

To avoid excessive code duplication create wrapper for fill regs
from stack frame. Change the trap() code to use newly created function
instead of explicit regs assignment.


218327 05-Feb-2011 kib

Clear the padding when returning context to the usermode, for
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.

Noted and discussed with: bde
MFC after: 2 weeks


218195 02-Feb-2011 mdf

Put the general logic for being a CPU hog into a new function
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after: 1 week


217896 26-Jan-2011 dchagin

Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after: 1 month


217886 26-Jan-2011 mdf

Set td_kstack_pages for thread0. This was already being done for most
architectures, but i386 and amd64 were missing it.

Submitted by: Mohd Fahadullah <mfahadullah AT isilon DOT com>


217688 21-Jan-2011 pluknet

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


217604 19-Jan-2011 kib

Use CTLFLAG_RDTUN for read-only sysctl that exports tunable.

Reminded by: pjd
MFC after: 6 days


217564 18-Jan-2011 kib

Make the length of the LDT a loader tunable, machdep.max_ldt_segment,
and export it with read-only sysctl. Remove unused defines.

Reviewed by: jhb (previous version)
MFC after: 1 week


217563 18-Jan-2011 kib

Use malloc(9) instead of kmem_alloc(9) for temporal copy of the
user-supplied descriptor array.

Noted and reviewed by: jhb (previous version)
MFC after: 1 week


217543 18-Jan-2011 jhb

- Remove some always-true checks (checking for unsigned < 0).
- Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT
when descriptors are provided. Specifically, allow the 'start == 0'
and 'num == 0' special case used to free all LDT entries that previously
failed with EINVAL.

Submitted by: clang via rdivacky (some of 1)
Reviewed by: kib


217515 17-Jan-2011 jkim

Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.

MFC after: 1 month


217506 17-Jan-2011 jkim

Avoid preemption while manipulating CRs and MTRRs.

Tested by: ariff


217368 13-Jan-2011 mdf

Fix up a few more sysctl(9) mis-typing found in various LINT builds.


217360 13-Jan-2011 jhb

If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared. This causes the local APIC interrupt routine
to fail to find an interrupt to service. Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC. If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by: steve


217151 08-Jan-2011 kib

Create shared (readonly) page. Each ABI may specify the use of page by
setting SV_SHP flag and providing pointer to the vm object and mapping
address. Provide simple allocator to carve space in the page, tailored
to put the code with alignment restrictions.

Enable shared page use for amd64, both native and 32bit FreeBSD
binaries. Page is private mapped at the top of the user address
space, moving a start of the stack one page down. Move signal
trampoline code from the top of the stack to the shared page.

Reviewed by: alc


216673 22-Dec-2010 jkim

Increase size of pcb_flags to four bytes.

Requested by: bde, jhb


216634 22-Dec-2010 jkim

Improve PCB flags handling and make it more robust. Add two new functions
for manipulating pcb_flags. These inline functions are very similar to
atomic_set_char(9) and atomic_clear_char(9) but without unnecessary LOCK
prefix for SMP. Add comments about the rationale[1]. Use these functions
wherever possible. Although there are some places where it is not strictly
necessary (e.g., a PCB is copied to create a new PCB), it is done across
the board for sake of consistency. Turn pcb_full_iret into a PCB flag as
it is safe now. Move rarely used fields before pcb_flags and reduce size
of pcb_flags to one byte. Fix some style(9) nits in pcb.h while I am in
the neighborhood.

Reviewed by: kib
Submitted by: kib[1]
MFC after: 2 months


216443 14-Dec-2010 jkim

Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case. Now we support cpu_get_nominal_mhz()
for the case, instead. Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.


216394 12-Dec-2010 kib

In fpudna()/npxdna(), mark FPU context initialized and optionally
mark user FPU context initialized, if current context is user context.
It was reversed in r215865, by inadequate change of this code fragment
to a call to fpuuserinited()/npxuserinited().

The issue is only relevant for in-kernel users of FPU.

Reported by: Jan Henrik Sylvester <me janh de>, Mike Tancsa <mike sentex net>
Tested by: Mike Tancsa
MFC after: 3 days


216316 09-Dec-2010 cperciva

Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c
(which are identical) with a single x86/x86/busdma_machdep.c.


216312 08-Dec-2010 jkim

Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC.
Remove a confusing comment about converting to MHz as we never did.


216308 08-Dec-2010 cperciva

On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192,
while on i386 we have MAX_BPAGES=512. Implement this difference via
'#ifdef __i386__'.

With this commit, the i386 and amd64 busdma_machdep.c files become
identical; they will soon be replaced by a single file under sys/x86.


216306 08-Dec-2010 cperciva

MFi386 r1.94: If XEN, make pmap_kextract = pmap_kextract_ma. This is a
no-op currently, since FreeBSD/amd64 doesn't have (paravirtualized) Xen
support, but if/when that support is ever added we'll want this, and
until then it's harmless.


216304 08-Dec-2010 cperciva

MFi386 r1.81, r1.82, r1.84: Reorganize code to reduce cache pressure and
branch mispredictions.

No objections from: scottl


216283 08-Dec-2010 jkim

Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.

Discussed with: avg


216276 07-Dec-2010 jkim

Remove stale comments about P-state invariant TSC and fix style(9) nits.


216275 07-Dec-2010 jkim

Do not register a event handler for CPU freqency changes when it is found
P-state invariant. This is continuation of r216274.


216274 07-Dec-2010 jkim

Now the P-state invariant TSC is probed early enough, do not register event
handlers for CPU freqency changes when it is found P-state invariant.
Adjust a comment about non-existent tsc_freq_max() while I am here.


216272 07-Dec-2010 jkim

Probe P-state invariant TSC from rightful place.


216255 07-Dec-2010 kib

Update some comments related to use of amd64 full context switch.
In exec_linux_setregs(), use locally cached pointer to pcb to set
pcb_full_iret.
In set_regs(), note that full return is needed when code that sets
segment registers is enabled.

MFC after: 1 week


216253 07-Dec-2010 kib

Retire write-only PCB_FULLCTX pcb flag on amd64.

Reminded by: Petr Salinger <Petr.Salinger seznam cz>
Tested by: pho
MFC after: 1 week


216231 06-Dec-2010 kib

Do not leak %rdx value in the previous image to the new image after
execve(2). Note that ia32 binaries already handle this properly,
since ia32_setregs() resets td_retval[1], but not exec_setregs().

We still do not conform to the amd64 ABI specification, since %rsp
on the image startup is not aligned to 16 bytes.

PR: amd64/124134
Discussed with: Petr Salinger <Petr.Salinger seznam cz>
(who convinced me that there is indeed several bugs)
MFC after: 1 week


216163 03-Dec-2010 jkim

Revert r216161. It is not necessary because we zero-fill BSS anyway.

Requested by: jhb


216161 03-Dec-2010 jkim

Explicitly initialize TSC frequency. To calibrate TSC frequency, we use
DELAY(9) and it may use TSC in turn if TSC frequency is non-zero.

MFC after: 3 days


216159 03-Dec-2010 jkim

Do not change CPU ticker frequency if TSC is P-state invariant. Note this
change was meant to be committed with r184102 (and its subsequent MFCs) but
it fell off somehow.

Pointyhat to: jkim
MFC after: 3 days


216012 28-Nov-2010 kib

Calling fill_fpregs() for curthread is legitimate, and ELF coredump
does this.

Reported and tested by: pho
MFC after: 5 days


215878 26-Nov-2010 alc

Make the size of the direct map easily configurable. Changing NDMPML4E
now suffices.

Increase the size of the direct map to 1TB.

An earler version of this patch was tested by sbruno@.


215865 26-Nov-2010 kib

Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs()
functions, they are unused. Remove 'user' from npxgetuserregs()
etc. names.

For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU
context storage. This eliminates the need for ugly copying with
overwrite of the newly added and reserved fields in ucontext on i386
to satisfy alignment requirements for fpusave() and fpurstor().

pc98 version was copied from i386.

Suggested and reviewed by: bde
Tested by: pho (i386 and amd64)
MFC after: 1 week


215854 26-Nov-2010 uqs

Remove kernel support for BB profiling, now that kernbb(8) is gone, too.

PR: bin/83558
Reviewed by: jkim


215845 25-Nov-2010 dim

Apply the same fix as in r215823 to sys/amd64/amd64/fpu.c: use
unambiguous inline assembly to load a float variable.


215801 24-Nov-2010 dim

Change ambiguous (or invalid, depending on how strict you want to be :)
assembly instruction "movw %rcx,2(%rax)" to "movw %cx,2(%rax)", since
the intent was to move 16 bits of data, in this case.

Found by: clang
Reviewed by: kib


215754 23-Nov-2010 jkim

Remove a stale tunable introduced in r215703.


215753 23-Nov-2010 jkim

Reinitialize PAT MSR via pmap_init_pat() while resuming. This function does
better job since r215703 and it is safer now.


215703 22-Nov-2010 jkim

- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual. Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-). Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as
WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the
default of WB and UC. This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency. Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks. With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more. Now this function is
identical on both amd64 and i386.

Reviewed by: jhb
Tested by: RM (reuf_m at hotmail dot com)
Ryszard Czekaj (rychoo at freeshell dot net)
army.of.root (army dot of dot root at googlemail dot com)
MFC after: 3 days


215415 16-Nov-2010 jkim

Restore CR0 after MTRR initialization for correctness sakes. There will be
no noticeable change because we enable caches before we enter here for both
BSP and AP cases. Remove another pointless optimization for CR4.PGE bit
while I am here.


215414 16-Nov-2010 jkim

Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this
code but probably it only worked by chance because modifying CR4.PGE bit
causes invlidation of entire TLBs. Since these are very rare events, this
micro-optimization seems useless.

Reviewed by: jhb


215321 14-Nov-2010 kib

Do not use __FreeBSD_version prefix for the special osrel version.
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.

Reported and tested by: swell.k gmail com
MFC after: 1 week


215309 14-Nov-2010 kib

Use symbolic names instead of hardcoding values for magic p_osrel constants.

MFC after: 1 week


215133 11-Nov-2010 avg

amd64: introduce minidump version 2

After KVA space was increased to 512GB on amd64 it became impractical
to use PTEs as entries in the minidump map of dumped pages, because size
of that map alone would already be 1GB.
Instead, we now use PDEs as page map entries and employ two stage lookup
in libkvm: virtual address -> PDE -> PTE -> physical address. PTEs are
now dumped as regular pages. Fixed page map size now is 2MB.

libkvm keeps support for accessing amd64 minidumps of version 1.
Support for 1GB pages is added.

Many thanks to Alan Cox for his guidance, numerous reviews, suggestions,
enhancments and corrections.

Reviewed by: alc [kernel part]
MFC after: 15 days


215002 08-Nov-2010 jhb

A few small style and whitespace fixes.


214954 07-Nov-2010 alc

Don't call pmap_demote_DMAP() on MTRR entries from the BIOS that are marked
as "bogus".

Reported by: Jia-Shiun Li


214835 05-Nov-2010 jhb

Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger. Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock. However, trap interrupts for single-stepping
can still occur even when interrupts are disabled. Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased. Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero. To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with: bde
MFC after: 1 month


214774 04-Nov-2010 avg

x86 topo_probe: do not probe smp topology if only one cpu is visible

This could lead to a division by zero if hardware is multi-core and/or
multi-threaded, but for some (quite unusual) reason FreeBSD sees only
one logical processor. This could happen, for example, if neither MADT
nor MP Table are presented by BIOS.

Also:
- assert in topo_probe_0x4 that BSP is accounted for
- neither cpu_cores nor cpu_logical should be zero after successful
probing, so either being zero is an indication of failed probing

Reported by: vwe, Dan Allen <danallen46@airwired.net>
Tested by: Dan Allen <danallen46@airwired.net>
MFC after: 3 days


214631 01-Nov-2010 jhb

Move <machine/apicreg.h> to <x86/apicreg.h>.


214630 01-Nov-2010 jhb

Move the <machine/mca.h> header to <x86/mca.h>.


214576 30-Oct-2010 alc

Add another safety belt to pmap_demote_DMAP().


214563 30-Oct-2010 alc

Don't demote in pmap_demote_DMAP() if the specified length is zero.


214457 28-Oct-2010 attilio

Merge nexus.c from amd64 and i386 to x86 subtree.

Sponsored by: Sandvine Incorporated
Tested by: gianni


214448 28-Oct-2010 jhb

Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine
when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[]
is only populated for multiple-CPU machines. This also matches what the
code does when SMP is not enabled.

PR: bin/151616
Tested by: "Damian S. Kolodziejczyk" damkol | gmail
Submitted by: avg
MFC after: 1 week


214446 28-Oct-2010 attilio

Merge the mptable support from MD bits to x86 subtree.

Sponsored by: Sandvine Incorporated
Discussed with: jhb


214425 27-Oct-2010 alc

[1] According to the x86 architectural specifications, no virtual-to-
physical page mapping should span two or more MTRRs of different types.
Add a pmap function, pmap_demote_DMAP(), by which the MTRR module can
ensure that the direct map region doesn't have such a mapping.

[2] Fix a couple of nearby style errors in amd64_mrset().

[3] Re-enable the use of 1GB page mappings for implementing the direct
map. (See also r197580 and r213897.)

Tested by: kib@ on a Westmere-family processor [3]
MFC after: 3 weeks


214373 26-Oct-2010 attilio

Merge dump_machdep.c i386/amd64 under the x86 subtree.

Sponsored by: Sandvine Incorporated
Tested by: gianni


214347 25-Oct-2010 jhb

Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
by intr_disable().

Requested by: bde


213897 15-Oct-2010 alc

Update pmap_extract() to handle 1GB page mappings. Some device drivers
use pmap_extract() rather than pmap_kextract() on direct map addresses.
Thus, pmap_extract() needs to be able to deal with 1GB page mappings if
we are to use 1GB page mappings for the direct map. (See r197580.)


213748 12-Oct-2010 jkim

Remove trailing ", " from `sysctl machdep.idle_available' output.


213452 05-Oct-2010 kib

Display PCID capability of CPU and add CPUID define for it.

MFC after: 1 week


213382 03-Oct-2010 kib

The makectx() function, used by kdb_trap() to reconstruct pcb from
trap frame when trap initiated kdb entry, incorrectly calculated the
value of %rsp for trapped thread.

According to Intel(R) 64 and IA-32 Architectures Software Developer's Manual
Volume 3A: System Programming Guide, Part 1, rev. 035, 6.14.2 64-Bit Mode
Stack Frame, "64-bit mode ... pushes SS:RSP unconditionally, rather than
only on a CPL change."
Even assuming the conditional push of the %ss:%rsp, the calculation
was still wrong because sizeof(tf_ss) + sizeof(tf_rsp) == 16 on amd64.

Always use the tf_rsp from trap frame. The change supposedly fixes
stepping when using kgdb backend for kdb.

Submitted by: Zhouyi Zhou <zhouzhouyi gmail com>
PR: amd64/151167
Reviewed by: avg
MFC after: 1 week


213323 01-Oct-2010 avg

i386 and amd64 mp_machdep: improve topology detection for Intel CPUs

This patch is significantly based on previous work by jkim.
List of changes:
- added comments that describe topology uniformity assumption
- added reference to Intel Processor Topology Enumeration article
- documented a few global variables that describe topology
- retired weirdly set and used logical_cpus variable
- changed fallback code for mp_ncpus > 0 case, so that CPUs are treated
as being different packages rather than cores in a single package
- moved AMD-specific code to topo_probe_amd [jkim]
- in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT
and core masks and match APIC IDs against those masks [started by
jkim]
- in topo_probe_0x4() drop code for double-checking topology parameters
by looking at L1 cache properties [jkim]
- in topo_probe_0xb() add fallback path to topo_probe_0x4() as
prescribed by Intel [jkim]

Still to do:
- prepare for upcoming AMD CPUs by using new mechanism of uniform
topology description [pointed by jkim]
- probe cache topology in addition to CPU topology and probably use that
for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have
the same CPU topology, but Athlon cores do not share L2 cache while
Core2's do (no L3 cache in both cases)
- think of supporting non-uniform topologies if they are ever
implemented for platforms in question
- think how to better described old HTT vs new HTT distinction, HTT vs
SMT can be confusing as SMT is a generic term
- more robust code for marking CPUs as "logical" and/or "hyperthreaded",
use HTT mask instead of modulo operation
- correct support for halting logical and/or hyperthreaded CPUs, let
scheduler know that it shouldn't schedule any threads on those CPUs

PR: kern/145385 (related)
In collaboration with: jkim
Tested by: Sergey Kandaurov <pluknet@gmail.com>,
Jeremy Chadwick <freebsd@jdc.parodius.com>,
Chip Camden <sterling@camdensoftware.com>,
Steve Wills <steve@mouf.net>,
Olivier Smedts <olivier@gid0.org>,
Florian Smeets <flo@smeets.im>
MFC after: 1 month


213282 29-Sep-2010 neel

Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.

The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by: NetApp
Submitted by: Will McGovern (will at netapp dot com)
Reviewed by: mjacob, jhb


212541 13-Sep-2010 mav

Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.


212413 10-Sep-2010 avg

bus_add_child: change type of order parameter to u_int

This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to: r212213
MFC after: 10 days


212026 30-Aug-2010 jkim

Save MSR_FSBASE, MSR_GSBASE and MSR_KGSBASE directly to PCB as we do not use
these values in the function.


211924 28-Aug-2010 rpaulo

Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.

Sponsored by: The FreeBSD Foundation


211804 25-Aug-2010 rpaulo

Call the necessary DTrace function pointers when we have different kinds
of traps.

Sponsored by: The FreeBSD Foundation


211518 19-Aug-2010 attilio

Revert part of the r211149 as I erroneously ported the logical_cpus from
Yahoo! patchset as a mask (and according manipulating variables) while
it is actually a CPU count.

Submitted by: neel
MFC after: 1 month
X-MFC: 211149


211515 19-Aug-2010 jhb

Remove unused KTRACE includes.


211424 17-Aug-2010 gahr

- The iMac9,1 needs the PAT workaround as well

Approved by: cognet


211292 13-Aug-2010 jkim

Reset switchtime to zero rather than the current CPU ticker (TSC) value.
It is more appropriate in this context because TSC MSR is reset to zero
when the CPU is restarted from S3 and above. Move acpi_resync_clock() back
to where it was before r211202. It does not make a difference any more.


211220 12-Aug-2010 attilio

Revert r211176:
As long as interrupts are disabled and there is not explicit call to
sched_add() there can't be any preemption there, thus the calls may be
consistent.

Reported by: kib, jhb


211202 12-Aug-2010 jkim

Reset switchtime and switchticks after resynchronizing the system clock.
This should fix weird runtime problem after resume on amd64. It also fixes
"calcru: runtime went backwards" warnings with bootverbose.


211197 11-Aug-2010 jhb

Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int. Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.


211176 11-Aug-2010 attilio

IPI handlers may run generally with interrupts disabled because they
are served via an interrupt gate.

However, that doesn't explicitly prevent preemption and thread
migration thus scheduler pinning may be necessary in some handlers.
Fix that.

Tested by: gianni
MFC after: 1 month


211151 10-Aug-2010 attilio

Fix a typo due to a stale version of the patch.

Reported by: gianni, rdivacky
MFC after: 1 month
X-MFC: 211149


211149 10-Aug-2010 attilio

Fix some places that may use cpumask_t while they still use 'int' types.
While there, also fix some places assuming cpu type is 'int' while
u_int is really meant.

Note: this will also fix some possible races in per-cpu data accessings
to be addressed in further commits.

In collabouration with: Yahoo! Incorporated (via sbruno and peter)
Tested by: gianni
MFC after: 1 month


211117 09-Aug-2010 attilio

Simplify the logic for handling ipi_selected() and ipi_cpu() in the
amd64/i386 case.

Reviewed by: jhb
Tested by: gianni
MFC after: 1 month
X-MFC: 210939


211082 08-Aug-2010 dwmalone

Don't pass sizeof(u_int) to an argument of SYSCLT_PROC that ends up not
being used.


210939 06-Aug-2010 jhb

Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month


210868 05-Aug-2010 jhb

Change the MPTable and $PIR PCI-PCI bridge drivers to inherit from the
generic PCI-PCI bridge driver and only override specific methods. This
should fix suspend/resume of PCI-PCI bridges using these drivers.


210804 03-Aug-2010 jkim

savectx() has not been used for fork(2) for about 15 years. [1]
Do not clobber FPU thread's PCB as it is more harmful. When we resume CPU,
unconditionally reload FPU state.

Pointed out by: bde [1]


210780 02-Aug-2010 jkim

Rearrange struct pcb. r177532 (CVS r1.64 of pcb.h) moved pcb_flags to make
better use of cache lines by placing it before pcb_save (now pcb_user_save),
which is moved to the end of pcb since r210777.


210777 02-Aug-2010 jkim

- Merge savectx2() with savectx() and struct xpcb with struct pcb. [1]
savectx() is only used for panic dump (dumppcb) and kdb (stoppcbs). Thus,
saving additional information does not hurt and it may be even beneficial.
Unfortunately, struct pcb has grown larger to accommodate more data.
Move 512-byte long pcb_user_save to the end of struct pcb while I am here.
- savectx() now saves FPU state unconditionally and copy it to the PCB of
FPU thread if necessary. This gives panic dump and kdb a chance to take
a look at the current FPU state even if the FPU is "supposedly" not used.
- Resuming CPU now unconditionally reinitializes FPU. If the saved FPU
state was irrelevant, it could be in an unknown state.

Suggested by: bde [1]


210774 02-Aug-2010 jhb

Tweak the logic to disable CLFLUSH in virtual environments to work around
problems with flushing the local APIC register range so that it checks
vm_guest directly.

Reviewed by: kib, alc
MFC after: 2 weeks


210665 30-Jul-2010 delphij

In rdmsr_safe, use zero extend (by doing a 32-bit movl over
eax to itself) instead of a sign extend.

Discussed with: stas
MFC after: 1 month


210615 29-Jul-2010 jkim

Fix another fallout from r208833. savectx() is used to save CPU context
for crash dump (dumppcb) and kdb (stoppcbs). For both cases, there cannot
have a valid pointer in pcb_save. This should restore the previous
behaviour.


210614 29-Jul-2010 jkim

Rename PCB_USER_FPU to PCB_USERFPU not to clash with a macro from fpu.h.


210521 26-Jul-2010 jkim

Simplify fldcw() macro. There is no reason to use pointer here. No object
file change after this commit (verified with md5).


210520 26-Jul-2010 jkim

Add missing ldmxcsr() prototype for lint case.


210518 26-Jul-2010 jkim

Reduce diff against fenv.h:

Mark all inline asms as volatile for safety. No object file change after
this commit (verified with md5).


210517 26-Jul-2010 jkim

FNSTSW instruction can use AX register as an operand.

Obtained from: fenv.h


210514 26-Jul-2010 jkim

Re-implement FPU suspend/resume for amd64. This removes superfluous uses
of critical_enter(9) and critical_exit(9) by fpugetregs() and fpusetregs().
Also, we do not touch PCB flags any more.

MFC after: 1 month


210369 22-Jul-2010 kib

When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by: jhb, nwhitehorn
MFC after: 2 weeks


210124 15-Jul-2010 alc

Optimize pmap_remove()'s handling of PG_G mappings. Specifically,
instead of calling pmap_invalidate_page() for each PG_G mapping, call
pmap_invalidate_range() for each range of PG_G mappings. In addition,
eliminate a redundant call to pmap_invalidate_page(). Both
pmap_remove_pte() and pmap_remove_page() called pmap_invalidate_page()
when the mapping had the PG_G attribute. Now, only pmap_remove_page()
calls pmap_invalidate_page(). Altogether, these changes eliminate 53%
of the TLB shootdowns for a "buildworld" on a ZFS file system. On
FFS, the reduction is 3%.

MFC after: 6 weeks


209956 12-Jul-2010 jhb

Remove a dead test. We already exclude NMI traps from this code in an
earlier condition.

MFC after: 1 week


209955 12-Jul-2010 kib

When switching the thread from the processor, store %dr7 content
into the pcb before disabling watchpoints. Otherwise, when the
thread is restored on a processor, watchpoints are still disabled.

Submitted by: Tijl Coosemans <tijl coosemans org>
(I would be much happier if Tijl commited this himself)
MFC after: 1 week


209887 10-Jul-2010 alc

Reduce the number of global TLB shootdowns generated by pmap_qenter().
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown. Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns. The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown. With this change, that is no longer so.

For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.

MFC after: 3 weeks


209862 09-Jul-2010 kib

For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by: alc [1]
Reviewed by: alc
MFC after: 3 weeks


209789 08-Jul-2010 alc

Correctly maintain the per-cpu field "curpmap" on amd64 just like we
do on i386. The consequences of not doing so on amd64 became apparent
with the introduction of the COUNT_IPIS and COUNT_XINVLTLB_HITS
options. Specifically, single-threaded applications were generating
unnecessary IPIs to shoot-down the TLB on other processors. However,
this is clearly nonsensical because a single-threaded application is
only running on the current processor. The reason that this happens
is that pmap_activate() is unable to properly update the old pmap's
field "pm_active" without the correct "curpmap". So, in effect, stale
bits in "pm_active" were leading pmap_protect(), pmap_remove(),
pmap_remove_pages(), etc. to flush the TLB contents on some arbitrary
processor that wasn't even running the same application.

Reviewed by: kib
MFC after: 3 weeks


209649 02-Jul-2010 mav

Revert r209638. After commit, there appeared to be more people who liked
previous name of stray interrupt counters, then responded to the list.


209638 01-Jul-2010 mav

Make stray irq counters have format alike to other counters. Unified format
makes string processing (for example by `systat -vm`) easier.


209613 30-Jun-2010 jhb

Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.


209483 23-Jun-2010 kib

Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.

The DF clearing for signal handlers was done some time ago.

MFC after: 1 week


209463 23-Jun-2010 kib

Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() for
get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also,
note that usercontext is not initialized anymore in fpstate_drop().

Systematically replace references to npxgetregs() and npxsetregs()
by npxgetuserregs() and npxsetuserregs() in comments.

Noted by: bde


209432 22-Jun-2010 mav

Some style fixes for r209371.

Submitted by: jhb@


209371 20-Jun-2010 mav

Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.


209248 17-Jun-2010 mav

Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.

Add counters for timer-related IPIs.

Reviewed by: jhb@ (previous version)


209212 15-Jun-2010 jhb

Restore the machine check register banks on resume. For banks being
monitored via CMCI, reset the interrupt threshold to 1 on resume.

Reviewed by: jkim
MFC after: 2 weeks


209208 15-Jun-2010 kib

Remove two obsoleted comments, add a note about 32bit compatibility.

MFC after: 1 month


209204 15-Jun-2010 kib

Rename CRITSECT_ASSERT to CRITICAL_ASSERT.

Suggested by: jhb
MFC after: 1 month


209198 15-Jun-2010 kib

Use critical sections instead of disabling local interrupts to ensure
the consistency between PCPU fpcurthread and the state of the FPU.

Explicitely assert that the calling conventions for fpudrop() are
adhered too. In cpu_thread_exit(), add missed critical section entrance.

Reviewed by: bde
Tested by: pho
MFC after: 1 month


209174 14-Jun-2010 jkim

Fix ACPI suspend/resume on amd64, which was broken since r208833.
We need actual storage for FPU state to save and restore.


209155 14-Jun-2010 mav

Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu()
for pre-bound IRQs during boot, submit there LAPIC ID, same as in other
places, not CPU ID.


209059 11-Jun-2010 jhb

Update several places that iterate over CPUs to use CPU_FOREACH().


209048 11-Jun-2010 alc

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


208990 10-Jun-2010 alc

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


208922 08-Jun-2010 jhb

Move the MD support for PCI message signalled interrupts to the x86 tree
as it is identical for i386 and amd64.


208921 08-Jun-2010 jhb

Move the machine check support code to the x86 tree since it is identical
on i386 and amd64.

Requested by: alc


208919 08-Jun-2010 jhb

Move the I/O APIC code to the x86 tree since it is identical on i386 and
amd64.


208915 08-Jun-2010 jhb

- Use a bit more care when moving I/O APIC interrupts between CPUs. Mask
the interrupt followed by a brief delay if it is not currently masked
before moving the interrupt.
- Move the icu_lock out of ioapic_program_intpin() and into callers. This
closes a race where ioapic_program_intpin() could use a stale value of
the masked state to compute the masked bit in the register.

Reviewed by: mav
MFC after: 2 weeks


208877 06-Jun-2010 kib

Style-compilant order of declarations.

Noted by: bde
MFC after: 1 month


208833 05-Jun-2010 kib

Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches. There is also a
facility to allow the kernel thread to use pcb save_area.

Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.

KPI discussed with: fabient
Tested by: pho, fabient
Hardware provided by: Sentex Communications
MFC after: 1 month


208667 31-May-2010 alc

Eliminate a stale comment.


208657 30-May-2010 alc

Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.


208645 29-May-2010 alc

When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().


208621 28-May-2010 jhb

Defer initializing machine checks for the boot CPU until the local APIC is
fully configured.

MFC after: 1 month


208609 28-May-2010 alc

Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released. This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().


208574 26-May-2010 alc

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


208507 24-May-2010 jhb

Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached. CMCI also includes a count of events when reporting
corrected errors in the bank's status register. Note that individual
banks may or may not support CMCI. If they do, each bank includes its own
threshold register that determines when the interrupt fires. Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl). The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by: Sailaja Bangaru sbappana at yahoo com
MFC after: 1 month


208504 24-May-2010 alc

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


208494 24-May-2010 mav

- Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.


208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


208452 23-May-2010 mav

Unify local_apic.c for x86 archs,


208392 21-May-2010 jhb

- Adjust the whitespace for the lines that output fields in 'show pcpu' in
DDB so that all the fields line up.
- Print out the tid of the per-CPU idlethread instead of the pid since
the idle process is now shared across all idle threads.

MFC after: 1 month


208175 16-May-2010 alc

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


208026 13-May-2010 kib

Do not use .extern, it is not strictly needed with gas and it is custom
to omit it.

Requested by: bde
MFC after: 6 days


207958 12-May-2010 kib

Route all returns from the interrupts and faults through the doreti_iret
labeled iretq instruction.

Suppose that multithreaded process executes two threads, currently
scheduled on different processors. Let assume that thread A executes
using %cs or %ss pointing into the descriptor from LDT. If IPI comes
which handler does not return by jump to doreti, and meantime thread B
invalidates descriptor pointed to by %cs or %ss, then iretq from IPI
handler could fault.

Routing the return by doreti_iret allows kernel to catch the situation
and recover from it by sending signal to the usermode.

Tested by: pho
MFC after: 1 week


207957 12-May-2010 kib

Remove unneeded overrides of the segment registers in the inner trap
frame upon segment register load fault. The doreti procedure does not
load segment registers when returning to the kernel frame, and current
values in the segment descriptor cache already allow the kernel mode
to run, not modified by faulted loaded.

Suggested by: bde
Tested by: pho
MFC after: 1 week


207796 08-May-2010 alc

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


207702 06-May-2010 alc

Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free(). Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.


207676 05-May-2010 kib

Add definitions for Intel AESNI CPUID bits and print the capabilities
on boot.

Hardware provided by: Sentex Communications
MFC after: 1 week


207570 03-May-2010 kib

Style and comment adjustements.

Suggested and reviewed by: bde
MFC after: 3 days


207463 01-May-2010 kib

Remove debugging code that was not used once since commit.

Suggested by: bde
MFC after: 1 week


207410 30-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


207329 28-Apr-2010 attilio

- Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by: Sandvine Incorporated
PR: threads/116181
Reviewed by: emaste, marcel
MFC after: 3 weeks


207213 26-Apr-2010 kmacy

missed pv access before pmap lock


207210 25-Apr-2010 kmacy

Incremental reduction of delta with head_page_lock_2 branch

- replace modification of pmap resident_count with pmap_resident_count_{inc,dec}
- the pv list is protected by the pmap lock, but in several cases we are relying
on the vm page queue mutex, move pv_va read under the pmap lock


207205 25-Apr-2010 alc

Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation. Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection. This could, by itself, be
fixed, but I don't see the point given the above argument.

Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.


207161 24-Apr-2010 kmacy

apply style(9) changes applied to head_page_lock_2

requested by: kib@


207155 24-Apr-2010 alc

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


207081 22-Apr-2010 jkim

If a conditional jump instruction has the same jt and jf, do not perform
the test and jump unconditionally.


206901 20-Apr-2010 rpaulo

Rename the cyclic global variable lapic_cyclic_clock_func to just
cyclic_clock_func. This will make more sense when we start developing non
x86 cyclic version.


206623 14-Apr-2010 kib

ld_gs_base is executing with stack containing only the frame,
temporary pushed %rflags has been popped already.

Pointy hat to: kib
MFC after: 3 days


206553 13-Apr-2010 kib

Change printf() calls to uprintf() for sigreturn() and trap() complaints
about inacessible or wrong mcontext, and for dreaded "kernel trap with
interrupts disabled" situation. The later is changed when trap is
generated from user mode (shall never be ?).

Normalize the messages to include both pid and thread name.

MFC after: 1 week


206459 10-Apr-2010 kib

Handle a case when non-canonical address is loaded into the fsbase or
gsbase MSR.

MFC after: 3 days


205851 29-Mar-2010 jhb

Add a handler for the local APIC error interrupt. For now it just prints
out the current value of the local APIC error register when the interrupt
fires.

MFC after: 1 week


205778 28-Mar-2010 alc

Correctly handle preemption of pmap_update_pde_invalidate().

X-MFC after: r205573


205642 25-Mar-2010 nwhitehorn

Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by: jhb


205403 21-Mar-2010 alc

Eliminate a pointless TLB invalidation from pmap_bootstrap(). No mappings
whatsoever are changed between the earlier load_cr3() and this invalidation.


205402 21-Mar-2010 alc

I am told by AMD that the machine check hardware on the instruction TLB
won't generate bogus exceptions. Therefore, the implementation of the
"unofficial" workaround needn't mask L1TP errors by the instruction cache
unit.


205334 19-Mar-2010 avg

pmap amd64/i386: fix a typo in a comment

MFC after: 3 days


205214 16-Mar-2010 jhb

- Extend the machine check record structure to include several fields useful
for parsing model-specific and other fields in machine check events
including the global machine check capabilities and status registers,
CPU identification, and the FreeBSD CPU ID.
- Report these added fields in the console log of a machine check so that
a record structure can be reconstituted from the console messages.
- Parse new architectural errors including memory controller errors.

MFC after: 1 week


205063 12-Mar-2010 jhb

Fix the previous attempt to fix kernel builds of HEAD on 7.x. Use the
__gnu_inline__ attribute for PMAP_INLINE when using the 7.x compiler to
match what 7.x uses for PMAP_INLINE.


205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


205013 11-Mar-2010 jhb

Print out the family and model from the cpu_id. This is especially useful
given the advent of the extended family and extended model fields. The
values are printed in hex to match their common usage in documentation.

Submitted by: Alexander Best
MFC after: 1 week


204957 10-Mar-2010 kib

Fall back to wbinvd when region for CLFLUSH is >= 2MB.

Submitted by: Kevin Day <toasty dragondata com>
Reviewed by: jhb
MFC after: 2 weeks


204913 09-Mar-2010 jhb

Now that the workaround for the AMD 10h CPUs is in place, re-enable machine
checks by default on amd64.

Discussed with: alc


204907 09-Mar-2010 alc

Implement AMD's recommended workaround for Erratum 383 on Family 10h
processors. With this workaround, superpage promotion can be re-enabled
under virtualization. Moreover, machine check exceptions can safely be
enabled when FreeBSD is running natively on Family 10h processors.

Most of the credit should go to Andriy Gapon for diagnosing the error and
working with Borislav Petkov at AMD to document it. Andriy also reviewed
and tested my patches.

Discussed with: jhb
MFC after: 3 weeks


204641 03-Mar-2010 attilio

Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.
Right now, some atrtc seems reporting strange diagnostic error* making the
current pattern bogus.

In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.

Sponsored by: Sandvine Incorporated
Tested by: yongari, Richard Todd
<rmtodd at ichotolot dot servalan dot com>
MFC: 3 weeks
X-MFC: r202387, 204309


204518 01-Mar-2010 jhb

Print the contents of the miscellaneous (MISC) register to the console if
it is valid along with the other register values when a machine check is
encountered.

MFC after: 1 week


204420 27-Feb-2010 alc

When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by: jhb, kib
MFC after: 3 days


204309 25-Feb-2010 attilio

Introduce the new kernel sub-tree x86 which should contain all the code
shared and generalized between our current amd64, i386 and pc98.

This is just an initial step that should lead to a more complete effort.
For the moment, a very simple porting of cpufreq modules, BIOS calls and
the whole MD specific ISA bus part is added to the sub-tree but ideally
a lot of code might be added and more shared support should grow.

Sponsored by: Sandvine Incorporated
Reviewed by: emaste, kib, jhb, imp
Discussed on: arch
MFC: 3 weeks


204214 22-Feb-2010 gibbs

Enforce stronger semantics for bus-dma alignment (currently only on amd64).
Now all contiguous regions returned from bus-dma will be aligned to the
alignment constraint and all but the last region are guaranteed to be
a multiple of the alignment in length. This also means that the relative
alignment of two adjacent bytes in the I/O stream have a difference of 1
even if they are not physically contiguous.

The old code, when needing to perform a copy in order to align data, only
copied the amount of data needed to reach the next page boundary. This
often left an unaligned end to the segment. Drivers such as Xen's blkfront
can't deal with such segments.

The downside to this approach is that, once an unaligned region is encountered,
the remainder of the I/O will be bounced. However, bouncing should be rare.
It is typically caused by non-performance critical userland programs that
don't bother to align their I/O buffers (e.g. bsdlabel). In-kernel I/O
buffers are always aligned to at least a page boundary.

Reviewed by: scottl
MFC after: 2 weeks


204161 21-Feb-2010 alc

Since create_pagetables() zeroes the page tables, pmap_bootstrap() needn't
zero *CMAP1.


204041 18-Feb-2010 ed

Allow the pmap code to be built with GCC from FreeBSD 7 again.

This patch basically gives us the best of both worlds. Instead of
forcing the compiler to emulate GNU-style inline semantics even though
we're using ISO C99, it will only use GNU-style inlining when the
compiler is configured that way (__GNUC_GNU_INLINE__).

Tested by: jhb


203367 02-Feb-2010 rnoland

Enable MTRR on all VIA CPUs that claim support (amd64).

This is the amd64 part of r203289.

Noticed by: jhb
MFC after: 2 weeks


203160 29-Jan-2010 avg

add static qualifier to definition of a function already declared static

This is for improving code readibility only.

MFC after: 1 week


202897 23-Jan-2010 alc

Simplify the mapping of the system message buffer. Use the direct map just
like ia64 does.


202882 23-Jan-2010 kib

For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number
and arguments after ptracestop(), if debugger modified anything in the
process environment. Since procfs stopeven requires number of syscall
arguments in p_xstat, this cannot be solved by moving stop/trace point
before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reported by: Ali Polatel <alip exherbo org>
Briefly discussed with: jhb
MFC after: 3 weeks


202628 19-Jan-2010 ed

Recommit r193732:

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc

It was backed out, because it prevented us from building kernels using a
7.x compiler. Now that most people use 8.x, there is nothing that holds
us back. Even if people run 7.x, they should be able to build a kernel
if they run `make kernel-toolchain' or `make buildworld' first.


202387 15-Jan-2010 attilio

Handling all the three clocks (hardclock, softclock, profclock) with the
LAPIC may lead to aliasing for softclock and profclock because frequencies
are sized in order to fit mainly hardclock.
atrtc used to take care of the softclock and profclock and it does still
do, if the LAPIC can't handle the clocks properly.

Revert the change when the LAPIC started taking charge of all three of
them and let atrtc handle softclock and profclock if not explicitly
requested. Such request can be made setting != 0 the new tunable
machdep.lapic_allclocks or if the new device ATPIC is not present
within the i386 kernel config (atrtc is linked to atpic presence).

Diagnosed by: Sandvine Incorporated
Reviewed by: jhb, emaste
Sponsored by: Sandvine Incorporated
MFC: 3 weeks


202161 12-Jan-2010 gavin

Spell "Hz" correctly wherever it is user-visible.

PR: bin/142566
Submitted by: N.J. Mann njm njm.me.uk
Approved by: ed (mentor)
MFC after: 2 weeks


202097 11-Jan-2010 marcel

Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after: 1 week


202085 11-Jan-2010 alc

Simplify pmap_init(). Additionally, correct a harmless misbehavior on i386.
Specifically, where locore had created large page mappings for the kernel,
the wrong vm page array entries were being initialized. The vm page array
entries for the pages containing the kernel were being initialized instead
of the vm page array entries for page table pages.

MFC after: 1 week


202047 10-Jan-2010 alc

Eliminate unused declarations.


201890 09-Jan-2010 kib

Set md_ldt (pointer to the LDT) after md_ldt_sd (system segment
descriptor for the LDT) is populated. md_ldt is used by context-switch
code as indicator that LDT segment register shall be loaded with
GUSERLDT segment instead of 0, so context switch at the wrong time may
cause attempt to load non-populated descriptor.

Use store with the barrier to prevent other CPUs from seeing updated
md_ldt but not seeing updated md_ldt_sd. Multithreaded process may
context-switch to another thread of the process on another CPU and read
md_ldt.

MFC after: 1 week


201223 29-Dec-2009 rnoland

Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.

This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by: jhb@
MFC after: Not in this lifetime...


200444 12-Dec-2009 kib

For ia32 syscall(), call cpu_set_syscall_retval(). Update comment inside
cpu_set_syscall_retval() accordingly.

MFC after: 1 week


200064 03-Dec-2009 avg

mca: small enhancements related to cpu quirks

- use utility macros for CPU family/model checking
- limit Intel P6 quirk to pre-Nehalem models (taken from OpenSolaris)
- add AMD GartTblWkEn quirk for families 0Fh and 10h; I haven't experienced
any problems without the quirk but both Linux and OpenSolaris do this
- slightly re-arrange quirk code to provide for the future generalization
and separation of vendor-specific quirk functions

Reviewed by: jhb
MFC after: 1 week


200033 02-Dec-2009 avg

mca: improve status checking, recording and reporting

- directly print mca information in case we fail to allocate memory
for a record
- include bank number into mca record
- print raw mca status value for extended information

Reviewed by: jhb
MFC after: 10 days


199968 30-Nov-2009 avg

x86 cpu features: add MOVBE reporting and flag

The check is glimpsed from Linux and OpenSolaris.
MOVBE instruction is found in Intel Atom processors.


199868 27-Nov-2009 alc

Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY. The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with: kib


199721 23-Nov-2009 jkim

- Add more aggressive BPF JIT optimization. This is in more favor of i386
while the previous commit was more amd64-centric.
- Use calloc(3) instead of malloc(3)/memset(3) in user land[1].

Submitted by: ed[1]


199619 21-Nov-2009 jkim

Add an experimental and rudimentary JIT optimizer to reduce unncessary
overhead from short BPF filter programs such as "get the first 96 bytes".


199615 20-Nov-2009 jkim

General style cleanup, no functional change.


199603 20-Nov-2009 jkim

- Allocate scratch memory on stack instead of pre-allocating it with
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9). It has no
performance benefit over malloc(9)/free(9)[2].

Requested by: rwatson[1]
Pointed out by: rwatson, jhb, alc[2]


199531 19-Nov-2009 jkim

Fix tinderbox build for i386 and sync amd64 with it.


199498 18-Nov-2009 jkim

- Change internal function bpf_jit_compile() to return allocated size of
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.


199492 18-Nov-2009 jkim

- Make BPF JIT compiler working again in userland. We are limiting size of
generated native binary to page size for now.
- Update copyright date and fix some style nits.


199253 13-Nov-2009 kib

Amd64 init_secondary() calls initializecpu() while curthread is still
not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to
initializecpu() that results in hang because AP are started when kernel
environment is already dynamic and thus needs to acquire mutex, that is
too early in AP start sequence to work.

Extract the code that should be executed only once, because it sets
up global variables, from initializecpu() to initializecpucache(),
and call the later only from hammer_time() executed on BSP. Now,
TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage.

In collaboration with: Mykola Dzham <freebsd levsha org ua>
Reviewed by: jhb
Tested by: ed, battlez


199215 12-Nov-2009 kuriyama

- Style nits.
- Remove unneeded TUNABLE_INT().

Suggested by: avg, kib


199184 11-Nov-2009 avg

reflect that pg_ps_enabled is a tunable, not just a read-only sysctl

Nod from: jhb


199135 10-Nov-2009 kib

Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by: marcel
Reviewed by: marcel, davidxu
PowerPC, ARM, ia64 changes: marcel
Sparc64 tested and reviewed by: marius, also sunv reviewed
MIPS tested by: gonzo
MFC after: 1 month


199067 09-Nov-2009 kuriyama

- Add hw.clflush_disable loader tunable to avoid panic (trap 9) at
map_invalidate_cache_range() even if CPU is not Intel.
- This tunable can be set to -1 (default), 0 and 1. -1 is same as
current behavior, which automatically disable CLFLUSH on Intel CPUs
without CPUID_SS (should be occured on Xen only). You can specify 1
when this panic happened on non-Intel CPUs (such as AMD's). Because
disabling CLFLUSH may reduce performance, you can try with setting 0
on Intel CPUs without SS to use CLFLUSH feature.

Reviewed by: kib
Reported by: karl, kuriyama
Related to: kern/138863


198950 05-Nov-2009 attilio

Strip from messages for users external URLs the project cannot directly
control.

Requested by: kib, rwatson


198931 04-Nov-2009 jkim

Tweak memory allocation for amd64 suspend/resume CPU context.


198868 04-Nov-2009 attilio

Opteron rev E family of processor expose a bug where, in very rare
ocassions, memory barriers semantic is not honoured by the hardware
itself. As a result, some random breakage can happen in uninvestigable
ways (for further explanation see at the content of the commit itself).

As long as just a specific familly is bugged of an entire architecture
is broken, a complete fix-up is impratical without harming to some
extents the other correct cases.
Considering that (and considering the frequency of the bug exposure)
just print out a warning message if the affected machine is identified.

Pointed out by: Samy Al Bahra <sbahra at repnop dot org>
Help on wordings by: jeff
MFC: 3 days


198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198341 21-Oct-2009 marcel

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


198170 16-Oct-2009 kib

Move intr_describe() out of #ifdef SMP; the function is always required.

Reviewed by: jhb


198134 15-Oct-2009 jhb

Add a facility for associating optional descriptions with active interrupt
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
the nexus(4) driver supply a custom bus_describe_intr method that invokes
a new intr_describe() MD routine which in turn looks up the associated
interrupt event and invokes intr_event_describe_handler().

Requested by: many
Reviewed by: scottl
MFC after: 2 weeks


197729 03-Oct-2009 bz

Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by: kib
MFC after: 1 month


197663 01-Oct-2009 kib

As a workaround, for Intel CPUs, do not use CLFLUSH in
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.

XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.

Tested by: csjp
Reviewed by: alc
MFC after: 3 days


197580 28-Sep-2009 alc

Temporarily disable the use of 1GB page mappings by the direct map. There
are currently two problems with the use of 1GB page mappings by the direct
map. First, at least one device driver uses pmap_extract() rather than
DMAP_TO_PHYS() to translate a direct map address to a physical address.
Unfortunately, neither pmap_extract() nor pmap_kextract() yet support 1GB
page mappings. Second, pmap_bootstrap() needs to interrogate the MTRRs to
ensure that a 1GB page mapping doesn't span two MTRRs of different types.

Reported and tested by: Daniel O'Connor
MFC after: 3 days


197455 24-Sep-2009 emaste

Add a backtrace to the "fpudna in kernel mode!" case, to help track down
where this comes from.

Reviewed by: bde


197410 22-Sep-2009 jhb

- Split the logic to parse an SMAP entry out into a separate function on
amd64 similar to i386. This fixes a bug on amd64 where overlapping
entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
instead of an append to support systems with out-of-order SMAP entries.

PR: amd64/138220
Reported by: James R. Van Artsdalen james of jrv org
MFC after: 3 days


197389 21-Sep-2009 kib

If CPU happens to be in usermode when a T_RESERVED trap occured,
then trapsignal is called with ksi.ksi_signo = 0. For debugging kernels,
that should end up in panic, for non-debugging kernels behaviour is
undefined.

Do panic regardeless of execution mode at the moment of trap.

Reviewed by: jhb
MFC after: 1 month


197317 18-Sep-2009 alc

When superpages are enabled, add the 2 or 4MB page size to the array of
supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


197070 10-Sep-2009 jkim

Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.


196771 02-Sep-2009 jkim

Fix confusing comments about default PAT entries.


196769 02-Sep-2009 jkim

- Work around ACPI mode transition problem for recent NVIDIA 9400M chipset
based Intel Macs. Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason. For these platforms, we just
use the old PAT layout. Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.

Reported by: rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by: jhb


196745 02-Sep-2009 jhb

Don't attempt to bind the current thread to the CPU an IRQ is bound to
when removing an interrupt handler from an IRQ during shutdown. During
shutdown we are already bound to CPU 0 and this was triggering a panic.

MFC after: 3 days


196707 31-Aug-2009 jhb

Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.

Discussed with: alc
MFC after: 3 days


196653 30-Aug-2009 bz

Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.

Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.

The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo was matched first, as the fallback to ld-elf32.so.1 does
not exist in that case.

Reported and tested by: ticso
In collaboration with: kib
MFC after: 3 days


196643 29-Aug-2009 rnoland

Swap the start/end virtual addresses in pmap_invalidate_cache_range().

This fixes the functionality on non SelfSnoop hardware.

Found by: rnoland
Submitted by: alc
Reviewed by: kib
MFC after: 3 days


196512 24-Aug-2009 bz

Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
MFC after: 3 days


196412 20-Aug-2009 jkim

Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.

Tested by: bz
Approved by: re (kib)


196390 19-Aug-2009 ed

Make the MacBookPro3,1 hardware boot again.

Tested by: Patrick Lamaiziere <patfbsd davenulle org>
Approved by: re (kib)


196318 17-Aug-2009 kib

Correct a critical accounting error in pmap_demote_pde(). Specifically,
when pmap_demote_pde() allocates a page table page to implement a
user-space demotion, it must increment the pmap's resident page count.
Not doing so, can lead to an underflow during address space termination
that causes pmap_remove() to exit prematurely, before it has destroyed
all of the mappings within the specified range. The ultimate effect or
symptom of this error is an assertion failure in vm_page_free_toq()
because the page being freed is still mapped.

This error is only possible when superpage promotion is enabled. Thus,
it only affects FreeBSD versions greater than 7.2.

Tested by: pho, alc
Reviewed by: alc
Approved by: re (rwatson)
MFC after: 1 week


196224 14-Aug-2009 jhb

Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
routines in the local APIC code that the hwpmc(4) driver can use to
manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly
enables the interrupt when it is succesfully initialized and disables
the interrupt when it is unloaded. This avoids enabling the interrupt
on unsupported CPUs which may result in spurious NMIs.

Reported by: rnoland
Reviewed by: jkoshy
Approved by: re (kib)
MFC after: 2 weeks


196196 13-Aug-2009 attilio

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


196033 02-Aug-2009 ed

Make the MacBook3,1 boot again.

Approved by: re (kib)


195907 27-Jul-2009 rpaulo

Refine the MacBook hack to only match early models that have Intel ICH.

Discussed with: kjim
Approved by: re (kib)


195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


195820 22-Jul-2009 kib

When the page caching attributes are changed, after new mapping is
established, OS shall flush the caches on all processors that may have
used the mapping previously. This operation is not needed if processors
support self-snooping. If not, but clflush instruction is implemented
on the CPU, series of the clflush can be used on the mapping region.
Otherwise, we have to flush the whole cache. The later operation is very
expensive, and AMD-made CPUs do not have self-snooping.

Implement cache flush for remapped region by using clflush for amd64,
when supported by CPU.

Proposed and reviewed by: alc
Approved by: re (kensmith)


195774 19-Jul-2009 alc

Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386. Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages. Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache. Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address. For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it. It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by: re (kib)


195749 18-Jul-2009 alc

An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by: re (kib)


195649 12-Jul-2009 alc

Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.

vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by: re (kib)


195535 10-Jul-2009 kib

When amd64 CPU cannot load segment descriptor during trap return to
usermode, it generates GPF, that is mirrored to user mode as SIGSEGV.
The offending register in mcontext should contain the value loading of
which generated the GPF, and it is so on i386. On amd64, we currently
report segment descriptor in tf_err, while segment register contains the
corrected value loaded by trap handler.

Fix the issue by behaving like i386, reloading segment register in trap
frame after signal frame is pushed onto user stack.

Noted and tested by: pho
Approved by: re (kensmith)


195486 09-Jul-2009 kib

Restore the segment registers and segment base MSRs for amd64 syscall
return path only when neither thread was context switched while
executing syscall code nor syscall explicitely modified LDT or MSRs.

Save segment registers in trap handlers before interrupts are enabled,
to not allow context switches to happen before registers are saved.
Use separated byte in pcb for indication of fast/full return, since
pcb_flags are not synchronized with context switches.

The change puts back syscall microbenchmark numbers that were slowed
down after commit of the support for LDT on amd64.

Reviewed by: jeff
Tested (and tested, and tested ...) by: pho
Approved by: re (kensmith)


195416 06-Jul-2009 alc

When pmap_change_attr() changes the PAT setting on a kernel mapping, it has
to simultaneously change the PAT setting for the same pages within the
direct map region. This may require the demotion of a 2MB page mapping and
the allocation of a page table page. This revision gives the highest
possible priority (VM_ALLOC_INTERRUPT) to this page allocation, so that
pmap_change_attr() is less likely to fail. (In general, kernel page table
page allocations have the highest priority, so this is not creating a new
precedent.)

(Demotion of 1GB page mappings within the direct map already specifies
VM_ALLOC_INTERRUPT to vm_page_alloc(), so only pmap_demote_pde() must be
changed.)

Approved by: re (kib)


195415 06-Jul-2009 jhb

After the per-CPU IDT changes, the IDT vector of an interrupt could change
when the interrupt was moved from one CPU to another. If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled. This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range. However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled. The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.

Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method. While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.

Approved by: re (kensmith)


195410 06-Jul-2009 jhb

MFi386: Add a 'show idt' command to DDB to display the non-default function
pointers in the interrupt descriptor table.

Approved by: re (kensmith)


195249 01-Jul-2009 jhb

Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.

Approved by: re (kib)


195228 01-Jul-2009 dfr

Don't include rpcv2.h - it has been removed.

Submitted by: ed@
Approved by: re


195188 30-Jun-2009 avg

remove unused/unneeded extern declarations

This should result in no changes to compiled code.

Reviewed by: alc
Approved by: re (kib)
MFC after: 1 day


195105 27-Jun-2009 rwatson

Catch missed AUDIT_ARG() -> AUDIT_ARG_CMD() on amd64.

Submitted by: Florian Smeets <flo at kasimir.com>
Approved by: re (kib) (implicit)
MFC after: 1 week


195104 27-Jun-2009 rwatson

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


195002 25-Jun-2009 jhb

Fix kernels compiled without SMP support. Make intr_next_cpu() available
for UP kernels but as a stub that always returns the single CPU's local
APIC ID.

Reported by: kib


194985 25-Jun-2009 jhb

- Restore the behavior of pre-allocating IDT vectors for MSI interrupts.
This is mostly important for the multiple MSI message case where the
IDT vectors for the entire group need to be allocated together. This
also restores the assumptions made by the PCI bus code that it could
invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
assigned to interrupt sources on activation. Instead of assigning the
CPU via pic_assign_cpu() before calling enable_intr(), allow the
different interrupt source drivers to ask the MD interrupt code which
CPU to use when they allocate an IDT vector. I/O APIC interrupt pins
do this in their pic_enable_intr() routines giving the same behavior as
before. MSI sources do it when the IDT vectors are allocated during
msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.

Tested by: rnoland


194889 24-Jun-2009 jhb

Whitespace fix.


194784 23-Jun-2009 jeff

Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.

Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas


194237 15-Jun-2009 mav

Forbid multi-vector MSI interrupt vectors migration to another CPU once
allocated. MSI have strict vectors allocation requirements, which are not
satisfied now during reallocation. This is not the best possible solution,
but better then just broken, as it was.

No objections: current@, arch@, jhb@


194209 14-Jun-2009 alc

Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt(). This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous. Unfortunately, this is not always the case. For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption. Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous. This
revision also changes some inconsistent if not buggy behavior. For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid. However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page. The amd64 version has a bug of
my own creation. It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance. I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms. However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity). These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with: jhb@


193804 09-Jun-2009 ariff

Move C1E workaround into its own idle function. Previous workaround works
only during initial booting process, while there are laptops/BIOSes that
tend to act 'smarter' by force enabling C1E if the main power adapter
being pulled out, rendering previous workaround ineffective. Given the
fact that we still rely on local APIC to drive timer interrupt, this
workaround should keep all Turion (probably Phenom too) X\d+ alive whether
its on battery power or not.

URL: http://lists.freebsd.org/pipermail/freebsd-acpi/2008-April/004858.html
http://lists.freebsd.org/pipermail/freebsd-acpi/2008-May/004888.html

Tested by: Peter Jeremy <peterjeremy at optushome d com d au>


193734 08-Jun-2009 ed

Revert my change; reintroduce __gnu89_inline.

It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.

Requested by: rwatson


193732 08-Jun-2009 ed

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc


193535 05-Jun-2009 kib

Put intrcnt, eintrcnt, intrnames and eintrnames into the .data section.

Noted by: "Tseng, Kuo-Lang" <kuo-lang.tseng intel com>, bde
MFC after: 3 days


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


192440 20-May-2009 jhb

Don't bother reading the initial value of the machine check banks during
startup on Pentium 4 CPUs. This wasn't safe to do on APs during AP startup,
was of limited value, and won't be used for future processors.


192343 18-May-2009 jhb

- Add a tunable 'hw.mca.enabled' that can be used to enable/disable the
machine check code. Disable it by default for now.
- When computing the mask of bits that determines a non-restartable event
during a machine check exception, or-in the overflow flag rather than
replacing the other flags.

PR: i386/134586 [2]
Submitted by: Andi Kleen andi-fbsd firstfloor.org


192323 18-May-2009 marcel

Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.


192114 14-May-2009 attilio

FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now). Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by: jhb, kmacy
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


192050 13-May-2009 jhb

Implement simple machine check support for amd64 and i386.
- For CPUs that only support MCE (the machine check exception) but not MCA
(i.e. Pentium), all this does is print out the value of the machine check
registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
a bit more involved.
- First, there is limited support for decoding the CPU-independent MCA
error codes in the kernel, and the kernel uses this to output a short
description of any machine check events that occur.
- When a machine check exception occurs, all of the MCx banks on the
current CPU are scanned and any events are reported to the console
before panic'ing.
- To catch events for correctable errors, a periodic timer kicks off a
task which scans the MCx banks on all CPUs. The frequency of these
checks is controlled via the "hw.mca.interval" sysctl.
- Userland can request an immediate scan of the MCx banks by writing
a non-zero value to "hw.mca.force_scan".
- If any correctable events are encountered, the appropriate details
are stored in a 'struct mca_record' (defined in <machine/mca.h>).
The "hw.mca.count" is a count of such records and each record may
be queried via the "hw.mca.records" tree by specifying the record
index (0 .. count - 1) as the next name in the MIB similar to using
PIDs with the kern.proc.* sysctls. The idea is to export machine
check events to userland for more detailed processing.
- The periodic timer and hw.mca sysctls are only present if the CPU
supports MCA.

Discussed with: emaste (briefly)
MFC after: 1 month


192035 13-May-2009 alc

Correct a rare use-after-free error in pmap_copy(). This error was
introduced in amd64 revision 1.540 and i386 revision 1.547. However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)

The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page. I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails. This is appropriate because the creation of
mappings by pmap_copy() is optional. They are a (possible) optimization,
and not a requirement.

Correct a nearby whitespace error in the i386 pmap_copy().

Crash reported by: jeff@
MFC after: 6 weeks


191803 05-May-2009 mav

Do not try to initialize LAPIC timer if we are not going to use it.
It solves assertion, when kernel built with INVARIANTS configured
to use i8254 timer.


191788 04-May-2009 jkim

Unlock the largest standard CPUID on Intel CPUs for both amd64 and i386 and
fix SMP topology detection. On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place. On amd64, all supported Intel CPUs should have this MSR.


191744 02-May-2009 mav

Add support for using i8254 and rtc timers as event sources for amd64 SMP
system. Redistribute hard-/stat-/profclock events to other CPUs using IPIs.


191730 01-May-2009 mav

Small addition to r191720.

Restore previous behaviour for the case of unknown interrupt. Invocation
of IRQ -1 crashes my system on resume. Returning 0, as it was, is not
perfect also, but at least not so dangerous.


191720 01-May-2009 mav

Use value -1 instead of 0 for marking unused APIC vectors. This fixes
IRQ0 routing on LAPIC-enabled systems.

Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers
as hard-/stat-/profclock sources falling back to using i8254 and rtc timers.

On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU
enters C3 or deeper power state. It makes no problems for interrupt
processing, as chipset wakes up CPU on interrupt triggering. But entering
C3 state kills LAPIC timer and freezes system time, making C3 and deeper
states practically unusable. Using i8254 timer allows to avoid this
problem.

By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters
C3 state, saving more then a Watt of total idle power (>10%) in addition to
all other power-saving techniques.

This technique is not working for SMP yet, as only one CPU receives
timer interrupts. But I think that problem could be fixed by forwarding
interrupts to other CPUs with IPI.


191708 30-Apr-2009 jkim

- Fix divide-by-zero panic when SMP kernel is used on UP system[1].
- Avoid possible divide-by-zero panic on SMP system when the CPUID is
disabled, unsupported, or buggy.

Submitted by: pluknet (pluknet at gmail dot com)[1]


191648 29-Apr-2009 jeff

- Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
- Remove the cpu_cores/cpu_logical detection from identcpu.
- Describe the layout of the system in cpu_mp_announce().

Sponsored by: Nokia


191438 23-Apr-2009 jhb

Reduce the number of bounce zones (and thus the number of bounce pages
used in some cases):
- Ignore DMA tag boundaries when allocating bounce pages. The boundaries
don't determine whether or not parts of a DMA request bounce. Instead,
they are just used to carve up segments.
- Allow tags with sub-page alignment to share bounce pages since bounce
pages are always page aligned.

Reviewed by: scottl (amd64)
MFC after: 1 month


191405 22-Apr-2009 jhb

Adjust the way we number CPUs on x86 so that we attempt to "group" all
logical CPUs in a package. We do this by numbering the non-boot CPUs
by starting with the first CPU whose APIC ID is after the boot CPU and
wrapping back around to APIC ID 0 if needed rather than always starting
at APIC ID 0. While here, adjust the cpu_mp_announce() routine to list
CPUs based on the mapping established by assign_cpu_ids() rather than
making assumptions about the algorithm assign_cpu_ids() uses.

MFC after: 1 month


191201 17-Apr-2009 jhb

Restore bus DMA bounce pages to an offset of 0 when they are released by
a tag that has BUS_DMA_KEEP_PG_OFFSET set. Otherwise the page could be
reused with a non-zero offset by a tag that doesn't have
BUS_DMA_KEEP_PG_OFFSET leading to data corruption.

Sleuthing by: avg
Reviewed by: scottl


191011 13-Apr-2009 kib

The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the
uio_td to extract pages from, instead of unconditionally use kernel
pmap.

Submitted by: Jason Harmening <jason.harmening gmail com> (amd64 version)
PR: amd64/133592
Reviewed by: scottl (original patch), jhb
MFC after: 2 weeks


190919 11-Apr-2009 ed

Simplify in/out functions (for i386 and AMD64).

Remove a hack to generate more efficient code for port numbers below
0x100, which has been obsolete for at least ten years, because GCC has
an asm constraint to specify that.

Submitted by: Christoph Mallon <christoph mallon gmx de>


190708 05-Apr-2009 dchagin

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


190629 01-Apr-2009 jkim

Garbage collect unused MSR_GSBASE since r190620.

The only consumer was exception.S and specialreg.h is directly included now.
Note no md5 changes were observed for all assym.s consumers with this.


190626 01-Apr-2009 jkim

Garbage collect unused stack segment since r190620.


190620 01-Apr-2009 kib

Save and restore segment registers on amd64 when entering and leaving
the kernel on amd64. Fill and read segment registers for mcontext and
signals. Handle traps caused by restoration of the
invalidated selectors.

Implement user-mode creation and manipulation of the process-specific
LDT descriptors for amd64, see sysarch(2).

Implement support for TSS i/o port access permission bitmap for amd64.

Context-switch LDT and TSS. Do not save and restore segment registers on
the context switch, that is handled by kernel enter/leave trampolines
now. Remove segment restore code from the signal trampolines for
freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason.

Implement amd64-specific compat shims for sysarch.

Linuxolator (temporary ?) switched to use gsbase for thread_area pointer.

TODO:
Currently, gdb is not adapted to show segment registers from struct reg.
Also, no machine-depended ptrace command is added to set segment
registers for debugged process.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb
Linuxolator tested by: dchagin


190619 01-Apr-2009 kib

Add separate gdt descriptors for %fs and %gs on amd64.
Reorder amd64 gdt descriptors so that user-accessible selectors are the
same as on i386. At least Wine hard-codes this into the binary.

In collaboration with: pho
Reviewed by: jhb


190600 31-Mar-2009 jkim

Fix an uninitialized variable from the previous commit.


190599 31-Mar-2009 jkim

Probe size of installed memory modules from loader and display it
as 'real memory' instead of Maxmem if the value is available.
Note amd64 displayed physmem as 'usable memory' since machdep.c r1.640
to unconfuse users. Now it is consistent across amd64 and i386 again.
While I am here, clean up smbios.c a bit and update copyright date.

Reviewed by: jhb


190447 26-Mar-2009 kib

Convert gdt_segs and ldt_segs initialization to C99 style.

Reviewed by: jhb


190426 25-Mar-2009 jhb

Fix a few nits in the earlier changes to prevent local information leakage
in AMD FPUs:
- Do not clear the affected state in the case that the FPU registers for
the thread that already owns the FPU are changed via fpu_setregs(). The
only local information the thread would see is its own state in that
case.
- Fix a type mismatch for the dummy variable used in a "fld". It accepts
a float, not a double.

Reviewed by: bde
Approved by: so (cperciva)
MFC after: 1 month


190413 25-Mar-2009 jhb

Rename (fpu|npx)_cleanstate to (fpu|npx)_initialstate to better reflect
their purpose.

Inspired by: bde
MFC after: 1 month


190239 22-Mar-2009 alc

In general, the kernel virtual address of the pml4 page table page that is
stored in the pmap is from the direct map region. The two exceptions have
been the kernel pmap and the swapper's pmap. These pmaps have used a
kernel virtual address established by pmap_bootstrap() for their shared
pml4 page table page. However, there is no reason not to use the direct
map for these pmaps as well.


190237 22-Mar-2009 alc

Eliminate the recomputation of pcb_cr3 from cpu_set_upcall(). The
bcopy()ed value from the old thread is the correct value because the new
thread and the old thread will share a page table.


189903 17-Mar-2009 jkim

Initial suspend/resume support for amd64.

This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.


189785 14-Mar-2009 alc

Update the pmap's resident page count when a page table page is freed in
pmap_remove_pde() and pmap_remove_pages().

MFC after: 6 weeks


189783 14-Mar-2009 alc

Correct accounting errors in _pmap_allocpte(). Specifically, the pmap's
resident page count and the global wired page count were not correctly
maintained when page table page allocation failed.

MFC after: 6 weeks


189771 13-Mar-2009 dchagin

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


189699 11-Mar-2009 dfr

Merge in support for Xen HVM on amd64 architecture.


189698 11-Mar-2009 alc

Optimize the inner loop of pmap_copy().

MFC after: 6 weeks


189610 10-Mar-2009 alc

Eliminate the last use of the recursive mapping to access user-space page
table pages. Now, all accesses to user-space page table pages are
performed through the direct map. (The recursive mapping is only used
to access kernel-space page table pages.)

Eliminate the TLB invalidation on the recursive mapping when a user-space
page table page is removed from the page table and when a user-space
superpage is demoted.


189572 09-Mar-2009 rwatson

Trim comments about the MP-safety of various bits of the amd64/i386
system call entry path and i386 IP checksum generation: we now assume
all code is MPSAFE unless explicitly marked otherwise. Remove XXX
Giant comments along similar lines: the code by the comments either
doesn't need or doesn't want Giant (especially the NMI handler).

MFC after: 3 days


189551 09-Mar-2009 alc

Change pmap_enter_quick_locked() so that it uses the kernel's direct map
instead of the pmap's recursive mapping to access the lowest level of the
page table when it maps a user-space virtual address.


189509 08-Mar-2009 sobomax

Small comment nit: "run time" -> "run-time".

Submitted by: rwatson


189454 06-Mar-2009 alc

If the PDE is known, then use the direct mapping instead of the recursive
mapping to access the PTE.


189423 05-Mar-2009 jhb

A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
thread hasn't used the FPU yet to reflect the real initial control
word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
word instead of trashing whatever the current state of the FPU is.

Reviewed by: bde


189415 05-Mar-2009 alc

Make pmap_copy() more TLB friendly. Specifically, make it use the kernel's
direct map instead of the pmap's recursive mapping to access the lowest
level in the page table.

MFC after: 6 weeks


189412 05-Mar-2009 jhb

A few cleanups to the FPU code on amd64:
- fpudna() always returned 1 since amd64 CPUs always have FPUs. Change
the function to return void and adjust the calling code in trap() to
assume the return 1 case is the only case.
- Remove fpu_cleanstate_ready as it is always true when it is tested.
Also, only initialize fpu_cleanstate when fpuinit() is called on the BSP.

Reviewed by: bde


189282 02-Mar-2009 kib

Use the p_sysent->sv_flags flag SV_ILP32 to detect 32bit process
executing on 64bit kernel. This eliminates the direct comparisions
of p_sysent with &ia32_freebsd_sysvec, that were left intact after
r185169.


189057 25-Feb-2009 sobomax

Fix typo in comments in r189023.


189055 25-Feb-2009 jkim

Enable support for PAT_WRITE_PROTECTED and PAT_UNCACHED cache modes
unconditionally on amd64. On i386, we assume PAT is usable if the CPU
vendor is not Intel or CPU model is newer than Pentium IV.

Reviewed by: alc, jhb


189023 25-Feb-2009 sobomax

Make machdep.hyperthreading_enabled tunable working with the SCHED_ULE.
Unlike with SCHED_BSD, however, it can only be set to 0 at boot time,
it's not possible to change it at runtime.

Reviewed by: jhb
MFC after: 1 month


188938 23-Feb-2009 jhb

Some whitespace and style fixes.

Submitted by: bde (partly)


188932 23-Feb-2009 alc

Optimize free_pv_entry(); specifically, avoid repeated TAILQ_REMOVE()s.

MFC after: 1 week


188904 21-Feb-2009 jeff

- Resolve an issue where we may clear an idt while an interrupt on a
different cpu is still assigned to that vector by never clearing idt
entries. This was only provided as a debugging feature and the bugs
are caught by other means.
- Drop the sched lock when rebinding to reassign an interrupt vector
to a new cpu so that pending interrupts have a chance to be delivered
before removing the old vector.

Discussed with: tegge, jhb


188608 14-Feb-2009 alc

Remove unnecessary page queues locking around vm_page_busy() and
vm_page_wakeup(). (This change is applicable to RELENG_7 but not
RELENG_6.)

MFC after: 1 week


188403 09-Feb-2009 cognet

The bounce zone sees its page number increased if multiple dma maps use it in
the same dma tag. However, it can happen multiple dma tags share the same
bounce zone too, so add a per-bounce zone map counter, and check it instead of
the dma tag map counter, to know if we have to alloc more pages.

Reported by: miwi
Reviewed by: scottl


188350 08-Feb-2009 imp

When bouncing pages, allow a new option to preserve the intra-page
offset. This is needed for the ehci hardware buffer rings that assume
this behavior.

This is an interim solution, and a more general one is being worked
on. This solution doesn't break anything that doesn't ask for it
directly. The mbuf and uio variants with this flag likely don't work
and haven't been tested.

Universe builds with these changes. I don't have a huge-memory
machine to test these changes with, but will be happy to work with
folks that do and hps if this changes turns out not to be sufficient.

Submitted by: alfred@ from Hans Peter Selasky's original


188065 03-Feb-2009 jkoshy

Improve robustness of NMI handling, for NMIs recognized in kernel
mode.

- Make the NMI handler run on its own stack (TSS_IST2).
- Store the GSBASE value for each CPU just before the start of
each NMI stack, permitting efficient retrieval using %rsp-relative
addressing.
- For NMIs taken from kernel mode, program MSR_GSBASE explicitly
since one or both of MSR_GSBASE and MSR_KGSBASE can be potentially
invalid. The current contents of MSR_GSBASE are saved and restored
at exit.
- For NMIs handled from user mode, continue to use 'swapgs' to
load the per-CPU GSBASE.

Reviewed by: jeff
Debugging help: jeff
Tested by: gnn, Artem Belevich <artemb at gmail dot com>


187948 31-Jan-2009 obrien

Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by: jhb


187880 29-Jan-2009 jeff

- Allocate apic vectors on a per-cpu basis. This allows us to allocate
more irqs as we have more cpus. This is principally useful on systems
with msi devices which may want many irqs per-cpu.

Discussed with: jhb
Sponsored by: Nokia


187867 28-Jan-2009 jhb

Use a different value for the initial control word for the FPU state for
32-bit processes. The value matches the initial setting used by
FreeBSD/i386. Otherwise, 32-bit binaries using floating point would use
a slightly different initial state when run on FreeBSD/amd64.

MFC after: 1 week


187598 22-Jan-2009 jkim

VIA Nano processor has a special MSR (CENT_HARDWARECTRL3) bit 32 to determine
whether TSC is P-state invariant or not. In fact, this MSR is writable but
we just leave it at the BIOS default for now.


187470 20-Jan-2009 kib

The context switch to the 32bit binary does not properly restore
the fsbase value. The switch loads the fs segment register, that
invalidates the value in fsbase msr, thus value in %r9 can not be
considered the current value for fsbase anymore.

Unconditionally reload fsbase when switching to 32bit binary.

PR: 130526
MFC after: 3 weeks


187221 14-Jan-2009 kib

Disable interrupts, if they were enabled, before doing swapgs.
Otherwise, interrupt may happen while we run with kernel CS and usermode
gsbase.

Reviewed by: jeff
MFC after: 1 week


187109 12-Jan-2009 jkim

Add basic amd64 support for VIA Nano processors.


186797 05-Jan-2009 jkim

Add Centaur/IDT/VIA vendor ID for Nano family, which has long mode support.


186076 14-Dec-2008 jkoshy

Bug fix: %ebx needs to be preserved in the user callchain capture
path.


186037 13-Dec-2008 jkoshy

- Bug fix: prevent a thread from migrating between CPUs between the
time it is marked for user space callchain capture in the NMI
handler and the time the callchain capture callback runs.

- Improve code and control flow clarity by invoking hwpmc(4)'s user
space callchain capture callback directly from low-level code.

Reviewed by: jhb (kern/subr_trap.c)
Testing (various patch revisions): gnn,
Fabien Thomas <fabien dot thomas at netasq dot com>,
Artem Belevich <artemb at gmail dot com>


186009 12-Dec-2008 jkim

Add more CPUID bits from AMD CPUID Specification Rev. 2.28.


185991 12-Dec-2008 jkoshy

Expose symbol `PMC_FN_USER_CALLCHAIN' to assembler code.


185933 11-Dec-2008 jhb

Add constants for fields in the local APIC error status register and a
routine to read it.


185715 06-Dec-2008 alc

Change the default value for the flag enabling superpage mapping and
promotion to "on".

Reminded by: jhb
Tested by: kris


185634 05-Dec-2008 kib

Improve db_backtrace() for compat ia32 on amd64. 32bit image enters
the kernel via Xint0x80_syscall().

Submitted by: dchagin
MFC after: 1 week


185561 02-Dec-2008 ganbold

Remove unused variable.

Found with: Coverity Prevent(tm)
CID: 3685

Approved by: jhb


185460 30-Nov-2008 mav

According to "Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3B: System Programming Guide, Part 2", CPUs with family 0x6 and model
above or 0xE and CPUs with family 0xF and model above or 0x3 have invariant
TSC.


185343 26-Nov-2008 jkim

Use newly introduced cpu_vendor_id to make invariant TSC detection more
clearer and merge r185295 to amd64.


185341 26-Nov-2008 jkim

Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "...").

Reviewed by: jhb, peter (early amd64 version)


185169 22-Nov-2008 kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


184499 31-Oct-2008 kib

Revert r184136. Instead, push the check for crashdumpmap overflow into the
MD i386 and amd64 dump code.

Requested by: jhb
Retested by: pho
MFC after: 3 days (+ 176304 + 184136)


184378 27-Oct-2008 sobomax

Fix r184323 - set stathz to be the same as lapic_timer_hz when lapic_timer_hz
is less than 128. Remove extra {} to match existing style.


184293 26-Oct-2008 sobomax

Fix division by zero panic if kern.hz less than 32.

MFC after: 1 day


184169 22-Oct-2008 jkim

Add AMD Family 0Fh, Model 6Bh, Stepping 2 to the list of invariant TSCs
and fix i386 test.


184146 22-Oct-2008 jkim

Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higher
even if BIOS does not advertise it.


184102 21-Oct-2008 jkim

Turn off CPU frequency change notifiers when the TSC is P-state invariant
or it is forced by setting 'kern.timecounter.invariant_tsc' tunable
to non-zero.


184101 21-Oct-2008 jkim

Detect Advanced Power Management Information for AMD CPUs.


183615 05-Oct-2008 davidxu

If the current thread has the trap bit set (i.e. a debugger had
single stepped the process to the system call), we need to clear
the trap flag from the new frame. Otherwise, the new thread will
receive a (likely unexpected) SIGTRAP when it executes the first
instruction after returning to userland.


183527 01-Oct-2008 peter

Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment. Textdump magic can be passed in.


183439 28-Sep-2008 marius

Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by: jhb
Reviewed by: arch, grehan, jhb


183397 27-Sep-2008 ed

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


183322 24-Sep-2008 kib

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


183151 18-Sep-2008 stas

- Recognize SAVE and OSXSAVE extended processor features.

Approved by: kib (mentor)
MFC after: 1 month


182936 11-Sep-2008 jhb

Update the comments above the 0xcf9 register reset attempt to match the
code. We only attempt a single reset using this method (a "hard" reset),
and we use two writes to ensure there is a 0 -> 1 transition in bit 2 to
force a reset.

MFC after: 1 week


182868 08-Sep-2008 kib

The pcb_gs32p should be per-cpu, not per-thread pointer. This is
location in GDT where the segment descriptor from pcb_gs32sd is
copied, and the location is in GDT local to CPU.

Noted and reviewed by: peter
MFC after: 1 week


182867 08-Sep-2008 kib

Provide private per-CPU GDTs on amd64. This is required at least for the
linux CB_GS32BIT to work.

Noted by: nox
Reviewed by: peter
MFC after: 1 week


182865 08-Sep-2008 kib

Fix inconsistencies in the comments.

MFC after: 1 week


182684 02-Sep-2008 kib

- When executing FreeBSD/amd64 binaries from FreeBSD/i386 or Linux/i386
processes, clear PCB_32BIT and PCB_GS32BIT bits [1].

- Reread the fs and gs bases from the msr unconditionally, not believing
the values in pcb_fsbase and pcb_gsbase, since usermode may reload
segment registers, invalidating the cache. [2].

Both problems resulted in the wrong fs base, causing wrong tls pointer
be dereferenced in the usermode.

Reported and tested by: Vyacheslav Bocharov <adeepv at gmail com> [1]
Reported by: Bernd Walter <ticsoat cicely7 cicely de>,
Artem Belevich <fbsdlist at src cx>[2]
Reviewed by: peter
MFC after: 3 days


182220 26-Aug-2008 jkim

Move empty filter handling to MI source.

MFC after: 3 days


182173 25-Aug-2008 jkim

Fix a typo in copyrights.


182046 23-Aug-2008 jhb

Adjust the handling the various timer frequencies when using the lapic
timer. Previously, the various divisors were fixed which meant that while
it gave somewhat reasonable stathz, etc. at hz=1000, it went off the rails
with any other hz value. With these changes, we now pick a lapic timer hz
based on the value of hz. If hz is >= 1500, then the lapic timer runs at
hz. If 1500 hz >= 750, we run the lapic timer at hz * 2. If hz < 750, we
run at hz * 4. We compute a divider at runtime to make stathz run as close
to 128 as we can since stathz really wants to be run at something close to
that frequency. Profiling just runs on every clock tick. So some examples:

With hz = 100, the lapic timer now runs at 400 instead of 2000. stathz
will be 133, and profhz = 400. With hz = 1000 (default), the lapic timer
is still at 2000 (as it is now), stathz is at 133 (as it is now), and
profhz will be 2000 (previously 666).

MFC after: 2 weeks


181848 18-Aug-2008 jkim

Correctly check unsignedness of all BPF_LD|BPF_IND instructions.
This is roughly from sys/net/bpf_filter.c r1.12 and r1.14.


181846 18-Aug-2008 jkim

- Make these files compilable on user land.
- Update copyrights and fix style(9).


181823 18-Aug-2008 kib

The doreti_iret_fault code is always called with gs base MSR containing
kernel gs base, because %rip is adjusted only on kernel-mode trap caused
by iretq execution. On the other hand, the stack contains (hardware
part of) trap frame from the usermode. As a consequence, checking for
frame mode and doing swapgs causes the kernel to enter trap() with
usermode gs base.

Remove the check for mode and conditional swapgs, we already have right
gs base in the MSR.

Submitted by: Nate Eldredge <neldredge math ucsd edu>
MFC after: 3 days


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


181700 13-Aug-2008 jkim

Use int32_t/int16_t instead of int/short as sys/net/bpf_filter.c does.


181697 13-Aug-2008 jkim

- Remove unnecessary jump instruction(s) when offset(s) is/are zero(s).
- Constantly use conditional jumps for unsigned integers.


181648 12-Aug-2008 jkim

Update copyrights and fix style(9).


181644 12-Aug-2008 jkim

Replace all stack usages with registers and remove unused macros.


181606 11-Aug-2008 jhb

Decode some more "exotic" instructions including: fxsave, fxrstor, ldmxcsr,
stmxcsr, clflush, lfence, mfence, sfence, syscall, sysret, sysenter,
sysexit, pause, monitor, mwait, and swapgs (amd64 only).

MFC after: 1 week


181456 09-Aug-2008 alc

Intel describes the behavior of their processors as "undefined" if two or
more mappings to the same physical page have different memory types, i.e.,
PAT settings. Consequently, if pmap_change_attr() is applied to a virtual
address range within the kernel map, then the corresponding ranges of the
direct map also need to be changed. Enhance pmap_change_attr() to handle
this case automatically.

Add a comment describing what pmap_change_attr() does.

Discussed with: jhb


181430 08-Aug-2008 stas

- Add cpuctl(4) pseudo-device driver to provide access to some low-level
features of CPUs like reading/writing machine-specific registers,
retrieving cpuid data, and updating microcode.
- Add cpucontrol(8) utility, that provides userland access to
the features of cpuctl(4).
- Add subsequent manpages.

The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX
is created for each cpu present in the systems. The pseudo-device minor
number corresponds to the cpu number in the system. The cpuctl(4) pseudo-
device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID
and UPDATE. The first pair alows the caller to read/write machine-specific
registers from the correspondent CPU. cpuid data could be retrieved using
the CPUID call, and microcode updates are applied via UPDATE.

The permissions are inforced based on the pseudo-device file permissions.
RDMSR/CPUID will be allowed when the caller has read access to the device
node, while WRMSR/UPDATE will be granted only when the node is opened
for writing. There're also a number of priv(9) checks.

The cpucontrol(8) utility is intened to provide userland access to
the cpuctl(4) device features. The utility also allows one to apply
cpu microcode updates.

Currently only Intel and AMD cpus are supported and were tested.

Approved by: kib
Reviewed by: rpaulo, cokane, Peter Jeremy
MFC after: 1 month


181356 07-Aug-2008 alc

Introduce pmap_change_attr_locked().


181284 04-Aug-2008 alc

Make pmap_kenter_attr() static.


181151 02-Aug-2008 alc

Enhance pmap_mapdev_attr(). Take advantage of recent enhancements to
pmap_change_attr() in order to use the direct map for any cache mode, not
just write-back mode.

It is worth noting that this change also eliminates a situation in which we
have two mappings to the same physical memory with different cache modes.

Submitted by: Magesh Dhasayyan (with some changes by me)
Discussed with: jhb


181112 01-Aug-2008 alc

Enhance pmap_change_attr() with the ability to demote 1GB page mappings.


181077 31-Jul-2008 alc

Enhance pmap_change_attr(). Specifically, avoid 2MB page demotions, cache
mode changes, and cache and TLB invalidation when some or all of the
specified range is already mapped with the specified cache mode.

Submitted by: Magesh Dhasayyan


181043 31-Jul-2008 alc

Eliminate recomputation of the PDE by pmap_pde_attr().


180992 30-Jul-2008 kib

Bring back the save/restore of the %ds, %es, %fs and %gs registers for
the 32bit images on amd64.

Change the semantic of the PCB_32BIT pcb flag to request the context
switch code to operate on the segment registers. Its previous meaning
of saving or restoring the %gs base offset is assigned to the new
PCB_GS32BIT flag.

FreeBSD 32bit image activator sets the PCB_32BIT flag, while Linux 32bit
emulation sets PCB_32BIT | PCB_GS32BIT.

Reviewed by: peter
MFC after: 2 weeks


180872 28-Jul-2008 alc

Don't allow pmap_change_attr() to be applied to the recursive mapping.


180870 28-Jul-2008 alc

Add a check for 1GB page mappings to pmap_change_attr() so that it fails
gracefully. (On K10 family processors the direct map is implemented using
1GB page mappings.)


180846 27-Jul-2008 alc

Style fixes to several function definitions.


180845 27-Jul-2008 alc

Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2MB page
mapping to 4KB page mappings when the specified attribute change only
applies to a portion of the 2MB page. Previously, in such cases,
pmap_change_attr() gave up and returned an error.

Submitted by: Magesh Dhasayyan


180601 18-Jul-2008 alc

Correct an error in pmap_change_attr()'s initial loop that verifies that the
given range of addresses are mapped. Previously, the loop was testing the
same address every time.

Submitted by: Magesh Dhasayyan


180600 18-Jul-2008 alc

Simplify pmap_extract()'s control flow, making it more like the related
functions pmap_extract_and_hold() and pmap_kextract().


180533 15-Jul-2008 alc

Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is
specified when appropriate.

Reviewed by: scottl


180498 13-Jul-2008 alc

Handle a race between pmap_kextract() and pmap_promote_pde(). This race
caused ZFS to crash when restoring a snapshot with superpage promotion
enabled.

Reported by: kris


180485 12-Jul-2008 alc

Refine the changes made in SVN rev 180430. Specifically, instantiate a new
page table page only if the 2MB page mapping has been used. Also, refactor
some assertions.


180483 12-Jul-2008 alc

In order to apply pmap_demote_pde() to a page directory entry (PDE) from the
direct map, the PDE must have PG_M and PG_A preset.

Noticed by: Magesh Dhasayyan


180430 10-Jul-2008 alc

Extend pmap_demote_pde() to include the ability to instantiate a new page
table page where none existed before.


180393 09-Jul-2008 peter

Band-aid a problem with 32 bit selector setup.

Initialize %ds, %es, and %fs during CPU startup. Otherwise a garbage
value could leak to a 32-bit process if a process migrated to a different
CPU after exec and the new CPU had never exec'd a 32-bit process.

A more complete fix is needed, but this mitigates the most frequent
manifestations.

Obtained from: ups


180378 09-Jul-2008 alc

Fix lines that are too long in pmap_growkernel() by substituting shorter but
equivalent expressions.


180373 08-Jul-2008 alc

Eliminate pmap_growkernel()'s dependence on create_pagetables() preallocating
page directory pages from VM_MIN_KERNEL_ADDRESS through the end of the
kernel's bss. Specifically, the dependence was in pmap_growkernel()'s one-
time initialization of kernel_vm_end, not in its main body. (I could not,
however, resist the urge to optimize the main body.)

Reduce the number of preallocated page directory pages to just those needed
to support NKPT page table pages. (In fact, this allows me to revert a
couple of my earlier changes to create_pagetables().)


180362 08-Jul-2008 alc

Rev 180333, ``Change create_pagetables() and pmap_init() so that many fewer
page table pages have to be preallocated ...'', violates an assumption made
by minidumpsys(): kernel_vm_end is the highest virtual address that has ever
been used by the kernel. Now, however, the kernel code, data, and bss may
reside at addresses beyond kernel_vm_end. This revision modifies the upper
bound on minidumpsys()'s two page table traversals to account for this
possibility.


180352 07-Jul-2008 alc

In FreeBSD 7.0 and beyond, pmap_growkernel() should pass VM_ALLOC_INTERRUPT
to vm_page_alloc() instead of VM_ALLOC_SYSTEM. VM_ALLOC_SYSTEM was the
logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not
reclaim a cached page. Simply put, there was no ordering between
VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the
cache and free queues. Now, there is; VM_ALLOC_INTERRUPT dominates
VM_ALLOC_SYSTEM.

While I'm here, teach pmap_growkernel() to request a prezeroed page.

MFC after: 1 week


180333 06-Jul-2008 alc

Change create_pagetables() and pmap_init() so that many fewer page table
pages have to be preallocated by create_pagetables().


180255 04-Jul-2008 alc

Eliminate an unused declaration. (In fact, the declaration is bogus
because the variable is defined static to pmap.c on i386.)

Found by: CScout


180170 02-Jul-2008 alc

Eliminate an unnecessary static variable: nkpt.


179977 24-Jun-2008 jkim

Emit opcodes closer to GNU as(1) generated codes and micro-optimize.


179967 23-Jun-2008 jkim

Rehash and clean up BPF JIT compiler macros to match AT&T notations.


179917 21-Jun-2008 alc

Prepare for a larger kernel virtual address space. Specifically, once
KERNBASE and VM_MIN_KERNEL_ADDRESS are no longer the same, the physical
memory allocated during bootstrap will be offset from the low-end of the
kernel's page table.


179898 20-Jun-2008 alc

Make preparations for increasing the size of the kernel virtual
address space on the amd64 architecture. The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space. Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).
That said, kris@ has tested crash dumps under the full patch that
increases the kernel virtual address space on amd64 to 6GB.

Tested by: kris@


179886 20-Jun-2008 alc

Make preparations for increasing the size of the kernel virtual
address space on the amd64 architecture. The amd64 architecture
requires kernel code and global variables to reside in the highest 2GB
of the 64-bit virtual address space. Thus, KERNBASE cannot change.
However, KERNBASE is sometimes used as the start of the kernel virtual
address space. Henceforth, VM_MIN_KERNEL_ADDRESS should be used
instead. Since KERNBASE and VM_MIN_KERNEL_ADDRESS are still the same
address, there should be no visible effect from this change (yet).


179777 13-Jun-2008 alc

Tweak the promotion test in pmap_promote_pde(). Specifically, test PG_A
before PG_M. This sometimes prevents unnecessary removal of write access
from a PTE. Overall, the net result is fewer demotions and promotion
failures.


179749 12-Jun-2008 alc

Reverse the direction of pmap_promote_pde()'s traversal over the specified
page table page. The direction of the traversal can matter if
pmap_promote_pde() has to remove write access (PG_RW) from a PTE that hasn't
been modified (PG_M). In general, if there are two or more such PTEs to
choose among, it is better to write protect the one nearer the high end of
the page table page rather than the low end. This is because most programs
access memory in an ascending direction. The net result of this change is a
sometimes significant reduction in the number of failed promotion attempts
and the number of pages that are write protected by pmap_promote_pde().


179471 01-Jun-2008 alc

Correct an error in pmap_promote_pde() that may result in an errant
promotion within the kernel's address space. Specifically,
pmap_promote_pde() is only called when the page table page (PTP) that
is referenced by the given PDE has a full "use count", i.e., its
wire_count is 512. Although this guarantees for a user address space
that all 512 PTEs in the PTP hold valid mappings, the same is not true
of the kernel's address space. A kernel PTP always has a use count of
512 regardless of the state of the PTEs. Therefore,
pmap_promote_pde() should not assume (or assert) that the first PTE in
the PTP is valid.


179279 24-May-2008 jb

Add the DTrace hooks for exception handling (Function boundary trace
-fbt- provider), cyclic clock and syscalls.


179229 23-May-2008 alc

The VM system no longer uses setPQL2(). Remove it and its helpers.


179081 18-May-2008 alc

Retire pmap_addr_hint(). It is no longer used.


179049 16-May-2008 attilio

Removed unused assembly offsets for structures digging.


178947 11-May-2008 alc

Correct an error in pmap_align_superpage(). Specifically, correctly
handle the case where the mapping is greater than a superpage in size
but the alignment of the physical pages spans a superpage boundary.


178875 09-May-2008 alc

Introduce pmap_align_superpage(). It increases the starting virtual
address of the given mapping if a different alignment might result in more
superpage mappings.


178493 25-Apr-2008 alc

Always use PG_PS_FRAME to extract the physical address of a 2/4MB page
from a PDE.


178471 25-Apr-2008 jeff

- Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.

Sponsored by: Nokia


178429 22-Apr-2008 phk

Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.


178314 19-Apr-2008 peter

Put in a real isa_irq_pending() stub in order to remove two lines of dmesg
noise from sio per unit. sio likes to probe if interrupts are configured
correctly by looking at the pending bits of the atpic in order to put a
non-fatal warning on the console. I think I'd rather read the pending
bits from the apics, but I'm not sure its worth the hassle.


178092 11-Apr-2008 jeff

- Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.

Reviewed by: jhb
Sponsored by: Nokia


178070 10-Apr-2008 alc

Correct pmap_copy()'s method for extracting the physical address of a
2/4MB page from a PDE. Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.

Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page. This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().


177967 07-Apr-2008 alc

Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings.


177940 05-Apr-2008 jhb

Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).

Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64


177917 04-Apr-2008 alc

Eliminate an unnecessary test and its misleading comment from pmap_enter().


177851 02-Apr-2008 alc

Optimize pmap_pml4e() and pmap_pdpe() based upon two observations: The
given pmap is never NULL, and therefore pmap_pml4e() can never return
NULL. The pervasive use of these inline functions throughout the pmap
makes these simple changes worthwhile.


177680 28-Mar-2008 ps

Add support to mincore for detecting whether a page is part of a
"super" page or not.

Reviewed by: alc, ups


177631 26-Mar-2008 phk

Rename timer0_max_count to i8254_max_count.
Rename timer0_real_max_count to i8254_real_max_count and make it static.
Rename timer_freq to i8254_freq and make it a loader tunable.


177535 23-Mar-2008 peter

First pass at (possibly futile) microoptimizing of cpu_switch. Results
are mixed. Some pure context switch microbenchmarks show up to 29%
improvement. Pipe based context switch microbenchmarks show up to 7%
improvement. Real world tests are far less impressive as they are
dominated more by actual work than switch overheads, but depending on
the machine in question, workload, kernel options, phase of moon, etc, a
few percent gain might be seen.

Summary of changes:
- don't reload MSR_[FG]SBASE registers when context switching between
non-threaded userland apps. These typically cost 120 clock cycles each
on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no
faster on this.
- The above change only helps unthreaded userland apps that tend to use
the same value for gsbase. Threaded apps will get no benefit from this.
- reorder things like accessing the pcb to be in memory order, to give
prefetching a better chance of working. Operations are now in increasing
memory address order, rather than reverse or random.
- Push some lesser used code out of the main code paths. Hopefully
allowing better code density in cache lines. This is probably futile.
- (part 2 of previous item) Reorder code so that branches have a more
realistic static branch prediction hint. Both Intel and AMD cpus
default to predicting branches to lower memory addresses as being
taken, and to higher memory addresses as not being taken. This is
overridden by the limited dynamic branch prediction subsystem. A trip
through userland might overflow this.
- Futule attempt at spreading the use of the results of previous operations
in new operations. Hopefully this will allow the cpus to execute in
parallel better.
- stop wasting 16 bytes at the top of kernel stack, below the PCB.
- Never load the userland fs/gsbase registers for kthreads, but preserve
curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!)

Microbenchmarking this code seems to be really sensitive to things like
scheduling luck, timing, cache behavior, tlb behavior, kernel options,
other random code changes, etc.

While it doesn't help heavy userland workloads much, it does help high
context switch loads a little, and should help those that involve
switching via kthreads a bit more.

A special thanks to Kris for the testing and reality checks, and Jeff for
tormenting me into doing this. :)

This is still work-in-progress.


177534 23-Mar-2008 alc

Correct an error in pmap_mincore() when applied to a 2MB page mapping:
Use PG_PS_FRAME, not PG_FRAME, to obtain the physical address of the
2MB physical page from the PDE.


177533 23-Mar-2008 peter

Export TDP_KTHREAD to asm files.


177529 23-Mar-2008 alc

To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set. However, these changes enable
me to remove a work-around from pmap_promote_pde(), the superpage
promotion procedure.

(Note: The AMD processors that we have tested, including the latest,
the Phenom, still exhibit the historical behavior.)

Acknowledgments: After I observed the problem, Stephan (ups) was
instrumental in characterizing the exact behavior of Intel's recent
TLBs.

Tested by: Peter Holm


177525 23-Mar-2008 kib

Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by: kris (i386/pmap.c:pmap_remove() part)
Reviewed by: alc
MFC after: 1 week


177467 20-Mar-2008 jhb

Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ
resource to a CPU. The default method is to pass the request up to the
parent similar to BUS_CONFIG_INTR() so that all busses don't have to
explicitly implement bus_bind_intr. A bus_bind_intr(9) wrapper routine
similar to bus_setup/teardown_intr() is added for device drivers to use.
Unbinding an interrupt is done by binding it to NOCPU. The IRQ resource
must be allocated, but it can happen in any order with respect to
bus_setup_intr(). Currently it is only supported on amd64 and i386 via
nexus(4) methods that simply call the intr_bind() routine.

Tested by: gallatin


177325 17-Mar-2008 jhb

Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.

Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}


177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


177181 14-Mar-2008 jhb

Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:

intr_bind(rman_get_start(irq_res), cpu);

however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.

Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}


177160 14-Mar-2008 jhb

Fix a silly bogon which prevented all the CPUs that are tagged as interrupt
receivers from being given interrupts if any CPUs in the system were not
tagged as interrupt receivers that I introduced when switching the x86
interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC
IDs. In practice this only affects systems with Hyperthreading (though
disabling HTT in the BIOS would workaround the issue) as that is the only
case currently where one can have CPUs that aren't tagged as interrupt
receivers. On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical
CPUs of which 2 were interrupt receivers) the result was that all
device interrupts were sent to CPU 0.

MFC after: 1 week
Pointy hat to: jhb


177157 13-Mar-2008 jhb

Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines. The existing code already handles
having two platforms: ACPI and legacy. However, the existing approach was
rather hardcoded and difficult to extend. These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device). This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it
can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
resource managers so that subclassed nexus(4) drivers can invoke it from
their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
attach routine.
- The ACPI driver no longer contains an new-bus identify method. Instead
it exposes a public function (acpi_identify()) which is a probe routine
that the MD nexus(4) drivers can use to probe for ACPI. All of the
probe logic in acpi_probe() is now moved into acpi_identify() and
acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
acpi_identify() and claims the nexus0 device if the probe succeeds. It
then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
This matches the previous behavior where the old acpi_identify() would
fail to add an acpi0 device again leaving you with no devices.

Discussed with: imp
Silence on: arch@


177145 13-Mar-2008 kib

Since version 4.3, gcc changed its behaviour concerning the i386/amd64
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.

jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.

Submitted by: Aurelien Jarno <aurelien aurel32 net>
PR: 121422
Reviewed by: jhb
MFC after: 2 weeks


177125 12-Mar-2008 jhb

The variable MTRR registers actually have variable-sized PhysBase and
PhysMask fields based on the number of physical address bits supported
by the current CPU. The old code assumed 36 bits on i386 and 40 bits on
amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.

In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.). The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.

Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available. If that
feature is not available, then assume 36 PA bits.

While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).

MFC after: 1 week
PR: i386/120516
Reported by: Nokia


177123 12-Mar-2008 jhb

Minimize diffs with i686_mem.c:
- A few whitespace changes I missed in the style(9) changes.
- Move M_MEMDESC to mem.c.


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


177070 11-Mar-2008 jhb

Style(9) these files. No changes in the compiled code. (Verified by
diff'ing objdump -d output).


177069 11-Mar-2008 jhb

Add constants for the various fields in MTRR registers.

MFC after: 1 week
Verified by: md5(1)


177041 10-Mar-2008 jhb

Probe CPUs after the PCI hierarchy on i386, amd64, and ia64. This allows
the cpufreq drivers to reliably use properties of PCI devices for quirks,
etc.
- For the legacy drivers, add CPU devices via an identify routine in the
CPU driver itself rather than in the legacy driver's attach routine.
- Add CPU devices after Host-PCI bridges in the acpi bus driver.
- Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and
check its device ID rather than having a bogus PCI attachment that only
checked for the ID in probe and always failed. As a side effect, you
can now kldload ichss after boot.
- Fix the ichss(4) driver to use the correct device_t for the ICH (and not
for ichss0) when doing PCI config space operations to enable SpeedStep.

MFC after: 2 weeks
Reviewed by: njl, Andriy Gapon avg of icyb.net.ua


177006 10-Mar-2008 jeff

- Rather than repeating the same preemption code everywhere call the scheduler
specific sched_preempt() routine.


176803 04-Mar-2008 alc

Add support for automatic promotion of 4KB page mappings to 2MB page
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. (Expect this to change.)

Reviewed by: ups
Tested by: kris, Peter Holm


176734 02-Mar-2008 jeff

- Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.

Sponsored by: Nokia


176304 15-Feb-2008 scottl

Teach the dump and minidump code to respect the maxioszie attribute of
the disk; the hard-coded assumption of 64K doesn't work in all cases.


176206 12-Feb-2008 scottl

If busdma is being used to realign dynamic buffers and the alignment is set to
PAGE_SIZE or less, the bounce page counting logic was flawed and wouldn't
reserve any pages. Adjust to be correct. Review of other architectures is
forthcoming.

Submitted by: Joseph Golio


175905 02-Feb-2008 das

Add a few more CPUID feature bits while here. We don't support these
features yet.


175904 02-Feb-2008 das

SSE4 CPUID bits


175768 28-Jan-2008 ru

Add a wrapper function that bound checks writes to the dump device.


175404 17-Jan-2008 alc

Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally
compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)

Eliminate dead code from pmap_enter(). This code implemented an assertion.
On i386, an equivalent check is already implemented. However, on amd64,
a small change is required to implement an equivalent check.

Eliminate \n from a nearby panic string.

Use KASSERT() to reimplement pmap_copy()'s two assertions.


175325 14-Jan-2008 alc

Make pmap_is_prefaultable() more TLB friendly. Specifically, make it use
the kernel's direct map instead of the pmap's recursive mapping to access
the lowest level in the page table. The direct map is preferable for two
reasons: (1) The TLB is more likely to hold the required direct mapping
because pmap_enter() has already used the direct map to access a nearby
PTE and (2) loading a direct mapping into the TLB involves walking only 2
or 3 levels of the page table instead of 4.


175155 08-Jan-2008 alc

Convert a PMAP_DIAGNOSTIC to a KASSERT.


175119 06-Jan-2008 alc

Shrink the size of struct vm_page on amd64 and i386 by eliminating
pv_list_count from struct md_page. Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.


175067 03-Jan-2008 alc

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


175056 02-Jan-2008 alc

Provide a legitimate pindex to vm_page_alloc() in pmap_growkernel()
instead of writing apologetic comments. As it turns out, I need every
kernel page table page to have a legitimate pindex to support superpage
promotion on kernel memory.

Correct a nearby style error: Pointers should be compared to NULL.


174898 25-Dec-2007 rwatson

Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.


174557 12-Dec-2007 rpaulo

Disallow the legacy USB circuit to generate an SMI# via an ICH
register (MacBooks only).
This allows MacBooks to boot in SMP mode without any trick and solves
the timer problems with HZ=1000.

MFC after: 1 week

Reviewed by: njl (mentor), jhb
Approved by: njl (mentor), jhb


174496 09-Dec-2007 alc

Eliminate compilation warnings due to the use of non-static inlines
through the introduction and use of the __gnu89_inline attribute.

Submitted by: bde (i386)
MFC after: 3 days


174454 08-Dec-2007 alc

Use 1GB virtual pages to implement the direct map on architectures that
support this feature.

Wrap a nearby line that is too long.

MFC after: 6 weeks


174452 08-Dec-2007 alc

Recognize architectural support for 1GB virtual pages.

MFC after: 6 weeks


174395 07-Dec-2007 jkoshy

Kernel and hwpmc(4) support for callchain capture.

Sponsored by: FreeBSD Foundation and Google Inc.


174254 04-Dec-2007 kib

Fix the ABI change of the signal delivered on the access to the page
with insufficient protection mode.

For the i386 and amd64, create the tunable, machdep.prot_fault_translation,
with the following behaviour:
0 = autodetect the signal to be delivered on KERN_PROTECTION_FAILURE
from vm_fault based on the ELF OSABI note:
no note or __FreeBSD_version < 700004 - SIGBUS/BUS_PAGE_FAULT
note, and __FreeBSD_version >= 700004 - SIGSEGV/SEGV_ACCERR
1 = always SIGBUS/BUS_PAGE_FAULT
2 = always SIGSEGV/SEGV_ACCERR

This would do mostly automatic correction of ABI breakage, with the exception
of the untaged binaries for 7-CURRENT/RELENG_7 before the note is fixed. For
them, sysctl would allow to run the binary with manual settings.

Discussed with: portmgr (kris)
PR: kern/118304
MFC after: 3 days


174249 04-Dec-2007 alc

Style change: Use NULL rather than 0 where appropriate.


174195 02-Dec-2007 rwatson

Break out stack(9) from ddb(4):

- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)


174104 30-Nov-2007 alc

Improve get_pv_entry()'s handling of low-memory conditions. After page
allocation fails and pv entries are reclaimed, there may be an unused pv
entry in a pv chunk that survived the reclamation. However, previously,
after reclamation, get_pv_entry() did not look for an unused pv entry in
a surviving pv chunk; it simply retried the page allocation. Now, it
does look for an unused pv entry before retrying the page allocation.

Note: This only applies to RELENG_7. Earlier branches use a different
pv entry allocator.

MFC after: 6 weeks


174067 29-Nov-2007 bde

Don't use plain "ret" instructions at targets of jump instructions,
since the branch caches on at least Athlon XP through Athlon 64 CPU's
don't understand such instructions and guarantee a cache miss taking
at least 10 cycles. Use the documented workaround "ret $0" instead
("nop; ret" also works, but "ret $0" is probably faster on old CPUs).

Normal code (even asm code) doesn't branch to "ret", since there is
usually some cleanup to do, but the __mcount, .mcount and .mexitcount
entry points were optimized too well to have the minimum number of
instructions (3 instructions each if profiling is not enabled) and
they did this. I didn't see a significant number of cache misses for
.mexitcount, but for the shared "ret" for __mcount and .mcount I
observed cache misses costing 26 cycles each. For a send(2) syscall
that makes about 70 function calls, the cost of these cache misses
alone increased the syscall time from about 4000 cycles to about 7000
cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled;
after this fix, configuring profiling only costs about 600 cycles in the
4000, which is consistent with almost perfect branch prediction in the
mcounting calls.


174066 29-Nov-2007 bde

Remove entry points for -finstrument functions since they are currently
unused except to obfuscate disassemblies. -mprofiler-epilogue is
currently with gcc-4 (it does too little), but -finstrument-functions
is broken in a different way (it does too much).

amd64 version: meger whitespace fixes from i386 version.


174056 28-Nov-2007 alc

Account for pv entry pages in the total number of wired pages. (Note: pv
entry pages have always been included in the total number of wired pages
on i386 just not amd64.)

MFC after: 6 weeks


173988 27-Nov-2007 jhb

Remove the 'needbounce' variable from the _bus_dmamap_load_buffer()
routine. It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.

MFC after: 1 week
Reviewed by: scottl


173855 23-Nov-2007 jkoshy

MFP4: Add assembly language symbols used by hwpmc(4)'s callchain capture.


173799 21-Nov-2007 scottl

Extend critical section coverage in the low-level interrupt handlers to
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.

Reviewed by: jhb, jeff, silby


173708 17-Nov-2007 alc

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


173659 15-Nov-2007 jhb

Add support for cross double fault frames in stack traces:
- Populate the register values for the trapframe put on the stack by the
double fault handler.
- Teach DDB's trace routine to treat a double fault like other trap frames.

MFC after: 3 days


173615 14-Nov-2007 marcel

o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().

i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().

PR: ia64/118024


173601 14-Nov-2007 julian

A bunch more files that should probably print out a thread name
instead of a process name.


173600 14-Nov-2007 julian

generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.


173370 05-Nov-2007 alc

Add comments explaining why all stores updating a non-kernel page table
must be globally performed before calling any of the TLB invalidation
functions.

With one exception, on amd64, this requirement was already met. Fix this
one case. Also, as a clarification, change an existing atomic op into a
release. (Suggested by: jhb)

Reported and reviewed by: ups
MFC after: 3 days


173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


173296 03-Nov-2007 alc

Eliminate spurious "Approaching the limit on PV entries, ..."
warnings. Specifically, whenever vm_page_alloc(9) returned NULL to
get_pv_entry(), we issued a warning regardless of the number of pv
entries in use. (Note: The older pv entry allocator in RELENG_6 does
not have this problem.)

Reported by: Jeremy Chadwick

Eliminate the direct call to pagedaemon_wakeup() by get_pv_entry().
This was a holdover from earlier times when the page daemon was
responsible for the reclamation of pv entries.

MFC after: 5 days


173118 28-Oct-2007 jhb

- Add constants for the different memory types in the SMAP table.
- Use the SMAP types and constants from <machine/pc/bios.h> in the boot
code rather than duplicating it.


173061 27-Oct-2007 jhb

Don't test the APIC flag in the cpuid features for amd64 to see if a
local APIC is present or not. All amd64 CPUs have a local APIC and some
BIOSen don't set the CPUID_APIC flag.

MFC after: 1 week


172937 24-Oct-2007 jhb

Update copyright attribution.

MFC after: 3 days


172394 30-Sep-2007 marius

Make the PCI code aware of PCI domains (aka PCI segments) so we can
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.

Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)


172212 17-Sep-2007 peter

Fix an undefined symbol that as/ld neglected to flag as a problem. It
was used in assembler code in such a way that no unresolved relocation
records were generated, so ld didn't flag the problem. You can see
this with an 'nm' of the kernel. There will be 'U MAXCPU' on SMP systems.

The impact of this is that the intrcount/intrnames arrays do not have
the intended amount of space reserved. This could lead to interesting
problems due to the arrays being present in the middle of kernel code.
An overflow would be rather interesting as executable code would be used
as per-cpu incrementing interrupt counters.

This fixes it for now by exporting MAXCPU to the assembler. A better fix
might be to define these data structures in C - they're only referenced
in the kernel from C code these days anyway.

Approved by: re (kensmith)


172207 17-Sep-2007 jeff

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


172189 15-Sep-2007 alc

It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)


172144 11-Sep-2007 attilio

This is a follow-up, cleaning-up commit about recent changes involving
topology foo functions.
Working at the patch for topology problems in ia32/amd64 evicted some
problems regarding functions ordering in the SI_SUB_CPU family of
SYSINIT'ed subsystems.
In order to avoid problems with new modified to involved functions, a
correct ordering is not semantically specified for SI_SUB_CPU functions
(for a larger view of the issue please visit:
http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html )

Discussed with: peter
Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org>
Approved by: jeff
Approved by: re


171916 22-Aug-2007 jkoshy

Assign sizes to assembly language support functions.

Approved by: re (kensmith)


171907 21-Aug-2007 alc

In general, when we map a page into the kernel's address space, we no
longer create a pv entry for that mapping. (The two exceptions are
mappings into the kernel's exec and pipe submaps.) Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default. This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.

Approved by: re (kensmith)


171702 02-Aug-2007 peter

Move mp_topology() from apic_init(i386) and apic_setup_local(amd64) to
cpu_start_mp(). This is after we have read the cpuid registers to
calculate the hyperthreading_cpus value for the sysctl that enables or
disables hyperthread cores. Change mp_topology() to use that information
rather than trying to do it itself.

This solves the problem of ULE being incorrectly told that dual core
Athlon64 X2 or Operton cpus are hyperthreading cores. At the very least,
we now have a single piece of code to identify hyperthreading.

Obtained from: jhb
Approved by: re (kensmith)


171597 26-Jul-2007 jhb

If the trap number stored in the trapframe is corrupted into a negative
value, then we would use a negative index into the trap_msg[] array
resulting in a nested page fault. Make the 'type' variable holding the
trap number unsigned to avoid this.

MFC after: 2 weeks
Approved by: re (rwatson)


171481 17-Jul-2007 jeff

- Optimize the amd64 cpu_switch() TD_LOCK blocking and releasing to
require fewer blocking loops.
- Don't use atomic ops with 4BSD or on UP.
- Only use the blocking loop if ULE is compiled in.
- Use the correct memory barrier.

Discussed with: attilio, jhb, ssouhlal
Tested by: current@
Approved by: re


171128 01-Jul-2007 alc

Pages that do belong to an object and page queue can now be freed without
holding the page queues lock. Thus, the page table pages released by
pmap_remove() and pmap_remove_pages() can be freed after the page queues
lock is released.

Approved by: re (kensmith)


170867 17-Jun-2007 mjacob

Check for pte being NULL in return from pmap_pte_pde- unlikely or
even impossible, but it's better ot have a panic and a quiesced
gcc4.2.


170866 17-Jun-2007 mjacob

Initialize lastaddr to zero to make gcc4.2 happy.


170564 11-Jun-2007 mjacob

Check against maxsegsz being zero in bus_dma_tag_create and return EINVAL
if it is.

Reviewed by: scott long


170517 10-Jun-2007 attilio

Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)


170368 06-Jun-2007 davidxu

Backout experimental adaptive-spin umtx code.


170340 05-Jun-2007 jhb

Move a warning under bootverbose as no machines that trigger it have ended
up being broken.


170310 05-Jun-2007 jeff

- Add a new argument to cpu_switch. This is a pointer to a mutex that
oldthread should point at before we return.
- When cpu_switch() is called the td_lock pointer in the old thread may
point at the blocked lock. This prevents other processors from
switching into this thread while we're still switching out. Wait
until we're done deactivating the vmspace before we release the
thread by assigning to td_lock.
- Before we can activate the new vmspace we must make sure that the new
thread is not assigned to the blocked lock. It may be in the process
of switching out on another cpu. Spin until the new thread is
available.


170309 05-Jun-2007 jeff

- Expose td_lock to assembly so it may be used in cpu_switch().


170307 05-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170305 04-Jun-2007 jeff

- Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170304 04-Jun-2007 jeff

Commit 11/14 of sched_lock decomposition.
- There is no globally visible scheduler lock any longer. For now the
watchdog can only check Giant. This model of checking particular locks
is flawed and should be revisited. Other metrics should be considered.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170303 04-Jun-2007 jeff

Commit 10/14 of sched_lock decomposition.
- Use sched_throw() rather than replicating the same cpu_throw() code for
each architecture. This also allows the scheduler to use any locking it
may want to.
- Use the thread_lock() rather than sched_lock when preempting.
- The scheduler lock is not required to synchronize release_aps.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170291 04-Jun-2007 attilio

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


170289 04-Jun-2007 dwmalone

Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported. In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.


170253 03-Jun-2007 alc

Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] and dump_avail[] using one of these
definitions.

Approved by: re


170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


170162 31-May-2007 piso

In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb


170135 30-May-2007 des

MFi386: PDCM, remove pointless message

MFC after: 3 days


170086 29-May-2007 yongari

Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by: scottl


170028 27-May-2007 rwatson

Remove "XXX Giant" comments before calls to kdb_trap() -- the kernel
debugger is quite capable of handling Giant-free execution at this
point. Several other similar comments remain in trap.c on both i386
and amd64 awaiting analysis.


169895 23-May-2007 kib

Move futex support code from <arch>/support.s into linux compat directory.
Implement all futex atomic operations in assembler to not depend on the
fuword() that does not allow to distinguish between -1 and failure return.
Correctly return 0 from atomic operations on success.

In collaboration with: rdivacky
Tested by: Scot Hetzel <swhetzel gmail com>, Milos Vyletel <mvyletel mzm cz>
Sponsored by: Google SoC 2007


169846 22-May-2007 kan

Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.


169805 20-May-2007 jeff

- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.

Suggested by: julian@
Contributed by: attilio@


169731 19-May-2007 kan

Remove extern struct pcpu __pcpu[]; from the header file and
move it the the only file where it appears to be used.


169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


169395 08-May-2007 jhb

Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
starting any of the APs up rather than doing it while starting up the
APs. This step is now where we honor MAXCPU.

MFC after: 1 week


169391 08-May-2007 jhb

Minor fixes and tweaks to the x86 interrupt code:
- Split the intr_table_lock into an sx lock used for most things, and a
spin lock to protect intrcnt_index. Originally I had this as a spin lock
so interrupt code could use it to lookup sources. However, we don't
actually do that because it would add a lot of overhead to interrupts,
and if we ever do support removing interrupt sources, we can use other
means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
determine if a source is enabled or not. This allows us to notice when
a source is no longer in use. When that happens, we now invoke a new
PIC method (pic_disable_intr()) to inform the PIC driver that the
source is no longer in use. The I/O APIC driver frees the APIC IDT
vector when this happens. The MSI driver no longer needs to have a
hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
complement apic_enable_vector() and use it in the I/O APIC and MSI code
when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
IRQ to its irq_rman. The MSI code uses this when it creates new
interrupt sources to let the nexus know about newly valid IRQs.
Previously the msi_alloc() and msix_alloc() passed some extra stuff
back to the nexus methods which then added the IRQs. This approach is
a bit cleaner.
- Change the MSI sx lock to a mutex. If we need to create new sources,
drop the lock, create the required number of sources, then get the lock
and try the allocation again.


169320 06-May-2007 piso

Bring in the reminaing bits to make interrupt filtering work:

o push much of the i386 and amd64 MD interrupt handling code
(intr_machdep.c::intr_execute_handlers()) into MI code
(kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
(intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
and make them part of 'struct intr_event', passing them as arguments to
kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
with filter and ithread functions.

Approved by: re (implicit?)


169221 02-May-2007 jhb

Revamp the MSI/MSI-X code a bit to achieve two main goals:
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.

The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.

MFC after: 1 month


169042 25-Apr-2007 ariff

Disable C1 Enhanced mode on AMD K8 Family Revision F and above to keep
local APIC timer alive.

Reviewed by: jhb
PR: i386/104678
MFC after: 3 days


169030 24-Apr-2007 jhb

Fix the triple fault used as a last resort during a reboot to actually
fault. The previous method zero'd out the page tables, invalidated the
TLB, and then entered a spin loop. The idea was that the instruction after
the TLB invalidate would result in a page fault and the page fault and
subsequent double fault wouldn't be able to determine the physical page
for their fault handlers' first instruction. This stopped working when
PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3
reload doesn't clear TLB entries with PG_G set. Thus, the CPU was still
able to map the virtual address for the spin loop and happily performed
its infinite loop.

The triple fault now uses a much more deterministic sledge-hammer approach
to generate a triple fault. First, the IDT descriptor is set to point to
an empty IDT, so any interrupts (including a double fault) will instantly
fault. Second, we trigger a int 3 breakpoint to force an interrupt and
kick off a triple fault.

MFC after: 3 days


169029 24-Apr-2007 jhb

MFi386: Attempt to reset the machine using the Reset Control register and
Fast A20 and Init register if the keyboard reset doesn't work before
resorting to a triple fault.


168930 21-Apr-2007 ups

Modify TLB invalidation handling.

Reviewed by: alc@, peter@
MFC after: 1 week


168822 17-Apr-2007 jhb

Honor the BUS_DMA_NOCACHE flag to bus_dmamem_alloc() on amd64 and i386 by
mapping the pages as UC (uncacheable) using pmap_change_attr().

MFC after: 1 week
Requested by: ariff
Reviewed by: scottl


168691 13-Apr-2007 alc

Eliminate the misuse of PG_FRAME to truncate a virtual address to a virtual
page boundary.

Reviewed by: ru@


168109 31-Mar-2007 jkim

Correct BB-profiling and adjust comments.

Pointed out by: bde
Reviewed by: bde


168102 30-Mar-2007 jkim

Fix off-by-4 error in address validation for i386, reduce PCB reloading, and
fix more style(9) nits.

Pointed out by: bde
Discussed with: kib
Reviewd by: bde


168088 30-Mar-2007 jkim

Fix more style(9) nits[1] and remove unnecessary use of '#if !defined(_KERNEL)'.

Pointed out by: bde[1]


168078 30-Mar-2007 jkim

Use the same wisdom of sys/i386/i386/support.s 1.97 to remove obfuscation.

Pointed out by: bde


168037 30-Mar-2007 jkim

MFP4: Linux futex support for amd64.

Initial patch was submitted by kib and additional work was done
by Divacky Roman.

Tested by: emulation


168035 30-Mar-2007 jkim

MFP4: Linux set_thread_area syscall (aka TLS) support for amd64.

Initial version was submitted by Divacky Roman and mostly rewritten by me.

Tested by: emulation


167912 26-Mar-2007 kris

Remove unnecessary giant acquisition around panic in #ifdef DIAGNOSTIC
code.

# There is some question about whether this code is even relevant any
# longer (it dates back to prehistoric times, i.e. present in r1.1),
# especially on amd64.

Reviewed by: jhb


167905 26-Mar-2007 njl

Add an interface for drivers to be notified of changes to CPU frequency.
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change. cpufreq_post_change provides the results of the
change (success or failure). cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed. Hook
in all the drivers I could find that needed it.

* TSC: update TSC frequency value. When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq. This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.

Reviewed by: bde, phk
MFC after: 1 month


167767 21-Mar-2007 jhb

Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().

One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first. nyan@ has already fixed all of the PC-98
drivers.


167747 20-Mar-2007 jhb

Add a new apic0 psuedo-device to claim memory resources for the memory
address ranges used by local and I/O APICs in the system. Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.


167745 20-Mar-2007 jhb

Add a new ram0 pseudo-device that claims memory resouces for physical
addresses corresponding to system RAM. On amd64 ram0 uses the SMAP
and claims all the type 1 SMAP regions. On i386 ram0 uses the
dump_avail[] array. Note that on i386 we have to ignore regions above
4G in PAE kernels since bus resources use longs.


167744 20-Mar-2007 jkim

- Add macros for newly added CPUID bits in the corresponding header files.
- Use correct capticalization in xTPR as Intel uses in their documents.
- Use proper description instead of vendor code name in comment.


167742 20-Mar-2007 jhb

Tweak the probe/attach order of devices on the x86 nexus devices.
Various BIOS-related psuedo-devices are added at an order of 5. acpi0 is
added at an order of 10, and legacy0 is added at an order of 11.


167741 20-Mar-2007 jhb

MFi386 1.173: Display two new Intel feature bits.


167493 12-Mar-2007 jkim

Add another CPUID for AMD CPUs and fix style(9) while I am here.


167423 10-Mar-2007 alc

Completely eliminate "avail_start". It serves no useful purpose.


167364 09-Mar-2007 jhb

Defer calling lapic_init() until we've completed the 'MPTable: <...>'
printf. Otherwise, printfs inside of lapic_init() (such as during a
verbose boot) can uglify the output.


167352 09-Mar-2007 mohans

Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.


167277 06-Mar-2007 scottl

Don't increment total_bounced when doing no-op dmamap_sync ops.


167273 06-Mar-2007 jhb

Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid))
rather than local APIC IDs to keep track of CPUs which can handle
interrupts.


167250 05-Mar-2007 alc

Acquiring smp_ipi_mtx on every call to pmap_invalidate_*() is wasteful.
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor. This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.

Reviewed by: jhb
MFC after: 3 weeks


167247 05-Mar-2007 jhb

Use vm_paddr_t rather than uintptr_t when passing the physical address of
APICs to lapic_init() and ioapic_create().


167240 05-Mar-2007 jhb

Add a simple device driver to "eat" any I/O APICs that show up as PCI
devices.

MFC after: 1 week


166922 23-Feb-2007 jhb

Use ih_filter instead of ih_handler in a couple of places. This fixes
most INTR_FAST handlers on i386.

Reviewed by: piso


166901 23-Feb-2007 piso

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


166823 19-Feb-2007 kib

MFi386 rev. 1.544 of i386/i386/pmap.c:
Rounding addr upwards to next 2M boundary in pmap_growkernel() could
cause addr to become 0, resulting in an early return without populating
the last PDE.

Reported and tested by: kris
Suggested by: alc
MFC after: 1 week


166810 18-Feb-2007 alc

Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.


166776 15-Feb-2007 jhb

Add bootverbose printfs to indicate which IDT vectors are assigned to MSI
interrupts.


166569 08-Feb-2007 jhb

Don't send interrupts to CPUs disabled via lapic hints.

Reported by: Ludger Bolmerg <lbolmerg ! web.de>
MFC after: 3 days
Pointy hat to: jhb


166283 27-Jan-2007 jkoshy

Use a known good stack at the time of servicing an NMI --- reuse
the space allocated for the double fault handler since this space
is otherwise unused till the time a double fault occurs.

This change should have been committed alongside r1.127 of
"exception.S", but I somehow missed doing so.

Problem reported by: jeff
Pointy hat to: jkoshy


166187 23-Jan-2007 jeff

- Allow the schedulers to IPI_PREEMPT idlethread. This puts the decision
for this behavior on the initiator side.


166186 23-Jan-2007 bde

Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize. There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work. Split it up a bit more and do the first part as
late as possible to document the necessary order. The functions that
implement the split are still bogusly exported.

Cleaned up initialization of the i8254 clock hardware using the new
split. Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.

This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug. The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.


166176 22-Jan-2007 jhb

Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support.
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.

Tested by: scottl


165947 11-Jan-2007 jhb

Remove magic from rman_activate_resource() that uses the direct map at
KERNBASE for the first 1 MB of RAM instead of calling pmap_mapdev().
pmap_mapdev() knows how to handle the first 1 MB (and has known for a
while now) and properly maps the memory as UC to boot.

MFC after: 2 weeks


165929 11-Jan-2007 jeff

- Use the correct test in the ipi bitmask handler for IPI_PREEMPT so that
we actually issue preemptions.
- Remove the #ifdef IPI_PREEMPTION so it is always compiled in. Leave
the option which optionally enables support in sched_4bsd. sched_ule.c
will soon use this functionality as a run time rather than compile time
option.
- Compare against the idlethread rather than the priority. There are some
idle prio tasks that we can preempt.

Discussed with: ups
Tested on: i386, amd64


165918 09-Jan-2007 jkim

Add SSSE3 extensions and correct CNXT-ID spelling for Intel processors.


165479 23-Dec-2006 davidxu

Fix a panic when rebooting a SMP machine, when option STOP_NMI is used,
nmi handler is used to stop other processors, nmi hander calls trap(),
however, trap() now accepts a pointer rather than a reference, this was
changed by kmacy@.


165369 20-Dec-2006 davidxu

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


165303 17-Dec-2006 kmacy

Newer versions of gcc don't support treating structures passed by value
as if they were really passed by reference. Specifically, the dead stores
elimination pass in the GCC 4.1 optimiser breaks the non-compliant behavior
on which FreeBSD relied. This change brings FreeBSD up to date by switching
trap frames to being explicitly passed by reference.

Reviewed by: kan
Tested by: kan


165128 12-Dec-2006 jhb

Give Host-PCI bridge drivers their own pcib_alloc_msi() and
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.


165125 12-Dec-2006 jhb

Add a function to return the MD interrupt source cookie associated with
an interrupt event. Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.


164951 06-Dec-2006 sobomax

Allow machdep.cpu_idle_hlt to be set from the loader. This should allow
to workaround the problem with SMP kernels on Turion64 X2 processors
described in kern/104678 and may be useful in other situations too.

MFC after: 3 days


164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


164912 05-Dec-2006 ru

Use a different bitmask for superpages' base address so that it
doesn't conflict with the PG_PDE_PAT bit. (We still don't mask
off all the reserved bits but that's okay for now.)

Reviewed by: alc


164760 30-Nov-2006 jb

Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.


164726 28-Nov-2006 ru

Differentiate between data and instruction fetch in the fatal
page fault trap handler.

Reviewed by: alc


164565 23-Nov-2006 ru

Use a define instead of a "magic" value.


164564 23-Nov-2006 ru

Finish the PG_NX support at the pmap level.

Reviewed by: alc


164413 19-Nov-2006 alc

The global variable avail_end is redundant and only used once. Eliminate
it. Make avail_start static to the pmap on amd64. (It no longer exists
on other architectures.)


164365 17-Nov-2006 jhb

Add support for 8 byte hardware watches in long mode. Kernel hardware
watches support 8 byte watches. For userland, we disallow 8 byte watches
for 32-bit tasks.


164362 17-Nov-2006 jhb

- Add macro constants for the various fields in %dr7 and use them in place
of various scattered magic values.
- Pretty print the address of hardware watchpoints in 'show watch' rather
than just displaying hex.
- Expand address field width on amd64 for 64-bit pointers.


164358 17-Nov-2006 jhb

Trim some noise from bootverbose:
- Drop the printf in intr_machdep.c when we assign an interrupt souce to
a CPU. Each source already has a more detailed printf.
- Don't output a line for each ioapic pin showing its initial state, this
has outlived its usefulness.
- When an APIC enumerator sets the bus, polarity, or trigger mode of an
ioapic pin, just return success without printing anything if the new
value matches the current one.

MFC after: 2 weeks


164357 17-Nov-2006 jhb

A few more style fixes.


164303 15-Nov-2006 jhb

Various whitespace and style fixes.


164301 15-Nov-2006 jhb

Fix a typo that broke MSI (MSI-X worked fine) in the later revisions of
the MSI patches.


164265 13-Nov-2006 jhb

MD support for PCI Message Signalled Interrupts on amd64 and i386:
- Add a new apic_alloc_vectors() method to the local APIC support code
to allocate N contiguous IDT vectors (aligned on a M >= N boundary).
This function is used to allocate IDT vectors for a group of MSI
messages.
- Add MSI and MSI-X PICs. The PIC code here provides methods to manage
edge-triggered MSI messages as x86 interrupt sources. In addition to
the PIC methods, msi.c also includes methods to allocate and release
MSI and MSI-X messages. For x86, we allow for up to 128 different
MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs,
16-254 for APIC PCI IRQs, and IRQ 255 is reserved).
- Add pcib_(alloc|release)_msi[x]() methods to the MD x86 PCI bridge
drivers to bubble the request up to the nexus driver.
- Add pcib_(alloc|release)_msi[x]() methods to the x86 nexus drivers that
ask the MSI PIC code to allocate resources and IDT vectors.

MFC after: 2 months


164263 13-Nov-2006 jhb

Various fixes:
- Remove an extra entry from the array for 0x0f prefixed instruction groups.
This fixes decoding of instructions where the second opcode >= 0x80.
- Add support for the 64-bit immediate mov instructions.
- When short_addr is enabled, don't parse the modr/m byte for a 16-bit
address, but as a 32-bit address.
- Support %rip relative addressing.
- Don't print a displacement of 0 if there is a base or index register.

MFC after: 3 days


164229 12-Nov-2006 alc

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


164078 07-Nov-2006 ru

Spelling.


164077 07-Nov-2006 ru

Line up memory amount reporting that got broken when s/real/usable/.


164064 07-Nov-2006 jhb

Remove duplicate IDTVEC macro definition, it's already defined in
<machine/intr_machdep.h>.


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163858 01-Nov-2006 jb

Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by: scottl, jhb
MFC after: 2 weeks


163756 29-Oct-2006 bde

Removed some SMP ifdefs so that using the TSC as a cputime clock is
not completely decided at config time. Just don't default to using
the TSC if there are multiple active CPUs. Also, don't default to
using the TSC if it is broken. SMP ifdefs are still used to disallow
using perfmon since perfmon is always broken if SMP is just configured.

This only helps much for SMP kernels running on 1 CPU. The overheads
for using the i8254 cputime clock were a bit too high on 486/33's, and
now on multi-GHz CPUs they are usually in the 99-99.9% range. Switching
from the old default of an i8254 clock to the TSC works poorly because
the overheads are not recalibrated.

Use the same condition for declaring perfmon stuff as for using it.


163727 28-Oct-2006 bde

Cleaned up includes. <machine/profile.h> was unused. <machine/timerreg.h>
was only used in the GUPROF case, so the messes to get its i386 prerequisites
included shouldn't have been needed.

Fixed some style bugs. Quote #error contents, and don't repeat an #error
directive on amd64.


163726 28-Oct-2006 bde

Removed all traces of HIDENAME() in amd64 and i386 kernel code. Using
this used to be slightly cleaner than using ifdefs in a few places to
support both a.out and elf, but using it now just causes messes and
unportabilities. It seems to be impossible to implement the elf
HIDENAME() portably in cpp (since token pasting of "." and <name> is
invalid).

*/prof_machdep.c:
- Removed all uses of CNAME(). CNAME() is easy enough to use in pure
asm code, but using it in inline asm requires messy quoting. The
core pure asm code has been hacked on more and all uses of CNAME() in
it have already gone away. Just assume the elf convention here too.
- Removed now-uneeded include of <machine/asmacros.h>.
- Removed the workaround for a namespace conflict with this include.


163722 27-Oct-2006 bde

Don't call mexitcount or provide a stub mexitcount to call when
profiling is configured but high resolution profiling is not configured.
Only functions in *.[Ss] called the stub, so efficiency was not
significantly affected.


163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


163603 22-Oct-2006 alc

Eliminate unnecessary PG_BUSY tests.


163534 20-Oct-2006 bde

Don't show debug registers in "show registers". Special registers should
be displayed specially, and debug registers are among of the least
interesting special registers (far behind %cr3). The debug registers
are still accessible as variables and displayed in another bogus place
("show watches").


163449 17-Oct-2006 davidxu

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


163442 16-Oct-2006 jhb

Add one more include to fix the case of !DDB and !atpic.


163386 15-Oct-2006 hrs

Add a newline to the printf().

Spotted by: Peter Carah <pete@altadena.net>
MFC after: 3 days


163317 13-Oct-2006 jhb

Move the 2 additional #includes down into the #ifndef DEV_ATPIC section.


163286 13-Oct-2006 jb

Attempt to fix the GENERIC kernel build which has been failing on
tinderbox for a couple of days.


163267 12-Oct-2006 jhb

Fix nodevice atpic compile.

Pointy hat to: jhb


163219 10-Oct-2006 jhb

Change the x86 interrupt code to suspend/resume interrupt controllers
(PICs) rather than interrupt sources. This allows interrupt controllers
with no interrupt pics (such as the 8259As when APIC is in use) to
participate in suspend/resume.
- Always register the 8259A PICs even if we don't use any of their pins.
- Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't
included.
- Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC
on resume. This gets suspend/resume working with APIC on UP systems.
SMP still needs more work to bring the APs back to life.

The MFC after is tentative.

Tested by: anholt (i386)
Submitted by: Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3)
MFC after: 1 week


162958 02-Oct-2006 phk

Second part of a little cleanup in the calendar/timezone/RTC handling.

Split subr_clock.c in two parts (by repo-copy):
subr_clock.c contains generic RTC and calendaric stuff. etc.
subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c. They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162713 27-Sep-2006 sobomax

Extend comment explaining why code is conditional at !defined(SCHED_ULE).

Suggested by: ru


162708 27-Sep-2006 sobomax

Since ULE doesn't honor hlt_cpus_mask don't compile code that prevents
timer interrupt servicing for disabled HTT cores in ULE case. Should be
probably fixed in ULE code instead, but we have no real maintainer for
ULE to do it.

PR: 103697


162378 17-Sep-2006 davidxu

Make cpu_set_upcall_kse() and cpu_set_user_tls() work for 32bit process.


162233 11-Sep-2006 jhb

Add a new ddb command 'show lapic' to dump details about the local APIC
registers for the current CPU.

MFC after: 3 days


162232 11-Sep-2006 jhb

Actually hook up the IPI_INVLCACHE IDT vectors backing
pmap_invalidate_cache() in the SMP case so pmap_mapdev() in multiuser
doesn't panic with a trap 30. I broke this many months ago when I
added pmap_invalidate_cache() as early parts of the PAT work.

Patience from: jmg
Pointy hat: jhb


162224 11-Sep-2006 jhb

- Fix rman_manage_region() to be a lot more intelligent. It now checks
for overlaps, but more importantly, it collapses adjacent free regions.
This is needed to cope with BIOSen that split up ports for system devices
(like IPMI controllers) across multiple system resource entries.
- Now that rman_manage_region() is not so dumb, remove extra logic in the
x86 nexus drivers to populate the IRQ rman that manually coalesced the
regions.

MFC after: 1 week


162112 07-Sep-2006 jhb

Use a single constant to define the sizes of the physmap[], phys_avail[],
and dump_avail[] arrays so they are in sync (previously it was possible
to store more entries in the physmap[] then we could store in phys_avail[],
which was pointless). While I'm here, bump up the length of these tables
to hold 30 entries on amd64 and 16 on i386. This allows machines with
fairly fragmented memory maps to boot ok (at least one machine would
not boot FreeBSD/i386 but would boot FreeBSD/amd64 because amd64 allowed
for more fragments).

MFC after: 3 days


162087 06-Sep-2006 sobomax

Unbreak in the case when device apic is compiled into non-SMP kernel.

Reported by: jhay
MFC after: 2 weeks


162042 05-Sep-2006 sobomax

The FreeBSD by default "disables" hyper-threading cores, by not scheduling
any threads to them. However, it still counts those cores as "active but
permanently idle" when calculating system-wide CPUs statistics. It is
incorrect, since it skews statistics quite a bit and creates real problems
for certain types of applications (monitoring applications for example),
by making them believe that the system does have enough idle CPU resources,
while in fact it does not.

Correct the problem by not calling performance counting routines on "disabled"
cores. The cleaner solution would be to just disable APIC timer interrupts on
those cores completely, but ENOTIME here and it is not clear if the
additional complexity really worth minor performance gain.

Reviewed by: ssouhlal
Sponsored by: Sippy Software, Inc.
MFC after: 2 weeks


161675 28-Aug-2006 davidxu

Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.


161366 16-Aug-2006 davidxu

Change xorq back to xorl.

Noticed by: bde


161342 15-Aug-2006 davidxu

Backout revision 1.117, xorl and xorq have same result, but xorq needs
longer decoding.


161308 15-Aug-2006 davidxu

Because fuword on AMD64 returns 64bit long integer -1 on fault, clear
entire %rax to zero instead of only clearing %eax, otherwise it will
leave garbage data in upper 32 bits.


161292 14-Aug-2006 alc

Eliminate an unnecessary initialization from trap_pfault() that also
happens to contain a style error.


161285 14-Aug-2006 jhb

Don't try to preserve PAT bits in pmap_enter(). We currently on pages that
aren't mapped via pmap_enter() (KVA). We will eventually support PAT bits
on user pages, but those will require some sort of MI caching mode stored
in the vm_page.

Reviewed by: alc


161272 14-Aug-2006 alc

It's not entirely obvious that PGEX_I must be zero if no-execute is neither
supported nor enabled. Just to be sure, verify that no-execute is enabled
before passing VM_PROT_EXECUTE to vm_fault().

Suggested by: tegge@


161223 11-Aug-2006 jhb

First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR). Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
additional parameter (relative to pmap_mapdev()) specifying the cache
mode for this mapping. Note that on amd64 only WB mappings are done with
the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
mappings rather than WB. Previously we relied on the BIOS setting up
MTRR's to enforce memio regions being treated as UC. This might make
hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
that used pmap_mapdev() to map non-device memory (such as ACPI tables)
to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
caching mode for a range of KVA.

Reviewed by: alc


161066 08-Aug-2006 alc

Pass VM_PROT_EXECUTE to vm_fault() instead of VM_PROT_READ if the page
fault was caused by an instruction fetch.


161016 06-Aug-2006 alc

Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().


160889 01-Aug-2006 alc

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


160869 01-Aug-2006 obrien

Correct spelling of 3DNow!.


160801 28-Jul-2006 jhb

Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.


160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


160773 27-Jul-2006 jhb

Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
section.
- Add a new explicit check before userret() to make sure we don't return
with any locks held. The advantage here is that we can include the
syscall number and name in syscall() whereas that info is not available
in userret().
- Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by
the more general checks just added.

MFC after: 2 weeks


160763 27-Jul-2006 jhb

Don't allow MAXMEM or hw.physmem to extend the top of memory if our memory
map was obtained from the SMAP. SMAP is trustworthy, and the memory
extending feature is a band-aid for older systems where FreeBSD's methods
of detecting memory were not always trustworthy. This fixes the issue
where using hw.physmem could result in the ACPI tables getting trashed
breaking ACPI.

MFC after: 3 days
Tested on: i386


160617 24-Jul-2006 davidxu

Remove a duplicated line.


160525 20-Jul-2006 alc

Add pmap_clear_write() to the interface between the virtual memory
system's machine-dependent and machine-independent layers. Once
pmap_clear_write() is implemented on all of our supported
architectures, I intend to replace all calls to pmap_page_protect() by
calls to pmap_clear_write(). Why? Both the use and implementation of
pmap_page_protect() in our virtual memory system has subtle errors,
specifically, the management of execute permission is broken on some
architectures. The "prot" argument to pmap_page_protect() should
behave differently from the "prot" argument to other pmap functions.
Instead of meaning, "give the specified access rights to all of the
physical page's mappings," it means "don't take away the specified
access rights from all of the physical page's mappings, but do take
away the ones that aren't specified." However, owing to our i386
legacy, i.e., no support for no-execute rights, all but one invocation
of pmap_page_protect() specifies VM_PROT_READ only, when the intent
is, in fact, to remove only write permission. Consequently, a
faithful implementation of pmap_page_protect(), e.g., ia64, would
remove execute permission as well as write permission. On the other
hand, some architectures that support execute permission have
basically ignored whether or not VM_PROT_EXECUTE is passed to
pmap_page_protect(), e.g., amd64 and sparc64. This change represents
the first step in replacing pmap_page_protect() by the less subtle
pmap_clear_write() that is already implemented on amd64, i386, and
sparc64.

Discussed with: grehan@ and marcel@


160419 17-Jul-2006 alc

Now that free_pv_entry() accesses the pmap, call free_pv_entry() in
pmap_remove_all() before rather than after the pmap is unlocked. At
present, the page queues lock provides sufficient sychronization. In the
future, the page queues lock may not always be held when free_pv_entry() is
called.


160312 12-Jul-2006 jhb

Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.


160286 12-Jul-2006 jkim

Add two new CPUID bits for AMD CPUs, i. e., SVM and extended APIC register.


160125 06-Jul-2006 alc

Make two simplifications to pmap_ts_referenced(): Eliminate an unnecessary
test and exit the loop in a shorter way.


160106 05-Jul-2006 alc

pmap_clear_ptes() is already convoluted. This will worsen with the
implementation of superpages. Eliminate it and add pmap_clear_write().

There are no functional changes. Checked by: md5


160073 02-Jul-2006 alc

Correct an error in the new pmap_collect(), thus only affecting HEAD.
Specifically, the pv entry was always being freed to the caller's pmap
instead of the pmap to which the pv entry belongs.


160069 01-Jul-2006 alc

Tidy up pmap_ts_referenced(): Eliminate excessive white space. Eliminate
an initialized but otherwise unused variable. Explicitly check a pointer
against NULL.

There are no functional changes. Checked by: md5


160059 01-Jul-2006 alc

Eliminate the remaining uses of "register".

Convert the remaining K&R-style function declarations to ANSI-style.


159970 27-Jun-2006 alc

Correct a very old and very obscure bug: vmspace_fork() calls
pmap_copy() if the mapping is VM_INHERIT_SHARE. Suppose the mapping
is also wired. vmspace_fork() clears the wiring attributes in the vm
map entry but pmap_copy() copies the PG_W attribute in the PTE. I
don't think this is catastrophic. It blocks pmap_remove_pages() from
destroying the mapping and corrupts the pmap's wiring count.

This revision fixes the problem by changing pmap_copy() to clear the
PG_W attribute.

Reviewed by: tegge@


159934 25-Jun-2006 alc

Eliminate a comment that became stale after revision 1.540.

Wrap a nearby line.


159803 20-Jun-2006 alc

Change get_pv_entry() such that the call to vm_page_alloc() specifies
VM_ALLOC_NORMAL instead of VM_ALLOC_SYSTEM when try is TRUE. In other
words, when get_pv_entry() is permitted to fail, it no longer tries as
hard to allocate a page.

Change pmap_enter_quick_locked() to fail rather than wait if it is
unable to allocate a page table page. This prevents a race between
pmap_enter_object() and the page daemon. Specifically, an inactive
page that is a successor to the page that was given to
pmap_enter_quick_locked() might become a cache page while
pmap_enter_quick_locked() waits and later pmap_enter_object() maps
the cache page violating the invariant that cache pages are never
mapped. Similarly, change
pmap_enter_quick_locked() to call pmap_try_insert_pv_entry() rather
than pmap_insert_entry(). Generally speaking,
pmap_enter_quick_locked() is used to create speculative mappings. So,
it should not try hard to allocate memory if free memory is scarce.

Add an assertion that the object containing m_start is locked in
pmap_enter_object(). Remove a similar assertion from
pmap_enter_quick_locked() because that function no longer accesses the
containing object.

Remove a stale comment.

Reviewed by: ups@


159790 20-Jun-2006 yar

We no longer need to disable interrupts in MD trap machinery
when we're about to call kdb_trap() because the latter MI
function can disable interrupts by itself now.

Pointed out by: bde
X-MFC remark: depends on kern/subr_kdb.c#1.18
Sponsored by: RiNet (Cronyx Plus LLC)


159783 19-Jun-2006 davidxu

Add variable cpu_mxcsr_mask to save valid bits of mxcsr register.


159782 19-Jun-2006 davidxu

MFi386:
Use the method described in IA-32 Intel Architecture Software
Developer's Manual chapter 11.6.6 to get valid mxcsr bits,
use the mxcsr mask to clear invalid bits passed by user code.


159627 15-Jun-2006 ups

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


159546 12-Jun-2006 alc

Don't invalidate the TLB in pmap_qenter() unless the old mapping was valid.
Most often, it isn't.

Reviewed by: tegge@


159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


159130 01-Jun-2006 silby

After much discussion with mjacob and scottl, change bus_dmamem_alloc so
that it just warns the user with a printf when it misaligns a piece
of memory that was requested through a busdma tag.

Some drivers (such as mpt, and probably others) were asking for alignments
that could not be satisfied, but as far as driver operation was concerned,
that did not matter. In the theory that other drivers will fall into
this same category, we agreed that panicing or making the allocation
fail will cause more hardship than is necessary. The printf should
be sufficient motivation to get the driver glitch fixed.


159092 31-May-2006 mjacob

Turn the panic on not being able to meet alignment constraints
in bus_dmamem_alloc into the more reasonable EINVAL return.

Also, reclaim memory allocated but then not used if we had
an error return.


159012 28-May-2006 silby

MFi386 rev 1.78:

Add a quick hack to ensure that bus_dmamem_alloc properly aligns
small allocations with large alignment requirements.

Add a panic to detect cases where we've still failed to properly align.


158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


158445 11-May-2006 phk

Clean out sysctl machdep.* related defines.

The cmos clock related stuff should really be in MI code.


158264 03-May-2006 scottl

Allow bus_dmamap_load() to pass ENOMEM back to the caller. This puts it into
conformance with the mbuf and uio load routines. ENOMEM can only happen
with BUS_DMA_NOWAIT is passed in, thus the deferals are disabled. I don't
like doing this, but fixing this fixes assumptions in other important drivers,
which is a net benefit for now.


158238 01-May-2006 jhb

Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).

MFC after: 1 month


158236 01-May-2006 jhb

Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction. This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after: 1 month


158133 29-Apr-2006 alc

Eliminate unnecessary, recursive acquisitions and releases of the page
queues lock by free_pv_entry() and pmap_remove_pages().

Reduce the scope of the page queues lock in pmap_remove_pages().


158088 27-Apr-2006 alc

In general, bits in the page directory entry (PDE) and the page table
entry (PTE) have the same meaning. The exception to this rule is the
eighth bit (0x080). It is the PS bit in a PDE and the PAT bit in a
PTE. This change avoids the possibility that pmap_enter() confuses a
PAT bit with a PS bit, avoiding a panic().

Eliminate a diagnostic printf() from the i386 pmap_enter() that serves
no current purpose, i.e., I've seen no bug reports in the last two
years that are helped by this printf().

Reviewed by: jhb


158059 26-Apr-2006 peter

Move vm.pmap.pv_entry_count out from the PV_STATS ifdefs. It is always
available and is a real counter, not a statistic.


158007 25-Apr-2006 jkim

Check if reported HTT cores are physical cores. This commit does not
affect AMD CPUs at all because HTT bit is disabled earlier. Intel
multicore CPUs and ULE scheduler may be affected.


158004 24-Apr-2006 jkim

Add another Intel CPU feature flag, xTPR (Send Task Priority Messages).


158003 24-Apr-2006 jkim

Check if deterministic cache parameters leaf is valid before use.


158000 24-Apr-2006 cperciva

Adjust dangerous-shared-cache-detection logic from "all shared data
caches are dangerous" to "a shared L1 data cache is dangerous". This
is a compromise between paranoia and performance: Unlike the L1 cache,
nobody has publicly demonstrated a cryptographic side channel which
exploits the L2 cache -- this is harder due to the larger size, lower
bandwidth, and greater associativity -- and prohibiting shared L2
caches turns Intel Core Duo processors into Intel Core Solo processors.

As before, the 'machdep.hyperthreading_allowed' sysctl will allow even
the L1 data cache to be shared.

Discussed with: jhb, scottl
Security: See FreeBSD-SA-05:09.htt for background material.


157912 21-Apr-2006 peter

Oops. Minidumps were developed on 6.x, in without the small pv entry code.
Add some strategic dump_add_page()/dump_drop_page() lines to include pv
chunks in the minidumps - these operate in the direct map region like UMA.


157908 21-Apr-2006 peter

Introduce minidumps. Full physical memory crash dumps are still available
via the debug.minidump sysctl and tunable.

Traditional dumps store all physical memory. This was once a good thing
when machines had a maximum of 64M of ram and 1GB of kvm. These days,
machines often have many gigabytes of ram and a smaller amount of kvm.
libkvm+kgdb don't have a way to access physical ram that is not mapped
into kvm at the time of the crash dump, so the extra ram being dumped
is mostly wasted.

Minidumps invert the process. Instead of dumping physical memory in
in order to guarantee that all of kvm's backing is dumped, minidumps
instead dump only memory that is actively mapped into kvm.

amd64 has a direct map region that things like UMA use. Obviously we
cannot dump all of the direct map region because that is effectively
an old style all-physical-memory dump. Instead, introduce a bitmap
and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that
allow certain critical direct map pages to be included in the dump.
uma_machdep.c's allocator is the intended consumer.

Dumps are a custom format. At the very beginning of the file is a header,
then a copy of the message buffer, then the bitmap of pages present in
the dump, then the final level of the kvm page table trees (2MB mappings
are expanded into a 4K page mappings), then the sparse physical pages
according to the bitmap. libkvm can now conveniently access the kvm
page table entries.

Booting my test 8GB machine, forcing it into ddb and forcing a dump
leads to a 48MB minidump. While this is a best case, I expect minidumps
to be in the 100MB-500MB range. Obviously, never larger than physical
memory of course.

minidumps are on by default. It would want be necessary to turn them off
if it was necessary to debug corrupt kernel page table management as that
would mess up minidumps as well.

Both minidumps and regular dumps are supported on the same machine.


157893 20-Apr-2006 imp

Set the rid for a resoruce allocated with rman_reserve_resource.


157860 19-Apr-2006 cperciva

Correct a local information leakage bug affecting AMD FPUs.

Security: FreeBSD-SA-06:14.fpu


157850 18-Apr-2006 peter

If we're doing a try-alloc of a pv entry and give up early, do not forget
to reduce the pv_entry_count counter. This was found by Tor Egge. In the
same email, Tor also pointed out the pv_stats problem in the previous
commit, but I'd forgotten about it until I went looking for this email
about this allocation problem.


157849 18-Apr-2006 peter

pv_entry_count is more than a statistic. It is used for resource limiting.
Do not compile out its counter updates if pv entry stats are turned off.


157701 13-Apr-2006 alc

Include opt_pmap.h for PMAP_SHPGPERPROC.

PR: 94509


157680 12-Apr-2006 alc

Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge


157541 05-Apr-2006 jhb

Cache the value of the lower half of each I/O APIC redirection table entry
so that we only have to do an ioapic_write() instead of an ioapic_read()
followed by an ioapic_write() every time we mask and unmask level triggered
interrupts. This cuts the execution time for these operations roughly in
half.

Profiled by: Paolo Pisati <p.pisati@oltrelinux.com>
MFC after: 1 week


157505 04-Apr-2006 peter

Convert pv_entry_frees and pv_entry_allocs stats counters from int to long,
they wrap way too quickly.


157458 04-Apr-2006 marcel

Sync with i386: Map exceptions to signals in gdb_cpu_signal() so
that kgdb(1) gets a SIGTRAP when it needs to.

Pointed out by: grehan@


157446 03-Apr-2006 peter

Shrink the amd64 pv entry from 48 bytes to about 24 bytes. On a machine
with large mmap files mapped into many processes, this saves hundreds of
megabytes of ram.
pv entries were individually allocated and had two tailq entries and two
pointers (or addresses). Each pv entry was linked to a vm_page_t and
a process's address space (pmap). It had the virtual address and a
pointer to the pmap.
This change replaces the individual allocation with a per-process
allocation system. A page ("pv chunk") is allocated and this provides
168 pv entries for that process. We can now eliminate one of the 16 byte
tailq entries because we can simply iterate through the pv chunks to find
all the pv entries for a process. We can eliminate one of the 8 byte
pointers because the location of the pv entry implies the containing
pv chunk, which has the pointer. After overheads from the pv chunk
bitmap and tailq linkage, this works out that each pv entry has an
effective size of 24.38 bytes.

Future work still required, and other problems:
* when running low on pv entries or system ram, we may need to defrag
the chunk pages and free any spares. The stats (vm.pmap.*) show that
this doesn't seem to be that much of a problem, but it can be done if
needed.
* running low on pv entries is now a much bigger problem. The old
get_pv_entry() routine just needed to reclaim one other pv entry.
Now, since they are per-process, we can only use pv entries that are
assigned to our current process, or by stealing an entire page worth
from another process. Under normal circumstances, the pmap_collect()
code should be able to dislodge some pv entries from the current
process. But if needed, it can still reclaim entire pv chunk pages
from other processes.
* This should port to i386 really easily, except there it would reduce
pv entries from 24 bytes to about 12 bytes.

(I have integrated Alan's recent changes.)


157443 03-Apr-2006 peter

Remove the unused sva and eva arguments from pmap_remove_pages().


157394 02-Apr-2006 alc

Introduce pmap_try_insert_pv_entry(), a function that conditionally creates
a pv entry if the number of entries is below the high water mark for pv
entries.

Use pmap_try_insert_pv_entry() in pmap_copy() instead of
pmap_insert_entry(). This avoids possible recursion on a pmap lock in
get_pv_entry().

Eliminate the explicit low-memory checks in pmap_copy(). The check that
the number of pv entries was below the high water mark was largely
ineffective because it was located in the outer loop rather than the
inner loop where pv entries were allocated. Instead of checking, we
attempt the allocation and handle the failure.

Reviewed by: tegge
Reported by: kris
MFC after: 5 days


156963 21-Mar-2006 alc

Eliminate unnecessary invalidations of the entire TLB by pmap_remove().
Specifically, on mappings with PG_G set pmap_remove() not only performs
the necessary per-page invlpg invalidations but also performs an
unnecessary invalidation of the entire set of non-PG_G entries.

Reviewed by: tegge


156930 21-Mar-2006 davidxu

Remove stale KSE code.

Reviewed by: alc


156920 20-Mar-2006 jhb

Drop some unneeded casts since we program the kernel in C rather than C++.


156847 18-Mar-2006 ups

Enable global pages TLB extension on Application Processors.

MFC after: 3 days


156706 14-Mar-2006 jhb

Don't allow userland to set hardware watch points on kernel memory at all.
Previously, we tried to allow this only for root. However, we were calling
suser() on the *target* process rather than the current process. This
means that if you can ptrace() a process running as root you can set a
hardware watch point in the kernel. In practice I think you probably have
to be root in order to pass the p_candebug() checks in ptrace() to attach
to a process running as root anyway. Rather than fix the suser(), I just
axed the entire idea, as I can't think of any good reason _at all_ for
userland to set hardware watch points for KVM.

MFC after: 3 days
Also thinks hardware watch points on KVM from userland are bad: bde, rwatson


156695 13-Mar-2006 peter

MFi386: add a TRAP_INTERRUPT case


156694 13-Mar-2006 peter

Cosmetic sync with i386


156674 13-Mar-2006 ps

Fix the format/display descriptor of vm.kmem_size and vm.kmem_free
to be 'long' instead of 'int' so that sysctl(8) correctly displays
the 8 returned bytes as a single 'long' instead of two 'int' values.

Submitted by: peter


156504 09-Mar-2006 jhb

Flip the switch and don't route interrupts to hyperthreads in a HT system.
In at least one benchmark this showed around a 20% performance increase.
If other workloads do benefit from having hyperthreads service interrupts,
we can always make this a loader tunable.

MFC after: 3 days
Tested by: ps


156124 28-Feb-2006 jhb

Rework how we wire up interrupt sources to CPUs:
- Throw out all of the logical APIC ID stuff. The Intel docs are somewhat
ambiguous, but it seems that the "flat" cluster model we are currently
using is only supported on Pentium and P6 family CPUs. The other
"hierarchy" cluster model that is supported on all Intel CPUs with
local APICs is severely underdocumented. For example, it's not clear
if the OS needs to glean the topology of the APIC hierarchy from
somewhere (neither ACPI nor MP Table include it) and setup the logical
clusters based on the physical hierarchy or not. Not only that, but on
certain Intel chipsets, even though there were 4 CPUs in a logical
cluster, all the interrupts were only sent to one CPU anyway.
- We now bind interrupts to individual CPUs using physical addressing via
the local APIC IDs. This code has also moved out of the ioapic PIC
driver and into the common interrupt source code so that it can be
shared with MSI interrupt sources since MSI is addressed to APICs the
same way that I/O APIC pins are.
- Interrupt source classes grow a new method pic_assign_cpu() to bind an
interrupt source to a specific local APIC ID.
- The SMP code now tells the interrupt code which CPUs are avaiable to
handle interrupts in a simpler and more intuitive manner. For one thing,
it means we could now choose to not route interrupts to HT cores if we
wanted to (this code is currently in place in fact, but under an #if 0
for now).
- For now we simply do static round-robin of IRQs to CPUs when the first
interrupt handler just as before, with the change that IRQs are now
bound to individual CPUs rather than groups of up to 4 CPUs.
- Because the IRQ to CPU mapping has now been moved up a layer, it would
be easier to manage this mapping from higher levels. For example, we
could allow drivers to specify a CPU affinity map for their interrupts,
or we could allow a userland tool to bind IRQs to specific CPUs.

The MFC is tentative, but I want to see if this fixes problems some folks
had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work
fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose
interrupts).

MFC after: 1 week


155720 15-Feb-2006 dwmalone

It seems bit 5 of cpu_feature2 is the VMX (Virtual Machine Extensions)
bit. While I'm here, delete a comment that was cut and past from the
cpu_features code that doesn't belong here.


155534 11-Feb-2006 phk

CPU time accounting speedup (step 2)

Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64


155455 08-Feb-2006 phk

Simplify system time accounting for profiling.

Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.


155444 07-Feb-2006 phk

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


155313 04-Feb-2006 wsalamon

Call the audit syscall enter/exit functions for the amd64 architecture,
both 32-bit and 64-bit paths. System calls will now be audited.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)


155239 03-Feb-2006 davidxu

MFi386:
Clear carry flag in get_mconetxt so that setcontext does not
return a bogus error.


155234 03-Feb-2006 peter

Make PV entries dynamic on amd64. i386 has a pre-reserved block of kva
dedicated to storing pv entries, originally so that kva didn't have to be
allocated at inconvenient times. For amd64, we can get the same effect by
using the direct map area. Allocating pages is the same as with the object
backed method, but now we can just lookup the page in the direct map area.
Thus, no more pageable kva is reserved. This is the single largest
consumer of kva on our work machines and this change should help conserve
the fixed size 2GB pageable kva on the amd64 kernel.

There are a pair of sysctl nodes introduced, named the same as their
tunable counterparts. vm.pmap.shpgperproc and vm.pmap.pv_entry_max
They work just like the tunables of the same path, except the values are
linked. The pv entry cap is now dynamically changeable.

I didn't make them totally unlimited because we need some sort of safety
limit still. One could consume all physical memory without a cap.


154935 27-Jan-2006 jhb

Call WITNESS_CHECK() in the page fault handler and immediately assume it
is a fatal fault if we are holding any non-sleepable locks. This should
cut down on the number of bogus LORs we currently get when the kernel
panics due to a NULL (or bogus) pointer dereference that goes wandering
off into the VM system which tries to acquire locks and then kicks off
the spurious LORs. This should probably be ported to all the archs at
some point.

Tested on: i386


154367 14-Jan-2006 scottl

Free the newtag if we exit with a failure from alloc_bounce_zone().

Found by: Coverity Prevent(tm)


154079 06-Jan-2006 jhb

- Make pcib_devclass private to sys/dev/pci/pci_pci.c and change all the
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.


154074 06-Jan-2006 jhb

Fix various places that were testing td_critnest to see if interrupts
should remain disabled during a trap or not to check
td_md.md_spinlock_count instead.


153995 03-Jan-2006 jkim

- Explicitly validate an empty filter to match bpf_filter() comment[1].
- Do not use BPF JIT compiler for an empty filter.

[1] Pointed out by: darrenr


153947 01-Jan-2006 netchild

Unbreak kernel build.

A happy new year to all.

Submitted by: Goran Gajic <ggajic@afrodita.rcub.bg.ac.yu>, bz
Pointy hat to: netchild
Appologies to: all


153940 31-Dec-2005 netchild

MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)

MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)


153766 27-Dec-2005 pjd

Fix watch address truncation. The address was truncated when it was passed to
amd64_set_watch() as 'unsigned int' and 'unsigned int' is 32bit long on amd64.

Even with that fix hardware watchpoint don't work for me on amd64, ie. when
I set the watchpoint and write a byte there, nothing happens.


153741 26-Dec-2005 sobomax

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


153694 23-Dec-2005 jeff

- Improve the INKERNEL macro such that it can no longer give false positives.
This fixes the stack(9) functionality.

Submitted by: Antoine Brodin <antoine.brodin@laposte.net>


153666 22-Dec-2005 jhb

Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)


153504 18-Dec-2005 marcel

Make our ELF64 type definitions match standards. In particular this
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.

MFC after: 2 weeks


153469 16-Dec-2005 scottl

Don peril sensitive sunglasses and jack up the MAX_BPAGES limit to 8192
on amd64. If you're going to stuff >4GB into your box, reserving 32MB for
bonce pages amounts to a rounding error in the overall scheme of things.


153426 14-Dec-2005 jhb

Fix stale comment.


153383 13-Dec-2005 jhb

Revert previous commit. The BIOS braindamage is even worse than I
originally thought. The BIOS that cleared CPUID_APIC actually managed
to disable the local APIC entirely and even Windows 64 doesn't boot on
it.

Reported by: bz


153377 13-Dec-2005 jhb

Don't check the CPUID_APIC bit in the cpu_features flags field to determine
if the boot CPU has a local APIC because some BIOS vendors are not
competent enough to set this bit. Instead, just assume that we always have
a local APIC on amd64. For i386 the check is a bit more subtle. FreeBSD
requires either an MP Table or an ACPI MADT table to enumerate APICs. The
only systems that have one of those tables that don't have local APICs are
some presumably rare (and old) SMP 486 systems using external APICs. Thus,
instead of checking the CPUID_APIC flag, check the CPU class and abort if
we are running on a 486.

MFC after: 1 week
Reported by: bz


153268 09-Dec-2005 davidxu

Sync with i386, fix compiling for non-SMP.


153241 08-Dec-2005 jhb

MFi386:
- Move PUSH_FRAME and POP_FRAME to asmacros.h and use PUSH_FRAME in
atpic entry points.
- Move PCPU_* asm macros out of the middle of the asm profiling macros.
- Pass IRQ vector argument as an int rather than void * to reduce diffs
with i386.
- EOI the lapic in C for the lapic timer handler.
- GC unused Xcpuast function.
- Split IPI_STOP handling code of ipi_nmi_handler() out into a
cpustop_handler() function and call it from Xcpustop rather than
duplicating all the logic in assembly.
- Fixup the list of symbols with interrupt frames in ddb traces.
Xatpic_fastintr* have never existed on amd64, and the lapic timer
handler and various IPI handlers were missing.
- Use trapframe instead of intrframe for interrupt entry points (on amd64
the interrupt vector was already a separate argument, so the two frames
were already identical) and GC intrframe.

Submitted by: peter (3)


153177 06-Dec-2005 jkim

Fix ZERO_EDX() macro from the previous commit. It was emitting
`xor %ecx, %ecx', not `xor %edx, %edx'.


153157 06-Dec-2005 jkim

s/M_WAITOK/M_NOWAIT/ while mutex is held.

Pointed out by: csjp


153156 06-Dec-2005 jkim

- Micro-optimize `mov $0, %edx' -> `xor %edx, %edx'.
- Correct amd64 macro style (no functional change).


153151 06-Dec-2005 jkim

Add experimental BPF Just-In-Time compiler for amd64 and i386.

Use the following kernel configuration option to enable:

options BPF_JITTER

If you want to use bpf_filter() instead (e. g., debugging), do:

sysctl net.bpf.jitter.enable=0

to turn it off.

Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is
partially supported because 1) no need, 2) avoid expensive m_copydata(9).

Obtained from: WinPcap 3.1 (for i386)


152906 28-Nov-2005 jhb

If we get a stray interrupt, return after logging it. In the extremely
rare case of a stray interrupt to an unregistered source (such as a stray
interrupt from the 8259As when using APIC), this could result in a page
fault when it tried to walk the list of interrupt handlers to execute
INTR_FAST handlers. This bug was introduced with the intr_event changes,
so it's not present in 5.x or 6.x.

Submitted by: Mark Tinguely tinguely at casselton dot net


152775 24-Nov-2005 le

Fix typo.


152753 24-Nov-2005 ru

Add missing "struct" in i386/i386/machdep.c,v 1.497 by deischen@.


152651 21-Nov-2005 jhb

Expand the hack to mask the atpics if 'device atpic' is not in the kernel
during boot up. Now we do a full reset of the 8259As and setup a simple
interrupt handler (we actually borrow the apic one that just does an
immediate iret) to handle any spurious interrupts triggered by either chip.
This should fix some folks that were getting a Trap 30 during bootup of
certain SMP AMD systems. This might get pushed into the 6.0 branch as an
errata. For now a suitable workaround is to add 'device atpic' to your
kernel config.

Tested by: scottl
Helpful info from: dillon
MFC after: 1 week


152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


152588 18-Nov-2005 jhb

- Always print the trap number so that we have something to start with for
mystery traps. If we don't have a message for a given trap, just use
UNKNOWN for the message.
- Add trap messages for T_XMMFLT and T_RESERVED.

MFC after: 1 week


152537 17-Nov-2005 obrien

Fix spelling mistake.

Submitted by: kris


152531 16-Nov-2005 jhb

Revert a part of the previous commits to these files that made the NMI
IPI_STOP handling code use atomic_readandclear() to execute the restart
function on the first CPU to resume and restore the behavior of always
executing the restart function on the BSP since this is in fact what the
non-NMI IPI_STOP handler does. I did add back in a statement to clear
the restart function pointer after it is executed to match the behavior
of the non-NMI IPI_STOP handler.


152529 16-Nov-2005 jhb

Revert previous commit to these files. There isn't a race necessitating
an xchg instruction as we only try to execute the startup function if
the CPU ID is 0 (i.e. the BSP). I missed this earlier.


152528 16-Nov-2005 jhb

Fix a typo in the check for an invalid APIC. If we are told about an
I/O APIC that doesn't exist, then a read of the version register is going
to return -1 which is 0xffffffff not 0xffffff.

Tested on: i386
Tested by: Nikos Ntarmos ntarmos at ceid dot upatras dot gr
MFC after: 1 week


152359 13-Nov-2005 alc

In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock
cannot possibly occur.


152224 09-Nov-2005 alc

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


152076 04-Nov-2005 peter

Define M_IOAPIC the same as i386


152042 04-Nov-2005 alc

Begin and end the initialization of pvzone in pmap_init().
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().

Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.

Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.

Introduce the vm.pmap.pv_entries tunable on alpha and ia64.

Eliminate some unnecessary white space.

Discussed with: tegge (item #1)
Tested by: marcel (ia64)


151979 02-Nov-2005 jhb

Change the x86 code to allocate IDT vectors on-demand when an interrupt
source is first enabled similar to how intr_event's now allocate ithreads
on-demand. Previously, we would map IDT vectors 1:1 to IRQs. Since we
only have 191 available IDT vectors for I/O interrupts, this limited us
to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC
intpins. On many machines, however, each PCI-X bus has its own APIC even
though it only has 1 or 2 devices, thus, we were reserving between 24 and
32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors. With this
change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT
vectors. Also, this change provides an API (apic_alloc_vector() and
apic_free_vector()) that will allow a future MSI interrupt source driver to
request IDT vectors for use by MSI interrupts on x86 machines.

Tested on: amd64, i386


151910 31-Oct-2005 alc

Instead of a panic()ing in pmap_insert_entry() if get_pv_entry()
fails, reclaim a pv entry by destroying a mapping to an inactive
page.

Change the format strings in many of the assertions that were recently
converted from PMAP_DIAGNOSTIC printf()s so that they are compatible
with PAE. Avoid unnecessary differences between the amd64 and i386
format strings.


151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


151889 30-Oct-2005 alc

Replace diagnostic printf()s by assertions. Use consistent style for
similar assertions.


151724 26-Oct-2005 peter

MFi386: Various apic fixes and tweaks
* Don't recursively panic if we've already paniced and the local apic is
now stuck.
* Add hw.apic.* tunables/sysctls for extint controls
* Change "lapic%d timer" to "cpu%d timer" intname to match i386


151719 26-Oct-2005 peter

Change PHYSMAP_SIZE to allow for more memory segments. The old value was
too low for certain Dell amd64 machines.


151658 25-Oct-2005 jhb

Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)

Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)


151634 24-Oct-2005 jhb

Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
enabled if an attempt is made to send an IPI_STOP IPI. If the kernel
option is enabled, there is also a sysctl to change the behavior at
runtime (debug.stop_cpus_with_nmi which defaults to enabled). This
includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
that is restarted making use of atomic_readandclear() rather than
assuming that the BSP is always included in the set of restarted CPUs.
Also, the NMI handler didn't clear the function pointer meaning that
subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
of stoppedpcbs[] and always enable it for i386 and amd64 instead of
being dependent on KDB_STOP_NMI. It works fine in both the NMI and
non-NMI cases.


151633 24-Oct-2005 jhb

When restarting the BSP during cpu_reset() use a membar to ensure that
the updated cpustop_restartfunc is seen when the BSP resumes execution.
This matches the membar already present in restart_cpus().


151632 24-Oct-2005 jhb

Use xchg in Xcpustop to close a race and make cpustop_restartfunc truly
one-shot in the SMP case (before using the simple mov / cmp / mov sequence
could allow multiple CPUs to execute the restart function on resume).


151631 24-Oct-2005 jhb

- Various small whitespace and style nits.
- Use PCPU_GET(cpumask) in preference to 1 << PCPU_GET(cpuid) in a few
places.


151543 21-Oct-2005 ade

Specifically panic() in the case where pmap_insert_entry() fails to
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.

Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.

Reviewed by: alc, jhb
MFC after: 1 week


151431 17-Oct-2005 jkim

Redo physical/logical CPU count.

Suggested by: jhb


151429 17-Oct-2005 davidxu

Micro optimization for context switch. Eliminate code for saving gs.base
and fs.base. We always update pcb.pcb_gsbase and pcb.pcb_fsbase
when user wants to set them, in context switch routine, we only need to
write them into registers, we never have to read them out from registers
when thread is switched away. Since rdmsr is a serialization instruction,
micro benchmark shows it is worthy to do.

Reviewed by: peter, jhb


151418 17-Oct-2005 jkim

Split displaying number of physical and logical cores.


151375 16-Oct-2005 obrien

For AMD processors, nullify CPUID.HTT. FreeBSD has no need for the
information it conveys, and it is only confusing people.
This fixes incorrect output in the previous commit.


151348 14-Oct-2005 jkim

- Print number of physical/logical cores and more CPUID info.
- Add newer CPUID definitions for future use.

Many thanks to Mike Tancsa <mike at sentex dot net> for providing test
cases for Intel Pentium D and AMD Athlon 64 X2.

Approved by: anholt (mentor)


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


150789 01-Oct-2005 glebius

Big polling(4) cleanup.

o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.

o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.

Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.

Reviewed by: ru, sam, jhb


150663 28-Sep-2005 rwatson

Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by: bde


150647 27-Sep-2005 peter

Kill pcb_rflags. It served no purpose.

Reported by: bde


150640 27-Sep-2005 peter

Fix a minor nit that has been bugging me for a while. Fix the obvious
cases of using a 64 bit operation to zero a register. 32 bit opcodes are
smaller and supposedly faster, and clear the upper 32 bits for free.


150639 27-Sep-2005 peter

Add a bare minimum (but wrong) R_X86_64_JMP_SLOT relocation type for
kernel modules. We actually need to include any addends and the symbol
offset value, but for gcc/binutils didn't set it anywhere I've found on
'cc -fpic -shared' kernel modules.


150638 27-Sep-2005 peter

Don't report Maxmem as 'real memory'. It is really the highest address
available and can give the wrong impression when there are memory holes.
Report the total amount of usable memory that we detected instead of the
highest address.


150637 27-Sep-2005 peter

MFi386: If we take a trap with interrupts disabled while in a critical
section, don't enable them if we're servicing an NMI.


150635 27-Sep-2005 peter

Don't let the upper bits of %dr6/%dr7 get set.

Submitted by: Nate Eldredge <neldredge@math.ucsd.edu>


150546 25-Sep-2005 phk

__RMAN_RESOURCE_VISIBLE is not actually needed.


150335 19-Sep-2005 rwatson

Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after: 1 week


150266 18-Sep-2005 imp

MFi386: pci attribute allocation fixes.


149925 10-Sep-2005 marcel

Move the prototypes of db_md_set_watchpoint(), db_md_clr_watchpoint()
and db_md_list_watchpoints() to ddb/ddb.h.


149786 04-Sep-2005 alc

Eliminate unnecessary TLB invalidations by pmap_enter(). Specifically,
eliminate TLB invalidations when permissions are relaxed, such as when a
read-only mapping is changed to a read/write mapping. Additionally,
eliminate TLB invalidations when bits that are ignored by the hardware,
such as PG_W ("wired mapping"), are changed.

Reviewed by: tegge


149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


149526 27-Aug-2005 jkoshy

- Special-case NMI handling on the AMD64.

On entry or exit from the kernel the 'alltraps' and 'doreti' code
used taken by normal traps disables interrupts to protect the
critical sections where it is setting up %gs.

This protection is insufficient in the presence of NMIs since NMIs
can be taken even when the processor has disabled normal interrupts.
Thus the NMI handler needs to actually read MSR_GBASE on entry to
the kernel to determine whether a swap of %gs using 'swapgs' is
needed. However, reads of MSRs are expensive and integrating this
check into the 'alltraps'/'doreti' path would penalize normal
interrupts.

- Teach DDB about the 'nmi_calltrap' symbol.

Reviewed by: bde, peter (older versions of this change)


149484 26-Aug-2005 alc

Remedy the following three problems:

1. The amd64 pmap, unlike the i386 pmap, maintains a reference count
for each page directory (PD) page. However, in the transformation
of the i386 pmap into the amd64 pmap, operations, such as
pmap_copy() and pmap_object_init_pt(), that create 2MB "superpage"
mappings by setting the PG_PS bit in a PD entry were not modified
to adjust the underlying PD page's reference count. Consequently,
superpage mappings could disappear prematurely.

2. pmap_object_init_pt() could crash or corrupt memory if either the
virtual address range being mapped crosses a 1GB boundary in the
virtual address space or nothing is mapped in the 1GB area.

3. When pmap_allocpte() destroys a 2MB "superpage" mapping it does not
reduce the pmap's resident count accordingly. It should. (This
bug is inherited from i386.)

Discussed with: peter
Reviewed by: tegge


149475 25-Aug-2005 ups

NMI handler should not enable interrupts.

Tested by: kris@
MFC after: 3 weeks


149377 22-Aug-2005 alc

Pass the PDE from pmap_remove() to pmap_remove_page() so that the latter
procedure doesn't have to recompute it.


149364 22-Aug-2005 alc

Change pmap_extract() and pmap_extract_and_hold() to use PG_FRAME rather
than ~PDRMASK to extract the physical address of a superpage from a PDE.
The use of ~PDRMASK is problematic if the PDE has PG_NX set. Specifically,
the PG_NX bit will be included in the physical address if ~PDRMASK is used.

Reviewed by: peter


149341 20-Aug-2005 alc

Introduce pmap_pml4e_to_pdpe() and pmap_pdpe_to_pde() and use them to avoid
recomputation of the pml4e and pdpe in pmap_copy(), pmap_protect(), and
pmap_remove().


149300 19-Aug-2005 pjd

Avoid code duplication and implement bitcount32() function in systm.h only.

Reviewed by: cperciva
MFC after: 3 days


149272 19-Aug-2005 alc

Correct a performance bug in revision 1.462. The effect of the bug is to
execute the outer loop in procedures such as pmap_protect() many more times
than necessary.

Reviewed by: tegge


149058 14-Aug-2005 alc

Simplify the page table page reference counting by pmap_enter()'s change of
mapping case.

Eliminate a stale comment from pmap_enter().

Reviewed by: tegge


148976 11-Aug-2005 alc

Eliminate unneeded diagnostic code.

Eliminate an unused #include. (Kernel stack allocation and deallocation
long ago migrated to the machine-independent code.)


148971 11-Aug-2005 alc

Eliminate unneeded diagnostic code.

Reviewed by: tegge


148952 11-Aug-2005 alc

Decouple the unrefing of a page table page from the removal of a pv entry.
In other words, change pmap_remove_entry() such that it no longer unrefs
the page table page. Now, it only removes the pv entry.

Reviewed by: tegge


148835 07-Aug-2005 alc

When support for 2MB/4MB pages was added in revision 1.148 an error was
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.

The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.


148667 03-Aug-2005 jeff

- Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
Concept code from: Neal Fachan <neal@isilon.com>


148664 03-Aug-2005 jeff

- Improve the definition of INKERNEL() to include the DMAP area and the
proper start of the kernel area.

Discussed with: peter


148262 21-Jul-2005 peter

Actually create the double fault stack page for AP cpus so that we have a
chance of getting a working double fault instead of converting it to an
instant triple fault reset.


148231 21-Jul-2005 phk

Make the facility for recognizing BIOS-signatures more general
and return a printable representation.

This fixes recognition of the PC Engines WRAP and improves the
recognition of the Soekris boards (Bios version can now be
seen in the dmesg output for instance).

Also, add watchdog support for PCM-582x platforms.

Submitted by: Adrian Steinmann <ast@marabu.ch>
Slightly changed by: phk
PR: 81360


147889 10-Jul-2005 davidxu

Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)


147740 02-Jul-2005 marcel

Fix a buglet that was present in the ia64 code and that got inherited
by amd64 and i386: For buffered writes we collect data and write it
out a ${DEV_BSIZE}-sized block at a time. The fragsz variable is used
to keep track of how much data we have collected in the buffer so far
and it's reset to zero immediately after writing a block to the dump
device.
When the last, possibly partially filled buffer is flushed, we didn't
reset fragsz to 0 and as such would stop reflecting reality. Since we
currently only need to do buffered writes once, this isn't a problem.
However, when kernel dumps are made by hand (say by callling doadump
from within DDB), the improperly cleared state from the first call to
dumpsys causes the next call to dumpsys to create an invalid code file.
This change resets fragsz after flushing the partially filled buffer so
that it fixes the two problems at once.

Approved by: re (scottl)


147687 30-Jun-2005 peter

Sync i386->amd64.
* Add ichwd (The Intel EM64T folks have an ICH)
* Cosmetic comment syncs
* Merge cpufreq change over to NOTES
* add pbio (it compiles, but isn't useful since no boxes have ISA slots)
* copy ath settings (note: wlan disabled here since its in global NOTES)
* copy profiling, including fixing a previous i386->amd64 merge typo.

Approved by: re (blanket i386 <-> amd64 sync/convergence)


147677 30-Jun-2005 peter

Add a special-case handler for general protection faults. It appears to
be possible to get the swapgs state reversed if doreti traps during
the iretq. Attempt to handle this. load_gs() might need special
handling too. Running the kernel with the user's TLS and the
kernel's PCPU space interchanged would be bad(TM).

Discovered as a result of a conversation with: bde
Approved by: re


147674 29-Jun-2005 peter

Move the KDB_STOP_NMI option from opt_global.h to opt_kdb.h

Approved by: re


147671 29-Jun-2005 peter

Switch AMD64 and i386 platforms to using ELF as their kernel crash
dump format. The key reason to do this is so that we can dump sparse
address space. For example, we need to be able to skip the PCI hole
just below the 4GB boundary. Trying to destructively dump MMIO device
registers is Really Bad(TM). The frequent result of trying to do a
crash dump on a machine with 4GB or more ram was ugly (lockup or reboot).

This code has been taken directly from the IA64 dump_machdep.c code,
with just a few (mostly minor) mods.

Introduce a dump_avail[] array in the machdep.c code so that we have a
source of truth for what memory is present in a machine that needs to be
dumped. We can't use phys_avail[] because all sorts of things slice
memory out of it that we really need to dump. eg: the vm page array
and the dmesg buffer. dump_avail[] is pretty much an unmolested version
of phys_avail[]. It does have Maxmem correction.

Bump the i386 and amd64 dump format to version 2, but nothing actually
uses this. amd64 was actually using the i386 dump version number.

libkvm support to follow.

Approved by: re


147604 25-Jun-2005 ups

Disable the interrupts in trap_fatal before calling kdb_trap.
(required now that critical sections no longer block interrupts)

Reviewed by: jhb@
Approved by: re (scottl)
Tested by: kris@,glebius@


147569 24-Jun-2005 peter

Various trivial comment fixes

Approved by: re


147568 24-Jun-2005 peter

Eliminate a source of 'trap xx with interrupts disabled'. I was jumping to
the wrong backend code and neglecting to re-enable interrupts after the
stack prep.

Approved by: re


147566 24-Jun-2005 peter

MFi386: 1.258: Minor cleanups

Approved by: re (blanket i386<->amd64 sync)


147565 24-Jun-2005 peter

Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by: re


147217 10-Jun-2005 alc

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


147181 09-Jun-2005 ups

Add IPI support for preempting a thread on another CPU.

MFC after: 3 weeks


146799 30-May-2005 jkoshy

Kernel hooks to support PMC sampling modes.

Reviewed by: alc


146794 29-May-2005 marcel

Create nexus in configure_first() instead of in configure(). This
makes sure that sysinit tasks that run after configure_first(),
but before configure() have a nexus to hang devices off.


146767 29-May-2005 schweikh

Chop a '>' in a feature name (RSVD2>) that snuck in;
this now balances the <> flags displayed at boot, e.g. without this
Features2=0x41d<SSE3,RSVD2>,MON,DS_CPL,CNTX-ID>

MFC after: 1 week


146551 23-May-2005 obrien

Sync the style of these two files.


146507 22-May-2005 peter

Fix some of the problems Bruce observed with this code.


146492 22-May-2005 peter

MFi386: set PMC vector


146461 21-May-2005 peter

For non-profiling kernels, there were two symbols assigned to the same
address. One was alltraps_with_regs_pushed, the other was calltrap.

When the stack tracer walks up, it looks for magic symbol names to
determine how to parse non-standard stack frames, such as a trapframe.
It was looking for "calltrap". Which of the two symbols you got depended
on things like Phase of moon, etc. If you were unlucky, you got a
garbage stack trace for things like 'debug.trace_on_panic', which would
completely hide the actual source of the problem.


146457 20-May-2005 obrien

Adjust the start_ap delay to match i386.


146456 20-May-2005 obrien

Fix mismerge in rev 1.226: wait 5 seconds as the comment documents,
not .5 seconds.


146211 14-May-2005 nyan

- Move timerreg.h to <arch>/include and split i8253 specific defines into
i8253reg.h, and add some defines to control a speaker.
- Move PPI related defines from i386/isa/spkr.c into ppireg.h and use them.
- Move IO_{PPI,TIMER} defines into ppireg.h and timerreg.h respectively.
- Use isa/isareg.h rather than <arch>/isa/isa.h.

Tested on: i386, pc98


146172 13-May-2005 nectar

Default hyperthreading on in -CURRENT. No seatbelts in CURRENT (^_^)

Requested by: peter, jhb


146170 13-May-2005 nectar

Add a knob for disabling/enabling HTT, "machdep.hyperthreading_allowed".
Default off due to information disclosure on multi-user systems.

Submitted by: cperciva
Reviewed by: jhb


145911 05-May-2005 peter

Remove unused (besides being initialized) variable.


145889 04-May-2005 davidxu

Turn on PCB_FULLCTX in set_regs to fully restore context
set by debugger.


145727 30-Apr-2005 dwhite

Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by: ups


145433 23-Apr-2005 davidxu

Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.


145343 20-Apr-2005 ps

Don't enter the debugger if KDB_UNATTENDED is set or if
debug.debugger_on_panic=0.

MFC after: 2 weeks


145122 15-Apr-2005 peter

MFi386: use the lapic timer for UP systems that are using the apic so that
IRQ0 and mixed mode isn't a problem anymore. This removes mixed mode
support because nothing is left that uses it.


145077 14-Apr-2005 peter

Implement 32-bit compatable fsbase/gsbase methods so that we can run
(newer) unmodified static i386 binaries again.


145055 14-Apr-2005 jhb

Always use the local APIC timer, even on UP machines.


144971 12-Apr-2005 jhb

Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.


144868 10-Apr-2005 alc

Eliminate a conditional branch and as a side-effect eliminate a branch to
a return instruction. (The latter is discouraged by the Opteron
optimization manual because it disables branch prediction for the return
instruction.)

Reviewed by: bde


144696 06-Apr-2005 cperciva

Fully initialize the required TSS fields so that the io permission
bitmap is set correctly.

Patch from: peter
Security: FreeBSD-SA-05:03.amd64


144637 04-Apr-2005 jhb

Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more


144354 30-Mar-2005 peter

Checkpoint today's tidy-up of the WIP disassembler. It now agrees with
objdump --disassemble when disassembling itself in userland. I've added
the cmovCC instruction group and tweaked a bunch of size sensitive array
indexes to either fix my mistakes and/or force it to work by any means
necessary.

I'm committing this because it is usable enough to see what is going on
when single stepping via ddb.

It might still tell lies, but its lies will be far more subtle now. I'm
not sure that this is a good thing or not.


144353 30-Mar-2005 peter

Commit my checkpoint of db_disasm.c that I hacked to understand some amd64
instructions as it was when I dropped it back in May 31, 2003. I'm
committing this as an intermediate stage because back then I thought I
understood what I was doing with this file.


143801 18-Mar-2005 phk

s/SLIST/STAILQ/
/imp/a\
pointy hat
.


143450 12-Mar-2005 scottl

MFCi386: Prevent integer underflow that could result in all memory being
consumed.


143434 11-Mar-2005 peter

Remove diffs to i386 version that came in via the compiler support ifdefs.
This changes things like whitespace, inconsistent use of #ifndef vs
#if !defined(), different macro argument orders, mismatched comments, etc.


143433 11-Mar-2005 peter

MFi386: reduce apic clock interrupt rate


143294 08-Mar-2005 mux

Fixup KTR traces.


143284 08-Mar-2005 mux

Use __func__ in the KTR_BUSDMA traces. This avoids copy and paste
errors like in the bus_dmamap_load_mbuf_sg() case where we were wrongly
displaying the function name as bus_dmamap_load_mbuf.


143202 07-Mar-2005 scottl

Remove dead code.


143162 05-Mar-2005 des

MFi386: use TUNABLE_ULONG_FETCH to retrieve hw.physmem.


143159 05-Mar-2005 des

Replace goto with continue.


143063 02-Mar-2005 joerg

netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.

Submitted by: netchild
Reviewed by: various developers on arch@, some time ago


142866 01-Mar-2005 obrien

Catch up with the "physical memory" sysctl change.
(MFi386: rev 1.608)


142839 28-Feb-2005 peter

MFi386: Bring over John's local apic timer code


142765 28-Feb-2005 pjd

Typo.


142722 27-Feb-2005 obrien

MFi386: rev 1.3:
- Add debug.watchdog tunable, so we can specify watchdog CPU from loader
which will help to debug hangs on boot.
- Remove 'U' from debug.watchdog sysctl definition, so if we set it to '-1'
it really shows '-1'.
- Fix comment.


142261 22-Feb-2005 jhb

MFi386: r1.17: Treat pin 0 as IRQ 0 rather than ExtINT if mixed mode is not
enabled by the enumerator.


141944 15-Feb-2005 njl

MFi386 rev 1.61: Fix a few bugs in the legacy cpu attachment ivars.


141380 06-Feb-2005 njl

Staticize the legacy cpu devclasses and revert the name for the acpi_cpu
devclass. As pointed out by dfr@, devclasses don't have to share the same
linkage if multiple drivers have the same name. Newbus should match the
devclasses based on name and allocate non-conflicting unit numbers.


141378 06-Feb-2005 njl

Finish the job of sorting all includes and fix the build by including
malloc.h before proc.h on sparc64. Noticed by das@

Compiled on: alpha, amd64, i386, pc98, sparc64


141374 05-Feb-2005 njl

Make cpu_est_clockrate() more accurate by disabling interrupts for the
millisecond it is calibrating. Suggested by jhb@ and bde@. Don't clobber
the tsc_freq with the new value since it isn't accurate enough for
timecounters and the timecounter system as a whole needs support for
changing rates before we do this. Subtract 0.5% from our measurement
to account for overhead in DELAY. Note that this interface is for
estimating the clockrate and needs to work well at runtime so doing a full
calibration including disabling interrupts for a second is not feasible.


141367 05-Feb-2005 alc

Implement proper handling of PG_G mappings in pmap_protect(). (I don't
believe that this omission mattered before the introduction of MemGuard.)

Reviewed by: tegge@
MFC after: 1 week


141244 04-Feb-2005 njl

MFi386: Merge updates to the cpu pseudo-driver. Compile, not runtime
tested.


141237 04-Feb-2005 njl

Add an implementation of cpu_est_clockrate(9). This function estimates the
current clock frequency for the given CPU id in units of Hz.


140555 21-Jan-2005 peter

JumboMFi386: use bitmapped IPI handler. Update elcr and default mptable
config handler. Tidy up various local apic initialization.


140554 21-Jan-2005 peter

MFi386: handle PSL_T properly across fork. Typo fix.


140553 21-Jan-2005 peter

MFi386: whitespace, copyright header, etc updates


140552 21-Jan-2005 peter

MFi386: use %rip - 1 for the symbol search address (for noreturn funcs)


140257 14-Jan-2005 jhb

Remove redundant code to drop per-thread debug register state from
cpu_exit() as this is already performed in cpu_thread_exit() and the
debug state is per-thread rather than per-process.


140032 11-Jan-2005 imp

There are no PC98 amd64 machines, so gc a few stray ifdefs.


139840 07-Jan-2005 scottl

Introduce bus_dmamap_load_mbuf_sg(). Instead of taking a callback arg, this
cuts to the chase and fills in a provided s/g list. This is meant to optimize
out the cost of the callback since the callback doesn't serve much purpose for
mbufs since mbuf loads will never be deferred. This is just for amd64 and
i386 at the moment, other arches will be coming shortly.


139731 05-Jan-2005 imp

Begin all license/copyright comments with /*-


139345 27-Dec-2004 njl

MFi386: Restore cpu_reset proxy code to enable reset from ddb on an AP.


139344 27-Dec-2004 njl

Reduce diffs to i386.


139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


139143 21-Dec-2004 alc

Use vtopde() instead of pmap_pde() in pmap_kextract(); vtopde() is smaller
and faster in cases, such as pmap_kextract(), where the pde is known to
exist.


138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


138500 06-Dec-2004 peter

MFi386: rev 1.12: re-allow fast interrupts to cause preemption


138375 04-Dec-2004 alc

Replace (inlined) pmap_pte() calls with smaller, faster code where
possible, such as the inner loop of pmap_copy().

Remove two comments that apply to i386 but not amd64.


138304 02-Dec-2004 alc

For efficiency eliminate the call to pmap_pte() from pmap_protect()'s and
pmap_remove()'s inner loop. Instead, call pmap_pde_to_pte(), a new
function, prior to the inner loop.

Reviewed by: peter@, tegge@


138253 01-Dec-2004 marcel

Change gdb_cpu_setreg() to not take the value to which to set the
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.

This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.

Tested on: alpha, amd64, i386, ia64, sparc64


138237 30-Nov-2004 peter

Remove unused cnt variable for the SMP case. Trim some excessive blank
lines while here.


138212 30-Nov-2004 peter

Update the gdb register extraction support to use the pcb wherever
possible, like on i386. Registers are handled differently for caller
vs callee saved registers.


138208 29-Nov-2004 peter

MFi386: join the %cr0 setup line now that i386 has lost the I386 ifdefs.


138207 29-Nov-2004 peter

Take advantage of the shutdown processing being wired to the BSP and
eliminate the evil cpu_reset_proxy code now that it will never be
activated. i386 should pick this up as well.


138194 29-Nov-2004 scottl

Don't flag alignment constraints as a reason for bouncing. This fixes the
trigger for other misbehaviour in the sym driver that was causing freezes at
boot. Thanks to phk@ for reporting and testing this.


138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


137966 21-Nov-2004 scottl

Remove an extra #include


137964 21-Nov-2004 scottl

Consolidate all of the bounce tests into the BUS_DMA_COULD_BOUNCE flag.
Allocate the bounce zone at either tag creation or map creation to help
avoid null-pointer derefs later on. Track total pages per zone so that
each zone can get a minimum allocation at tag creation time instead of
being defeated by mis-behaving tags that suck up the max amount.


137917 20-Nov-2004 das

Remove references to U area and garbage collect includes.

Reviewed by: arch@


137912 20-Nov-2004 das

U areas are going away, so don't allocate one for process 0.

Reviewed by: arch@


137893 19-Nov-2004 scottl

Revert part of rev 1.56. Tag boundaries are handled by splitting segments,
not through bouncing.


137499 10-Nov-2004 scottl

MFi386 rev 1.63-1.64:
Use tag-specific pools of bounce pages instead of a single global pool.


137262 05-Nov-2004 peter

MFi386 1.238 (jhb): Allow hints to disable cpus


137261 05-Nov-2004 peter

MFi386:
rev 1.61 (scottl): Add KTR tracing
rev 1.62 (scottl): Optimize (td->pmap, inlines, etc)


137165 03-Nov-2004 scottl

Don't use atomic ops to increment interrupt stats. This was only done on
amd64 and i386 anyways. The stats are only kept for informational purposes.


137117 01-Nov-2004 jhb

- Change the ddb paging "support" to use a variable (db_lines_per_page) to
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.

MFC after: 1 month
Inspired by: kris


137012 28-Oct-2004 simokawa

MFi386: preserve dcons buffer passed by loader.


136521 14-Oct-2004 njl

Print flags in the nexus for child devices.


136252 08-Oct-2004 alc

Make pte_load_store() an atomic operation in all cases, not just i386 PAE.

Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors. (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)

Reviewed by: tegge@
PR: i386/61852


136101 03-Oct-2004 alc

Undo revision 1.251. This change was a performance pessimizing work-around
that is no longer required. (In fact, it is not clear that it was ever
required in HEAD or RELENG_4, only RELENG_3 required a work-around.) Now,
as before revision 1.251, if the preexisting PTE is invalid, pmap_enter()
does not call pmap_invalidate_page() to update the TLB(s).

Note: Even with this change, the handling of a copy-on-write fault is
inefficient, in such cases pmap_enter() calls pmap_invalidate_page() twice.

Discussed with: bde@
PR: kern/16568


136070 03-Oct-2004 alc

The physical address stored in the vm_page is page aligned. There is no
need to mask off the page offset bits. (This operation made some sense
prior to i386/i386/pmap.c revision 1.254 when we passed a physical address
rather than a vm_page pointer to pmap_enter().)


136050 02-Oct-2004 alc

Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These
uses predate the change in the pmap_enter() interface that replaced the
page's physical address by the address of its vm_page structure. The
PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page
structure that was being passed in.


136049 02-Oct-2004 alc

Remove an unused declaration. (I should have included this change in
revision 1.486.)


135939 29-Sep-2004 alc

Prevent the unexpected deallocation of a page table page while performing
pmap_copy(). This entails additional locking in pmap_copy() and the
addition of a "flags" parameter to the page table page allocator for
specifying whether it may sleep when memory is unavailable. (Already,
pmap_copy() checks the availability of memory, aborting if it is scarce.
In theory, another CPU could, however, allocate memory between
pmap_copy()'s check and the call to the page table page allocator,
causing the current thread to release its locks and sleep. This change
makes this scenario impossible.)

Reviewed by: tegge@


135914 29-Sep-2004 peter

MFi386: rev 1.239 - invalidate tlb after pte update


135913 29-Sep-2004 peter

MFi386: rev 1.236 - improve panic message for a busted mptable


135691 24-Sep-2004 peter

Like on i386, use the definition of struct bios_smap from machine/pc/bios.h
again.


135690 24-Sep-2004 peter

Converge towards i386. I originally resisted creating <machine/pc/bios.h>
because it was mostly irrelevant - except for the silly BIOS_PADDRTOVADDR
etc macros. Along the way of working around this, I missed a few things.

* Make syscons properly inherit the bios capslock/shiftlock/etc state like
i386 does. Note that we cannot inherit the bios key repeat rate because
that requires a bios call (which is impossible for us).
* Give syscons the ability to beep on amd64. Oops.

While here, make bios.c compile and add it to files.amd64.


135689 24-Sep-2004 peter

Severely strip down the repocopied i386/bios.c and bios.h files. It turns
out that bios_sigsearch() etc is useful for finding tables in roms.


135565 22-Sep-2004 alc

Correct a long-standing error in _pmap_unwire_pte_hold() affecting
multiprocessors. Specifically, the error is conditioning the call to
pmap_invalidate_page() on whether the pmap is active on the current CPU.
This call must be unconditional. Regardless of whether the pmap is active
on the CPU performing _pmap_unwire_pte_hold(), it could be active on another
CPU. For example, a call to pmap_remove_all() by the page daemon could
result in a call to _pmap_unwire_pte_hold() with the pmap inactive on the
current CPU and active on another CPU. In such circumstances, failing to
call pmap_invalidate_page() results in a stale TLB entry on the other CPU
that still maps the now deallocated page table page. What happens next is
typically a mysterious panic in pmap_enter() by the other CPU, either
"pmap_enter: attempted pmap_enter on 4MB page" or "pmap_enter: pte vanished,
va: 0x%lx". Both occur because the former page table page has been recycled
and allocated to a new purpose. Consequently, it no longer contains zeroes.

See also Peter's i386/i386/pmap.c revision 1.448 and the related e-mail
thread last year.

Many thanks to the engineers at Sandvine for providing clear and concise
information until all of the pieces of the puzzle fell into place and
for testing an earlier patch.

MT5 Candidate


135529 20-Sep-2004 jhb

- Add support for "paging" in stack trace output. That is, when you do
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.

MFC after: 1 month
Tested on: i386, alpha, sparc64


135479 19-Sep-2004 alc

Simplify the reference counting of page table pages. Specifically, use
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:

1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.

2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.

3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.

Reviewed by: tegge@


135452 19-Sep-2004 alc

Remove an outdated assertion from _pmap_allocpte(). (When vm_page_alloc()
succeeds, the page's queue field is unconditionally set to PQ_NONE by
vm_pageq_remove_nowakeup().)


135443 18-Sep-2004 alc

Release the page queues lock earlier in pmap_protect() and pmap_remove() in
order to reduce contention.


135123 12-Sep-2004 alc

Use an atomic op to update the pte in pmap_protect(). This is to prevent
the loss of a page modified (PG_M) bit in a race between processors.

Quoting Tor:
One scenario where the old code could cause a lost PG_M bit is a
multithreaded linux program (or FreeBSD program using the
linuxthreads port) where one thread was starting a subprocess.
The thread doing fork() would call vmspace_fork(), which would then
call vm_map_copy_entry() which would call pmap_protect() on an area
possibly accessed by other threads.

Additionally, make the clearing of PG_M by pmap_protect() unconditional if
write permission is removed. Previously, PG_M could persist on a read-only
unmanaged page. That seems inconsistent and confusing.

In collaboration with: tegge@

MT5 candidate
PR: 61852


134960 08-Sep-2004 alc

Use atomic ops in pmap_clear_ptes() to prevent SMP races that could
result in the loss of an accessed or modified bit from the pte.

In collaboration with: tegge@

MT5 candidate


134934 08-Sep-2004 scottl

Fix a problem with tag->boundary inheritence that has existed since day one
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.

This is a MT5 candidate.

Reviewed by: marcel
Submitted by: sos (in part)


134791 05-Sep-2004 julian

Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week


134591 01-Sep-2004 julian

Give the 4bsd scheduler the ability to wake up idle processors
when there is new work to be done.

MFC after: 5 days


134571 31-Aug-2004 julian

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


134568 31-Aug-2004 julian

Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after: 2 days


134553 30-Aug-2004 peter

Add the mp_watchdog hooks, although it locks up my SMP test box. It might
be useable to somebody.


134509 30-Aug-2004 alc

Remove unnecessary check for curthread == NULL.


134416 28-Aug-2004 obrien

s/smp_rv_mtx/smp_ipi_mtx/g

Requested by: jhb


134393 27-Aug-2004 alc

The machine-independent parts of the virtual memory system always pass a
valid pmap to the pmap functions that require one. Remove the checks for
NULL. (These checks have their origins in the Mach pmap.c that was
integrated into BSD. None of the new code written specifically for
FreeBSD included them.)


134263 24-Aug-2004 njl

Catch up with i386 nexus.c rev 1.59: add bus_get_resource_list().


134233 24-Aug-2004 peter

It is now an error to call pmap_unuse_pt without the paddr of the pde
that contained the pte.


134232 24-Aug-2004 peter

Oops, I forgot to have the idle loop call mp_grab_cpu_hlt() on the amd64
SMP case.


134227 23-Aug-2004 peter

Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock.
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery. For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.

This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery. Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.

Obtained from: dwhite, alc
Reviewed by: jhb


133907 16-Aug-2004 peter

Sync with i386 - Optimize intr_execute_handlers a bit etc.


133906 16-Aug-2004 peter

Sync with i386 - remove unused includes


133903 16-Aug-2004 peter

Sync with i386 - add foot shooting protection for the DDB/KDB thing.


133902 16-Aug-2004 peter

Sync with i386 - set rbp reg to 0 for upcalls as a frame marker, not that
it is guaranteed to be used in userland though.


133901 16-Aug-2004 peter

Sync with i386 - trace syscall entry/exit times, and a cosmetic fix.


133899 16-Aug-2004 peter

Sync with i386 - fix bounds check in lapic_create()


133898 16-Aug-2004 peter

Sync with i386 - pass resource requests up to parent


133897 16-Aug-2004 peter

Sync with i386 - s/cpu_swtch/cpu_switch/


133896 16-Aug-2004 peter

Sync with i386 - dont count needed bounce pages if loading a buffer that
was created with bud_dmamem_alloc()


133854 16-Aug-2004 obrien

Complete 'IA32' -> 'COMPAT_IA32' change for the Linuxulator32.


133770 15-Aug-2004 rwatson

Preemptive anti-footshooting: cause a #error if MP_WATCHDOG is compiled
with SCHED_ULE.


133759 15-Aug-2004 rwatson

Add an "options MP_WATCHDOG" to i386. This option allows one of the
logical CPUs on a system to be used as a dedicated watchdog to cause a
drop to the debugger and/or generate an NMI to the boot processor if
the kernel ceases to respond. A sysctl enables the watchdog running
out of the processor's idle thread; a callout is launched to reset a
timer in the watchdog. If the callout fails to reset the timer for ten
seconds, the watchdog will fire. The sysctl allows you to select which
CPU will run the watchdog.

A sample "debug.leak_schedlock" is included, which causes a sysctl to
spin holding sched_lock in order to trigger the watchdog. On my Xeons,
the watchdog is able to detect this failure mode and break into the
debugger, which cannot otherwise be done without an NMI button.

This option does not currently work with sched_ule due to ule's push
notion of scheduling, similar to machdep.hlt_logical_cpus failing to
work with that scheduler.

On face value, this might seem somewhat inefficient, but there are a
lot of dual-processor Xeons with HTT around, so using one as a watchdog
for testing is not as inefficient as one might fear.


133672 13-Aug-2004 ambrisko

Fix the memory scaling bug when basemem was converted to Kbytes from
bytes for AMD64. Otherwise the AP will be started at 640K which
won't work. Bug found on a Xeon 64bit system.


133529 11-Aug-2004 davidxu

Mark end of frames.


133464 11-Aug-2004 marcel

Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc


133431 10-Aug-2004 davidxu

As AMD64 architecture volume 1 chapter 3.1.2 says, high 32 bits of %rflags
are resevered, they can be written with anything, but they always read
as zero, we should simulate it in set_regs() as we are reading/writting
real hardware %rflags register.


133413 09-Aug-2004 davidxu

In syscall, always make a copy of parameters from trapframe, this
becauses some syscalls using set_mcontext can sneakily change
parameters and later when those syscalls references parameters,
they will wrongly use register values in mcontext_t.

Approved by: peter


133292 08-Aug-2004 alc

With the advent of pmap locking it makes sense for pmap_copy() to be less
forgiving about inconsistencies in the source pmap. Also, remove a new-
line character terminating a nearby panic string.


133255 07-Aug-2004 scottl

Move the definition of M_MEMDESC to a non-optional file. This allows
kernels configurations without the 'mem' device to compile.


133250 07-Aug-2004 alc

Eliminate a variable that became unused in the i386 to amd64 conversion.


133195 06-Aug-2004 markm

MFi386: Fix mem device. Grrr.


133194 06-Aug-2004 markm

MFi386: sort out the mem device. Grrrr.


133129 04-Aug-2004 markm

Fix module builds for i386 and amd64.


133124 04-Aug-2004 alc

Post-locking clean up/simplification, particularly, the elimination of
vm_page_sleep_if_busy() and the page table page's busy flag as a
synchronization mechanism on page table pages.

Also, relocate the inline pmap_unwire_pte_hold() so that it can be used
to shorten _pmap_unwire_pte_hold() on alpha and amd64. This places
pmap_unwire_pte_hold() next to a comment that more accurately describes
it than _pmap_unwire_pte_hold().


133035 02-Aug-2004 markm

Diff reduction WRT i386 version.


132992 02-Aug-2004 obrien

Fix the build by providing 'PHYS_TO_DMAP' and 'M_MEMDESC'.


132956 01-Aug-2004 markm

Break out the MI part of the /dev/[k]mem and /dev/io drivers into
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.


132924 31-Jul-2004 davidxu

Turn on PCB_FULLCTX for set_mcontext, functions like kse_switchin
needs to fully restore asynchronous context which did not come
from fast syscall.


132919 31-Jul-2004 alc

Add pmap locking to pmap_object_init_pt().


132852 29-Jul-2004 alc

Advance the state of pmap locking on alpha, amd64, and i386.

- Enable recursion on the page queues lock. This allows calls to
vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
queues lock held. Such calls are made to allocate page table pages
and pv entries.
- The previous change enables a partial reversion of vm/vm_page.c
revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
- Add partial locking to pmap_copy(). (As a side-effect, pmap_copy()
should now be faster on i386 SMP because it no longer generates IPIs
for TLB shootdown on the other processors.)
- Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now,
all changes to a user-level pmap on alpha, amd64, and i386 are performed
with appropriate locking.)


132808 28-Jul-2004 phk

Move a relic to its correct location(s): Put nfs diskless initialization
calls with the code they call. (Yet another example of mindless copy&paste).


132482 21-Jul-2004 marcel

Unify db_stack_trace_cmd(). All it did was look up the thread given
the thread ID and call db_trace_thread().
Since arm has all the logic in db_stack_trace_cmd(), rename the
new DB_COMMAND function to db_stack_trace to avoid conflicts on
arm.
While here, have db_stack_trace parse its own arguments so that
we can use a more natural radix for IDs. If the ID is not a thread
ID, or more precisely when no thread exists with the ID, try if
there's a process with that ID and return the first thread in it.
This makes it easier to print stack traces from the ps output.

requested by: rwatson@
tested on: amd64, i386, ia64


132427 20-Jul-2004 alc

Remove the allpmaps list. It's unused.

Reviewed by: peter@


132408 19-Jul-2004 jhb

As a temporary hack, turn off deferred preemptions that are the result of
a fast interrupt handler doing an swi_sched(). This fixed the lockups I
saw on my laptop when using xmms in KDE and on rwatson's MySQL benchmarks
on SMP. This will eventually be removed and/or modified when I figure out
what the root cause is and fix that.


132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


132141 14-Jul-2004 peter

Like on i386, eliminate pv_ptem (which was suggested by alc). This
reduces the size of the pv_entry structure a small but significant amount.

This is implemented a little differently because it isn't so cheap to get
the physical address of the page tabke page on amd64.. instead of it
being directly accessible from the top level page directory, it is now
two additional tree levels down. However.. In almost all cases, we
recently had the physical address if the page table page a short while
before we needed it, but it slipped through our fingers. This patch
saves it for when we do need it. Also, for the one case where we do not
have the ptp paddr, we are always running in curproc context and so we
can do a vtopte-like trick. I've implemented vtopde() for this purpose.

There is still a CYA entry in pmap_unuse_pt() that needs to be removed. I
think it can be removed now but I forgot to test with it gone.


132088 13-Jul-2004 davidxu

Add ptrace_clear_single_step(), alpha already has it for years, the function
will be used by ptrace to clear a thread's single step state.


132082 13-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_remove_pages(). (The implementation of pmap_remove_pages() is
optional. If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().


131960 11-Jul-2004 marcel

Remove the now unused GDB stubs. See src/sys/gdb/* for the new KDB
backend.


131952 10-Jul-2004 marcel

Mega update for the KDB framework: turn DDB into a KDB backend.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.

The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.

With this change, ia64 has support for breakpoints.


131941 10-Jul-2004 marcel

Update for the KDB framework:
o Make debugging support conditional upon KDB instead of DDB.
o Remove implementation of Debugger().
o Don't make setjump() and longjump() conditional upon DDB.
o s/ddb_on_nmi/kdb_on_nmi/g
o Call kdb_reenter() when kdb_active is non-zero. Call kdb_trap()
otherwise.


131905 10-Jul-2004 marcel

Implement makectx(). The makectx() function is used by KDB to create
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.


131899 10-Jul-2004 marcel

Introduce the GDB debugger backend for the new KDB framework. The
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.


131840 08-Jul-2004 brian

Change the following environment variables to kernel options:

bootp -> BOOTP
bootp.nfsroot -> BOOTP_NFSROOT
bootp.nfsv3 -> BOOTP_NFSV3
bootp.compat -> BOOTP_COMPAT
bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit. It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone


131814 08-Jul-2004 brian

Change the following kernel options to environment variables:

BOOTP -> bootp
BOOTP_NFSROOT -> bootp.nfsroot
BOOTP_NFSV3 -> bootp.nfsv3
BOOTP_COMPAT -> bootp.compat
BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

bootp="YES"
bootp.nfsroot="YES"
bootp.nfsv3="YES"
bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.


131778 08-Jul-2004 peter

MFi386: various io apic cleanups


131777 08-Jul-2004 peter

MFi386: use rman access methods instead of groping around inside
struct resource


131775 08-Jul-2004 peter

MFi386: fix up CR0 settings


131774 08-Jul-2004 peter

MFi386: 1.57: transparently respect alignment/boundary tags


131744 07-Jul-2004 alc

Simplify the control flow in pmap_extract(), enabling the elimination of a
PMAP_UNLOCK() call.


131730 07-Jul-2004 alc

White space and style changes only.


131666 06-Jul-2004 alc

Style changes to pmap_extract().


131481 02-Jul-2004 jhb

Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by: scottl (with his re@ hat)


131359 30-Jun-2004 imp

We need to make resources visible here as well.


130983 23-Jun-2004 jhb

Fetch the actual acpi0 device_t and use device_is_attached() to see if
it's alive rather than trying to fetch its softc pointer via its devclass.

Glanced at by: imp, njl


130958 23-Jun-2004 alc

Implement the protection check required by the pmap_extract_and_hold()
specification. This enables the elimination of Giant from that function.


130814 20-Jun-2004 alc

- Simplify pmap_remove_pages(), eliminating unnecessary indirection.
- Simplify the locking of pmap_is_modified() by converting control flow to
data flow.


130765 20-Jun-2004 alc

Add pmap locking to pmap_is_prefaultable().


130738 19-Jun-2004 alc

Remove unused pt_entry_ts. Remove an unneeded semicolon.


130667 18-Jun-2004 peter

Try harder to give new processes a clean initial fpu state. fpu_cleanstate
wasn't actually clean, it was saving the xmm registers as left over by the
bios. fninit() doesn't clear those.

In fpudna(), instead of doing a fninit() and forgetting to load the initial
mxcsr, do a full fxrstor(&fpu_cleanstate). Otherwise we hand over whatever
random values are left in the xmm registers by the last user.

I'm not certain of whether this is excessive paranoia or not, but there was
an outright bug in neglecting to set the mxcsr value that caused awk to
SIGFPE in some case. Especially for Tim Robbins. :-)

i386 probably should do something about the mxcsr setings too.

Found by: tjr


130641 17-Jun-2004 njl

Revert last change. If acpi is loaded or compiled into the kernel, its
devclass will be present even if the driver was disabled by a hint. Using
device_get_softc() provides the right info even if it's overkill.

Explained by: jhb


130626 17-Jun-2004 alc

Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not
accessible through an object. Thus, PG_BUSY serves no purpose.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130577 16-Jun-2004 alc

Add some lock assertions. Lock a small part of pmap_enter().


130553 16-Jun-2004 alc

Correct an error in the implementation of pmap_is_prefaultable(). When I
introduced this function in revision 1.441, I inverted one of the
comparisons.


130539 15-Jun-2004 alc

Remove a stale comment.


130520 15-Jun-2004 alc

Add pmap locking to pmap_extract(), pmap_mincore(), and pmap_remove().


130510 15-Jun-2004 njl

We only need the devclass_find() result, not the softc.


130444 14-Jun-2004 alc

Introduce pmap locking to many of the pmap functions. There is more to
come later.


130433 13-Jun-2004 alc

Prevent the loss of a PG_M bit through an SMP race in pmap_ts_referenced().


130427 13-Jun-2004 alc

Remove dead or unneeded code, e.g., spl calls.


130386 12-Jun-2004 alc

In a multiprocessor, the PG_W bit in the pte must be changed atomically.
Otherwise, the setting of the PG_M bit by one processor could be lost if
another processor is simultaneously changing the PG_W bit.

Reviewed by: tegge@


130344 11-Jun-2004 phk

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


130313 10-Jun-2004 jhb

- Use the correct devclass name ("acpi" vs "ACPI") to detect if acpi0 is
present and thus that the PnPBIOS probe should be skipped instead of
having ACPI zero out the PnPBIOStable pointer.
- Make the PnPBIOStable pointer static to i386/i386/bios.c now that that is
the only place it is used.


130312 10-Jun-2004 jhb

Remove atdevbase and replace it's remaining uses with direct references to
KERNBASE instead.


130229 08-Jun-2004 peter

In pmap_extract_and_hold(), there is no need to mask off PG_FRAME because
pmap_extract() already does it.
In pmap_enter(), opa has already been masked so don't do it again.
Wrap a long line (recent transgression).
Use trunc_page() in pmap_mapdev() instead of anding with PG_FRAME, since
that is what we really meant.

Submitted by: alc (first item)


130228 08-Jun-2004 peter

Fix my silly typo in asm statement in previous commit.


130227 08-Jun-2004 peter

Argh. Remove stray number that slipped into the previous commit.


130226 08-Jun-2004 peter

Reapply rev 1.151 after enable sse/fpuinit order fixed in mp_machdep.c

Obtained from: das


130225 08-Jun-2004 peter

Set up the fpu *after* enabling SSE mode on AP's

Submitted by: (argh, I can't find the email)


130224 08-Jun-2004 peter

Initial PG_NX support (no-execute page bit)
- export the rest of the cpu features (and amd's features).
- turn on EFER_NXE, depending on the NX amd feature bit
- reorg the identcpu stuff a bit in order to stop treating the
amd features as second class features (since it is now a primary feature
bit set) and make it easier to export.


130223 08-Jun-2004 peter

Mask pte's with PG_FRAME before passing it to PHYS_TO_VM_PAGE().. PG_NX
lives in the top 12 'available' bits. atop() in the PHYS_TO_VM_PAGE()
macro only masks off the lower bits (by accident) and the upper bits
in the 64 bit ptes turn into "interesting" index values.


130221 08-Jun-2004 peter

Use trunc_page(va) when we mean it rather than anding it with PG_FRAME
(which doesn't work all that well when there are bits at the top that are
masked by PG_FRAME)


130219 07-Jun-2004 peter

Fix a serious problem that manifested during swap, and a few other times.
pmap_remove() would be called with a huge range and we'd stride across
it in only 2MB chunks. This would manifest as massive cpu time and a
largely unresponsive system during hard swap. Instead, check the higher
page directories which means we can run pmap_remove() in just a few
hundred loop iterations instead of millions since we can process
address space in chunks of 512GB and 1GB as well as 2MB.

Eternal thanks to: tmm


130140 06-Jun-2004 das

Back out revision 1.150, since dwmalone reports that it causes a panic
upon startup on his machine.


130105 05-Jun-2004 das

Initialize the MXCSR to the appropriate default value at startup.

Tested on: tjr


130040 03-Jun-2004 phk

Add new bios_string() which will hunt for a string inside a given range
of the BIOS. This can be used for finding arbitrary magic in the BIOS
in order to recognize particular platforms.


130035 03-Jun-2004 peter

MFi386: apic intpin programming updates etc.


130034 03-Jun-2004 peter

MFi386: remove debug printf


130032 03-Jun-2004 peter

MFi386: move cpu_nameclass struct next to its only consumer


130028 03-Jun-2004 tjr

Remove checks for curthread == NULL - it can't happen.


130025 03-Jun-2004 phk

Add missing <sys/module.h> instances which were shadowed by the nested
include in <sys/kernel.h>


130023 03-Jun-2004 tjr

Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by: jhb


129989 02-Jun-2004 tjr

Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by: davidxu


129876 30-May-2004 phk

Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>


129860 30-May-2004 alc

MFi386 revision 1.6
Reenable ithread preemption for interrupts that occur while executing in
the kernel.


129825 29-May-2004 tjr

Implement __bb_init_func. This is a fairly straightforward conversion
of the i386 version.


129817 28-May-2004 alc

Remove a broken micro-optimization from pmap_enter(). The ill effect
of this micro-optimization occurs when we call pmap_enter() to wire an
already mapped page. Because of the micro-optimization, we fail to
mark the PTE as wired. Later, on teardown of the address space,
pmap_remove_pages() destroys the PTE before vm_fault_unwire() has
unwired the page. (pmap_remove_pages() is not supposed to destroy
wired PTEs. They are destroyed by a later call to pmap_remove().)
Thus, the page becomes lost.

Note: The page is not lost if the application called munlock(2), only
if it relies on teardown of the address space to unwire its pages.

For the historically inclined, this bug was introduced by a
megacommit, revision 1.182, roughly six years ago.

Leak observed by: green@ and dillon independently
Patch submitted by: dillon at backplane dot com
Reviewed by: tegge@
MFC after: 1 week


129750 26-May-2004 tmm

Retire cpu_sched_exit(); it is not used any more.


129744 26-May-2004 bde

Quick fix for overflow when tsc_freq >= 2^31. "int profrate" in struct
gmon and struct gmonhdr was originally just to represent the kernel
(profiling) clock frequency and it remains poorly suited to representing
the frequencies of fast counters like the TSC. It broke a year or two
ago. This quick fix keeps it working for another year or month or two
until TSC frequencies can exceed 2^32, by dividing the frequency by 2.
Dividing the frequency by 4 would work for a little longer but would
lose a little too much precision.


129656 24-May-2004 bde

Oops, ".align 4" for the data section in the previous commit should
have been ".p2align 4". This bug is cosmetic since the data section
happens to be empty.


129653 24-May-2004 bde

Fixed profiling of trap, syscall and interrupt handlers and some
ordinary functions, essentially by backing out half of rev.1.115 of
amd64/exception.S. The handlers must be between certain labels for
the purposes of profiling, and this was broken by scattering them in
separately compiled .S files, especially for ordinary functions that
ended up between the labels. Merge the files by #including them as
before, except with different pathnames and better comments and
organization. Changes to the scattered files are minimal -- just
move the labels to the file that does the #includes.

This also partly fixes profiling of IPIs -- all IPI handlers are now
correctly classified as interrupt handlers, but many are still missing
mcount calls.


129625 23-May-2004 bde

Adjusted for amd64 after repo-copy. The adjustments are routine, except:
- perfmon headers must be avoided until perfmon is supported.
- all call-used registers including return registers must be preserved
by .mcount(), etc., not quite as in profile.h. __cyg_profile_func_*()
don't require this, but they are (mis)implemented as aliases for
.mcount(), etc. so they preserve the registers.
- i386 ifdefs related to perfmon have not been adjusted yet.


129623 23-May-2004 bde

Restored FAKE_MCOUNT() and MEXITCOUNT invocations and adjusted them for
amd64 as necessary. This is routine, except:
- the FAKE_MCOUNT($bintr) in doreti was missing the '$'. This gave a
a garbage address made up of padding bytes (with the nop byte 0x90 as
the MSB) instead of the intended address of bintr. This accidentally
worked on i386's because (0x90 << 24) is close enough to bintr, but
it doesn't work on amd64's because (0x90 << 56) is much further away
from bintr.
- the FAKE_MCOUNT($btrap) in calltrap was similarly broken. It hasn't
been needed since FreeBSD-1, so just delete it.


129618 23-May-2004 bde

Adjusted FAKE_MCOUNT()s for amd64. This is needed for both ordinary
and high resolution profiling of interrupt handlers. The adjustments
are routine once the magic stack offset 13*4 is decoded to be TF_RIP
(there were originally more types of stack frames so using TF_EIP for
one of them wouldn't have been much simpler).

Removed garbage comments attached to some of the FAKE_MCOUNT()s.


129612 23-May-2004 bde

Spell "retq" as "ret" in pagezero() like it is everywhere, else so
that the usual macro for "ret" hides the detail of calling .mexitcount
before returning.

Fixed missing call to .mexitcount in lgdt(). This was missing on
i386's, mainly because lgdt() uses lret[q] insted of ret. This is
very unimportant since lgdt() is not (normally?) called until after
profiling is initialized.


129551 21-May-2004 bde

MFi386 (1.103 and 1.104: fixed some problems in high resolution profiling
and improved some comments). Also, made the documented {f,s}uword()
functions the standard entry points and the undocumented {f,s}uword64()
functions alternative entry points, like {f,s}uword32() for i386's. The
bitrot in the comments was a little larger here -- there are new undocumented
32-bit sub-word functions, not just renaming of 16-bit functions from
documented ones to undocumented ones.


129460 19-May-2004 peter

Like on i386, clear the last three entries in the pml4 page when doing a
pmap_release(), and put it the free queue marked as already zeroed.


129412 19-May-2004 peter

Unbreak builds without DDB. Bad Bruce! No cookie! :-)


129373 18-May-2004 bde

Fixed DDB_NOKLDSYM on amd64's:

machdep.c:
Initialize the symbol table pointers, not quite like for other arches.

db_elf.c:
Don't claim to be an i486 in the fake ELF header.


129361 17-May-2004 peter

Deal with REL records that have the addend embedded variable sized targets
rather than the RELA table. I dont know if bintutils will ever generate
REL records, but just in case.....


129309 16-May-2004 peter

Checkpoint some of what I was starting to tinker with for having some
different context support for 32 vs 64 bit processes. This simply omits
the save/restore of the segment selector registers for non 32 bit
processes. This avoids the rdmsr/rwmsr juggling when restoring %gs
clobbers the kernel msr that holds the gsbase.

However, I suspect it might be better to conditionally do this at
user<->kernel transition where we wouldn't need to do the juggling in the
first place. Or have per-thread extended context save/restore hooks.


129305 16-May-2004 peter

Kill the LAZYPMAP ifdefs. While they worked, they didn't do anything
to help the AMD cpus (which have a hardware tlb flush filter). I held
off to see what the 64 bit Intel cpus did, but it doesn't seem to help
much there either. Oh well, store it in the Attic.


129287 16-May-2004 peter

MFi386: avoid partial register references, for what its worth.


129286 16-May-2004 peter

For consistency with i386, have pmap_kenter_temporary() take a vm_paddr_t
argument. It is actually the same type on amd64 (vm_paddr_t = vm_offset_t)
but this reduces the i386<->amd64 diffs a little.


129284 16-May-2004 peter

MFi386: numerous interrupt and acpi updates


129282 16-May-2004 peter

Make a small revision to the api between the elf linker core and the
elf_reloc() backends for two reasons. First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).


128384 18-Apr-2004 alc

Simplify the sf_buf implementation. In short, make it a trivial veneer
over the direct virtual-to-physical mapping.


128297 16-Apr-2004 alc

Set the "global" attribute on the page table entries for the kernel and
direct mappings. This shaves a few seconds off of my buildworld times.

Discussed with: peter@


128100 11-Apr-2004 alc

- is_physical_memory()'s parameter, which is a physical address, should be
a vm_paddr_t not a vm_offset_t.


127974 07-Apr-2004 peter

Update to include both the L1 and L2 TLB stats, as well as the seperate
2M/4M page TLB vs 4K page TLB stats. This also applies to the i386
platform, as does the cpu features fixes.


127973 07-Apr-2004 peter

MFi386: move rss() from db_interface.c to cpufunc.h


127920 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999 and email from Peter Wemm.

Approved by: core, peter


127914 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


127913 05-Apr-2004 imp

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999 and permission from Alan Cox.

Approved by: core, alc@


127869 05-Apr-2004 alc

Remove unused arguments from pmap_init().


127803 03-Apr-2004 alc

Remove ptmmap and ptvmmap. They are unused on amd64.


127788 03-Apr-2004 alc

In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not. Add a new parameter so that the caller can specify which is
the case.

Reported by: dillon


127786 03-Apr-2004 alc

Microoptimize pagezero() based upon something that I learned writing the
optimized pagecopy(). This also has the virtual of making these two
functions more similar in style.


127653 31-Mar-2004 alc

- Add an optimized page copy function for use by pmap_copy_page(). It is
roughly four times faster than bcopy() for uncached pages.
- Sort the function prototypes in md_var.h.


127582 29-Mar-2004 peter

Finish tidying up a couple of leftovers from the KSTACK_PAGES stuff. Some
files still #included the opt_ file. powerpc hadn't been updated yet.


127392 25-Mar-2004 peter

MFi386: correctly calculate the top-of-stack when a kthread is created
with a larger kernel stack.


127391 25-Mar-2004 peter

Run print_AMD_features() for both AuthenticAMD and GenuineIntel cpus.
Report the %ecx bits in cpuid function 1. This is a hack.
When reporting AMD Features, only mask off the common bits. Otherwise
the SEP bit masks off SYSCALL etc in the report.


127241 20-Mar-2004 alc

- Add uiomove_fromphys() implementations to alpha and ia64. These only
differ trivially from amd64.
- Correct a spelling error in a comment.


127236 20-Mar-2004 alc

Introduce uiomove_fromphys(). This is a variant of uiomove() that takes
a collection of physical pages as the source. On amd64 it is implemented
using the direct virtual-to-physical map.


127158 18-Mar-2004 obrien

Document machdep.hlt_cpus.

Submitted by: Craig Rodrigues <rodrigc@crodrigues.org>


127086 16-Mar-2004 alc

Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.


126925 13-Mar-2004 peter

Reduce the scope of the Giant lock being held for non-mpsafe syscalls.
There was way too much code being covered.


126919 13-Mar-2004 scottl

Now that contigfree() does not require Giant, don't grab it in busdma.


126891 12-Mar-2004 trhodes

These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
- in_cksum.[ch]:
* use a generic C version instead of the assembly version in the !gcc
case (ASM code breaks with the optimizations icc does)
-> no bad checksums with an icc compiled kernel
Help from: andre, grehan, das
Stolen from: alpha version via ppc version
The entire checksum code should IMHO be replaced with the DragonFly
version (because it isn't guaranteed future revisions of gcc will
include similar optimizations) as in:
---snip---
Revision Changes Path
1.12 +1 -0 src/sys/conf/files.i386
1.4 +142 -558 src/sys/i386/i386/in_cksum.c
1.5 +33 -69 src/sys/i386/include/in_cksum.h
1.5 +2 -0 src/sys/netinet/igmp.c
1.6 +0 -1 src/sys/netinet/in.h
1.6 +2 -0 src/sys/netinet/ip_icmp.c

1.4 +3 -4 src/contrib/ipfilter/ip_compat.h
1.3 +1 -2 src/sbin/natd/icmp.c
1.4 +0 -1 src/sbin/natd/natd.c
1.48 +1 -0 src/sys/conf/files
1.2 +0 -1 src/sys/conf/files.amd64
1.13 +0 -1 src/sys/conf/files.i386
1.5 +0 -1 src/sys/conf/files.pc98
1.7 +1 -1 src/sys/contrib/ipfilter/netinet/fil.c
1.10 +2 -3 src/sys/contrib/ipfilter/netinet/ip_compat.h
1.10 +1 -1 src/sys/contrib/ipfilter/netinet/ip_fil.c
1.7 +1 -1 src/sys/dev/netif/txp/if_txp.c
1.7 +1 -1 src/sys/net/ip_mroute/ip_mroute.c
1.7 +1 -2 src/sys/net/ipfw/ip_fw2.c
1.6 +1 -2 src/sys/netinet/igmp.c
1.4 +158 -116 src/sys/netinet/in_cksum.c
1.6 +1 -1 src/sys/netinet/ip_gre.c
1.7 +1 -2 src/sys/netinet/ip_icmp.c
1.10 +1 -1 src/sys/netinet/ip_input.c
1.10 +1 -2 src/sys/netinet/ip_output.c
1.13 +1 -2 src/sys/netinet/tcp_input.c
1.9 +1 -2 src/sys/netinet/tcp_output.c
1.10 +1 -1 src/sys/netinet/tcp_subr.c
1.10 +1 -1 src/sys/netinet/tcp_syncache.c
1.9 +1 -2 src/sys/netinet/udp_usrreq.c

1.5 +1 -2 src/sys/netinet6/ipsec.c
1.5 +1 -2 src/sys/netproto/ipsec/ipsec.c
1.5 +1 -1 src/sys/netproto/ipsec/ipsec_input.c
1.4 +1 -2 src/sys/netproto/ipsec/ipsec_output.c

and finally remove
sys/i386/i386 in_cksum.c
sys/i386/include in_cksum.h
---snip---
- endian.h:
* DTRT in C++ mode
- quad.h:
* we don't use gcc v1 anymore, remove support for it
Suggested by: bde (long ago)
- assym.h:
* avoid zero-length arrays (remove dependency on a gcc specific
feature)
This change changes the contents of the object file, but as it's
only used to generate some values for a header, and the generator
knows how to handle this, there's no impact in the gcc case.
Explained by: bde
Submitted by: Marius Strobl <marius@alchemy.franken.de>
- aicasm.c:
* minor change to teach it about the way icc spells "-nostdinc"
Not approved by: gibbs (no reply to my mail)
- bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by: -arch
Submitted by: netchild


126828 11-Mar-2004 marcel

Remove stale or broken call to kdb_trap() and protected by the non-
option KDB. Besides being wrong, it also interferes with ongoing
work.


126735 08-Mar-2004 peter

Stop depending on #include pollution from cpufunc.h


126733 08-Mar-2004 peter

MFi386: curpcb is no longer null anymore, so do not test for it.


126732 08-Mar-2004 peter

MFi386: set initial curpcb pcpu variable at startup time rather than
waiting for a context switch


126731 08-Mar-2004 peter

MFi386: wait for local apic to become free before using it


126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


126677 06-Mar-2004 peter

When faced with a "GenuineIntel", we know what they call it now. Replace
snide comment with a different one.


126654 05-Mar-2004 bde

MFi386: (all: keep a comment in sync with code, and don't depend on
namespace pollution).


126246 25-Feb-2004 peter

Since we don't use PG_NX yet, don't turn on EFER_NXE quite yet. This needs
to be done based on the cpuid bits. AMD says that we should test the cpuid
features bits for certain things, such as this.


126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


125530 06-Feb-2004 peter

Remove the badsw* INVARIANTS checks. The events that this attempts
to catch are already nicely caught by trapping the null pointer derefs.
Remove no-longer-used noswitch/nothrow strings. They were referenced
by the stub cpu_switch() etc functions before they were implemented.
Try something a little different for the lock prefixes.

Prompted by: bde (the first two items anyway)


125487 05-Feb-2004 kan

Rename cn_unavailable to cnunavailable for little more consistency.
Garbage collect unused cndebug() function.

Suggested by: bde


125467 05-Feb-2004 kan

Eliminate global cons_unavailable flag and replace it by the status
bit maintained on a per-device basis. Single variable is inadequate
on machines running with multiple consoles enabled.


125463 05-Feb-2004 peter

Fix long/int printf format problems exposed by PMAP_DIAGNOSTIC


125221 30-Jan-2004 peter

Merge some more changes from i386.


125183 29-Jan-2004 peter

Re-add debug register support.
Some other minor tweaks snuck in here, including supporting more
discontiguous memory segments and some cosmetic tweaks.


125182 29-Jan-2004 peter

Re-add user_dbreg_trap() for debug register support


125181 29-Jan-2004 peter

Take another shot at the invariants calls to __panic. They hadn't been
updated for the regparm ABI on amd64.
Context switch debug regs.
Update for fpu simplification
Don't needlessly reload %cr3, in case the cpu has the tlb flush filter
turned off. Re-add LAZY_SWITCH stubs.


125180 28-Jan-2004 peter

deal with dbregs for fork etc
update for fpu.c simplification
Merge #include sort from i386


125179 28-Jan-2004 peter

Un-stub the hardware debug register stuff.


125178 28-Jan-2004 peter

Export PCB_DR* symbols


125177 28-Jan-2004 peter

We can simplify a lot of things now that we don't have to worry about
hardware bugs on external 386 cpus and now that we can depend on SSE.


125174 28-Jan-2004 peter

MFi386: mp_topology().


124950 25-Jan-2004 alc

MFi386 revision 1.230
- Move smp_topology to subr_smp.c so that it is defined on all architectures.


124850 23-Jan-2004 peter

Unbreak amd64: Rename calls from panic to __panic


124092 03-Jan-2004 davidxu

Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr


124038 01-Jan-2004 alc

- Use pagezero() instead of bzero() in pmap_pinit(). (pagezero() is much
faster.)
MFi386:
- Don't bother clearing PG_ZERO on the page table page in
_pmap_allocpte(); it serves no purpose.
- Don't bother clearing and setting PG_BUSY on page table directory pages.


123929 28-Dec-2003 silby

Track three new sendfile-related statistics:
- The number of times sendfile had to do disk I/O
- The number of times sfbuf allocation failed
- The number of times sfbuf allocation had to wait


123920 28-Dec-2003 silby

Move the declaration of sfbufspeak and sfbufsused to mbuf.h,
and use imax instead of max, as sfbufspeak and sfbufsused
are signed.

Submitted by: bde


123884 27-Dec-2003 silby

Track current and peak sfbuf usage, export the values via sysctl.


123742 23-Dec-2003 peter

Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.


123710 22-Dec-2003 alc

- Significantly reduce the number of preallocated pv entries in
pmap_init(). Such a large preallocation is unnecessary and wastes
nearly eight megabytes of kernel virtual address space per gigabyte
of managed physical memory.
- Increase UMA_BOOT_PAGES by two. This enables the removal of
pmap_pv_allocf(). (Note: this function was only used during
initialization, specifically, after pmap_init() but before
pmap_init2(). During pmap_init2(), a new allocator is installed.)


123428 11-Dec-2003 peter

MFi386: (jhb): Deal with MAXCPU etc correctly


123180 06-Dec-2003 peter

Various whitespace and cosmetic sync-up's with i386.

Approved by: re (scottl)


123179 06-Dec-2003 peter

amd64_protection_init and the protection_codes[] array was overkill.
Inline it instead.

Approved by: re (scottl)


123177 06-Dec-2003 peter

MFi386: put the apic disable hook in a better place.

Approved by: re (scottl)


123175 06-Dec-2003 peter

Revert some amd64 changes that cached curthread and converge back to the
i386 version. The curthread special case in pcpu.h solves my complaint
about the verbose macro expansion in this case. Note that the i386
version still has some OBE comments, I didn't re-add them back again.

Approved by: re (scottl)


123126 03-Dec-2003 jhb

Fix all users of mp_maxid to use the same semantics, namely:

1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by: re (scottl)
Tested on: i386, amd64, alpha


123074 30-Nov-2003 jeff

- Make mp_maxid reflect the same meaning as it does on other architectures.
It is one past the last valid cpuid. This relied on a different bug in
UMA to work properly.

Reported/Tested by: phk
Approved by: rwatson


123010 27-Nov-2003 peter

Fix i386 apic support merge botch. sizeof(long) is 8, not 4. This fixes
the annoying 'sysctl: hw.intrcnt: out of memory' error message in systat.

Approved by: re (rwatson)


122950 22-Nov-2003 peter

Argh! The Athlon64 and Opteron only implement 40 bits of address space in
the MTRR Base/Mask registers. If you use the documented algorithm in the
systems programming guide, you'll get a GPF. The only thing that has
prevented this so far is that the bios pre-sets some MTRR entries which
we mis-interpreted sufficiently to fool the memcontrol interface into
thinking all the address space was taken and therefore rejected XFree86's
requests. However, not all bioses do this.. You get an insta-panic in
that case. Grrr. A better fix (dynamic mask) will happen by 5.3/5-stable
so that we automatically adapt to more than 40 physical bits.

Approved by: re (scottl)


122947 21-Nov-2003 jhb

- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.

Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)


122940 21-Nov-2003 peter

Cosmetic and/or trivial sync up with i386.

Approved by: re (rwatson)


122939 21-Nov-2003 peter

MFi386 rev 1.54 (jhb): Add interrupts that are actually available to the
resource manager, rather than adding everything.

Approved by: re (scottl)


122930 20-Nov-2003 peter

Provide a streamlined '#define curthread __curthread()' for amd64 to avoid
the compiler having to parse and optimize the PCPU_GET(curthread) so often.
__curthread() is an inline optimized version of PCPU_GET(curthread) that
knows that pc_curthread is at offset zero in the pcpu struct. Add a
CTASSERT() to catch any possible changes to this. This accounts for
just over a 1% wall clock speedup for total kernel compile/link time,
and 20% compile time speedup on some specific files depending on which
compile options are used.

Approved by: re (jhb)


122849 17-Nov-2003 peter

Initial landing of SMP support for FreeBSD/amd64.

- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
(in particular, bootstrap)
- This is still a WIP. It seems that there are some highly bogus bioses
on nVidia nForce3-150 boards. I can't stress how broken these boards
are. I have a workaround in mind, but right now the Asus SK8N is broken.
The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE. SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
atpic' in addition, because they somehow managed to forget to connect the
8254 timer to the apic, even though its in the same silicon! ARGH!
This directly violates the ACPI spec.


122841 17-Nov-2003 peter

Widen the enable/disable helper function's argument in line with the
ithread_create() changes etc. This should be mostly a NOP.


122821 16-Nov-2003 alc

- Remove unnecessary synchronization from sf_buf_init(). (There is only
one active CPU when sf_buf_init() is performed.)


122780 16-Nov-2003 alc

- Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.


122771 16-Nov-2003 bde

Localized the cy driver's locking.


122763 15-Nov-2003 njl

Add the pc_acpi_id PCPU member. The new acpi_cpu driver uses this to
dereference the softc.


122713 14-Nov-2003 peter

Minor source sync with amd64. Use int as the type for the width
field of %.*s rather than size_t.


122700 14-Nov-2003 jhb

If an interrupt source doesn't have an ithread, treat it as a stray
interrupt. This can only happen if an unregistered interrupt source
triggers an interrupt.


122697 14-Nov-2003 peter

basemem is in K, not bytes. I think I tricked jhb into making the same
mistake I did and then committing it to cvs.


122690 14-Nov-2003 jhb

Shuffle the APIC interrupt vectors around a bit:
- Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff
range. The pmap lazyfix IPI was reordered down next to the TLB
shootdowns to avoid conflicting with the spurious interrupt vector.
- Move the base of APIC interrupts up 16 so that the first 16 APIC
interrupts do not overlap the vectors used by the ATPIC.
- Remove bogus interrupt vector reservations for LINT[01].
- Now that 0xc0 - 0xef are available, use them for device interrupts.
This increases the number of APIC device interrupts to 191.
- Increase the system-wide number of global interrupts to 191 to catch up
to more APIC interrupts.

Requested by: peter (2)


122620 13-Nov-2003 jhb

Whitespace.


122595 13-Nov-2003 peter

Stop pretending to support kernel profiling. The FAKE_MCOUNT() etc
calls are just gradually getting more and more stale. At this point it
would be better to start from scratch once prof_machdep.c is adapted.


122572 12-Nov-2003 jhb

- Move manipulation of td_intr_nesting_level out of assembly interrupt
vector stubs and into the C functions they call.
- Move disabling and EOIing of interrupt sources out of PIC driver entry
points and into intr_execute_handlers(). Intr_execute_handlers() only
disables a source for an interrupt if it is a stray interrupt or has
threaded handlers. Sources with fast handlers no longer disable (mask)
the source while executing the handlers.
- Move the setting of clkintr_pending into intr_execute_handlers() and set
the variable for any interrupt source with a vector of 0. (Should only
be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE
case.
- Implement lapic_eoi() and use it to implement ioapic_eoi_source().
- Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to
handle all atpic interrupts and not just threaded ones.

Inspired by: peter's changes to amd64 in p4 (1)
Requested by: bde (2)


122520 12-Nov-2003 peter

Cosmetic sync with i386


122511 11-Nov-2003 jhb

Don't probe busses in the MP Table for the MP Table PCI bridge drivers
if the bus number doesn't correspond to a PCI bus in the MP Table.

Reported by: jhay


122491 11-Nov-2003 jhb

Enable HTT CPUs by default instead of halting them by default. Users
should now only have HTT CPUs if they have explicitly asked for them
either by enabling HyperThreading in the BIOS or by using the
MPTABLE_FORCE_HTT kernel option.


122490 11-Nov-2003 jhb

Disable probing of HTT CPUs by default for the MP Table case. HTT CPUs
should only be used if they are enabled in the BIOS. Now that we support
enumerating CPUs using the ACPI MADT, any HTT machine using ACPI should
respect the BIOS setting. For HTT machines with ACPI disabled in the
kernel, the MPTABLE_FORCE_HTT kernel option can be used to try to probe HTT
CPUs like have done in the past for the MP Table case. This option should
only be enabled if HTT is enabled in the BIOS.


122438 10-Nov-2003 jhb

MFamd64 (via P4, not in CVS yet):
- Use the static boot_address variable directly rather than passing it
around to several functions.
- Clean up a couple of magic numbers.


122434 10-Nov-2003 jhb

Bump APIC ID limits up to 32 since a machine with 16 CPUs will have APIC
IDs for the I/O APICs that are greater than 16.

Reported by: John Cagle <john.cagle@hp.com>


122364 09-Nov-2003 marcel

Change the clear_ret argument of get_mcontext() to be a flags argument.
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.

This change is triggered by a need on ia64 to have another knob for
get_mcontext().


122296 08-Nov-2003 peter

Update the graffiti.


122295 08-Nov-2003 peter

Switch from having a fpu "device" to something that is more like the
integrated part of the cpu core that it is.


122292 08-Nov-2003 peter

The great s/npx/fpu/gi


122278 08-Nov-2003 peter

Rename npx* to fpu*. I haven't done the flags/function names yet.


122269 08-Nov-2003 peter

There isn't much point printing 'npx0: INT 16 interface' because that is
the only way it works here.


122268 07-Nov-2003 jhb

Dump the trigger and polarity of each intpin's default setting in the
bootverbose output.


122156 06-Nov-2003 peter

OK, this might be a bit silly, but add another popcnt() candidate.


122149 05-Nov-2003 jhb

When remapping an ISA interrupt from one intpin to another, disable the
pin that is used by the default identity mapping if it still maps to the
old vector. The ACPI case might need some tweaking for the SCI interrupt
case since ACPI likes to address the intpin using both the IRQ remapped to
it as well as the previous existing PCI IRQ mapped to it.

Reported by: kan


122148 05-Nov-2003 jhb

Two style nits.


122124 05-Nov-2003 jhb

- Adjust some of the bitfields in the ioapic_intsrc struct to be unsigned
rather than signed. This fixes some cosmetics such as verbose printf's
for IRQs greater than 127.
- The calculation for next_ioapic_base was also adjusted so that it will
only complain once for each hole in the IRQs provided by ACPI for IO
APICs.

Reported by: Michal Mertl <mime@traveller.cz>


122123 05-Nov-2003 jhb

Add a workaround for MP Tables that list the same PCI IRQ twice with
the same APIC / pin destination in both cases.

Reported by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


122064 04-Nov-2003 jhb

Tweak the version string output for ioapic devices.


121996 03-Nov-2003 jhb

New i386 SMP code:

- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.


121995 03-Nov-2003 jhb

Don't probe PnP BIOS devices for PICs for now to avoid problems with those
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.


121991 03-Nov-2003 jhb

Add the MP Table APIC enumerator. This code uses the BIOS MP Table to
enumerate I/O APICs as well as local APICs. It also provides Host-PCI
and PCI-PCI bridge drivers to use the MP Table to route PCI interrupts.


121986 03-Nov-2003 jhb

New APIC support code:

- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.


121982 03-Nov-2003 jhb

New device interrupt code. This defines an interrupt source abstraction
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.

By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.


121755 30-Oct-2003 jhb

Include "opt_pmap.h" so that the DISABLE_P* options are honored.


121754 30-Oct-2003 jhb

Always export r_gdt and r_idt and give them extern declarations in
machine/segments.h.


121751 30-Oct-2003 peter

MFi386: thread specific fpu state optimizations


121723 30-Oct-2003 peter

MFi386: rev 1.451 (jhb): call pmap_kremove() rather than duplicate it


121722 30-Oct-2003 peter

MFi386: trap.c rev 1.259: fetch thread mailbox address in page fault trap


121623 28-Oct-2003 peter

Oops. Remove some rather noisy debug printfs that slipped in there
somehow.


121481 24-Oct-2003 jhb

A few whitespace and comment tweaks.


121307 21-Oct-2003 silby

Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.


121228 18-Oct-2003 njl

Add the cpu_idle_hook() function pointer so that other idlers can be
hooked at runtime. Make C1 sleep (e.g., HLT) be the default. This
prepares the way for further ACPI sleep states.


121133 16-Oct-2003 bde

Don't forget to load %es with the kernel data segment selector in
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.

This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.


121103 15-Oct-2003 peter

Pull the tier-2 card one last time and break the get/setcontext and
sigreturn() ABI and the signal context on the stack.

Make the trapframe (and its shadows in the ucontext and sigframe etc)
8 bytes larger in order to preserve 16 byte stack alignment for the
following C code calls. I could have done some padding after the
trapframe was saved, but some of the C code still expects an argument of
'struct trapframe'. Anyway, this gives me a spare field that can be used
to store things like 'partial trapframe' status or something else in
the future.

The runtime impact is fairly small, *except* for threaded apps and things
that decode contexts and the signal stack (eg: cvsup binary). Signal
delivery isn't too badly affected because the kernel generates the
sigframe that sigreturn uses after the handler has been called.

The size of mcontext_t and struct sigframe hasn't changed. Only
the last few fields (sc_eip etc) got moved a little and I eliminated
a spare field. mc_len/sc_len did change location though so the
sanity checks there will still trap it.


121081 14-Oct-2003 alc

MFia64
Move uma_small_alloc() and uma_small_free() to uma_machdep.c.


120937 09-Oct-2003 robert

Implement preliminary support for the PT_SYSCALL command to ptrace(2).


120772 05-Oct-2003 alc

Don't bother setting a page table page's valid field. It is unused and
not setting it is consistent with other uses of VM_ALLOC_NOOBJ pages.


120722 03-Oct-2003 alc

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


120661 02-Oct-2003 alc

Reimplement pagezero() using "movnti".


120654 01-Oct-2003 peter

Commit Bosko's patch to clean up the PSE/PG_G initialization to and
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.

For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.

A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.

Obtained from: bmilekic


120595 30-Sep-2003 jeff

- Remove the definition for TD_SWITCHIN as it is not used.

Approved by: peter


120525 27-Sep-2003 alc

Eliminate the pte object.


120449 26-Sep-2003 alc

MFi386
Allocate the page table directory page as "no object" pages.


120427 25-Sep-2003 alc

MFi386
Reimplement pmap_release() such that it uses the page table rather than
the pte object to locate the page table directory pages. (Temporarily,
retain an assertion on the emptiness of the pte object.)


120422 25-Sep-2003 peter

Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.


120367 23-Sep-2003 peter

Fix patch transcription typo. s/IDT_BPT/IDT_BP/


120365 23-Sep-2003 peter

Sync with i386 version. The quality initialization was missing and some
other junk.


120360 22-Sep-2003 peter

Move basemem variable into global scope so that the MP startup code can
refer to it for looking for tables.


120357 22-Sep-2003 peter

MFi386 rev 1.51 by scottl: make dflt_lock() always panic.


120356 22-Sep-2003 peter

MFi386 rev 1.53 by scottl: Allocate the S/G list in the tag, not on
the stack. This means that s/g lists can be arbitrarily long.


120355 22-Sep-2003 peter

MFi386 machdep.c rev 1.201, clock.c 1.201, clock.h 1.45 by phk: Dont
initialize a TSC timecounter until we know if it is broke or not.

XXX I think there is a bug in the i386 code here. init_TSC_tc() comes
after:
if (statclock_disable)
return;

ie: if you turn off the statclock interrupt, you dont get the TSC either.


120354 22-Sep-2003 peter

MFi386 rev 1.105 by jhb: fix comment typo


120353 22-Sep-2003 peter

MFi386 rev 1.256 by jhb: remove redundant #include <sys/sysctl.h>


120351 22-Sep-2003 peter

MFi386 rev 1.55 by sam: remove unused #define BUS_DMAMAP_NSEGS


120348 22-Sep-2003 peter

MFCi386: trap.c rev 1.257 by bde. Don't forget to reenable interrupts
for breakpoint and trace traps from usermode. Although all the setidt
entries are interrupt gates on amd64, all but the trace and bpt trap
entry handlers reenable interrupts after the swapgs instruction in order
to simulate the trap/interrupt gate distinction. In other words, the
amd64 code behaves the same way that i386 does here.


120346 22-Sep-2003 peter

MFi386 by jhb: use symbolic constants for the IDT entries.


120345 22-Sep-2003 peter

MFi386: machdep.c:1.570 clock.c:1.204 by bde: Quick fix for calling DELAY
for ddb input in some atkbd-based console drivers. ddb must not use any
normal locks but DELAY() normally calls getit() which needs clock_lock.
This also removes the need for recursion on clock_lock.


120040 13-Sep-2003 alc

Simplify (and micro-optimize) pmap_unuse_pt(): Only one caller,
pmap_remove_pte(), passed NULL instead of the required page table
page to pmap_unuse_pt(). Compute the necessary page table page
in pmap_remove_pte(). Also, remove some unreachable code from
pmap_remove_pte().


119999 12-Sep-2003 alc

Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from: tegge


119941 10-Sep-2003 jhb

Remove an XXX comment by using the per CPU mask added after this comment
was added.


119924 09-Sep-2003 peter

Clean up get/set_mcontext() and get/set_fpcontext(). These are operated
on data structures on the kernel stack which are guaranteed to be 16 byte
aligned by gcc, the amd64 ABI and __aligned(16).

Ensire the tss_rsp0 initial stack pointer is 16 byte aligned in case
sizeof(pcb) becomes odd at some point. This is convenient for the
interrupt handler case because the ring crossing pushes cause the
required odd alignment before the call to the C code.

Have fast_syscall add an additional 8 bytes to ensure that the trapframe
has the correct odd alignment for the call to C code. Note that there are
no checks to make sure that the trapframe size is appropriate for this.

This makes get/setfpcontext work properly (finally). You get a GPF in
kernel mode if any of this is botched without the alignment fixup code
that is apparently needed on i386.


119869 08-Sep-2003 alc

Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by: tegge


119563 29-Aug-2003 alc

Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.


119452 25-Aug-2003 obrien

Fix copyright comment & FBSDID style nits.

Requested by: bde


119399 24-Aug-2003 alc

Eliminate the last (direct) uses of vm_page_lookup() on the pte object.


119340 23-Aug-2003 peter

AMD64 mtrr driver.


119158 20-Aug-2003 alc

- Lock the pte object when performing vm_page_grab().
- Insure that the page table page is zero filled before adding it
to the page table.


119015 17-Aug-2003 gordon

Fixup the ELF branding information to point to the new home of rtld.


119006 17-Aug-2003 alc

In pmap_copy(), since we have the page table page's physical address
in hand, use PHYS_TO_VM_PAGE() rather than vm_page_lookup().


119004 16-Aug-2003 marcel

In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.


118983 16-Aug-2003 alc

Eliminate pmap_page_lookup() and its uses. Instead, use PHYS_TO_VM_PAGE()
to convert the pte's physical address into a vm page.

Reviewed by: peter


118848 12-Aug-2003 imp

Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon


118832 12-Aug-2003 ps

Halted CPU's should not accumulate time.

Reviewed by: jhb


118741 10-Aug-2003 alc

Rename pmap_changebit() to pmap_clear_ptes() and remove the last
parameter. The new name better reflects what the function does and
how it is used. The last parameter was always FALSE.

Note: In theory, gcc would perform constant propagation and dead code
elimination to achieve the same effect as removing the last parameter,
which is always FALSE. In practice, recent versions do not. So, there
is little point in letting unused code pessimize execution.


118641 08-Aug-2003 alc

MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry().


118451 04-Aug-2003 scottl

In _bus_dmamap_load_buffer(), only count the number of bounce pages needed if
they haven't been counted before. This test was ommitted when bus_dmamap_load()
was merged into this function, and results in the pagesneeded field growing
without bounds when multiple deferrals happen.

Thanks to Paul Saab for beating his head against this for a few hours =-)


118443 04-Aug-2003 jhb

- Since td_critnest is now initialized in MI code, it doesn't have to be
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.


118365 02-Aug-2003 alc

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in pmap_mapdev().
See revision 1.140 of kern/sys_pipe.c for a detailed rationale.

Submitted by: tegge


118244 31-Jul-2003 bmilekic

Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.


118235 31-Jul-2003 peter

Cosmetic: fix disorder of opt_kstack_pages.h include.


118156 29-Jul-2003 davidxu

Use PSL_KERNEL as upcall thread's initial rflags, don't use
scratch user rflags.


118081 27-Jul-2003 mux

- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.


118031 25-Jul-2003 obrien

Use __FBSDID().

Brought to you by: a boring talk at Ottawa Linux Symposium


118024 25-Jul-2003 alc

MFi386 revision 1.416
Add vm object locking to pmap_prefault().

Note: powerpc and sparc64 do not implement this function.


117985 25-Jul-2003 davidxu

Align upcall stack top to odd times of 8. GCC accounts return address
in callee function for stack alignment.


117962 24-Jul-2003 davidxu

Implement cpu_set_upcall and cpu_set_upcall_kse.

Reviewed by: peter


117961 24-Jul-2003 davidxu

Set fault address to si_addr.

Reviewed by: peter


117943 23-Jul-2003 peter

Make the breakpoint instruction trap gate available to users.
ptrace() needs this.

Submitted by: Mark Kettenis <kettenis@chello.nl>


117942 23-Jul-2003 peter

Set the %gs base to pcb_gsbase, not pcb_fsbase. Oops.

Discovered by: davidxu


117929 23-Jul-2003 alc

Annotate pmap_changebit() as __always_inline. This function was
written as a template that when inlined is specialized for the caller
through constant value propagation and dead code elimination. Thus,
the specialized code that is generated for pmap_clear_reference() et
al. avoids several conditional branches inside of a loop.


117928 23-Jul-2003 jhb

Use macros from apic.h to when writing to the ICR to send IPIs to startup
APs rather than magic numbers.

Tested by: scottl


117600 15-Jul-2003 davidxu

Rename thread_siginfo to cpu_thread_siginfo.

Suggested by: jhb


117385 10-Jul-2003 markm

Protect lint(1) from a #error.


117372 10-Jul-2003 peter

unifdef -DLAZY_SWITCH and start to tidy up the associated glue.


117369 09-Jul-2003 peter

Fix up bogus index/offset/mask calculations in the allocpte and the
corresponding release code. This was preventing the use of more than
1/2TB of user VM. I also spent a week staring at this code only to
eventually find that I'd mistakenly typed a P as an R.


117368 09-Jul-2003 peter

Turn the 2MB page mappings that cover the kernel text+data+bss area back
on now that pmap_pte() can handle it. I never actually ran into anything
that broke that I know of, but this was turned off as a precaution.


117367 09-Jul-2003 peter

Have pmap_pte() on a 2MB mapped address return the 2MB pde itself
rather than a non-existing pte. There is code elsewhere in i386/amd64
pmap that neglects to handle the large page cases because it knows that
it will see PG_PS in the returned "pte".


117340 08-Jul-2003 alc

In pmap_object_init_pt(), the pmap_invalidate_all() should be performed on
the caller-provided pmap, not the kernel_pmap. Using the kernel_pmap
results in an unnecessary IPI for TLB shootdown on SMPs.

Reviewed by: jake, peter


117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


117136 01-Jul-2003 mux

Sync more things with other backends.


117129 01-Jul-2003 mux

Honor the boundary of the busdma tag when allocating bounce pages.
This was fixed in revision 1.5 of alpha/alpha/busdma_machdep.c and
was never fixed in other busdma backends using bounce pages.


117126 01-Jul-2003 scottl

Mega busdma API commit.

Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.

sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.

If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.

Reviewed by: tmm, gibbs


117045 29-Jun-2003 alc

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


117006 28-Jun-2003 jeff

- Construct a cpu topology map for Hyper Threading systems so that ULE may
take advantage of them.


116958 28-Jun-2003 davidxu

Add a machine depended function thread_siginfo, SA signal code
will use the function to construct a siginfo structure and use
the result to export to userland.

Reviewed by: julian


116947 28-Jun-2003 scottl

Catch amd64 up with the pending busdma async callback locking. Though this
mechanism might change in the near future, it's best to keep everything in
sync right now.

Reminded by: peter


116856 26-Jun-2003 peter

Add back in the ability for pmap_mapdev() to use KVM if the region
being requested is outside of the range of the direct map region. eg:
for pci windows. While here, increase the minimum size of the direct
map region to be 4GB instead of 1GB.


116709 23-Jun-2003 alc

MFi386
Add vm object locking to pmap_object_init_pt().


116684 22-Jun-2003 simokawa

- Allow access to direct mapped region via /dev/kmem. This makes
'netstat -r' work.
- Use direct map for /dev/mem.


116683 22-Jun-2003 simokawa

- Allocate a new PD Table if kernel grows beyond 1GB boundary.
Reviewed by: peter

- Use direct map in pmap_mapdev().


116619 20-Jun-2003 simokawa

Use direct map in pmap_map().

This saves much KVA for vm_pages and you don't need to increase NKPT
for large physical memory anymore.

Suggested by: dfr


116579 19-Jun-2003 simokawa

Fix direct map page table for 2GB+ physical memory.

You may still need to increase NKPT for larger memory.
I have successfully booted 8GB system with NKPT=256.


116510 18-Jun-2003 alc

Fix a performance bug in all of the various implementations of
uma_small_alloc(): They always zeroed the page regardless of what the
caller requested.


116361 15-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


116328 14-Jun-2003 alc

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


116188 11-Jun-2003 peter

GC unused cpu_wait() function


115907 06-Jun-2003 jhb

- Use IDTVEC() to declare IPI handlers since they are also IDT vectors.
- Make handlers for IPI's used by SMP kernels #ifdef SMP.


115861 04-Jun-2003 marcel

Change the second (and last) argument of cpu_set_upcall(). Previously
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.

Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.

This change officially marks the ability on ia64 for 1:1 threading.

Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64


115736 02-Jun-2003 peter

Fix restarted syscalls. When we rewind %rip, we also need to restore
all the argument registers etc since we have almost certainly have trashed
them by now. Take particular car of %r10 since it held the original value
of %rcx (which we saved in tf_rcx on entry and doreti doesn't know this).


115703 02-Jun-2003 obrien

Use __FBSDID().


115683 02-Jun-2003 obrien

Use __FBSDID().


115577 31-May-2003 peter

MFi386: rev 1.56: remove break after return


115576 31-May-2003 peter

MFi386: rev 1.23: use gdb_strlen()/gdb_strcpy() directly.


115575 31-May-2003 peter

MFi386: rev 1.50: remove unused variable


115546 31-May-2003 phk

Avoid unbalancing the { } count in the source file with #ifdef by
putting the opening { after the #ifdef ... #endif sequence.

Found by: FlexeLint


115432 31-May-2003 peter

Add acpi to the build. Remove the hack from machdep.c that lies to the
loader to shut it up.


115431 31-May-2003 peter

Have hammer_time() return the proc0 stack location, and have locore
switch to it before calling mi_startup(). The bootstack is WAY too small
for running acpica during probe/attach. While here, pass modulep/physfree
to the startup routine, rather than writing to the global variables in
locore.S.

Approved by: re (amd64/*)


115404 30-May-2003 peter

Nasty 'make it compile' port to amd64. Note that it needs some other
wire protocol for the extra registers. I should probably just remove it
from here for now since its quite useless.

Approved by: re (amd64/* blanket)


115403 30-May-2003 peter

Initial port to amd64 after repocopy from i386. Note that the
disassembler has not been updated yet, and will do some very strange
things. It does tracebacks (without function arguments due to regparm
calling conventions) if -fno-omit-frame-pointer is used (to come later).
This achieves basic functionality.

Approved by: re (amd64/* blanket)


115402 30-May-2003 peter

Add setjmp/longjmp for ddb


115358 27-May-2003 peter

Update AMD Features vector to include NX (page table entry no-execute bit)
and LM (long mode) etc.


115343 27-May-2003 scottl

Bring back bus_dmasync_op_t. It is now a typedef to an int, though the
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.

Approved by: re (bmah)


115316 26-May-2003 scottl

De-orbit bus_dmamem_alloc_size(). It's a hack and was never used anyways.
No need for it to pollute the 5.x API any further.

Approved by: re (bmah)


115251 23-May-2003 peter

Major pmap rework to take advantage of the larger address space on amd64
systems. Of note:
- Implement a direct mapped region using 2MB pages. This eliminates the
need for temporary mappings when getting ptes. This supports up to
512GB of physical memory for now. This should be enough for a while.
- Implement a 4-tier page table system. Most of the infrastructure is
there for 128TB of userland virtual address space, but only 512GB is
presently enabled due to a mystery bug somewhere. The design of this
was heavily inspired by the alpha pmap.c.
- The kernel is moved into the negative address space(!).
- The kernel has 2GB of KVM available.
- Provide a uma memory allocator to use the direct map region to take
advantage of the 2MB TLBs.
- Fixed some assumptions in the bus_space macros about the ability
to fit virtual addresses in an 'int'.

Notable missing things:
- pmap_growkernel() should be able to grow to 512GB of KVM by expanding
downwards below kernbase. The kernel must be at the top 2GB of the
negative address space because of gcc code generation strategies.
- need to fix the >512GB user vm code.

Approved by: re (blanket)


115237 22-May-2003 peter

Merge from i386/trap.c rev 1.252. Use td_critnest instead of the
spinlocks count for explicitly enabling interrupts.

Approved by: re (blanket)


115093 17-May-2003 peter

Actually get all the bits for sd_hibase.. it was 16 bits short. oops.

Approved by: re (amd64/* blanket)


115016 15-May-2003 alc

Initialize logical_cpus_mask when the logical CPUs are enumerated in
the mptable. (Previously, logical_cpus_mask was only initialized if
the hyperthreading fixup was executed.)

Approved by: re (jhb)
Reviewed by: ps


115006 15-May-2003 peter

Collect the nastiness for preserving the kernel MSR_GSBASE around the
load_gs() calls into a single place that is less likely to go wrong.

Eliminate the per-process context switching of MSR_GSBASE, because it
should be constant for a single cpu. Instead, save/restore it during
the loading of the new %gs selector for the new process.

Approved by: re (amd64/* blanket)


115005 15-May-2003 peter

Use compile time constants for things like PTmap[] etc because they're
about to move outside of the +/- 2GB range

Suggested by: jake
Approved by: re (amd64/* blanket)


114987 14-May-2003 peter

Add BASIC i386 binary support for the amd64 kernel. This is largely
stolen from the ia64/ia32 code (indeed there was a repocopy), but I've
redone the MD parts and added and fixed a few essential syscalls. It
is sufficient to run i386 binaries like /bin/ls, /usr/bin/id (dynamic)
and p4. The ia64 code has not implemented signal delivery, so I had
to do that.

Before you say it, yes, this does need to go in a common place. But
we're in a freeze at the moment and I didn't want to risk breaking ia64.
I will sort this out after the freeze so that the common code is in a
common place.

On the AMD64 side, this required adding segment selector context switch
support and some other support infrastructure. The %fs/%gs etc code
is hairy because loading %gs will clobber the kernel's current MSR_GSBASE
setting. The segment selectors are not used by the kernel, so they're only
changed at context switch time or when changing modes. This still needs
to be optimized.

Approved by: re (amd64/* blanket)


114986 14-May-2003 peter

Fix some misunderstandings about 64 bit extension.
Fix fuword/suword - they're supposed to be 'long' - ie: point them
at fuword64/suword64 instead of the incorrect 32 bit versions.


114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


114953 12-May-2003 peter

Really stop the loader from trying to load the acpi module by lying and
pretending that it is already here.

Approved by: re (amd64/* stuff)


114952 12-May-2003 peter

For the page fault handler, save %cr2 in the outer trap handler so that
we do not have to run so long with interrupts disabled. This involved
creating tf_addr in the trapframe. Reorganize the trap stubs so that
they consistently reserve the stack space and initialize any missing
bits.

Approved by: re (amd64 stuff)


114951 12-May-2003 peter

Sync ucontext with reality. The struct trapframe changes need to be
reflected here.

Approved by: re (blanket amd64/*)


114930 12-May-2003 peter

AMD64 physical space is much larger than i386, de-i386 the bus_space and
bus_dma MD code for AMD64. (And a trivial ifdef update in dev/kbd because
of this). More updates are needed here to take advantage of the 64 bit
instructions.

Approved by: re (blanket amd64/*)


114928 12-May-2003 peter

Give a %fs and %gs to userland. Use swapgs to obtain the kernel %GS.base
value on entry and exit. This isn't as easy as it sounds because when
we recursively trap or interrupt, we have to avoid duplicating the
swapgs instruction or we end up back with the userland %gs. I implemented
this by testing TF_CS to see if we're coming from supervisor mode
already, and check for returning to supervisor. To avoid a race with
interrupts in the brief period after beginning executing the handler and
before the swapgs, convert all trap gates to interrupt gates, and reenable
interrupts immediately after the swapgs. I am not happy with this.
There are other possible ways to do this that should be investigated.
(eg: storing the GS.base MSR value in the trapframe)

Add some sysarch functions to let the userland code get to this.

Approved by: re (blanket amd64/*)


114923 11-May-2003 peter

Call it an AMD64 Processor, not a Hammer. Also, it seems that the cpuid
model numbers are wider than I first thought.

Approved by: re (blanket amd64/*)


114922 11-May-2003 peter

I missed another printf format error while extracting the patch.

Approved by: re (blanket amd64/*)


114921 11-May-2003 peter

Make atdevbase long for the KERNBASE > 4GB case

Approved by: re (amd64/* blanket)


114919 11-May-2003 peter

Fix printf format errors that were undetected due to using the standard
FSF compiler during early development.


114918 11-May-2003 peter

Export PML4SHIFT and PDPSHIFT

Approved by: re (blanket amd64/*)


114917 11-May-2003 peter

Since compiling natively, the compile environment has been less forgiving
about silly typos. Use the correct comment sequences.


114867 10-May-2003 peter

Finish translating i386/support.s into amd64 asm - replace bcopy etc with
asm versions. This yields about a 5% kernel compile time speedup.


114837 08-May-2003 peter

Oops. Turn T_PAGEFLT back into an interrupt gate. It is *critical*
that interrupts be disabled and remain disabled until %cr2 is read.
Otherwise we can preempt and another process can fault, and by the
time we read %cr2, we see a different processes fault address. This
Greatly Confuses vm_fault() (to say the least). The i386 port has
got this marked as a bug workaround for a Cyrix CPU, which is what
lead me astray. Its actually necessary for preemption, regardless
of whether Cyrix cpus had a bug or not.


114821 08-May-2003 peter

Leave space for the 128 byte red-zone on the stack.


114820 08-May-2003 peter

#include <machine/metadata.h> was missing; add it


114819 08-May-2003 peter

Fix a preemption race. I was reenabling interrupts in the fast system
call handler before it was safe. It was possible for to lose context
and for something else to clobber the PCPU scratch variable. This
moves the interrupt enable *way* too late, but its better safe than
sorry for the moment.


114560 03-May-2003 peter

Repocopy *.s to *.S


114381 01-May-2003 peter

I changed the numbering of the MODINFOMD_SMAP during the commit, so
recognize the old number for my development boxes so I can use old
loader/pxeboot for a while if I need to.


114349 01-May-2003 peter

Commit MD parts of a loosely functional AMD64 port. This is based on
a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to
attempt to get a stable base to start from. There is a lot missing still.
Worth noting:
- The kernel runs at 1GB in order to cheat with the pmap code. pmap uses
a variation of the PAE code in order to avoid having to worry about 4
levels of page tables yet.
- It boots in 64 bit "long mode" with a tiny trampoline embedded in the
i386 loader. This simplifies locore.s greatly.
- There are still quite a few fragments of i386-specific code that have
not been translated yet, and some that I cheated and wrote dumb C
versions of (bcopy etc).
- It has both int 0x80 for syscalls (but using registers for argument
passing, as is native on the amd64 ABI), and the 'syscall' instruction
for syscalls. int 0x80 preserves all registers, 'syscall' does not.
- I have tried to minimize looking at the NetBSD code, except in a couple
of places (eg: to find which register they use to replace the trashed
%rcx register in the syscall instruction). As a result, there is not a
lot of similarity. I did look at NetBSD a few times while debugging to
get some ideas about what I might have done wrong in my first attempt.


114305 30-Apr-2003 jhb

Range check the syscall number before looking it up in the syscallnames[]
array.

Submitted by: pho


114293 30-Apr-2003 markm

Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.


114291 30-Apr-2003 markm

Warns fixing. Protect against inappropriate linting, and mark
GCC-specific assemble code as such (in #ifdefs). Fix an easy
static variable warning while I'm here.


114177 28-Apr-2003 jake

Use inlines for loading and storing page table entries. Use cmpxchg8b for
the PAE case to ensure idempotent 64 bit loads and stores.

Sponsored by: DARPA, Network Associates Laboratories


114029 25-Apr-2003 jhb

- Push down Giant into the sysarch() calls that still need Giant.
- Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is
invalid.


114013 25-Apr-2003 jake

Remove harmless invalid cast.

Sponsored by: DARPA, Network Associates Laboratories


113998 25-Apr-2003 deischen

Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared. The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets. Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code. The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by: jake
Reviewed by: bde (months ago)


113953 24-Apr-2003 davidxu

Don't print anything for fault at cpu_switch_load_gs, just like other
code to recover fault in doreti because of invalid segment registers,
silently push error to userland.


113845 22-Apr-2003 davidxu

Move down intr level testing code a bit, cpu_switch_load_gs fault can be at
interrupt nested time.


113843 22-Apr-2003 davidxu

Fix some problems for cpu_switch_load_gs. when fault address is at
cpu_switch_load_gs, cpu is in context switch, so don't enable interrupt.
because it is in context switch, it is expected sched_lock was held,
so don't PROC_LOCK(p) and psignal, it is LOR, probably we can
set a P_XSIGBUS like flag in p_sflags, and set TDF_ASTPENDING in
td_flags, in ast(), post a SIGBUS to process if P_XSIGBUS was set.


113833 22-Apr-2003 davidxu

Remove single threading detecting code, these code really should be
replaced by thread_user_enter(), but current we don't want to enable
this in trap.


113796 21-Apr-2003 davidxu

Reset pcb_gs and %gs before possibly invalidating it.


113686 18-Apr-2003 jhb

Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().


113682 18-Apr-2003 jhb

Hold the proc lock for curproc around sigonstack().


113621 17-Apr-2003 jhb

Remove a couple of unused symbols.


113492 15-Apr-2003 mux

style(9)


113472 14-Apr-2003 simokawa

Restore delayed load support for the resource shortage case.
It was missed in the previous change.
Now, _bus_dmamap_load_buffer() accepts BUS_DMA_WAITOK/BUS_DMA_NOWAIT flags.

Original idea from: jake


113459 14-Apr-2003 simokawa

* Use _bus_dmamap_load_buffer() and respect maxsegsz in bus_dmamap_load().
Ignoring maxsegsz may lead to fatal data corruption for some devices.
ex. SBP-2/FireWire
We should apply this change to other platforms except for sparc64.

MFC after: 1 week


113364 11-Apr-2003 davidxu

Copy %gs from current CPU not from a stale PCB backup.


113363 11-Apr-2003 davidxu

set_user_ldt_rv() should check same proc not thread,
this commit fixes an user LDT smp rendezvous bug.


113348 10-Apr-2003 des

Convert the SMP_TSC kernel option into a loader tunable. Also enable
the TSC timecounter on single-CPU systems even when they are running
an SMP kernel.


113347 10-Apr-2003 mux

Change the operation parameter of bus_dmamap_sync() from an
enum to an int and redefine the BUS_DMASYNC_* constants as
flags. This allows us to specify several operations in one
call to bus_dmamap_sync() as in NetBSD.


113339 10-Apr-2003 julian

Move the _oncpu entry from the KSE to the thread.
The entry in the KSE still exists but it's purpose will change a bit
when we add the ability to lock a KSE to a cpu.


113321 10-Apr-2003 wes

Add a sysctl that records and reports the CPU clock rate calculated
at boot. Funny how often this trivial piece of information crops up
in embedded boxen.

Sponsored by: St. Bernard Software


113228 07-Apr-2003 jake

Add support for bounce buffers to _bus_dmamap_load_buffer, which is the
backend for bus_dmamap_load_mbuf and bus_dmamap_load_uio.

- Increaes MAX_BPAGES to 512. Less than this causes fxp to quickly runs out
of bounce pages.
- Add an argument to reserve_bounce_pages indicating wether this operation
should fail or be queued for later processing if we run out of memory.
The EINPROGRESS return value is not handled properly by consumers of
bus_dmamap_load_mbuf.
- If bounce buffers are required allocate minimum 1 bounce page at map
creation time. If maxsize was small previously this could get truncated
to 0 and the drivers would quickly run out of bounce pages.
- Fix a bug handling the return value of alloc_bounce_pages at map creation
time. It returns the number of pages allocated, not 0 on success.
- Use bus_addr_t for physical addresses to avoid truncation.
- Assert that the map is non-null and not the no bounce map in
add_bounce_pages.

Sponsored by: DARPA, Network Associates Laboratories


113148 05-Apr-2003 peter

Unbreak the !LAZY_SWITCH case. I #ifdef'ed too much when I added
the ifdefs prior to commit and killed the same-address-space test.

Submitted by: bde


113100 04-Apr-2003 tegge

Add SMP_TSC option, which can be used on SMP systems where the TSCs
are synchronized to reduce context switch cost.


113090 04-Apr-2003 des

Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector. Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.


113040 03-Apr-2003 jake

- Removed APTD and associated macros, it is no longer used.

BANG BANG BANG etc.

Sponsored by: DARPA, Network Associates Laboratories


112993 02-Apr-2003 peter

Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.

Glanced at by: jake, jhb


112967 02-Apr-2003 jake

- Make casuptr return the old value of the location we're trying to update,
and change the umtx code to expect this.

Reviewed by: jeff


112898 01-Apr-2003 jeff

- Define a new md function 'casuptr'. This atomically compares and sets
a pointer that is in user space. It will be used as the basic primitive
for a kernel supported user space lock implementation.
- Implement this function in x86's support.s
- Provide stubs that return -1 in all other architectures. Implementations
will follow along shortly.

Reviewed by: jake


112897 01-Apr-2003 jeff

- In npxgetregs() use the td argument to save the fpu state from and not
curthread. Nothing currently depends on this behavior.
- Clean up an extra newline.

Obtained from: bde


112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


112886 31-Mar-2003 jeff

- Fix two calls to trapsignal() that were still passing in 'struct proc'.
These were missed in my last commit.


112883 31-Mar-2003 jeff

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


112858 31-Mar-2003 jeff

- In npxsetregs don't set the floating point if td == fpcurthread not if
curthread == fpcurthread. This is important when we're saving the fp
state for a thread other than curthread as in from set_mcontext.


112841 30-Mar-2003 jake

- Add support for PAE and more than 4 gigs of ram on x86, dependent on the
kernel opition 'options PAE'. This will only work with device drivers which
either use busdma, or are able to handle 64 bit physical addresses.

Thanks to Lanny Baron from FreeBSD Systems for the loan of a test machine
with 6 gigs of ram.

Sponsored by: DARPA, Network Associates Laboratories, FreeBSD Systems


112837 30-Mar-2003 jake

- Remove invalid casts.

Sponsored by: DARPA, Network Associates Laboratories


112836 30-Mar-2003 jake

- Convert all uses of pmap_pte and get_ptbase to pmap_pte_quick. When
accessing an alternate address space this causes 1 page table page at
a time to be mapped in, rather than using the recursive mapping technique
to map in an entire alternate address space. The recursive mapping
technique changes large portions of the address space and requires global
tlb flushes, which seem to cause problems when PAE is enabled. This will
also allow IPIs to be avoided when mapping in new page table pages using
the same technique as is used for pmap_copy_page and pmap_zero_page.

Sponsored by: DARPA, Network Associates Laboratories


112687 26-Mar-2003 ps

Nuke options HTT infavor of machdep.hlt_logical_cpus tunable/sysctl.
This keeps the logical cpu's halted in the idle loop. By default
the logical cpu's are halted at startup. It is also possible to
halt any cpu in the idle loop now using machdep.hlt_cpus.

Examples of how to use this:
machdep.hlt_cpus=1 halt cpu0
machdep.hlt_cpus=2 halt cpu1
machdep.hlt_cpus=4 halt cpu2
machdep.hlt_cpus=3 halt cpu0,cpu1

Reviewed by: jhb, peter


112686 26-Mar-2003 peter

Halt the cpus in the idle loop for SMP as well for several reasons:
1) Its critical for HTT. There's less foot-shooting opportunity.
2) I've seen significant improvements in interactive response to commands
over ssh sessions. I assume this is less lock contention.
3) As incentive to finish the idle cpu IPI wakeup stuff.
4) The machine on my desk was blowing hot air in my general direction
because somebody forgot to turn the hlt on, and it saves 50 watts per
cpu..

The machdep.cpu_idle_hlt sysctl is still available, but now the default
is the same as on UP kernels.


112569 25-Mar-2003 jake

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


112526 24-Mar-2003 bde

Disable interrupts while in kdb_trap() to handle cases where the caller
doesn't do it. This fixes all known causes of "Context switches not
allowed in the debugger" in mi_switch(). The main cause was trap_fatal()
calling kdb_trap() with interrupts enabled. Switching to ithreads for
interrupt handling then made fatal traps more fatal and harder to debug.
The problem was limited in -current because most interrupt handlers are
blocked by Giant, but it occurred almost deterministically for me because
my clock interrupt handlers are non-fast and not blocked by Giant.


112445 20-Mar-2003 dwmalone

Extend CPU_ATHLON_SSE_HACK to cover a few more revisions of Athlon CPUs.

Submitted by: Jon Kuster <kwsn@earthlink.net>
MFC after: 2 weeks


112436 20-Mar-2003 mux

Use atomic operations to increment and decrement the refcount
in busdma tags. There are currently no tags shared accross
different drivers so this isn't needed at the moment, but it
will be required when we'll have a proper newbus method to get
the parent busdma tag.


112367 18-Mar-2003 phk

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


112346 17-Mar-2003 mux

- Lock down the bounce pages structures. We use the same locking scheme
as with the alpha backend because both implementations of bounce pages
are identical.
- Remove useless splhigh()/splx() calls.


112196 13-Mar-2003 mux

Grab Giant around calls to contigmalloc() and contigfree() so
that drivers converted to be MP safe don't have to deal with it.


112133 12-Mar-2003 jake

- Added support for multiple page directory pages to pmap_pinit and
pmap_release.
- Merged pmap_release and pmap_release_free_page. When pmap_release is
called only the page directory page(s) can be left in the pmap pte object,
since all page table pages will have been freed by pmap_remove_pages and
pmap_remove. In addition, there can only be one reference to the pmap and
the page directory is wired, so the page(s) can never be busy. So all there
is to do is clear the magic mappings from the page directory and free the
page(s).

Sponsored by: DARPA, Network Associates Laboratories


111975 08-Mar-2003 davidxu

Initialize eflags in fake frame to default value rather than random one.
The random value sometimes causes macro CLKF_USERMODE to return true
because PSL_VM bit is set and really shoudn't be, this causes statclock()
to execute in wrong path, and further breaks KSE code and kernel crashes
when executing threaded program.


111939 06-Mar-2003 rwatson

Instrument sysarch() MD privileged I/O access interfaces with a MAC
check, mac_check_sysarch_ioperm(), permitting MAC security policy
modules to control access to these interfaces. Currently, they
protect access to IOPL on i386, and setting HAE on Alpha.
Additional checks might be required on other platforms to prevent
bypass of kernel security protections by unauthorized processes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


111878 04-Mar-2003 jhb

Wrap the hyperthreading support code with the HTT kernel option.
Hyperthreading support is now off unless the HTT option is added.

MFC-after: 3 days


111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


111638 27-Feb-2003 jhb

Expand some #ifdef's to fix I386_CPU compile.

Reported by: Andy Farkas <andyf@speednet.com.au>


111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


111535 26-Feb-2003 davidxu

Better to not know anything about KSE.


111524 26-Feb-2003 mux

Correctly set BUS_SPACE_MAXSIZE in all the busdma backends.
It was bogusly set to 64 * 1024 or 128 * 1024 because it was
bogusly reused in the BUS_DMAMAP_NSEGS definition.


111493 25-Feb-2003 jake

- Added inlines pmap_is_current, pmap_is_alternate and pmap_set_alternate
for testing and setting the current and alternate address spaces.
- Changed PTDpde and APTDpde to arrays to support multiple page directory
pages.

ponsored by: DARPA, Network Associates Laboratories


111477 25-Feb-2003 davidxu

Remove an unsafe KASSERT.


111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


111428 24-Feb-2003 nyan

The mpbiosreason variable does not used for pc98.


111385 24-Feb-2003 jake

Use the direct mapping of IdlePTD setup in locore for proc0's page directory,
instead of allocating another page of kva and mapping it in again. This was
likely an oversight in revision 1.174 (cut and paste from pmap_pinit).

Discussed with: peter, tegge
Sponsored by: DARPA, Network Associates Laboratories


111382 23-Feb-2003 tegge

Allow machines with one CPU and a valid mp table to boot an SMP kernel.


111372 23-Feb-2003 jake

Previous commit missed a 1 that should be NGPTD, and an NPDEPG that should
be NPDEPTD. Grumble.

Sponsored by: DARPA, Network Associates Laboratories


111363 23-Feb-2003 jake

- Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the
page directory.
- Use these instead of the magic constants 1 or PAGE_SIZE where appropriate.
There are still numerous assumptions that the page directory is exactly
1 page.

Sponsored by: DARPA, Network Associates Laboratories


111299 23-Feb-2003 jake

- Added macros PDESHIFT and PTESHIFT, use these instead of magic constants
in locore.
- Removed the macros PTESIZE and PDESIZE, use sizeof instead in C.

Sponsored by: DARPA, Network Associates Laboratories


111272 22-Feb-2003 alc

The root of the splay tree maintained within the pm_pteobj always refers
to the last accessed pte page. Thus, the pm_ptphint is redundant and can
be removed.


111271 22-Feb-2003 jake

unsigned -> pt_entry_t.

Sponsored by: DARPA, Network Associates Laboratories


111167 20-Feb-2003 peter

Fix fumble in rev 1.525. pmap_kenter()'s second argument is a physical
address, not a page index.

Laughed at by: jake


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


111028 17-Feb-2003 jeff

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


111024 17-Feb-2003 jeff

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


111017 16-Feb-2003 phk

Change "dev_t gdbdev" to "void *gdb_arg", some possible paths for GDB
will not have a dev_t.


111002 16-Feb-2003 phk

Remove #include <sys/dkstat.h>


110955 15-Feb-2003 alc

Assert that the kernel map's system mutex is held in pmap_growkernel().


110845 14-Feb-2003 alc

- Add a mutex for synchronizing the use of CMAP/CADDR 1 and 2.
- Eliminate small style differences between pmap_zero_page(),
pmap_copy_page(), etc.


110781 13-Feb-2003 peter

Oops. I mis-remembered about the P4 problems. It was 5.0-DP2 that
was shipped with DISABLE_PG_G and DISABLE_PSE, not 5.0-REL. *blush*
Disable the code - but still leave it there in case its still lurking.


110780 13-Feb-2003 peter

Turn of PG_PS and PG_G for Pentium-4 cpus at boot time. This is so
that we can stop turning off PG_G and PG_PS globally for releases.


110747 12-Feb-2003 alc

Remove kptobj. Instead, use VM_ALLOC_NOOBJ.


110532 08-Feb-2003 alc

MF alpha
- Synchronize access to the allpmaps list with a mutex.


110480 07-Feb-2003 peter

Commit some cosmetic changes I had laying around and almost included
with another commit. Unwrap a line. Unexpand a pmap_kenter().


110379 05-Feb-2003 phk

This file has no longer any content from the original Berkeley file so
replace the UCB copyright with a FreeBSD 2 clause thing.

Remove some no longer relevant comments.


110370 05-Feb-2003 phk

i386/i386/tsc.c was repo-copied from i386/isa/clock.c.

Remove all the stuff that does not relate to the TSC.

Change the calibration to use DELAY(1000000) rather than trying to check
it against the CMOS RTC, this drastically increases precision:

Using 25 samples on a Athlon 700MHz UP machine I find:

stddev min max average
CMOS 22200 Hz -74980 Hz 34301 Hz 704928721 Hz
DELAY 1805 Hz -1984 Hz 2678 Hz 704937583 Hz

(The difference between the two averages is not statistically significant.)

expressed in PPM of the frequency:
stddev min max
CMOS 31.49 PPM -106.37 PPM 48.66 PPM
DELAY 2.56 PPM 2.81 PPM 3.80 PPM

This code will not be used until a followup commit to sys/isa/clock.c
and sys/pc98/pc98/clock.c which will only happen after some field testing.


110335 04-Feb-2003 harti

Fix a problem in bus_dmamap_load_{mbuf,uio} when the first mbuf or the first
uio segment is empty. In this case no dma segment is create by
bus_dmamap_load_buffer, but the calling routine clears the first flag.
Under certain combinations of addresses of the first and second mbuf/uio
buffer this leads to corrupted DMA segment descriptors. This was already
fixed by tmm in sparc64/sparc64/iommu.c.

PR: kern/47733
Reviewed by: sam
Approved by: jake (mentor)


110299 03-Feb-2003 phk

Split the global timezone structure into two integer fields to
prevent the compiler from optimizing assignments into byte-copy
operations which might make access to the individual fields non-atomic.

Use the individual fields throughout, and don't bother locking them with
Giant: it is no longer needed.

Inspired by: tjr


110296 03-Feb-2003 jake

Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway. This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by: jhb, tmm
Tested on: i386, sparc64


110254 03-Feb-2003 alc

- Make allpmaps static.
- Use atomic subtract to update the global wired pages count. (See
also vm/vm_page.c revision 1.233.)
- Assert that the page queue lock is held in pmap_remove_entry().


110232 02-Feb-2003 alfred

Consolidate MIN/MAX macros into one place (param.h).

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


110039 29-Jan-2003 phk

Make tsc_freq a 64bit quantity.

Inspired by: http://www.theinquirer.net/?article=7481


110030 29-Jan-2003 scottl

Implement bus_dmamem_alloc_size() and bus_dmamem_free_size() as
counterparts to bus_dmamem_alloc() and bus_dmamem_free(). This allows
the caller to specify the size of the allocation instead of it defaulting
to the max_size field of the busdma tag.

This is intended to aid in converting drivers to busdma. Lots of
hardware cannot understand scatter/gather lists, which forces the
driver to copy the i/o buffers to a single contiguous region
before sending it to the hardware. Without these new methods, this
would require a new busdma tag for each operation, or a complex
internal allocator/cache for each driver.

Allocations greater than PAGE_SIZE are rounded up to the next
PAGE_SIZE by contigmalloc(), so this is not suitable for multiple
static allocations that would be better served by a single
fixed-length subdivided allocation.

Reviewed by: jake (sparc64)


109994 28-Jan-2003 jake

Remove BDE_DEBUGGER.

Discussed with: bde


109964 28-Jan-2003 alc

Merge pmap_testbit() and pmap_is_modified(). The latter is the only caller
of the former.


109898 26-Jan-2003 julian

Fix KSE related patch.
Make it compile for the SMP case..
statclock_process() has changed prototypes.


109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


109715 23-Jan-2003 peter

Now that TPR isn't bogusly raised at boot, there is no need to clear
it at context switch.


109700 22-Jan-2003 jhb

- Move enable_sse()'s prototype to machine/md_var.h.
- Sort definition of cpu_* variables appropriately.
- Move cpu_fxsr out of the magic non-BSS set of variables and stick it in
the BSS along with hw_instruction_sse (make the latter static as well).

Submitted by: bde (partially)


109696 22-Jan-2003 jhb

Rename cpuid_cpuinfo to cpu_procinfo. bde requested that I rename this
variable to something in the cpu_* namespace since that's what all the
other cpuid variables were named and cpu_procinfo is what I came up with.

Requested by: bde


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109605 21-Jan-2003 jake

Resolve relative relocations in klds before trying to parse the module's
metadata. This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by: Hartmut Brandt <brandt@fokus.gmd.de>
PR: 46732
Tested on: alpha (obrien), i386, sparc64


109342 16-Jan-2003 dillon

Merge all the various copies of vm_fault_quick() into a single
portable copy.


109340 15-Jan-2003 dillon

Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy. Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).


109027 09-Jan-2003 jhb

Remove earlysetcpuclass() as it has been OBE.

Suggested by: bde


109026 09-Jan-2003 jhb

Rework part of the previous processor name changes so that we read
cpu_exthigh and cpu_brand in printcpuinfo() instead of in identify_cpu().
We also only do it for known-good values of cpu_vendor which is a bit more
conservative.

Reviewed by: bde (mostly)


108961 08-Jan-2003 jhb

Consistently use spaces in between arguments to strcmp(). Whitespace
only.


108948 08-Jan-2003 jhb

- Use cpu_exthigh instead of executing cpuid again to retrieve it for the
print_AMD_foo() functions.
- Add a brand name table for the brand index provided on Intel CPU's in
%ebx after cpuid 1.
- For Intel CPUs, if we don't get a processor name from the extended cpuid
then use the brand index in cpuid_cpuinfo to pick a name from the brand
table and copy that name into cpu_brand.
- Replace the duplicated code to use the extended cpuid to replace
cpu_model with the processor name in the AMD and Transmeta sections of
printcpuinfo() with generic code that replaces cpu_model with
cpu_brand if cpu_brand is not an empty string. We also trim leading
spaces from cpu_brand prior to doing this since at least some processor
names (notably those of Intel CPUs) have leading spaces in the name.
- Give print_AMD_features() its own private regs[] array since
printcpuinfo() doesn't use the one it has anymore.


108947 08-Jan-2003 jhb

- Add a cpu_exthigh variable to hold the highest extended cpuid value
returned from cpuid 0x80000000.
- Add a cpu_brand char array to hold the processor name returned by
cpuid 0x80000002-0x80000004 on AMD, Intel, Transmeta, and possibly
other CPUs.
- Use cpuid to set cpu_exthigh and read the processor name if it is present
in identify_cpu().


108946 08-Jan-2003 jhb

Bah, get the test for more than one logical CPU right so we don't bogusly
claim a CPU has HT support when it lists 0 or 1 logical CPU's per physical
processor.


108914 08-Jan-2003 jhb

Enumerate logical hyperthread CPUs manually if they aren't already listed
in the mptable. The way this works is that we determine if the system
has hyperthreading and how many logical CPU's should be in each physical
CPU by using the information returned by cpuid. During the first pass of
the mptable, we build a bitmask of the APIC IDs of the CPUs listed in the
mptable. We then scan that bitmask to see if the CPUs are already listed
by the mptable, or if there are any APIC IDs already in use that would
conflict with the APIC IDs of the logical CPUs. If that test succeeds,
then we fixup the count of application processors. Later on during the
second pass of the mptable we create fake processor entries for logical
CPUs and add them to the system.

We only need this type of fixup hack when using the mptable to enumerate
CPUs. The ACPI MADT table properly enumerates all logical CPUs.


108913 08-Jan-2003 jhb

If the boot processor supports hyperthreading and contains more than one
logical CPU, display the number of logical CPUs per physical processor
underneath the list of CPU features.


108911 08-Jan-2003 jhb

Add a cpuid_cpuinfo variable to hold the results of %ebx from cpuid with
%eax of 1 and set it in identify_cpu().


108608 03-Jan-2003 jhb

Document bit 31 of the cpuid features word as PBE (Pending Break Enable).


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108517 31-Dec-2002 njl

Return an error when r/w is requested on an unsupported device instead of
looping.

Submitted by: Sean Kelly <smkelly@zombie.org>
Pointed out by: bde


108338 28-Dec-2002 julian

Add code to ddb to allow backtracing an arbitrary thread.
(show thread {address})

Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.

All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().

kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.

Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.


108337 28-Dec-2002 alc

Assert that the page queues lock is held in pmap_testbit().


108253 24-Dec-2002 alc

- Hold the page queues lock around calls to vm_page_wakeup() and
vm_page_flag_clear().


107965 17-Dec-2002 njl

Back out 1.19 to rethink approach

Requested by: julian@


107962 17-Dec-2002 njl

Automatically issue a "continue" along with the "detach" command. This
fixes the problem of cleanly restarting a target after entering gdb mode.

Reviewed by: archie@


107957 16-Dec-2002 julian

Reformat last change
Requested by: nate@


107955 16-Dec-2002 julian

Don't dump core into a partition that is too small for it.
If we do, we usually wrote backwareds into the proceeding partititon
which is usually the root partition.


107867 14-Dec-2002 phk

Only dump the BIOS geometry table from bootinfo on PC98, we don't use
the contents on i386 anymore.


107853 14-Dec-2002 alc

Add page locking to pmap_mincore().

Submitted (in part) by: tjr@


107719 10-Dec-2002 julian

Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by: re (jhb)


107576 04-Dec-2002 phk

Use the correct value when writing the Day Of Week byte in the CMOS.
The correct range is [1...7] with Sunday=1, but we have been writing
[0...6] with Sunday=0.

The Soekris computers flagged the zero, zapped the date, so if you
rebooted your soekris on a sunday, it would come up with a wrong
date.

Bruce has a more extensive rework of this code, but we will stick with
the minimalist fix for now.

Spotted by: Soren Kristensen <soren@soekris.com>
Thanks to: Michael Sierchio <kudzu@tenebras.com>.
Confirmed by: bde
Approved by: re


107538 03-Dec-2002 alc

Avoid recursive acquisition of the page queues lock in pmap_unuse_pt().

Approved by: re


107521 02-Dec-2002 deischen

Align the FPU state in the ucontext and sigcontext to 16 bytes
to accomodate the new SSE/XMM floating point save/restore
instructions.

This commit is mostly from bde and includes some style nits.

Approved by: re (jhb)


107490 02-Dec-2002 alc

Hold the page queues lock when calling pmap_unwire_pte_hold() or
pmap_remove_pte(). Use vm_page_sleep_if_busy() in
_pmap_unwire_pte_hold() so that the page queues lock is released
when sleeping.

Approved by: re (blanket)


107434 01-Dec-2002 alc

Assert that the page queues lock is held in pmap_changebit()
and pmap_ts_referenced().

Approved by: re (blanket)


107410 30-Nov-2002 alc

Assert that the page queues lock is held in pmap_page_exists_quick().

Approved by: re (blanket)


107217 25-Nov-2002 alc

Assert that the page queues lock is held in pmap_remove_pages().

Approved by: re (blanket)


107212 24-Nov-2002 alc

Add page queues locking to vunmapbuf(); reduce differences with respect
to the sparc64 implementation. (Note: With modest effort on the alpha and
ia64 this function could migrate to the MI part of the kernel.)

Approved by: re (blanket)


107184 23-Nov-2002 alc

- Assert that the page queues lock is held in pmap_remove_all().
- Fix a diagnostic message and comment in pmap_remove_all().
- Eliminate excessive white space from pmap_remove_all().

Approved by: re


107180 22-Nov-2002 mux

Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit(). This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit(). We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by: re@
Suggested by: jhb


106977 16-Nov-2002 deischen

Add getcontext, setcontext, and swapcontext as system calls.
Previously these were libc functions but were requested to
be made into system calls for atomicity and to coalesce what
might be two entrances into the kernel (signal mask setting
and floating point trap) into one.

A few style nits and comments from bde are also included.

Tested on alpha by: gallatin


106842 13-Nov-2002 mdodd

Loader tunable 'machdep.disable_mtrrs'.
Sysctl of same name to reflect status.

Submitted by: jhb
Approved by: re (murray)
MFC after: 1 day


106838 13-Nov-2002 alc

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


106753 11-Nov-2002 alc

- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit()
if we're removing write access from the page's PTEs.
- Export pmap_remove_all() on alpha, i386, and ia64. (It's already
exported on sparc64.)


106707 09-Nov-2002 iwasaki

Add a new loader tunable, hw.hasbrokenint12, to indicate that BIOS
has broken int 12H.
If hw.hasbrokenint12="1" in loader environment, kernel never use BIOS
INT 12 call to determine base memory size.
Otherwise, kernel use INT 12 in old behaviour.
This should fix kernel panic problem caused by 1.544 changes.

MFC after: 1 day


106697 09-Nov-2002 des

Print real / avail memory in megabytes rather than kilobytes.


106605 07-Nov-2002 tmm

Move the definitions of the hw.physmem, hw.usermem and hw.availpages
sysctls to MI code; this reduces code duplication and makes all of them
available on sparc64, and the latter two on powerpc.
The semantics by the i386 and pc98 hw.availpages is slightly changed:
previously, holes between ranges of available pages would be included,
while they are excluded now. The new behaviour should be more correct
and brings i386 in line with the other architectures.

Move physmem to vm/vm_init.c, where this variable is used in MI code.


106567 07-Nov-2002 alc

Simplify and optimize pmap_object_init_pt(). More specifically,
take advantage of the fact that the vm object's list of pages is
now ordered to reduce the overhead of finding the desired set of
pages to be mapped. (See revision 1.215 of vm/vm_page.c.)


106542 07-Nov-2002 davidxu

1.Fix smp race between kernel vm86 BIOS calling and userland vm86 mode code,
remove global variable in_vm86call, set vm86 calling flag in PCB flags.

2.Fix vm86 BIOS calling preempted problem by changing vm86_lock mutex type
from MTX_DEF to MTX_SPIN. vm86pcb is not remembered in thread struct,
when the thread calling vm86 BIOS is preempted by interrupt thread,
and later switching back to the thread would cause incorrect context be
loaded into CPU registers, this leads to kernel crash.


106503 06-Nov-2002 jmallett

Remove what was a temporary bogus assignment of bits of siginfo_t, as it does
not look like the prerequisites to fill it in properly will be in the tree
for the upcoming release, but it's mostly done, so there is no need for these
to stay around to remind us.


106443 05-Nov-2002 davidxu

Fix typo. ioport_rid should be irq_rid.


105952 25-Oct-2002 peter

Finish fixing the 5.x FPU code for dealing with signal handlers.

Obtained from: bde


105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


105949 25-Oct-2002 iwasaki

Change method to determine base memory size.
Try INT 15H/E820H first, then fall back to the old compatibility
method (INT 12H).
This is a workaround for newer machines which have broken INT 12H BIOS
service implementation.

Reviewed by: -current ML
MFC after: 3 days


105900 24-Oct-2002 julian

Extract out KSE specific code from machine specific code
so that there is ony one copy of it. Fix that one copy
so that KSEs with no mailbox in a KSE program are not a cause
of page faults (this can legitmatly happen).

Submitted by: (parts) davidxu


105554 20-Oct-2002 phk

Change the definition of the debugging registers to be an array, so
that we can index into it, rather than do pointer gymnastics on a
structure containing 8 elements.

Verified by: MD5 hash on the produced .o files.


105534 20-Oct-2002 phk

Hide inline assembly if lint is defined.


105469 19-Oct-2002 marcel

Add two hooks to signal module load and module unload to MD code.
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.

Validated on: alpha, i386, ia64


105328 17-Oct-2002 iwasaki

1. Fix a comment. Locking _is_ needed (but not done).
2. Update a comment. We now restore much more than RTC updates and
interrupts.
3. Order change. Stop interrupts by writing to RTC_STATUSB,
restore rate bits for the interrupts by writing to RTC_STATUSA,
then enable interrupts again.
This seems to be done perfectly backwards in startrtclock().
Otherwise, the idea for this change was obtained from
startrtclock().
4. Don't stop the clock (RTCB_HALT). We only program some control bits
and don't want to stop the clock.
5. (Not really related.) Add caveats to the comment about timer_restore().
The update is non-atomic since locking is not done.

On locking:
6. rtcin() and writertc() are locked() adequately by splhigh() in RELENG_4,
but this locking is null in -current.
7. Doing things in the correct order in (3) combined with (6) is probably
enough locking for rtcrestore() in RELENG_4. In -current, the
writertc()'s race with rtcintr() unless the BIOS disables RTC interrupts.

Submitted by: bde (including commit message)
MFC after: 1 week


105216 16-Oct-2002 phk

Be consistent about functions being static.

Spotted by: FlexeLint.


104964 12-Oct-2002 jeff

- Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
- Replace direct manipulation of scheduler data with hooks provided by the
new api.
- Remove KSE specific state modifications and single runq assumptions from
kern_switch.c

Reviewed by: -arch


104908 11-Oct-2002 mike

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


104695 09-Oct-2002 julian

Round out the facilty for a 'bound' thread to loan out its KSE
in specific situations. The owner thread must be blocked, and the
borrower can not proceed back to user space with the borrowed KSE.
The borrower will return the KSE on the next context switch where
teh owner wants it back. This removes a lot of possible
race conditions and deadlocks. It is consceivable that the
borrower should inherit the priority of the owner too.
that's another discussion and would be simple to do.

Also, as part of this, the "preallocatd spare thread" is attached to the
thread doing a syscall rather than the KSE. This removes the need to lock
the scheduler when we want to access it, as it's now "at hand".

DDB now shows a lot mor info for threaded proceses though it may need
some optimisation to squeeze it all back into 80 chars again.
(possible JKH project)

Upcalls are now "bound" threads, but "KSE Lending" now means that
other completing syscalls can be completed using that KSE before the upcall
finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is
one of the highest priorities in the KSE system.) The upcall when it happens
will present all the completed syscalls to the KSE for selection.


104513 05-Oct-2002 deischen

Fix building of minimal kernels without npx by rearranging ifdefs.
Also fix some style bugs in surrounding code, and add a comment
about FP state restoral that seems questionable.

Submitted by: bde


104486 04-Oct-2002 sam

New bus_dma interfaces for use by crypto device drivers:

o bus_dmamap_load_mbuf
o bus_dmamap_load_uio

Test on i386. Known to compile on alpha and sparc64, but not tested.
Otherwise untried.


104474 04-Oct-2002 jhb

Fix a bogon in previous commit. bcopy() from the malloc'd memory that we
already copied into, rather than doing the bcopy() from the userland
pointer. "Oops."


104460 04-Oct-2002 deischen

Add another temporary hack to allow running older i386 binaries.
This will be removed when new versions of syscalls sigreturn()
and sigaction() are added (mini is working on this but is in
the middle of a move).

This should fix the problem of cvsupd dying.


104354 02-Oct-2002 scottl

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


104319 01-Oct-2002 phk

The pmap_prefault_pageorder[] array was initialize with wrong values
due to a missing comma.

I have no idea what trouble, if any, this may have caused.

Pointed out by: FlexeLint


104224 30-Sep-2002 jhb

- Give legacy an identify routine that always adds 'legacy0' at an order
of 1 so that it is not probed until after acpi0 is probed and attached.
- In legacy_probe(), return ENXIO if acpi0 is around and alive.
- nexus_attach() is now much simpler and just lets its child drivers do
all the work.


104215 30-Sep-2002 obrien

Turn back on the "SMP: AP CPU #N Launched!" message on normal boots.
Peter's rev 1.189 should fix the lost console on SCSI-based systems due
to this message.


104175 30-Sep-2002 obrien

Only print out the "SMP: AP CPU #N Launched!" message on verbose boots.
The kernel printf() isn't race-free


104174 30-Sep-2002 obrien

Save the FP state in the PCB as that is compatable with releng4 binaries.

This is a band-aid until the KSE pthread committers get back on the ground
and have their machines setup.

Submitted by: eischen


104118 28-Sep-2002 peter

Deal with some SMP races by doing the entire copyin at once rather
than doing the checks piecemeal and then doing a second copyin later.

PR: 38021
Submitted by: davidx (I've tweaked the patch a bit)


104106 28-Sep-2002 peter

Repair range checking for reading the ldt list.

PR: 38016
Submitted by: davidx


104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


103870 23-Sep-2002 alfred

use __packed.


103865 23-Sep-2002 jhb

Update the nexus driver for the addition of the legacy driver:
- nexus no longer has PCI bridges as direct children, so the PCI bus
ivar is no longer used and is removed.
- Don't attach default EISA, ISA, or MCA busses. Instead, if we do not
have an acpi0 device after bus_generic_probe(), add a legacy0 child
device.
- Remove machine/nexusvar.h.


103862 23-Sep-2002 jhb

Add a new legacy(4) device driver for use on machines that do not have
ACPI or for when ACPI support is disabled or not present in the kernel.
Basically, the nexus device is now split into two with some parts
(such as adding default ISA, MCA, and EISA busses if they aren't found
as well as support for PCI bus device ivars) being moved to the legacy
driver.


103778 22-Sep-2002 peter

Create inlines for ltr(sel), lldt(sel), lidt(addr) rather than
functions that have one instruction.


103772 22-Sep-2002 mdodd

- Move the init of %gs and pcb_gs before user_ldt_free().
- Always call load_gs()
- Trim comments.

This addresses some of the issues raised by BDE.


103770 22-Sep-2002 jake

Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c
so that it is MI. Allow nfs_mountroot to return an error if the nfs_diskless
struct is not valid, rather than panicing later on. Call nfs_setup_diskless()
from nfs_mountroot if NFS_ROOT is defined, like bootpc_init(). Removed legacy
root mount support for sparc64, and enabled NFS_ROOT by default.


103755 21-Sep-2002 markm

A good dose of style.9. No functional change.


103753 21-Sep-2002 markm

Code tidy-up. ISOfy, turn a macro into an inline for lint(1) (perhaps
this needs to go to cpufunc.h?), de-register.


103733 21-Sep-2002 phk

Fix a 3 year old oversight: Remove the #ifdef/#endif pair now that there
is nothing between them anymore.

Spotted by: peter.


103706 20-Sep-2002 phk

We need neither <sys/diskslice.h> nor <sys/disklabel.h> here.

Sponsored by: DARPA & NAI Labs.


103703 20-Sep-2002 phk

For reasons now lost in historical fog, the bounds_check_with_label()
function were put in i386/i386/machdep.c from where it has been
cut and pasted to other architectures with only minor corruption.

Disklabel is really a MI format in many ways, at least it certainly
is when you operate on struct disklabel.

Put bounds_check_with_label() back in subr_disklabel.c where it belongs.

Sponsored by: DARPA & NAI Labs.


103682 20-Sep-2002 jhb

fork_trampoline() marks a trap frame.

Submitted by: bde


103681 20-Sep-2002 jhb

Use proper type for a variable used as a DDB symbol.


103680 20-Sep-2002 jhb

Trim includes.

Submitted by: bde


103679 20-Sep-2002 jhb

Various style fixes, including moving db_print_backtrace() out of the
middle of the watchpoint code.

Submitted by: bde


103649 19-Sep-2002 mdodd

This patch enables FreeBSD i686 MTRR support on Intel Pentium
4/XEON processors, which are not currently recognized.

Submitted by: Christian Zander <zander@minion.de>


103646 19-Sep-2002 jhb

Implement db_print_backtrace() if DDB is compiled into the kernel. This
MD function is just a wrapper around db_stack_trace_cmd() that prints out
a backtrace of curthread. Currently, this function is only implemented
on i386 and alpha (and the alpha version isn't quite tested yet, will do
that in a bit). Other changes:

- For i386, fix a bug in the raw frame address case. The eip we extract
from the passed in frame address does not match the frame we received.
Thus, instead of printing a bogus frame with the wrong eip, go ahead
and advance frame down to the same frame as the eip we are using.
- For alpha, attempt to add a way of doing a raw trace for alpha. Instead
of passing a frame address in 'addr', pass in a pointer to a structure
containing PC and KSP and use those to start the backtrace. The alpha
db_print_backtrace() uses asm to read in the current PC and KSP values
into such a request.

Tested on: i386
Requested by: many


103645 19-Sep-2002 mdodd

From Christian Zander:

This patch addresses a bug that can cause a GPF in the kernel - if a
process makes use of i386_set_ldt to install a LDT entry, then loads
a corresponding segment descriptor into %gs, forks, and if the child
execs.

In this scenario, setregs executes user_ldt_free and then determines
how to reset the %gs register:

/* reset %gs as well */
if (pcb == curpcb)
load_gs(_udatasel);
else
pcb->pcb_gs = _udatasel;

This is insufficient in the fork/exec case, since pcb will be equal
to curpcb when the child execs; load_gs will reset %gs to _udatasel
but it doesn't reset pcb->pcb_gs; upon return from the system call,
cpu_switch_load_gs will thus attempt to restore %gs from pcb->pcb_gs
and trigger a GPF since all LDT entries have already been cleared.

The fix is to always reset pcb->pcb_gs to _udatasel.

Submitted by: Christian Zander <zander@minion.de>
Reviewed by: jake


103527 18-Sep-2002 iwasaki

Restore status register A of RTC at resume time.
This should fix the 'too many RTC interrupts and statclock seems
broken after resume' problem.

MFC after: 1 week


103477 17-Sep-2002 sobomax

Don't reference cpu_fxsr unless CPU_ENABLE_SSE is defined. This fixes kernel
in !CPU_ENABLE_SSE case.


103436 17-Sep-2002 peter

Initiate deorbit burn for the i386-only a.out related support. Moves are
under way to move the remnants of the a.out toolchain to ports. As the
comment in src/Makefile said, this stuff is deprecated and one should not
expect this to remain beyond 4.0-REL. It has already lasted WAY beyond
that.

Notable exceptions:
gcc - I have not touched the a.out generation stuff there.
ldd/ldconfig - still have some code to interface with a.out rtld.
old as/ld/etc - I have not removed these yet, pending their move to ports.
some includes - necessary for ldd/ldconfig for now.

Tested on: i386 (extensively), alpha


103409 16-Sep-2002 mini

Add kernel support needed for the KSE-aware libpthread:
- Maintain fpu state across signals.
- Save and restore FPU state properly in ucontext_t's.

Reviewed by: bde, deischen, julian
Approved by: -arch


103407 16-Sep-2002 mini

Add kernel support needed for the KSE-aware libpthread:
- Maintain fpu state across signals.
- Use ucontext_t's to store KSE thread state.
- Synthesize state for the UTS upon each upcall, rather than
saving and copying a trapframe.
- Save and restore FPU state properly in ucontext_t's.

Reviewed by: deischen, julian
Approved by: -arch


103367 15-Sep-2002 julian

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


103346 15-Sep-2002 dwmalone

Some BIOSs are using MTRR values that are only documented under NDA
to control the mapping of things like the ACPI and APM into memory.

The problem is that starting X changes these values, so if something
was using the bits of BIOS mapped into memory (say ACPI or APM),
then next time they access this memory the machine would hang.

This patch refuse to change MTRR values it doesn't understand,
unless a new "force" option is given. This means X doesn't change
them by accident but someone can override that if they really want
to.

PR: 28418
Tested by: Christopher Masto <chris@netmonger.net>,
David Bushong <david@bushong.net>,
Santos <casd@myrealbox.com>
MFC after: 1 week


103081 07-Sep-2002 jmallett

Fill out two fields (si_pid, si_uid) in the siginfo structure handed back
to userland in the signal handler that were not being iflled out before, but
should and can be.

This part of sendsig could be slightly refactored to use an MI interface, or
ideally, *sendsig*() would have an API change to accept a siginfo_t, which
would be filled out by an MI function in the level above sendsig, and said MI
function would make a small call into MD code to fill out the MD parts (some
of which may be bogus, such as the si_addr stuff in some places). This would
eventually make it possible for parts of the kernel sending signals to set up
a siginfo with meaningful information.

Reviewed by: mux
MFC after: 2 weeks


103076 07-Sep-2002 jmallett

Match the more modern ports and comment the filling of POSIX parts of siginfo
with 'Fill in POSIX parts'. (Diff reduction.)


103064 07-Sep-2002 peter

Automatically enable CPU_ENABLE_SSE (detect and enable SSE instructions)
if compiling with I686_CPU as a target. CPU_DISABLE_SSE will prevent
this from happening and will guarantee the code is not compiled in.

I am still not happy with this, but gcc is now generating code that uses
these instructions if you set CPUTYPE to p3/p4 or athlon-4/mp/xp or higher.


103049 07-Sep-2002 peter

Zap the implementations of the i386-aout specific cpu_coredump function.
Most of the non-i386 platforms had rather broken implementations anyway.


102974 05-Sep-2002 jhb

Move some variables to the BSS instead of explicitly zero'ing them. This
also makes all of the PCIbios variable be zero'd, not just the entry field.


102934 04-Sep-2002 phk

Change the support for AMDs ElanSC520 CPU from being a device driver to
be
options CPU_ELAN
(NB: Soekris.com users!)

It is cleaner this way. We still recognize the cpu on the host-pci bridge.


102920 04-Sep-2002 jhb

Use resource_list_print_type() instead of duplicating the code in
nexus_print_resources().


102808 01-Sep-2002 jake

Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.


102667 31-Aug-2002 bde

db_ps.c:
Don't attempt to follow null pointers for zombie processes in db_ps().

Style fix: use explicit an comparison with NULL for all null pointer
checks in db_ps() instead of for half of them.

db_interface.c:
Fixed ddb's handling of traps from with ddb on i386's only.

This was mostly fixed in rev.1.27 (by longjmp()'ing back to the top
level) but was completly broken in rev.1.48 (by not unwinding the new
state (mainly db_active) either before or after the longjmp(). This
mostly never worked for other arches, since rev.1.27 has not been ported
and lower level longjmp()'s only handle traps for memory accesses. All
cases should be handled at a lower level to provided better control and
simplify unwinding of state.

Implementation details: don't pretend to maintain db_active in a nested
way -- ddb cannot be reentered in a nested way. Use db_active instead
of the db_global_jmpbuf_valid flag and longjmp()'s return value for things
related to reentering ddb. [re]entering is still not atomic enough.


102666 31-Aug-2002 peter

Take a shot at fixing up a whole stack of style and other embarresing
unforced errors that Bruce identified. I have not yet addressed all of
his concerns.


102603 30-Aug-2002 ache

Unbreak kernel build by printing Maxmem using %ld instead of old (now changed)
%u


102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


102561 29-Aug-2002 jake

Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.


102543 28-Aug-2002 peter

OK, I have had it with losing my console because the AP's print their "I am
alive!" message right as the scsi probe messages happen. This is a bit
nasty, but it seems to work. At the point that we unlock the AP's, briefly
wait till they are all done while we hold the console on their behalf.


102412 25-Aug-2002 charnier

Replace various spelling with FALLTHROUGH which is lint()able


102399 25-Aug-2002 alc

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


102329 23-Aug-2002 peter

Ok, somebody please shoot me. The asm I wrote for the ranged IPI shootdown
was wrong. It only ever invalidated one page due to me getting the loop
terminator wrong. This explains the DISABLE_PG_G effect on SMP.


102241 21-Aug-2002 archie

Don't use "NULL" when "0" is really meant.


102041 18-Aug-2002 alc

o Simplify the ptphint test in pmap_release_free_page(). In other words,
make it just like the test in _pmap_unwire_pte_hold().


101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101879 14-Aug-2002 jmallett

Document why the has_f00f_bug variable is initialised rather than placed into
the BSS (so that it can be binary-patched).

Inspired by: bde


101770 13-Aug-2002 alc

o Remove an unnecessary vm_page_flash() from _pmap_unwire_pte_hold().

Reviewed by: peter


101751 12-Aug-2002 alc

o Convert three instances of vm_page_sleep_busy() into vm_page_sleep_if_busy()
with page queue locking.


101721 12-Aug-2002 iedowse

Use roundup2() to avoid a problem where pmap_growkernel was unable
to extend the kernel VM to the maximum possible address of 4G-4M.

PR: i386/22441
Submitted by: Bill Carpenter <carp@world.std.com>
Reviewed by: alc


101635 10-Aug-2002 alc

o Remove the setting and clearing of the PG_MAPPED flag. (This flag is
obsolete.)


101352 05-Aug-2002 peter

Revert rev 1.356 and 1.352 (pmap_mapdev hacks). It wasn't worth the
pain.


101322 04-Aug-2002 peter

Fix a mistake in 1.352 - I was returning a pointer to the rounded down
address. I expect this will fix acpica.


101294 04-Aug-2002 alc

o Request a wired page from vm_page_grab() in _pmap_allocpte().


101280 03-Aug-2002 alc

o Ask for a prezeroed page in pmap_pinit() for the page directory page.


101254 03-Aug-2002 alc

o Don't set PG_MAPPED on the page allocated and mapped in _pmap_allocpte().
(Only set this flag if the mapping has a corresponding pv list entry,
which this mapping doesn't.)


101249 03-Aug-2002 peter

Take advantage of the fact that there is a small 1MB direct mapped region
on x86 in between KERNBASE and the kernel load address. pmap_mapdev()
can return pointers to this for devices operating in the isa "hole".


101248 03-Aug-2002 peter

Take a shot at fixing a nasty bug in the pmap changes that I did. I
missed the pmap_kenter/kremove in this file, which leads to read()/write()
of /dev/mem using stale TLB entries. (gah!) Fortunately, mmap of /dev/mem
wasn't affected, so it wasn't as bad as it could have been. This throws
some light on the 'X server affects stability' thread....

Pointed out by: bde


101197 02-Aug-2002 alc

o Lock page queue accesses by vm_page_deactivate().


101105 31-Jul-2002 alc

o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped
by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably
leads to unnecessary pmap_page_protect() calls if one of these pages is
paged out after unwiring.

Note: setting PG_MAPPED asserts that the page's pv list may be
non-empty. Since checking the status of the page's pv list isn't any
harder than checking this flag, the flag should probably be eliminated.
Alternatively, PG_MAPPED could be set by pmap_enter() exclusively
rather than various places throughout the kernel.


101054 31-Jul-2002 phk

The Elan SC520 MMCR is actually 16bit wide, so u_char is inconvenient.


100912 30-Jul-2002 alc

o Lock page queue accesses by pmap_release_free_page().


100862 29-Jul-2002 alc

o Pass VM_ALLOC_WIRED to vm_page_grab() rather than calling vm_page_wire()
in pmap_new_thread(), pmap_pinit(), and vm_proc_new().
o Lock page queue accesses by vm_page_free() in pmap_object_init_pt().


100781 28-Jul-2002 peter

Unwind the syscall_with_err_pushed tweak that jake did some time back.

OK'ed by: jake


100646 24-Jul-2002 julian

Add some locking asserts and some comments


100432 21-Jul-2002 peter

Move SWTCH_OPTIM_STATS related code out of cpufunc.h. (This sort of stat
gathering is not an x86 cpu feature)


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


100378 19-Jul-2002 alc

o Use vm_page_alloc(... | VM_ALLOC_WIRED) in place of vm_page_wire().


100321 18-Jul-2002 phk

Add initialization code for the AMD Elan sc520 which maps the MMCR
into KVM and sets the i8254 frequency to the correct value.


100275 18-Jul-2002 peter

Use pmap_kenter() rather than vtopte() and bashing the page tables
directly.


100264 17-Jul-2002 peter

Avoid trying to set PG_G on the first 4MB when we set up the 4MB page.
This solves the SMP panic for at least one system. I'd still like to know
why my xeon works though.

Tested by: bmilekic


100220 17-Jul-2002 dillon

Qualify comment on machdep.cpu_idle_hlt. Turning this on on a SMP
machine will result in approximately a 4.2% loss of performance (buildworld)
and approximately a 5% reduction in power consumption (when idle). Add XXX
note on how to really make hlt work (send an IPI to wakeup HLTed cpus on
a thread-schedule event? Generate an interrupt somehow?).


100152 15-Jul-2002 peter

The pmap_invalidate_all() here is definately not a good idea. We are
running with interrupts disabled, other cpus locked down, and only
making a temporary local mapping that we immediately back out again.

Tested by: gallatin


99987 14-Jul-2002 alc

o Lock page queue accesses by vm_page_wire().


99932 13-Jul-2002 bde

Quick fix for high resolution kernel profiling on i386's. Use
-finstrument-functions instead of -mprofiler-epilogue. The former
works essentially the same as the latter but has a higher overhead
(about 22 more bytes per function for passing unused args to the
profiling functions).

Removed all traces of the IDENT Makefile variable, which had been
reduced to just a place for holding profiling's contribution to CFLAGS
(the IDENT that gives the kernel identity was renamed to KERN_IDENT).


99931 13-Jul-2002 peter

Two invlpg's slipped through that were not protected from I386_CPU

Pointed out by: dillon


99930 13-Jul-2002 peter

invlpg() does not work too well on i386 cpus. Add token i386 support
back in to the pmap_zero_page* stuff.


99929 13-Jul-2002 peter

Do global shootdowns when switching to/from 4MB pages. I believe we can
do a shootdown on a 4MB "page" though, but this should be safer for now.

Noticed by: tegge


99928 13-Jul-2002 peter

Bandaid for SMP. Changing APTDpde without a global shootdown is not
safe yet. We used to do a global shootdown here anyway so another day
or so shouldn't hurt.


99925 13-Jul-2002 alc

o Lock some page queue accesses, in particular, those by vm_page_unwire().


99900 13-Jul-2002 mini

Add additional cred_free_thread() calls that I had missed the first time.

Pointed out by: jhb


99890 12-Jul-2002 dillon

Re-enable the idle page-zeroing code. Remove all IPIs from the idle
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.

A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.

Two-way SMP buildworld -j 5 tests (second run, after stabilization)
2282.76 real 2515.17 user 704.22 sys before peter's IPI commit
2266.69 real 2467.50 user 633.77 sys after peter's commit
2232.80 real 2468.99 user 615.89 sys after this commit

Reviewed by: peter, jhb
Approved by: peter


99887 12-Jul-2002 jhb

Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.

Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64


99862 12-Jul-2002 peter

Revive backed out pmap related changes from Feb 2002. The highlights are:
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.

Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.

I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.

I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.

New option: DISABLE_PG_G - In case I missed something.


99852 12-Jul-2002 peter

Unexpand a couple of 8-space indents that I added in rev 1.285.


99766 11-Jul-2002 peter

Bah, move the invltlb counter to C code and hook a debug sysctl onto it.


99765 11-Jul-2002 peter

s/NCPU/MAXCPU/ to try and get this to compile.


99746 10-Jul-2002 julian

fix a comment and note a problem with XXXSMP


99742 10-Jul-2002 dillon

Remove the critmode sysctl - the new method for critical_enter/exit (already
the default) is now the only method for i386.

Remove the paraphanalia that supported critmode. Remove td_critnest, clean
up the assembly, and clean up (mostly remove) the old junk from
cpu_critical_enter() and cpu_critical_exit().


99741 10-Jul-2002 obrien

Consistently line-up /**/ comments so they don't cause line wrappage.


99703 10-Jul-2002 julian

Include all of isa/ipl.s into exception.s as there is now nothing left in
ipl.s except doreti which really belongs in with the exceptions as it's
just the other side of the same coin. Will remove ipl.s in a separate commit.

Agreed by: several including bde@freebsd.org


99571 08-Jul-2002 peter

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


99567 08-Jul-2002 peter

s/procrunnable/kserunnable/ in a comment


99561 08-Jul-2002 peter

Fix a hideous TLB bug. pmap_unmapdev neglected to remove the device
mappings from the page tables, which were mapped with PG_G! We could
reuse the page table entry for another mapping (pmap_mapdev) but it
would never have cleared any remaining PG_G TLB entries.


99559 07-Jul-2002 peter

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


99537 07-Jul-2002 mux

One #include <sys/lock.h> is enough.

Submitted by: Olivier Houchard <cognet@ci0.org>


99415 04-Jul-2002 peter

Diff reduction (microoptimization) with another WIP. Move the frame
calculation in get_ptbase() to a little later on.


99398 04-Jul-2002 julian

Don't free pages we never allocated..

My eyes openned by: Matt


99397 04-Jul-2002 julian

Slight restatement of the code and remove some unused variables.


99381 03-Jul-2002 julian

Add comments and slightly rearrange the thread stack assignment code
to try make it less obscure.


99380 03-Jul-2002 julian

Remove vestiges of old code...
These functions are always called on new memory so they can
not already be set up, so don't bother testing for that.
(This was left over from before we used UMA (which is cool))


99095 29-Jun-2002 julian

Fix reverse ordering of locks. add a comment about locks on some platforms.

Submitted by: jhb@freebsd.org


99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


98902 27-Jun-2002 arr

Fix for the problem stated below by Tor Egge:
(from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832566+0+ \
current/freebsd-current)

"Too many pages were prefaulted in pmap_object_init_pt, thus
the wrong physical page was entered in the pmap for the virtual
address where the .dynamic section was supposed to be."

Submitted by: tegge
Approved by: tegge's patches never fail


98892 26-Jun-2002 iedowse

Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.


98824 25-Jun-2002 iedowse

Complete the initial set of VM changes required to support full
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:

o Use a 64-bit type for the vm_object `size' and the size argument
to vm_object_allocate().
o Use the correct type for index variables in dev_pager_getpages(),
vm_object_page_clean() and vm_object_page_remove().
o Avoid an overflow in the i386 pmap_object_init_pt().


98778 24-Jun-2002 peter

Compile in the cpu halt code even on SMP, instead just default the
sysctl (machdep.cpu_idle_hlt) to off in the SMP case. This allows you to
turn it on if you wish and do not particularly care about the small window
where a cpu will remain halted even when a job is placed on the run queue
(until the next clock tick).


98765 24-Jun-2002 jake

Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.


98728 24-Jun-2002 mini

userout -> out. These two labels are now identical.

Approved by: alfred


98727 24-Jun-2002 mini

Remove unused diagnostic function cread_free_thread().

Approved by: alfred


98618 22-Jun-2002 mp

Clock frequencies reported by sysctl should be unsigned values. Discovered
when machdep.tsc_freq returned a negative number on a 2.2GHz Xeon.

Submitted by: Brian Harrison <bharrison@ironport.com>
Reviewed by: phk
MFC after: 1 week


98480 20-Jun-2002 peter

Deorbit suibyte(). It was only used for split address space systems
for supporting UIO_USERISPACE (ie: it wasn't used).


98361 17-Jun-2002 jeff

- Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.


98145 12-Jun-2002 bde

If trap() is called when ddb is active, then go directly to trap_fatal();
do not blunder around enabling interrupts and running trap handlers.
trap_pfault() will normally pass control to ddb's fault handler which
will normally do the right thing.

This bug is very old. but in old versions of FreeBSD it is probably only
serious for trap handling that involves sleeping. In -current, attempting
to examine unmapped memory while stopped at a breakpoint at mi_switch()
was always fatal.


98001 07-Jun-2002 jhb

- Fixup / remove obsolete comments.
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.


97307 26-May-2002 dfr

Add declarations of suword32 and suword64. Add implementations of one or
the other (or both) to all the platforms. Similar for fuword32 and
fuword64.


96527 13-May-2002 bde

Fixed a semantic error. va_arg(ap, u_short) is nonsense except on i386's
with 16-bit ints, since u_short is promoted when it is passed to a
varargs function. gcc now warns about this. We always pass small
integers (this is well obuscated), so there are no conversion problems.

Fixed a related style bug (bogus cast).


96517 13-May-2002 bde

Fixed a syntax error (a label not followed by a statement).


96035 04-May-2002 fenner

Restore the ability interrupt dumps on i386, based on
the old kern_shutdown.c . Other archs might be able to
use similar code but I don't have anything to test on.


95814 30-Apr-2002 phk

Don't export timecounter structures under debug. with sysctl, they
contain no truly interesting data anymore.


95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


95579 27-Apr-2002 alc

For what it's worth, fix the compilation of an I386_CPU-only kernel
now that certain warnings are fatal.


95571 27-Apr-2002 alc

Don't call vm_map_growstack() from trapwrite() as vm_fault() now performs
this automatically.


95489 26-Apr-2002 phk

Remove the tc_update() function. Any frequency change to the
timecounter will be used starting at the next second, which is
good enough for sysctl purposes. If better adjustment is needed
the NTP PLL should be used.


95410 25-Apr-2002 marcel

Don't use the symbol name to lookup the symbol value when we can use
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.

Don't pass the symbol name to elf_reloc(), as it isn't used any more.


95320 23-Apr-2002 phk

Don't free(9) a pointer which has been modified.

Chapeau de pointe: mux


95076 19-Apr-2002 alfred

Clean up:

Comment run_filter() to explain what it does.

Remove chatty comments.

void busdma_swi() { } -> void busdma_swi(void) { }


94977 18-Apr-2002 alc

o Call vm_map_growstack() from vm_fault() if vm_map_lookup() has failed
due to conditions that suggest the possible need for stack growth.
This has two beneficial effects: (1) we can
now remove calls to vm_map_growstack() from the MD trap handlers and (2)
simple page faults are faster because we no longer unnecessarily perform
vm_map_growstack() on every page fault.
o Remove vm_map_growstack() from the i386's trap_pfault().
o Remove the acquisition and release of Giant from i386's trap_pfault().
(vm_fault() still acquires it.)


94967 17-Apr-2002 tegge

Fix typo in adjusted panic message.

Submitted by: cokane


94962 17-Apr-2002 tegge

Update io_apic_ints array properly when revoking an irq mapping.
Adjust panic message.

Submitted by: David Xu <bsddiy@yahoo.com>


94936 17-Apr-2002 mux

Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by: peter


94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


94683 14-Apr-2002 dwmalone

Make the MTRR code a bit more defensive - this should help people
trying to run X on some Athlon systems where the BIOS does odd things
(mines an ASUS A7A266, but it seems to also help on other systems).

Here's a description of the problem and my fix:

The problem with the old MTRR code is that it only expects
to find documented values in the bytes of MTRR registers.
To convert the MTRR byte into a FreeBSD "Memory Range Type"
(mrt) it uses the byte value and looks it up in an array.
If the value is not in range then the mrt value ends up
containing random junk.

This isn't an immediate problem. The mrt value is only used
later when rewriting the MTRR registers. When we finally
go to write a value back again, the function i686_mtrrtype()
searches for the junk value and returns -1 when it fails
to find it. This is converted to a byte (0xff) and written
back to the register, causing a GPF as 0xff is an illegal
value for a MTRR byte.

To work around this problem I've added a new mrt flag
MDF_UNKNOWN. We set this when we read a MTRR byte which
we do not understand. If we try to convert a MDF_UNKNOWN
back into a MTRR value, then the new function, i686_mrt2mtrr,
just returns the old value of the MTRR byte. This leaves
the memory range type unchanged.

I'd like to merge this before the 4.6 code freeze, so if people
can test this with XFree 4 that would be very useful.

PR: 28418, 25958
Tested by: jkh, Christopher Masto <chris@netmonger.net>
MFC after: 2 weeks


94383 10-Apr-2002 alc

o In osigreturn(), restore all of the registers in one place.
o Recent changes to osigreturn() and sigreturn() have made them MPSAFE. Add
a comment to this effect.

Submitted by: bde (bullet #1)
Reviewed by: jhb (bullet #2)


94275 09-Apr-2002 phk

GC various bits and pieces of USERCONFIG from all over the place.


94197 08-Apr-2002 bde

Removed ispc98 sysctl completely. Applications should understand that
ispc98 isn't set if its sysctl doesn't exist. At least make(1) already
understands this.

Approved by: nyan


94151 07-Apr-2002 phk

GC the "dumplo" variable, which is no longer used.

A lot of sys/*/*/machdep.c seems not to be.


93944 06-Apr-2002 nyan

Remove pc98 code.


93823 04-Apr-2002 dillon

Embed a struct vmmeter in the per-cpu structure and add a macro,
PCPU_LAZY_INC() which increments elements in it for cases where we
can afford the occassional inaccuracy. Use of per-cpu stats counters
avoids significant cache stalls in various critical paths that would
otherwise severely limit our cpu scaleability.

Adjust all sysctl's accessing cnt.* elements to now use a procedure
which aggregates the requested field for all cpus and for the global
vmmeter.

The global vmmeter is retained, since some stats counters, like v_free_min,
cannot be made per-cpu. Also, this allows us to convert counters from
the global vmmeter to the per-cpu vmmeter in a piecemeal fashion, so
have at it!


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93794 04-Apr-2002 brian

Back out the previous commit.

In the i386 case, options BOOTP requires options NFS_ROOT as well as
options NFSCLIENT. With *both* the NFS options, a bootpc_init()
prototype is brought in by nfsclient/nfsdiskless.h.

In the ia64 case, it just doesn't work and my change just pushes it
further away from working.

Suggested to be wrong by: bde


93793 04-Apr-2002 bde

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


93785 04-Apr-2002 brian

Pre-declare bootpc_init() so that options BOOTP doesn't break the
build in ia64 and i386 due to -Werror.


93717 03-Apr-2002 marcel

Make the kernel dump header endianness invariant by always dumping
in dump byte order (=network byte order). Swap blocksize and dumptime
to avoid extraneous padding on 64-bit architectures. Use CTASSERT
instead of runtime checks to make sure the header is 512 bytes large.
Various style(9) fixes.

Reviewed by: phk, bde, mike


93702 02-Apr-2002 jhb

- Move the MI mutexes sched_lock and Giant from being declared in the
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.

Tested on: alpha, i386


93607 01-Apr-2002 dillon

Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by: jake


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93496 31-Mar-2002 phk

Here follows the new kernel dumping infrastructure.

Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by: DARPA, NAI Labs


93467 31-Mar-2002 phk

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


93461 31-Mar-2002 alc

Implement i386's (o)sigreturn() like the alpha's: Use copyin() to read
the osigcontext or ucontext_t rather than useracc() followed by direct user-
space memory accesses. This reduces (o)sigreturn()'s execution time by 5-
50%.

Submitted by: bde


93334 28-Mar-2002 nyan

Remove unneeded pc98 hack.


93312 28-Mar-2002 obrien

style(9)

Approved by: jake


93273 27-Mar-2002 jeff

Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag. Remove the dup_list and dup_ok code from subr_witness. Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface. The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by: jhb


93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


93024 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


93017 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


93005 23-Mar-2002 takawata

Add bios area range check (lower side).


92890 21-Mar-2002 alc

o Use the MI vm_map_growstack() instead of grow_stack() in trap_pfault()
and trapwrite().
o On i386/pc98, remove the (now) unused grow_stack().


92860 21-Mar-2002 imp

Fix abuses of cpu_critical_{enter,exit} by converting to
intr_{disable,restore} as well as providing an implemenation of
intr_{disable,restore}.

Reviewed by: jake, rwatson, jhb


92846 21-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


92824 20-Mar-2002 jhb

Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount. Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case. In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place. This also removse td_cache_ucred from struct thread. This is
still only done #ifdef DIAGNOSTIC.

Tested on: i386, alpha


92770 20-Mar-2002 alfred

Remove __P.


92765 20-Mar-2002 alfred

Remove __P.


92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


92548 18-Mar-2002 alc

Eliminate grow_stack() from (o)sendsig(). If the stack needs to grow,
copyout() will page fault and perform grow_stack() from trap_pfault().
These calls to grow_stack() accomplish nothing.

Reviewed by: bde


92470 17-Mar-2002 alc

o Stop calling useracc() in (o)sendsig() now that we use copyout()
to copy the sigframe to the user's stack. Useracc() takes a non-trivial
amount of time. Eliminating it speeds up signal delivery by 15% or more.
o Update some comments.

Submitted by: bde


92018 10-Mar-2002 luigi

Export a (machine dependent) kernel variable bootdev as
machdep.guessed_bootdev, and add code to sysctl to parse its value
and give a (not necessarily correct) name to the device we booted
from (the main motivation for this code is to use the info in the
PicoBSD boot scripts, and the impact on the kernel is minimal).

NOTE: the information available in bootdev is not always reliable,
so you should not trust it too much. The parsing code is the same
as in boot2.c, and cannot cover all cases -- as it is, it seems to
work fine with floppies and IDE disks recognised by the BIOS. It
_should_ work as well with SCSI disks recognised by the BIOS.
Booting from a CDROM in floppy emulation will return /dev/fd0 (because
this is what the BIOS tells us).
Booting off the network (e.g. with etherboot) leaves bootdev unset so
the value will be printed as "invalid (0xffffffff)".

Finally, this feature might go away at some point, hopefully when we
have a more reliable way to get the same information.

MFC-after: 5 days


91978 10-Mar-2002 alc

Condition the compilation of trapwrite() on I386_CPU.


91893 08-Mar-2002 phk

#include <machine/smp.h> in the SMP case.
don't include <sys/smp.h> at all.

Fallout from: probably something jake did.
Hint by: jhb


91778 07-Mar-2002 jake

Add needed includes of machine/smp.h, remove nested include in sys/smp.h
so that inlines in machine/smp.h can use variables declared in sys/smp.h.


91673 05-Mar-2002 jeff

Add a new variable mp_maxid. This is used so that per cpu datastructures may
be allocated as arrays indexed by the cpu id. Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.

Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up. This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.

Reviewed by: jake
Approved by: jake


91640 04-Mar-2002 iwasaki

Add generalized power profile code.
This makes other power-management system (APM for now) to be able to
generate power profile change events (ie. AC-line status changes), and
other kernel components, not only the ACPI components, can be notified
the events.

- move subroutines in acpi_powerprofile.c (removed) to kern/subr_power.c
- call power_profile_set_state() also from APM driver when AC-line
status changes
- add call-back function for Crusoe LongRun controlling on power
profile changes for a example


91504 28-Feb-2002 arr

- Move a comment from being on the same line as a #ifdef to the line
following it. This should have gone in the previous commit, but
misviewed Bruce's patch.

Requested by: bde


91473 28-Feb-2002 arr

- trap -> trap() in panic() string.
- Translate the message into some sort of understandable english.
- Fix a couple near-by style nits.

Submitted by: bde


91471 28-Feb-2002 silby

Fix a minor swap leak.

Previously, the UPAGES/KSTACK area of processes/threads would leak memory
at the time that a previously swapped process was terminated. Lukcily, the
leak was only 12K/proc, so it was unlikely to be a major problem unless you
had an undersized swap partition.

Submitted by: dillon
Reviewed by: silby
MFC after: 1 week


91460 28-Feb-2002 peter

Fix warnings.. bootpc_init() and related.


91422 27-Feb-2002 arr

- Insert a space in the panic() string in order more clearly show the
message.


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


91403 27-Feb-2002 silby

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


91368 27-Feb-2002 peter

Re-fix a pointer/integer warning.


91367 27-Feb-2002 peter

Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.


91358 27-Feb-2002 peter

Bandaid for the Uniprocessor kernel exploding. This makes a UP kernel
boot and run (and indeed I am committing from it) instead of exploding
during the int 0x15 call from inside the atkbd driver to get the keyboard
repeat rates.


91353 27-Feb-2002 alfred

clarify panic message


91344 27-Feb-2002 peter

Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places. Do the same for i386. This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead. This will help for PAE down the road.

Obtained from: jake (MI code, suggestions for MD part)


91341 27-Feb-2002 dillon

didn't quite undo the last reversion. This gets it.


91329 26-Feb-2002 dillon

revert compatibility fix temporarily (thought it would not break anything
leaving it in).


91328 26-Feb-2002 dillon

revert last commit temporarily due to whining on the lists.


91322 26-Feb-2002 dillon

Make peter's commit compatible with interrupt-enabled critical_enter()
and exit(), which has already solved the problem in regards to deadlocked
IPI's.


91315 26-Feb-2002 dillon

STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.

Other architectures are unaffected.

* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.

The new code places the burden of entering the critical section
in the trampoline code where it belongs.

* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.

* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.

* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.

* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.

* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.

* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.

Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.

* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.

Performance issues:

One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.

The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.

The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).

The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().

Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.


91262 26-Feb-2002 peter

Fix a warning. useracc() should take a const pointer argument.


91260 25-Feb-2002 peter

Work-in-progress commit syncing up pmap cleanups that I have been working
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.

This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.


91250 25-Feb-2002 peter

Tidy up some warnings


91090 22-Feb-2002 julian

Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by: dillon@freebsd.org


91066 22-Feb-2002 phk

Convert p->p_runtime and PCPU(switchtime) to bintime format.


90998 20-Feb-2002 peter

Pass me the pointy hat please. Be sure to return a value in a non-void
function. I've been running with this buried in the mountains of compiler
output for about a month on my desktop.


90960 20-Feb-2002 cjc

Fix typos in some comments.

PR: i386/35114
Submitted by: Gavin Atkinson <gavin.atkinson@ury.york.ac.uk>


90947 20-Feb-2002 peter

Some more tidy-up of stray "unsigned" variables instead of p[dt]_entry_t
etc.


90776 17-Feb-2002 deischen

Use struct __ucontext in prototypes and associated functions instead of
ucontext_t. Forward declare struct __ucontext in <sys/signal.h> and
remove reliance on <sys/ucontext.h> being included.

While I'm here, also hide osigcontext types from userland; suggested
by bde.

Namespace pollution noticed by: Kevin Day <toasty@shell.dragondata.com>


90762 17-Feb-2002 nyan

- Split the routine to initialize a bus_space_handle into the separate
function.
- Only access a bus_space_handle if the resource type is SYS_RES_MEMORY or
SYS_RES_IOPORT.
- Add the bus_space_subregion supports.


90748 17-Feb-2002 julian

If the credential on an incoming thread is correct, don't bother
reaquiring it. In the same vein, don't bother dropping the thread cred
when goinf ot userland. We are guaranteed to nned it when we come back,
(which we are guaranteed to do).

Reviewed by: jhb@freebsd.org, bde@freebsd.org (slightly different version)


90718 16-Feb-2002 bde

Don't leave garbage in parts of fpregs in the fxsr case. All callers
(procfs and ptrace) supply kernel stack garbage, so kernel context was
leaked to userland.

Reviewed by: des


90633 13-Feb-2002 bde

Don't confuse a struct with its first member. This fixes:
./@/i386/i386/machdep.c: In function `init386':
./@/i386/i386/machdep.c:1700: warning: assignment from incompatible pointer type


90590 12-Feb-2002 dwmalone

Add an option CPU_ATHLON_SSE_HACK which attempts to enable the SSE
feature bit on newer Athlon CPUs if the BIOS has forgotten to enable
it.

This patch was constructed using some info made available by John
Clemens at http://www.deater.net/john/PavilionN5430.html

Reviewed by: -audit
MFC after: 3 weeks


90589 12-Feb-2002 dwmalone

Move do_cpuid() from a identcpu.c into cpufunc.h.


90562 12-Feb-2002 alc

Remove an unused (but initialized) variable from vmapbuf().


90515 11-Feb-2002 bde

Garbage-collect the "LOCORE" version of MPLOCKED.


90468 10-Feb-2002 kato

Cosmetic changes:
- Collected i486 identification codes in one place like
586 and 686.
- Merged two cases (0x470 and 0x490) for `Enhanced Am486DX4
Write-Back.'
- Replaced `unknown' into `Unknown'.

Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)


90425 09-Feb-2002 kato

Recognize VIA C3 Samuel 2.

MFC after: 3 days


90372 07-Feb-2002 peter

Attempt to patch up some style bugs introduced in the previous commit


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


90344 07-Feb-2002 phk

GC the PC_SWITCH* symbols which are not used in assembly anymore.


90132 03-Feb-2002 bde

Use osigreturn(2) instead of sigreturn(2) plus broken magic for returning
from old signal handlers. This is simpler and faster, and fixes (new)
sigreturn(2) when %eip in the new signal context happens to match the
magic value (0x1d516). 0x1d516 is below the default ELF text section,
so this probably never broken anything in practice.

locore.s:
In addition, don't build the signal trampoline for old signal handlers
when it is not used.

alpha:
Not fixed, but seems to be even less broken in practice due to more
advanced magic. A false match occurs for register #32 in mc_regs[].
Since there is no hardware register #32, a false match is only possible
for direct calls to sigreturn(2) that happen to have the magic number
in the spare mc_regs[32] field.


90128 03-Feb-2002 bde

Improve the change in the previous commit: use a stub for osigreturn()
when it is not really used instead of unconditionalizing all of it.


90065 01-Feb-2002 bde

Compile osigreturn() unconditionally since it will always be needed on
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.

ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.

powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.

sparc64: copy the cleaned up the ia64 stub, since there was no stub before.


89990 30-Jan-2002 bde

Backed out the main part of revs.1.14-16. Don't disable interrupts in
the packet transfer routines, since rev.1.468 of machdep.c does this
better. I'm surprised that disabling interrupts helped much. Disabling
them in the packet receive routine is too late.

Fixed some minor style bugs in rev.1.14.


89989 30-Jan-2002 bde

Backed out the last vestiges of rev.1.51. Don't enter a critical
region in Debugger(), since rev.1.468 of machdep.c does this better.
Other cosmetic backouts.


89988 30-Jan-2002 bde

Cleaned up the 0ldSiG magic check before removing it. Just use fuword()
to fetch the magic word instead of useracc() plus a direct access.
This is more efficient as well as simpler and less incorrect:
- it was inefficent because useracc() takes much longer than just
accessing the data using a correct access method, at least on i386's.
- it was incorrect because direct access is incorrect unless the address
has been mapped. This and nearby direct accesses are mostly handled
better for other arches because they have to be (direct accesses don't
work).
- using magic in sigreturn is still fundamentally broken because false
matches are possible. On i386's, a false match occurs when %eip in a
new signal context happens to equal the magic value. This is not
handled better for other arches.


89980 30-Jan-2002 bde

Don't include <isa/isavar.h> or compile code depending on it when isa
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.

This was fixed in 1999 but was broken by unconditionalizing PNPBIOS.


89631 22-Jan-2002 peter

List bit 18 (reserved, apparently present on thunderbird cpus)
and bit 19 (athlon XP/MP rev 0x662 and later) for amd_features.

Submitted by: dwcjr


89489 18-Jan-2002 peter

Avoid __func__ string concatenation


89466 17-Jan-2002 bde

Changed the type of pcb_flags from u_char to u_int and adjusted things.
This removes the only atomic operation on a char type in the entire
kernel.


89412 16-Jan-2002 peter

Change <b28> to HTT (Hyperthreading technology). If this flag is set then
cpuid with %eax=1 will return a logical cpu count in bits 16-23 of %ebx.
Bit 29 is actually 'TM' according to AP-485. This signifies the presence
of the thermal control circuit (which I believe can slow the clock down
to reduce core temperature).


89410 16-Jan-2002 peter

Ensure that we set all the %cr0 bits to a known state for the AP's before
they make it through to userland. This should fix the p5-smp problem
without affecting the other cpus (eg: cyrix, see initcpu.c and the special
cache handling for these cpu types).


89195 10-Jan-2002 bde

Clear the single-step flag for signal handlers. This fixes bogus trace
traps on the first instruction of signal handlers.

In trap.c:syscall(), fake a trace trap if the single-step flag was set
on entry to the kernel, not if it will be set on exit from the kernel.
This fixes bogus trace traps after the last instruction of signal handlers.

gdb-4.18 (the version in FreeBSD) still has problems with the program in
the PR. These seem to be due to bugs in gdb and not in FreeBSD, and are
fixed in gdb-5.1 (the distribution version).

PR: 33262
Tested by: k Macy <kip_macy@yahoo.com>
MFC after: 1 day


89175 10-Jan-2002 deischen

Use a spare slot in the machine context for a flags word to indicate
whether the machine context is valid and whether the FPU state is
valid (saved).

Mark the machine context valid before copying it out when sending a
signal.

Approved by: -arch


88903 05-Jan-2002 peter

Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).


88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


88838 03-Jan-2002 peter

Allow a specific setting for pv entries. This avoids the need to guess
(or calculate by hand) the effect of interactions between shpgperproc,
physical ram size, maxproc, maxdsiz, etc.


88744 31-Dec-2001 dillon

Grrr. The tlb code is strewn over 3 files and I misread it. Revert
the last change (it was a NOP), and remove the XXX comments that no longer
apply.


88742 31-Dec-2001 dillon

You know those 'XXX what about SMP' comments in pmap_kenter()? Well,
they were right. Fix both kenter() and kremove() for SMP by ensuring that
the tlb is flushed on other cpu's. This will directly solve random-corruption
panic issues in -stable when it is MFC'd. Better to be safe then sorry, we
can optimize this later.

Original Suspicion by: peter
Maybe MFC: immediately on re's permission


88719 30-Dec-2001 phk

GC an alternate trap_pfault() which has rotted away behind an "#ifdef notyet"
since 21-Mar-95 .


88322 20-Dec-2001 jhb

Introduce a standard name for the lock protecting an interrupt controller
and it's associated state variables: icu_lock with the name "icu". This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP. This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.


88245 20-Dec-2001 peter

Replace a bunch of:
for (pv = TAILQ_FIRST(&m->md.pv_list);
pv;
pv = TAILQ_NEXT(pv, pv_list)) {
with:
TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {


88240 20-Dec-2001 peter

Fix some whitespace nits, and a minor error that I made in some unused
#ifdef DEBUG code (VM_MAXUSER_ADDRESS vs UPT_MAX_ADDRESS).


88146 18-Dec-2001 julian

In a couple of places, we recalculated addresses we already had in local
pointer variables.


88088 18-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


88085 17-Dec-2001 jhb

Small cleanups to the SMP code:
- Axe inlvtlb_ok as it was completely redundant with smp_active.
- Remove references to non-existent variable and non-existent file
in i386/include/smp.h.
- Don't perform initializations local to each CPU while holding the
ap boot lock on i386 while an AP bootstraps itself.
- Reorganize the AP startup code some to unify the latter half of the
functions to bring an AP up. Eventually this might be broken out into
a MI function in subr_smp.c.


87902 14-Dec-2001 luigi

Device Polling code for -current.

Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

options DEVICE_POLLING

and at runtime enable polling with

sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications


87721 12-Dec-2001 jhb

Axe an unneeded PCPU_SET(spinlocks, NULL) that I missed earlier.


87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


87637 11-Dec-2001 peter

Delete some leftover code from a bygone age. We dont have an array of
IdlePTDS anymore and dont to the PTD[MPPTDI] swapping etc.


87620 10-Dec-2001 guido

Add new boot flag to i386 boot: -p.
This flag adds a pausing utility. When ran with -p, during the kernel
probing phase, the kernel will pause after each line of output.
This pausing can be ended with the '.' key, and is automatically
suspended when entering ddb.

This flag comes in handy at systems without a serial port that either hang
during booting or reser.
Reviewed by: (partly by jlemon)
MFC after: 1 week


87546 09-Dec-2001 dillon

Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after: 1 week


87122 30-Nov-2001 peter

cpuid bit 30 is 'IA64', for when you're running in i386 mode on an ia64
cpu. (This is for either userland apps running in i386 mode on an ia64
OS, or when the cpu is in i386 legacy mode running an i386 OS).


86486 17-Nov-2001 peter

Fix the non-KSTACK_GUARD case.. It has been broken since the KSE
commit. ptek was not been initialized.


86485 17-Nov-2001 peter

Start bringing i386/pmap.c into line with cleanups that were done to
alpha pmap. In particular -
- pd_entry_t and pt_entry_t are now u_int32_t instead of a pointer.
This is to enable cleaner PAE and x86-64 support down the track sor
that we can change the pd_entry_t/pt_entry_t types to 64 bit entities.
- Terminate "unsigned *ptep, pte" with extreme prejudice and use the
correct pt_entry_t/pd_entry_t types.
- Various other cosmetic changes to match cleanups elsewhere.
- This eliminates a boatload of casts.
- use VM_MAXUSER_ADDRESS in place of UPT_MIN_ADDRESS in a couple of places
where we're testing user address space limits. Assuming the page tables
start directly after the end of user space is not a safe assumption.
There is still more to go.


86443 16-Nov-2001 peter

Oops, I accidently merged a whitespace error from the original commit.
(whitespace at end of line in rev 1.264 pmap.c). Fix them all.


86439 16-Nov-2001 peter

Converge/fix some debug code (#if 0'ed on alpha, but whatever)
- use NPTEPG/NPDEPG instead of magic 1024 (important for PAE)
- use pt_entry_t instead of unsigned (important for PAE)
- use vm_offset_t instead of unsigned for va's (important for x86-64)


86408 15-Nov-2001 jhb

- Don't enable interrupts in trap() if we trapped while holding a spin
lock as this usually makes the problem worse.
- If we get a page fault while holding a spin lock, treat it as a fatal
trap and don't even bother calling into the VM since calling into the
VM will panic when trying to lock Giant before we can get a useful
message anyways.


85835 01-Nov-2001 iwasaki

Some fix for the recent apm module changes.
- Now that apm loadable module can inform its existence to other kernel
components (e.g. i386/isa/clock.c:startrtclock()'s TCS hack).
- Exchange priority of SI_SUB_CPU and SI_SUB_KLD for above purpose.
- Add simple arbitration mechanism for APM vs. ACPI. This prevents
the kernel enables both of them.
- Remove obsolete `#ifdef DEV_APM' related code.
- Add abstracted interface for Powermanagement operations. Public apm(4)
functions, such as apm_suspend(), should be replaced new interfaces.
Currently only power_pm_suspend (successor of apm_suspend) is implemented.

Reviewed by: peter, arch@ and audit@


85806 01-Nov-2001 peter

Skip PG_UNMANAGED pages when we're shooting everything down to try and
reclaim pv_entries. PG_UNMANAGED pages dont have pv_entries to reclaim.

Reported by: David Xu <davidx@viasoft.com.cn>


85793 31-Oct-2001 mjacob

Remove previous revision. smp_started back in subr_smp where it belongs.


85788 31-Oct-2001 mjacob

Make the actual volatile int smp_started live *somewhere*. This is
a temporary fix so that we can compile kernels. I waited 30 minutes
for a response from the person who would likely know, but any longer
is too long to wait with breakage at ToT.


85762 31-Oct-2001 dillon

Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise(). It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.


85761 31-Oct-2001 msmith

Don't try to probe the PnP BIOS if ACPI is active.


85711 30-Oct-2001 jhb

Fix a typo in comment and #ifdef fixes: GRAP_PRIO -> GRAB_PRIO so that
x86 SMP kernels actually boot again to single user mode.

Pointy hat to: jhb
Noticed by: jlemon


85695 29-Oct-2001 bde

Don't set CR0_NE in cpu_setregs() for the SMP case, since setting it
is npx.c's job and setting it here breaks the edit-time option of not
setting it in npx.c. (It is not set in the right places for the SMP
case, but always setting it here is harmless because there isn't even
an edit-time option to not set it.)


85627 28-Oct-2001 jhb

- More whitespace and comment cleanups.
- Remove unused sw1a label. A breakpoint can be set in choosethread() for
the same effect.

Reviewed by: bde
Submitted by: bde (partly)


85525 26-Oct-2001 jhb

Add a per-thread ucred reference for syscalls and synchronous traps from
userland. The per thread ucred reference is immutable and thus needs no
locks to be read. However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on: x86 (SMP), alpha, sparc64


85491 25-Oct-2001 jhb

Currently no code does a CROSSJUMP() to sw1a, so we don't need a
CROSSJUMPTARGET() for it.

Submitted by: bde


85490 25-Oct-2001 jhb

Use %ecx instead of %ebx for the scratch register while updating %dr7 since
%ecx isn't a call safe register and thus we don't have to save and restore
it.

Submitted by: bde


85489 25-Oct-2001 jhb

- Fix typo in comment from previous revision.
- Fix a bug in the LDT changes where the wrong argument was passed to
set_user_ldt() from cpu_switch(). The bug was passing a pointer to the
ldt, but set_user_ldt() takes a pointer to the process' mdproc structure.

Submitted by: bde


85487 25-Oct-2001 jhb

Whitespace, comment, and string fixes.

Submitted by: bde (mostly)


85449 25-Oct-2001 jhb

Split the per-process Local Descriptor Table out of the PCB and into
struct mdproc.

Submitted by: Andrew R. Reiter <arr@watson.org>
Silence on: -current


85419 24-Oct-2001 jhb

- Clean up the comments slightly here to make them more readable.
- Set the type and trapframe number for the F00F workaround since type
can be used later by sv_transtrap(). Debuggers might also want to look
at the type in the trapframe.

Submitted by: bde (mostly)


85384 23-Oct-2001 jhb

Set the code and signal for the F00F hack fault directly instead of
changing the code in the trapframe and looping back to the top of trap
again.

Tested by: cjc


85373 23-Oct-2001 jlemon

Implement multiple low-level console support.


85271 21-Oct-2001 bde

MFi386:
- sys/pc98/pc98/npx.c 1.87 (2001/09/15; author: imp)
I don't think pc98 has acpi at all, so ifdef the acpi attachments for
now.

This completes merging sys/pc98/pc98/npx.c into sys/i386/isa/npx.c so
that the former can be removed.


85270 21-Oct-2001 bde

MFpc98: fundamental differences. The magic numbers for the i/o port
and the irq are different for pc98, and are not very well handled (we
use a historical mess of hard-coded values, values from header files
and values from hints).


85268 21-Oct-2001 bde

MFpc98: all changes in sys/pc98/pc98/npx.c related to FPU_ERROR_BROKEN.

- 1.58 (2000/09/01; author: kato)
Fixed FPU_ERROR_BROKEN code. It had old-isa code.
- 1.33 (1998/03/09; author: kato)
Make FPU_ERROR_BROKEN a new-style option.
- 1.7 (1996/10/09; author: asami)
Make sure FPU is recognized for non-Intel CPUs.

The log for rev.1.7 should have said something like:
Added FPU_ERROR_BROKEN option. This forces a successful probe for
exception 16, so that hardware with a broken FPU error signal can sort
of work.


85029 16-Oct-2001 bde

Deleted most of npxprobe(), and merged npxprobe1() back into npxprobe().
Use the normal interrupt handler (npx_intr()) instead of a special
probe-time interrupt handler, although this causes problems due to
the bus_teardown_intr() not actually even tearing down the interrupt
(these problems were avoided by doing interrupt attachment for the
special interrupt handler directly). Fixed minor bitrot in comments.

The reason for the npxprobe()/npxprobe1() split mostly went away at
about the same time it was made (in 1992 or 1993 just before the
beginning of history). 386BSD ran all probes with interrupts completely
masked, and I didn't want to disturb this when I added an irq probe
to npxprobe(). An irq (not necessarily npx) must be acked for at least
external npx's to take the cpu out of the wait state that it enters
when an npx error occurs, so the probe must be done with a suitable
irq unmasked. npxprobe() went to great lengths to unmask precisely
the npx irq.

Running probes with all interrupts masked was never really needed in
FreeBSD, since FreeBSD always masked interrupts well enough using
splhigh(), but it wasn't until rev.1.48 (1995/12/12) of autoconf.c
that all probes were run with CPU interrupts enabled. This permits
npxprobe() to probe its irq using normal interrupt resources. Note
that most drivers still can't depend on this. It depends on the
interrupt handler being fast and the irq not being shared.


85028 16-Oct-2001 bde

Commit my old fixes for cosmetic bugs in npxprobe() so that they aren't
lost when the buggy code goes away completely:
- don't assume that the npx irq number is >= 8. Rev.1.73 only reversed
part of the hard-coding of it to 13 in rev.1.66.
- backed out the part of rev.1.84 that added a highly confused comment
about an enable_intr() being "highly bogus". The whole reason for
existence of npxprobe() (separate from the main probe, npxprobe1())
is to handle the complications to make this enable_intr() safe.
- backed out the part of rev.1.94 that modified npxprobe(). It mainly
broke the enable_intr() to restore_intr(). Restoring the interrupt
state in a nested way is precisely what is not wanted here. It was
harmless in practice because npxprobe() is called with interrupts
enabled, so restoring the interrupt state enables interrupts. Most
of npxprobe() is a no-op for the same reason...


85009 15-Oct-2001 tegge

Explicitly initialize the fpu when SSE is enabled since this no
longer happens as a side effect of calling npxsave.

Reviewed by: peter, bde


84935 14-Oct-2001 tegge

Change vmapbuf() to use pmap_qenter() and vunmapbuf() to use pmap_qremove().

This significantly reduces the number of TLB shootdowns caused by
vmapbuf/vunmapbuf when performing many large reads from raw disk devices.

Reviewed by: dillon


84934 14-Oct-2001 tegge

Reduce the number of TLB shootdowns caused by a call to pmap_qenter()
from number of pages mapped to 1.

Reviewed by: dillon


84850 12-Oct-2001 jdp

Correct the input/output/clobber specifications for the cpuid
instruction. Stefan Keller <dres@earth.serd.org> noticed that CPU
identification was broken when compiled with -O2, and tracked it
down to the asm statement, which was storing values into memory
without specifying that memory was modified. He submitted a patch
which added "memory" as a clobber, but I refined it further to
arrive at this version.

MFC after: 3 days


84815 11-Oct-2001 jhb

Oops, these already included sys/lock.h, they just did so after
sys/mutex.h which is too late.


84812 11-Oct-2001 jhb

Add missing includes of sys/ktr.h.


84811 11-Oct-2001 jhb

Add missing includes of sys/lock.h.


84733 09-Oct-2001 iedowse

Remove the Xresume* labels from the i386 interrupt handlers; the
code in ipl.s and icu_ipl.s that used them was removed when the
interrupt thread system was committed. Debuggers also knew about
Xresume* because these labels hide the real names of the interrupt
handlers (Xintr*), and debuggers need to special-case interrupt
handlers to get the interrupt frame.

Both gdb and ddb will now use the Xintr* and Xfastintr* symbols to
detect interrupt frames. Fast interrupt frames were never identified
correctly before, so this fixes the problem of the running stack
frame getting lost in a ddb or gdb trace generated from a fast
interrupt - e.g. when debugging a simple infinite loop in the kernel
using a serial console, the frame containing the loop would never
appear in a gdb or ddb trace.

Reviewed by: jhb, bde


84721 09-Oct-2001 robert

Remove an unneeded variable declaration and statement.

Approved by: jake


84615 07-Oct-2001 nyan

Rewrite the pc98 bus_space stuff.

The type of bus_space_tag_t is now a pointer to bus_space_tag structure,
and the bus_space_tag structure saves pointers to functions for direct
access and relocate access.

Added bsh_bam member to the bus_space_handle structure, it saves access
method either direct access or relocate access which is called by
bus_space_* functions.

Added the mecia device support. If the bs_da and bs_ra in bus tag are set
NEPC_io_space_tag and NEPC_mem_space_tag respectively, new bus_space stuff
changes the register of mecia automatically for 16bit access.

Obtained from: NetBSD/pc98


84572 06-Oct-2001 peter

Fix a warning. (unused p if not INVARIANTS)


84553 05-Oct-2001 dfr

In in_cksumdata, len must be a signed type.


84381 02-Oct-2001 mjacob

Fix problem where a user buffer outside of the area being tested
will be corrupted.

PR: 29194
Obtained from: Tor.Egge@fast.no
MFC after: 2 weeks


83972 26-Sep-2001 rwatson

o Modify i386_set_ioperm() to use securelevel_gt() instead of
direct securelevel variable checks.

Obtained from: TrustedBSD Project


83971 26-Sep-2001 rwatson

o Modify device open access control for /dev/mem and friends to use
securelevel_gt() instead of direct securelevel variable checks.

Obtained from: TrustedBSD Project


83660 19-Sep-2001 peter

Reserve an extra 16 bytes in case we have to grow the trapframe into
a vm86trapframe for switching to vm86 [unlikely] while exiting.
I lost this when doing the pcb move that went in with the KSE commit.

Reviewed by: jake


83659 19-Sep-2001 peter

Fix a mistake I made with the pcb movement relative to the stack in the
KSE patch. We need to leave the 16 bytes here for enabling the trapframe
to be converted to a vm86trapframe if we're switching *to* a vm86 context.


83651 18-Sep-2001 peter

Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.


83640 18-Sep-2001 jhb

Whitespace fixes.


83428 14-Sep-2001 imp

s/thread'/thread's/


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83276 10-Sep-2001 peter

Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by: jake, tmm, dillon


83275 10-Sep-2001 peter

gcc-3 has objections about the bluetrap6 and bluetrap13 inline asm
functions. Apparently multi-line string asm arguments are deprecated.


83223 08-Sep-2001 peter

Missing part of dillon's coredump commit. cpu_coredump() was still
passing IO_NODELOCKED to vn_rdwr(), this would cause operations on the
unlocked core vnode and softupdates nastiness if an a.out binary cored.


83163 06-Sep-2001 jhb

Call sendsig() with the proc lock held and return with it held.


83093 05-Sep-2001 jlemon

Remove superfluous statement.


83051 05-Sep-2001 yokota

Rework the ISA PnP driver pnp and the PnP resource parser to fix
the following bugs.

- When constructing a resource configuration, respect the order
in which resource descriptors are read, in order to establish
the correct mapping between the descriptors and configuration
registers.
"Plug and Play ISA Specification, Version 1.0a", Sec 4.6.1, May 5,
1994. "Clarifications to the Plug and Play ISA Specification,
Version 1.0a", Sec 6.2.1, Dec. 10, 1994.

- Do not ignore null (empty) descriptors; they are valid descriptors
acting as filler.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a",
Sec 6.2.1.

- Correctly set up logical device configuration registers for null
resources.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a"

- Handle null resources properly in the resource allocator for the
ISA bus.


82971 04-Sep-2001 iwasaki

Reenable RTC interrupts after wakeup. Some laptops have a problem
with system statistics monitoring tools (such as systat, vmstat...)
because of stopping RTC interrupts generation.
Restore all the timers (RTC and i8254) atomically.

Reviewed by: bde
MFC after: 1 week


82957 04-Sep-2001 peter

Mostly cosmetic. Move various variables from .s files to .c files so that
gdb generates debug info for them.


82939 04-Sep-2001 peter

Zap #if 0'ed map init code that got moved to the MI area.
Convert the powerpc tree to use the common code.


82938 04-Sep-2001 peter

Nuke #if 0'ed "setredzone()" stub. We never used it, and probably
never will. I've implemented an optional redzone as part of the KSE
upage breakup.


82624 31-Aug-2001 peter

Do a style cleanup pass for the pmap_{new,dispose,etc}_proc() functions
to get them closer to the KSE tree. I will do the other $machine/pmap.c
files shortly.


82585 30-Aug-2001 dillon

Remove the MPSAFE keyword from the parser for syscalls.master.
Instead introduce the [M] prefix to existing keywords. e.g.
MSTD is the MP SAFE version of STD. This is prepatory for a
massive Giant lock pushdown. The old MPSAFE keyword made
syscalls.master too messy.

Begin comments MP-Safe procedures with the comment:
/*
* MPSAFE
*/
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).

sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE

ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)

Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not. Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.

Rebuild syscall tables.


82555 30-Aug-2001 msmith

Add ACPI attachments.


82394 27-Aug-2001 peter

There is nothing more embarresing than having three goes at correcting
typos in the same paragraph. s/in in/in/

Submitted by: iedowse


82393 27-Aug-2001 peter

Enable hardwiring of things like tunables from embedded enironments
that do not start from loader(8).


82366 26-Aug-2001 peter

I missed a typo in the last commit: s/whach/which/

Submitted by: bde


82310 25-Aug-2001 julian

Add another comment.
check for 'teh's this time..


82309 25-Aug-2001 peter

Optionize UPAGES for the i386. As part of this I split some of the low
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places. The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.

This is mainly being done for the benefit of a MFC to RELENG_4 at some
point. -current doesn't really need this so much since each interrupt
runs on its own kstack.


82308 25-Aug-2001 peter

s/teh/the/


82307 25-Aug-2001 julian

Add an explanatory note that would have saved me an hour or two
of confusion had it been there when I started reading the code..


82279 24-Aug-2001 jhb

Remove references to the old giant kernel lock in various comments.


82262 24-Aug-2001 peter

Export the actual KERNBASE to the symbol table. We can use nlist() to get
this without having to second guess it in userland.


82261 24-Aug-2001 peter

Move cpu_fxsr definition to C code (so debug info is generated) and where
it is easily #ifdef'ed so that we dont miss unintentional references to it.


82165 23-Aug-2001 peter

Fix a comment error that was fixed in the pc98 version. hw.maxmem is
really hw.physmem.


82157 23-Aug-2001 peter

Dont add UPAGES to the %cs segment limit. There is nothing there except
page tables.


82154 23-Aug-2001 peter

Dont compile in SSE fxsave/fxrstor instructions if CPU_ENABLE_SSE isn't
active.


82140 22-Aug-2001 iwasaki

Move CR4.PGE enabling code after paging is enabled via CR0.PG based on
the description (2.5. CONTROL REGISTERS) of Intel developer's manual at:
ftp://download.intel.com/design/PentiumII/manuals/24319202.pdf

Reviewed by: peter, bde, tlambert2@mindspring.com
Pointed-out by: "Shin'ya Kumabuchi" <kumabu@t3.rim.or.jp>
MFC after: 1 week


82127 22-Aug-2001 dillon

Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area. i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by: freebsd-smp, jake


82121 22-Aug-2001 peter

Introduce two new sysctl's.. vm.kvm_size and vm.kvm_free. These are
purely informational and can give some advance indications of tuning
problems. These are i386 only for now as it seems that the i386 is
the only one suffering kvm pressure.


82118 21-Aug-2001 jhb

Push down Giant some in trap_pfault() so we don't grab Giant around
trap_fatal() to make restarting from panic's slightly easier. Before if
one did 'w 0 0' in ddb, the longjmp in ddb inside of trap_fatal() would
result in Giant being held (or recursed one level deeper) which led to
problems later on. You can now drop to teh debugger, do 'w 0 0', and
continue w/o a problem.


82031 21-Aug-2001 dillon

Fix bug in physmem_est calculation - the kernel_map size was not being
converted into pages.

Fix bug in maxbcache calculation, nbuf must be tested against maxbcache
rather then physmem_est.

Obtained from: bde


82025 21-Aug-2001 peter

Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this
stuff.


81933 20-Aug-2001 dillon

Limit the amount of KVM reserved for the buffer cache and for swap-meta
information. The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache. This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating. The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.


81879 18-Aug-2001 peter

There is nothing special that requires SSE to be only on 686 class cpus.
This enables 586-only SMP kernels to compile again.

Problem reported by: Jacek Jedrzejczak <jacol@ids.gda.pl>


81711 15-Aug-2001 wpaul

Teach bus_dmamem_free() about contigfree(). This is a bit of a hack,
but it's better than the buggy behavior we have now. If we contigmalloc()
buffers in bus_dmamem_alloc(), then we must configfree() them in
bus_dmamem_free(). Trying to free() them is wrong, and will cause
a panic (at least, it does on the alpha.)

I tripped over this when trying to kldunload my busdma-ified if_rl
driver.


81704 15-Aug-2001 jhb

Whitespace fixes to make this mostly fit in 80 columns.


81584 13-Aug-2001 bde

Use interrupt gates instead of trap gates for breakpoint and trace
traps, so that ddb can keep control (almost) no matter how it is
entered. This breaks time-critical interrupts while the system is
stopped in ddb, but I haven't noticed any significant problems except
that applications become confused about the time. Lost time will be
adjusted for later. Anyway, the half-baked disabling of interrupts in
Debugger() gives the same problems for the usual way of entering ddb.


81583 13-Aug-2001 bde

Removed he BPTTRAP() macro and its use. It was intended for restoring
bug for bug compatibility to ddb trap handlers after fixing the debugger
trap gates to be interrupt gates, but the fix was never committed. Now
I want the fix to apply to ddb.


81544 12-Aug-2001 iwasaki

Fix some trivial bugs.
- fix segment limit mis-calculation for GCODE_SEL, GDATA_SEL, GPRIV_SEL,
LUCODE_SEL and LUDATA_SEL.
- move `loader(8) metadata' related printf() after cninit().
- use atop macro (address to pages) for segment limit calculation
instead of i386_btop macro (bytes to pages).
- fix style bugs for the declarations of ints.

Reviewed by: bde, msmith (and arch & audit ML)


81493 10-Aug-2001 jhb

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


81265 08-Aug-2001 peter

Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.


80431 27-Jul-2001 peter

Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel
for this, since it is easy to run into with large systems with lots of
shared mmap space.

Obtained from: yahoo


80426 26-Jul-2001 peter

MASK_FPU_SW didn't do what it was expected to do.


80421 26-Jul-2001 peter

Call the early tunable setup functions as soon as kern_envp is available.
Some things depend on hz being set not long after this.


80399 26-Jul-2001 bmilekic

- Do not handle the per-CPU containers in mbuf code as though the cpuids
were indices in a dense array. The cpuids are a sparse set and treat
them as such, setting up containers only for CPUs activated during
mb_init().

- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
map, in accordance with the above.

This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.

Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob


79893 19-Jul-2001 bsd

swtch.s: During context save, use the correct bit mask for clearing
the non-reserved bits of dr7.

During context restore, load dr7 in such a way as to not
disturb reserved bits.

machdep.c: Don't explicitly disallow the setting of the reserved bits
in dr7 since we now keep from setting them when we load dr7
from the PCB.

This allows one to write back the dr7 value obtained from
the system without triggering an EINVAL (one of the
reserved bits always seems to be set after taking a trace
trap).

MFC after: 7 days


79885 19-Jul-2001 kris

Quiet a variable format-string warning.

MFC after: 1 week


79824 17-Jul-2001 tegge

The per-cpu temporary buffers are not needed since the pcb_save areas have
the proper alignment. Change dummy variable in npxinit from stack to bss
to ensure proper alignment.

Reviewed by: bde


79781 16-Jul-2001 tegge

Use PCPU_GET(cpuid) instead of curproc->p_oncpu.
Reviewed by: peter


79662 13-Jul-2001 sobomax

Unbroke kernel if I686_CPU is not defined.


79628 12-Jul-2001 peter

Fix another missed pcb_savefpu reference (inside NPX_DEBUG)


79623 12-Jul-2001 peter

Forgot this fix from another tree. make enable_sse() a real prototype.


79611 12-Jul-2001 peter

Move init_sse() out of the "GenuineIntel" section, my AthlonMP system
has it, for example, and it works fine.


79609 12-Jul-2001 peter

Activate SSE/SIMD. This is the extra context switching support that
we are required to do if we let user processes use the extra 128 bit
registers etc.

This is the base part of the diff I got from:
http://www.issei.org/issei/FreeBSD/sse.html
I believe this is by: Mr. SUZUKI Issei <issei@issei.org>
SMP support apparently by: Takekazu KATO <kato@chino.it.okayama-u.ac.jp>
Test code by: NAKAMURA Kazushi <kaz@kobe1995.net>, see
http://kobe1995.net/~kaz/FreeBSD/SSE.en.html

I have fixed a couple of style(9) deviations. I have some followup
commits to fix a couple of non-style things.


79573 11-Jul-2001 bsd

Add 'hwatch' and 'dhwatch' ddb commands analogous to 'watch' and
'dwatch'. The new commands install hardware watchpoints if supported
by the architecture and if there are enough registers to cover the
desired memory area.

No objection by: audit@, hackers@

MFC after: 2 weeks


79418 08-Jul-2001 julian

A set of changes to reduce the number of include files the kernel
takes from /usr/include. I cannot check them on alpha.. (will try beast)

Briefly looked at by: Warner Losh <imp@harmony.village.org>


79265 05-Jul-2001 dillon

Move vm_page_zero_idle() from machine-dependant sections to a
machine-independant source file, vm/vm_zeroidle.c. It was exactly the
same for all platforms and updating them all was getting annoying.


79263 04-Jul-2001 dillon

Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)


79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


79153 03-Jul-2001 tmm

Make the code to read the kernel message buffer via sysctl machine-
independent and rename the corresponding sysctls from machdep.msgbuf and
machdep.msgbuf_clear (i386 only) to kern.msgbuf and kern.msgbuf_clear.


79137 03-Jul-2001 iwasaki

Add Transmeta Crusoe LongRun support.

Submitted by: Tamotsu HATTORI <athlete@kta.att.ne.jp>
Reviewed by: arch@ folks
MFC after: 1 week


79124 03-Jul-2001 jhb

Quiet warning by removing ast() prototype.

Forgotten by: jhb (me)


79123 03-Jul-2001 jhb

Allow Giant to be recursed when a process terminates.


78983 29-Jun-2001 jhb

Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.


78981 29-Jun-2001 imp

Remove cruft from old bus.


78962 29-Jun-2001 jhb

Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by: jake (in principle)


78946 29-Jun-2001 jhb

Grab Giant around trap_pfault() for now.


78908 28-Jun-2001 jhb

Get kernel profiling on SMP systems closer to working by replacing the
mcount spin mutex with a very simple non-recursive spinlock implemented
using atomic operations.


78903 28-Jun-2001 bsd

Provide access to the IA32 hardware debug registers from the ddb
kernel debugger. Proper use of these registers allows setting
hardware watchpoints for use in kernel debugging.

MFC after: 2 weeks


78798 26-Jun-2001 kato

Recognize FC-PGA2 Pentium III (Tualatin).


78760 25-Jun-2001 dfr

Add code to detect Transmeta Crusoe cpus.


78636 22-Jun-2001 jhb

- Grab the proc lock around CURSIG and postsig(). Don't release the proc
lock until after grabbing the sched_lock to avoid CURSIG racing with
psignal.
- Don't grab Giant for addupc_task() as it isn't needed.

Reported by: tegge (signal race), bde (addupc_task a while back)


78631 22-Jun-2001 peter

Make the hw.physmem and hw.usermem variables unsigned so that they dont
come up as negative on machines with >2GB ram.


78427 18-Jun-2001 jhb

Initialize mutexes needed early on all in the same place so that the
startup routine more closely matches that of alpha and ia64. At some
point the common mutexes shared across all platforms probably should move
into sys/kern_mutex.c.


78426 18-Jun-2001 jhb

- Add support for decoding syscall names. (Brought over from the new alpha
trace code that was brought over from NetBSD.)
- Check for "syscall_with_err_pushed" as the label prior to a syscall trap
frame rather than "Xlcall_syscall" and "Xint0x80_syscall". We don't
have a valid trapframe during the short range of code that those two
symbols now cover.
- Simplify db_next_frame() to avoid duplicating the code for the different
trap frame types.
- Don't try to trace a swapped-out process. (Brought over from NetBSD via
the new alpha trace code.)


78425 18-Jun-2001 jhb

Include sys/pcpu.h to get the prototype for globaldata_register() to quiet
a warning.


78260 15-Jun-2001 peter

Fix warnings:
908: warning: long unsigned int format, unsigned int arg (arg 3)
887: warning: `timezero' defined but not used


78135 12-Jun-2001 peter

Hints overhaul:
- Replace some very poorly thought out API hacks that should have been
fixed a long while ago.
- Provide some much more flexible search functions (resource_find_*())
- Use strings for storage instead of an outgrowth of the rather
inconvenient temporary ioconf table from config(). We already had a
fallback to using strings before malloc/vm was running anyway.


77796 06-Jun-2001 jhb

Don't hold sched_lock across addupc_task().

Reported by: David Taylor <davidt@yadt.co.uk>
Submitted by: bde


77582 01-Jun-2001 tmm

Clean up the code exporting interrupt statistics via sysctl a bit:
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs

Requested by: bde
Reviewed by: bde (some time ago)


77502 30-May-2001 jhb

Quiet warnings by adding a prototype for set_user_ldt_rv() and making it
conditional on #ifdef SMP.


77486 30-May-2001 jhb

We can't grab the sched_lock in set_user_ldt() because when it is called
from cpu_switch(), curproc has been changed, but the sched_lock owner will
not be updated until we return to mi_switch(), thus we deadlock against
ourselves. As a workaround, push the acquire and release of sched_lock out
to the callers of set_user_ldt(). Note that we can't use a mtx_assert() in
set_user_ldt for the same reason.

Sleuting by: tmm
Tested by: tmm, dougb


77097 23-May-2001 jhb

Don't acquire Giant just to call trap_fatal(), we are about to panic
anyway so we'd rather see the printf's then block if the system is
hosed.


77082 23-May-2001 alfred

pmap_mapdev needs the vm_mtx, aquire it if not already locked


77015 22-May-2001 bde

Convert npx interrupts into traps instead of vice versa. This is much
simpler for npx exceptions that start as traps (no assembly required...)
and works better for npx exceptions that start as interrupts (there is
no longer a problem for nested interrupts).

Submitted by: original (pre-SMPng) version by luoqi


76947 22-May-2001 jhb

Remove a few more spl's I missed earlier.

Reported by: Michael Harnois <mdharnois@home.com>
Pointy hat: me


76941 21-May-2001 jhb

Sort includes.


76939 21-May-2001 jhb

Axe unneeded spl()'s.


76906 20-May-2001 bde

Throw away the complications in npxsave() and their infrastructure.
npxsave() went to great lengths to excecute fnsave with interrupts
enabled in case executing it froze the CPU. This case can't happen,
at least for Intel CPU/NPX's. Spurious IRQ13's don't imply spurious
freezes. Anyway, the complications were usually no-ops because IRQ13
is not used on i486's and newer CPUs, and because SMPng broke them in
rev.1.84. Forcible enabling of interrupts was changed to
write_eflags(old_eflags), but since SMPng usually calls npxsave() from
cpu_switch() with interrupts disabled, write_eflags() usually just
kept interrupts disabled.


76905 20-May-2001 bde

Use a critical region to protect almost everything in npxinit().
npxinit() didn't have the usual race because it doesn't save to curpcb,
but it may have had a worse form of it since it uses the npx when it
doesn't "own" it. I'm not sure if locking prevented this. npxinit()
is normally caled with the proc lock but not sched_lock.

Use a critical region to protect pushing of curproc's npx state to
curpcb in npxexit(). Not doing so was harmless since it at worst
saved a wrong state to a dieing pcb.


76903 20-May-2001 bde

Use a critical region to protect saving of the npx state in savectx().
Not doing this was fairly harmless because savectx() is only called
for panic dumps and the bug could at worse reset the state.

savectx() is still missing saving of (volatile) debug registers, and
still isn't called for core dumps.


76827 19-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


76770 17-May-2001 jhb

- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT.
- Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be
toggled after boot.
- Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just
afer the display of the copyright message instead of doing it by hand in
three MD places.


76650 15-May-2001 jhb

Remove unneeded includes of sys/ipl.h and machine/ipl.h.


76546 13-May-2001 bde

Use a critical region to protect pushing of the parent's npx state to the
pcb for fork(). It was possible for the state to be saved twice when an
interrupt handler saved it concurrently. This corrupted (reset) the state
because fnsave has the (in)convenient side effect of doing an implicit
fninit. Mundane null pointer bugs were not possible, because we save to
an "arbitrary" process's pcb and not to the "right" place (npxproc).

Push the parent's %gs to the pcb for fork(). Changes to %gs before
fork() were not preserved in the child unless an accidental context
switch did the pushing. Updated the list of pcb contents which is
supposed to inhibit bugs like this. pcb_dr*, pcb_gs and pcb_ext were
missing. Copying is correct for pcb_dr*, and pcb_ext is already
handled specially (although XXX'ly).

Reducing the savectx() call to an npxsave() call in rev.1.80 was a
mistake. The above bugs are duplicated in many places, including in
savectx() itself.

The arbitraryness of the parent process pointer for the fork()
subroutines, the pcb pointer for savectx(), and the save87 pointer
for npxsave(), is illusory. These functions don't work "right" unless
the pointers are precisely curproc, curpcb, and the address of npxproc's
save87 area, respectively, although the special context in which they
are called allows savectx(&dumppcb) to sort of work and npxsave(&dummy)
to work. cpu_fork() just doesn't work unless the parent process
pointer is curproc, or the caller has pushed %gs to the pcb, or %gs
happens to already be in the pcb.


76525 12-May-2001 deischen

Revert part of last commit. Instead of using %fs for KSD/TSD, we'll
follow Linux' convention and use %gs. This adds back the setting of
%fs to a sane value in sendsig(). The value of %gs remains preserved
to whatever it was in user context.


76494 11-May-2001 jhb

Simplify the vm fault trap handling code a bit by using if-else instead of
duplicating code in the then case and then using a goto to jump around
the else case.


76440 10-May-2001 jhb

- Split out the support for per-CPU data from the SMP code. UP kernels
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.

Reviewed by: jake


76434 10-May-2001 jhb

- Use sched_lock and critical regions to ensure that LDT updates are thread
safe from preemption and concurrent access to the LDT.
- Move the prototype for i386_extend_pcb() to <machine/pcb_ext.h>.

Reviewed by: silence on -hackers


76298 06-May-2001 deischen

When setting up the frame to invoke a signal handler, preserve the
%fs and %gs registers instead of setting them to known sane values.
%fs is going to be used for thread/KSE specific data by the new
threads library; we'll want it to be valid inside of signal handlers.

According to bde, Linux preserves the state of %fs and %gs when setting
up signal handlers, so there is precedent for doing this.

The same changes should be made in the Linux emulator, but when made,
they seem to break (at least one version of) the IBM JDK for Linux
(reported by drew).

Approved by: bde


76205 02-May-2001 bde

Fixed panics in npx exception handling. When using IRQ13 exception
handling, SMPng always switches the npx context away from curproc
before calling the handler, so the handler always paniced. When using
exception 16 exception handling, SMPng sometimes switches the npx
context away from curproc before calling the handler, so the handler
sometimes paniced. Also, we didn't lock the context while using it,
so we sometimes didn't detect the switch and then paniced in a less
controlled way.

Just lock the context while using it, and return without doing anything
except clearing the busy latch if the context is not for curproc. This
fixes the exception 16 case and makes the IRQ13 case harmless. In both
cases, the instruction that caused the exception is restarted and the
exception repeats. In the exception 16 case, we soon get an exception
that can be handled without doing anything special. In the IRQ13 case,
we get an easy to kill hung process.


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


76089 28-Apr-2001 jhb

Add in a missing call to forward_hardclock() in the SMP case.

Submitted by: bde


76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


76031 26-Apr-2001 jake

Remove a leading underscore that prevented I386_CPU kernels from
compiling.

Submitted by: Alexander N. Kabaev <ak03@gte.com>
PR: kern/26858


75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


75724 20-Apr-2001 jhb

Make the ap_boot_mtx mutex static.


75723 20-Apr-2001 jhb

Split up the db_printf's for 'show pcpu' so that we only output at most one
line for each db_printf(). Also, just use spaces to line the columns up
rather than trying to be fancy with tabs.


75570 17-Apr-2001 jhb

Blow away the panic mutex in favor of using a single atomic_cmpset() on a
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.


75488 13-Apr-2001 jhb

People are still having problems with i586_* on UP machines and SMP
machines, so just hack it to disable them for now until it can be fixed.

Inspired by hair pulling of: asmodai


75421 11-Apr-2001 jhb

Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just
"redundant noise" and to match the IPI constant namespace (IPI_*).

Requested by: bde


75396 10-Apr-2001 jhb

Remove the old APIC I/O higher level IPI API in favor of the newer MI
API for IPI's that isn't tied to the Intel APIC. MD code can still use
the apic_ipi() function or dink with the apic directly if needed to send
MD IPI's.


75393 10-Apr-2001 jhb

Remove the BETTER_CLOCK #ifdef's. The code is on by default and is here
to stay for the foreseeable future.

OK'd by: peter (the idea)


75392 10-Apr-2001 jhb

Add an MI API for sending IPI's. I used the same API present on the alpha
because:
- it used a better namespace (smp_ipi_* rather than *_ipi),
- it used better constant names for the IPI's (IPI_* rather than
X*_OFFSET), and
- this API also somewhat exists for both alpha and ia64 already.


75357 09-Apr-2001 jhb

- One can now specify the decimal pid of a process to trace as a parameter.
Since pid's are not in the kernel address space, this doesn't conflict
with the funcionality of specifying an arbitrary frame pointer to the
trace command.
- If the first function of a backtrace maps to fork_trampoline, then this
is a newly fork'd process that has not been executed yet, so just print
out the first frame and then return for that case.
- Lower the default count from 65535 to 1024. ddb doesn't trace into
userland, and if the stack gets hosed and starts looping it's less
annoying.


75274 06-Apr-2001 jhb

Add a new ddb command 'show pcpu' which lists some of the per-cpu data.
Specifically, the cpuid, curproc, curpcb, npxproc, and idleproc members.
Also, if witness is compiled into the kernel, then a list of all the spin
locks held by this CPU is displayed. By default the information for the
current CPU is displayed, but a decimal cpu id may be specified as a
parameter to obtain information on a specific CPU.


75256 06-Apr-2001 jhb

Axe the per-cpu variable witness_spin_check as it was replaced by the
per-cpu spinlocks list.


74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


74914 28-Mar-2001 jhb

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


74912 28-Mar-2001 jhb

Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.


74903 28-Mar-2001 jhb

Switch from save/disable/restore_intr() to critical_enter/exit().


74902 28-Mar-2001 jhb

Catch up to the mtx_saveintr -> mtx_savecrit change.


74810 26-Mar-2001 phk

Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.


74738 24-Mar-2001 obrien

Fix a problem where we were switching npxproc from underneath processes
running in process context in order to run interrupt handlers. This
caused a big smashing of the stack on AMD K6, K5 and Intel Pentium (ie, P5)
processors because we are using npxproc as a flag to indicate whether
the state has been pushed onto the stack.

Submitted by: bde


74670 23-Mar-2001 tmm

Export intrnames and intrcnt as sysctls (hw.nintr, hw.intrnames and
hw.intrcnt).

Approved by: rwatson


74430 19-Mar-2001 des

Show the bzero() bandwidth in kBps instead of Bps; use u_int32_t instead
of long and int64_t; and print the result as an unsigned long. This should
make the output from the bzero() test more readable, and avoid printing a
negative bandwidth. Note that this doesn't change the decision process,
since that is based on time elapsed, not on computed bandwidth.


74283 15-Mar-2001 peter

Kill the 4MB kernel limit dead. [I hope :-)].
For UP, we were using $tmp_stk as a stack from the data section. If the
kernel text section grew beyond ~3MB, the data section would be pushed
beyond the temporary 4MB P==V mapping. This would cause the trampoline
up to high memory to fault. The hack workaround I did was to use all of
the page table pages that we already have while preparing the initial
P==V mapping, instead of just the first one.
For SMP, the AP bootstrap process suffered the same sort of problem and
got the same treatment.

MFC candidate - this breaks on 4.x just the same..

Thanks to: Richard Todd <rmtodd@ichotolot.servalan.com>


73936 07-Mar-2001 jhb

Unrevert the pmap_map() changes. They weren't broken on x86.

Sense beaten into me by: peter


73931 07-Mar-2001 jhb

- Release Giant a bit earlier on syscall exit.
- Don't try to grab Giant before postsig() in userret() as it is no longer
needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.


73929 07-Mar-2001 jhb

Grab the process lock while calling psignal and before calling psignal.


73922 07-Mar-2001 jhb

Use the proc lock to protect p_pptr when waking up our parent in cpu_exit()
and remove the mpfixme() message that is now fixed.


73903 07-Mar-2001 jhb

Back out the pmap_map() change for now, it isn't completely stable on the
i386.


73862 06-Mar-2001 jhb

- Rework pmap_map() to take advantage of direct-mapped segments on
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.

Submitted by: dfr
Debugging help: peter
Tested by: me


73586 05-Mar-2001 jhb

Don't enable interrupts before calling sched_ithd for threaded interrupts.

Tested by: obrien


73017 25-Feb-2001 peter

Make the kernel actually compile and link under a.out, using
gcc -aout -mno-underscores. The bioscall.s tweak is not an a.out
requirement really, but to work around the bugs in the antique version of
gas that used for a.out. Makefile hacks are all that is needed to
get an a.out kernel. There is no telling if it will work though.
This is little more than an academic curiosity anyway since all it is
good for is situations where the boot code is hard wired, eg: rom
bootstraps (such as the gnat box).

GENERIC:
...
size -aout kernel ; chmod 755 kernel
text data bss dec hex
3051520 368640 198688 3618848 373820


73011 25-Feb-2001 jake

Remove the leading underscore from all symbols defined in x86 asm
and used in C or vice versa. The elf compiler uses the same names
for both. Remove asnames.h with great prejudice; it has served its
purpose.

Note that this does not affect the ability to generate an aout kernel
due to gcc's -mno-underscores option.

moral support from: peter, jhb


73001 25-Feb-2001 jake

- Rename the lcall system call handler from Xsyscall to Xlcall_syscall
to be more like Xint0x80_syscall and less like c function syscall().
- Reduce code duplication between the int0x80 and lcall handlers by
shuffling the elfags into the right place, saving the sizeof the
instruction in tf_err and jumping into the common int0x80 code.

Reviewed by: peter


72930 23-Feb-2001 peter

Activate USER_LDT by default. The new thread libraries are going to
depend on this. The linux ABI emulator tries to use it for some linux
binaries too. VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by: jhb, imp


72917 22-Feb-2001 jhb

The p_md.md_regs member of proc is used in signal handling to reference
the the original trapframe of the syscall, trap, or interrupt that entered
the kernel. Before SMPng, ast's were handled via a psuedo trap at the
end of doerti. With the SMPng commit, ast's were broken out into a
separate ast() function that was called from doreti to match the behavior
of other architectures. Unfortunately, when this was done, the
p_md.md_regs member of curproc was not updateda in ast(), thus when
signals are handled by userret() after an interrupt that returns to
userland, we end up using a stale trapframe that will result in the
registers from the old trapframe overwriting the real trapframe and
smashing all the registers right before we return to usermode. The saved
%cs:%eip from where we were in usermode are saved in the trapframe for
example.


72911 22-Feb-2001 jhb

- Change ast() to take a pointer to a trapframe like other architectures.
- Don't use an atomic operation to update cnt.v_soft in ast(). This is
the only place the variable is written to, and sched_lock is always
held when it is written, so it is already protected and the mutex release
of sched_lock asserts a memory barrier that ensures the value will be
updated in a timely fashion.


72900 22-Feb-2001 jhb

- Use TRAPF_PC() on the alpha to acess the PC in the trap frame.
- Don't hold sched_lock around addupc_task() as this apparently breaks
profiling badly due to sched_lock being held across copyin().

Reported by: bde (2)


72746 20-Feb-2001 jhb

- Don't call clear_resched() in userret(), instead, clear the resched flag
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.


72700 19-Feb-2001 bde

Removed all traces of T_ASTFLT (except for gaps where it was). It became
unused except in dead code when ast() was split off from trap().


72683 19-Feb-2001 bde

Changed the aston() family to operate on a specified process instead of
always on curproc. This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).

Debogotified the definition of aston(). aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.

Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.


72678 19-Feb-2001 bde

Fixed style bugs in clock.c rev.1.164 and cpu.h rev.1.52-1.53 -- declare
tsc_present in the right places (together with other variables of the
same linkage), and don't use messy ifdefs just to avoid exporting it in
some cases.


72376 12-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


72358 11-Feb-2001 markm

RIP <machine/lock.h>.

Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb


72334 10-Feb-2001 jake

Clear the reschedule flag after finding it set in userret(). This
used to be in cpu_switch(), but I don't see any difference between
doing it here.


72276 10-Feb-2001 jhb

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


72240 09-Feb-2001 jhb

Catch up to changes to inthand_add().


72239 09-Feb-2001 jhb

Use the MI ithread helper functions in the x86 interrupt code.


72238 09-Feb-2001 jhb

- Catch up to the new swi API changes:
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).


72226 09-Feb-2001 jhb

Move the initailization of the proc lock for proc0 very early into the MD
startup code.


72225 09-Feb-2001 jhb

Woops, remove an obsolete reference to gd_cpu_lockid.


72220 09-Feb-2001 jhb

Remove unused forward_irq counters.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


72148 08-Feb-2001 jhb

Don't enable interrupts for a kernel breakpoint or trace trap. Otherwise,
this negates the explicit disabling of interrupts when entering the
debugger in Debugger().


72145 07-Feb-2001 jhb

When SMPng was first committed, we removed 'cpl' from the interrupt
frame. Teach ddb about this as there is one less word for it to skip
over when finding a trapframe on the interrupt frame stack.


72091 06-Feb-2001 asmodai

Fix typo: seperate -> separate.

Seperate does not exist in the english language.


72011 04-Feb-2001 peter

Clean up some leftovers from the root mount cleanup that was done some
time ago. FFS_ROOT and CD9660_ROOT are obsolete.


71983 04-Feb-2001 dillon

This commit represents work mainly submitted by Tor and slightly modified
by myself. It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used. This code has been
heavily tested in -stable but only tested somewhat on -current. An MFC
will occur in a few days. My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.

Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.

Ensure that VM objects are not allocated for system maps. There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.

Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.

Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map. Previously only a simple
linear optimization was made. (The buffer vm_map typically has only a
handful of vm_map_entry's. This stabilizes it at that level permanently).

PR: 20609
Submitted by: (Tor Egge) tegge


71890 01-Feb-2001 jake

Implement preemptive scheduling of hardware interrupt threads.

- If possible, context switch to the thread directly in sched_ithd(),
rather than triggering a delayed ast reschedule.

- Disable interrupts while restoring fpu state in the trap handler,
in order to ensure that we are not preempted in the middle, which
could cause migration to another cpu.

Reviewed by: peter
Tested by: peter (alpha)


71880 31-Jan-2001 peter

Remove count for NSIO. The only places it was used it were incorrect.
(alpha-gdbstub.c got sync'ed up a bit with the i386 version)


71818 30-Jan-2001 peter

Remove some leftovers from the CMAP* stuff in globaldata and the
BSP and AP startup.


71817 30-Jan-2001 peter

Remove unused GD_CPU_LOCKID, GD_OTHER_CPUS, PS_IDLESTACK and
PS_IDLESTACK_TOP


71816 30-Jan-2001 jhb

Remove unnecessary locking to protect the p_upages_obj and p_addr
pointers.


71797 29-Jan-2001 peter

Convert mca (microchannel bus support) from something that we count
(bogus) to something that we test for the presence of.


71785 29-Jan-2001 peter

Send "#if NISA > 0" to the bit-bucket and replace it with an option.
These were compile-time "is the isa code present?" tests and not
'how many isa busses' tests.


71779 29-Jan-2001 peter

Remove stray #include "isa.h"


71728 28-Jan-2001 bmilekic

Move the setting of curproc to idleproc up earlier in ap_init(). The
problem is that a mutex lock, prior to this change, is acquired before
the curproc is set to idleproc, so we mess ourselves up by calling
the mutex lock routine with curproc == NULL.

Moving it up after the aps_ready spin-wait has us hopefully setting it
after idleproc is setup.

Solved by: jake (the allmighty) :-)


71727 28-Jan-2001 tegge

Defer assignment of low level interrupt handlers for PCI interrupts
described in the MP table until something asks for the interrupt number
later on.


71665 26-Jan-2001 jake

Push Giant down into the trap handlers that need it, instead of
acquiring it unconditionally.

Reviewed by: jhb


71647 25-Jan-2001 jhb

Whitespace fix: convert code indented 6 spaces to use tabs instead.


71604 24-Jan-2001 jhb

- Change fork_exit() to take a pointer to a trapframe as its 3rd argument
instead of a trapframe directly. (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
fork_return() instead of child_return().
- Axe child_return().


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


71532 24-Jan-2001 jhb

- Remove all the #if 0'd code that used to implement IRQ forwarding.
- Remove #if 0'd lazy interrupt mask.


71528 24-Jan-2001 jhb

Setup the return values for a child process in the trapframe when we setup
the rest of the trapframe instead of doing it in fork_return().


71527 24-Jan-2001 jhb

- Kill the have_giant parameter to userret() along with all instances of
that name as a variable. Use mtx_owned(&Giant) where appropriate
instead.
- Proc locking.
- P_FOO -> PS_FOO.
- Update comments about enable interrupts during trap and why this may be
bad if we trap while holding a spin mutex.
- Don't bother resetting p to curproc in syscall() in case we are the child
returning from fork. The child hasn't returned from fork through syscall
in a while.
- Remove fork_return() as it has been superseded by the MI version.


71526 24-Jan-2001 jhb

- Proc locking.
- P_INMEM -> PS_INMEM.


71525 24-Jan-2001 jhb

- Relocate portions of this file to get it into an order closer to that of
the alpha mp_machdep.c.
- Proc locking.
- Catch up to the P_FOO -> PS_FOO proc flags changes.
- Stick ap_init()'s prototype with the other prototypes.
- Remove the Xforwardirq IPI.
- Remove unused simplelocks.
- Don't try to psignal() from forward_statclock(), but set the appropriate
signal pending flag in p_sflag instead.
- Add in KTR_SMP tracepoints for various SMP functions. (Brought over
from the alpha port)


71524 24-Jan-2001 jhb

- Proc locking.
- Setup proc0.p_heldmtx, proc0.contested, and curproc earlier so that we
can use mutexes.
- Initialize sched_lock and Giant earlier and enter Giant during init386.
- Use suser(9) instead of checking cr_uid directly.


71522 24-Jan-2001 jhb

Call fork_exit() now instead of futzing around in assembly during a fork
return.


71350 21-Jan-2001 des

First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
- remove zalloci() and zfreei(), which are now redundant.

Reviewed by: bmilekic, jasone


71337 21-Jan-2001 jake

Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By: peter


71321 21-Jan-2001 peter

Remove APIC_INTR_DIAGNOSTIC - this has been disabled for some time now.
Remove some leftovers of removed SMP options.


71320 21-Jan-2001 jasone

Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.

This change is necessary in order to avoid some circular bootstrapping
dependencies.


71318 21-Jan-2001 jake

Remove the per-cpu pages used for copy and zero-ing pages of memory
for SMP; just use the same ones as UP. These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.


71294 20-Jan-2001 jake

Rename the ASSYM MTX_RECURSE to MTX_RECURSECNT in order to not conflict
with the flag of the same name.


71292 20-Jan-2001 jake

Simplify the i386 asm MTX_{ENTER,EXIT} macros to just call the
appropriate function, rather than doing a horse-and-buggy
acquire. They now take the mutex type as an arg and can be
used with sleep as well as spin mutexes.


71287 20-Jan-2001 jake

- Make npx_intr INTR_MPSAFE and move acquiring Giant into the
function itself.
- Remove a hack to allow acquiring Giant from the npx asm trap
vector.


71262 19-Jan-2001 peter

Convert apm from a bogus 'count' into a plain option. Clean out some
other cruft from the files.alpha and files.ia64 that were related to this.


71261 19-Jan-2001 peter

Zap unused #include "apm.h"


71257 19-Jan-2001 peter

Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h


71249 19-Jan-2001 jhb

Add in a space that got lost in the previous commit in some debugging code
so that '&' becomes a binary operator and not a unary operator.


71245 19-Jan-2001 peter

Remove reference to splz_unpend - it is long gone.


71244 19-Jan-2001 peter

Catch a few alternative names for the syscall entry frame, eg: post-ELF
and int $0x80 entry methods.


71243 19-Jan-2001 peter

apic_itrace_splz[] is unused


71228 19-Jan-2001 bmilekic

Implement MTX_RECURSE flag for mtx_init().
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.

The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.

The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.

Reviewed by: jhb


71211 18-Jan-2001 jhb

Protect p_stat and p_oncpu with sched_lock in forward_signal().


71098 16-Jan-2001 peter

Stop doing runtime checking on i386 cpus for cpu class. The cpu is
slow enough as it is, without having to constantly check that it really
is an i386 still. It was possible to compile out the conditionals for
faster cpus by leaving out 'I386_CPU', but it was not possible to
unconditionally compile for the i386. You got the runtime checking whether
you wanted it or not. This makes I386_CPU mutually exclusive with the
other cpu types, and tidies things up a little in the process.

Reviewed by: alfred, markm, phk, benno, jlemon, jhb, jake, grog, msmith,
jasone, dcs, des (and a bunch more people who encouraged it)


70954 12-Jan-2001 jake

Change return ??? to return -1 in some #if 0'ed code.


70952 12-Jan-2001 jake

Remove unused per-cpu variables inside_intr and ss_eflags.


70950 12-Jan-2001 bmilekic

Remove useless include of sys/mbuf.h (no longer useful since the
mbuf subsystem init was moved to a better place).


70928 11-Jan-2001 jake

- Remove compatibility macros for accessing per-cpu variables.
__FreeBSD_version 500015 can be used to detect their disappearance.
- Move the symbols for SMP_prvspace and lapic from globals.s to
locore.s.
- Remove globals.s with extreme prejudice.


70861 10-Jan-2001 jake

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


70798 08-Jan-2001 jake

Fix a warning. The type of globaldata.gd_prvspace has changed.


70714 06-Jan-2001 jake

Use %fs to access per-cpu variables in uni-processor kernels the same
as multi-processor kernels. The old way made it difficult for kernel
modules to be portable between uni-processor and multi-processor
kernels. It is no longer necessary to jump through hoops.

- always load %fs with the private segment on entry to the kernel
- change the type of the self referntial pointer from struct privatespace
to struct globaldata
- make the globaldata symbol have value 0 in all cases, so the symbols
in globals.s are always offsets, not aliases for fields in globaldata
- define the globaldata space used for uniprocessor kernels in C, rather
than assembler
- change the assmebly language accessors to use %fs, add a macro
PCPU_ADDR(member, reg), which loads the register reg with the address
of the per-cpu variable member


70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


70034 14-Dec-2000 jhb

Remove the "machine dependent" KTR trace buffer ddb commands. The code was
exactly the same on all platforms.


70006 14-Dec-2000 jake

Use _lapic+offset to access the local apic from assembly language
files, rather than the symbols in globals.s. The offsets are
generated by genassym.


69987 13-Dec-2000 jhb

If we fail to emulate a vm86 trap in kernel mode, then we use
vm86_trap() to return to the calling program directly. vm86_trap()
doesn't return, thus it was never returning to trap() to release
Giant. Thus, release Giant before calling vm86_trap().


69972 13-Dec-2000 tanimura

- If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by: dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by: alfred, dillon


69971 13-Dec-2000 jake

Introduce a new potientially cleaner interface for accessing per-cpu
variables from i386 assembly language. The syntax is PCPU(member)
where member is the capitalized name of the per-cpu variable, without
the gd_ prefix. Example: movl %eax,PCPU(CURPROC). The capitalization
is due to using the offsets generated by genassym rather than the symbols
provided by linking with globals.o. asmacros.h is the wrong place for
this but it seemed as good a place as any for now. The old implementation
in asnames.h has not been removed because it is still used to de-mangle
the symbols used by the C variables for the UP case.


69947 13-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


69919 12-Dec-2000 jhb

Add in symbols needed in the WITNESS_ENTER and WITNESS_EXIT macros in
i386/include/mutex.h.


69881 12-Dec-2000 jake

- Add code to detect if a system call returns with locks other than Giant
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


69779 08-Dec-2000 jake

Revert the previous change I made to cpu_switch. It doesn't help as
much as I thought it would and according to bde was a pessimization.


69774 08-Dec-2000 phk

Staticize some malloc M_ instances.


69658 06-Dec-2000 peter

This is kind of a nasty hack, but it appears to solve the Compaq DL360
SMP problem. Compaq, in their infinite wisdom, forgot to put the IO apic
intpin #0 connection to the 8259 PIC into the mptable. This hack is to
look and see if intpin #0 has *no* table entry and adds a fake ExtInt
entry for the remap routines to use. isa/clock.c will still test the
interrupts. This entry is only ever used on an already broken system.


69586 05-Dec-2000 jake

Remove the last of the MD netisr code. It is now all MI. Remove
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.

Reviewed by: jhb


69578 04-Dec-2000 peter

Cleanup some leftover lint from the old interrupt system.
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.

Reviewed by: jhb, jakeb


69536 03-Dec-2000 jake

Change cpu_switch to explicitly popl the callers program counter and
pushl that of the new process, rather than doing a movl (%esp) and
assuming that the stack has been setup right. This make the initial
stack setup slightly more sane, and will make it easier to stick
an interrupted process onto the run queue without its knowing.


69521 02-Dec-2000 markm

Namespace cleanup. Remove some #includes in favour of an explicit
declaration.

Asked for by: bde


69440 01-Dec-2000 jhb

Fix this slightly better by using NON_GPROF_RET instead of duplicating
hard-coded asm.

Suggested by: bde


69431 01-Dec-2000 jake

Change doreti to take a trapframe instead of an intrframe.
Remove associated pushes of dummy units to convert frame.

Reviewed by: jhb


69406 30-Nov-2000 jhb

Revert the previous change to this file. We have to hardcode in the opcode
for return because we do Evil Things(tm) with a 'ret' macro in
asmacros.h.

Noticed by: markm


69379 30-Nov-2000 marcel

Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286


69334 28-Nov-2000 jhb

Don't wait forever for CPUs to stop or restart. Instead, give up after a
timeout. If DIAGNOSTIC is turned on, then display a message to the console
with a map of which CPUs failed to stop or restart. This gives an SMP box
at least a fighting chance of getting into DDB if one of the other CPUs has
interrupts disabled.


69333 28-Nov-2000 jhb

Use atomic ops to close a race condition on the in_Debugger variable used
to only allow 1 CPU at a time to (non-recursively) enter the debugger.


69147 25-Nov-2000 jlemon

Revert the last commit to the callout interface, and add a flag to
callout_init() indicating whether the callout is safe or not. Update
the callers of callout_init() to reflect the new interface.

Okayed by: Jake


69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


69006 21-Nov-2000 markm

Assembler fixes.

Fix opcodes that were typed as ".byte 0xNN, 0xMM" when an older
assembler could not recognise the newer Pentium instructions.
Reviewed by: jhb


69001 21-Nov-2000 jhb

Stop handcoding a couple of instructions since gas 2.10 can properly
assemble 16-bit code.

Noticed by: markm


68889 19-Nov-2000 jake

- Protect the callout wheel with a separate spin mutex, callout_lock.
- Use the mutex in hardclock to ensure no races between it and
softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
CALLOUT_MPSAFE, which specifies that a callout handler does not
need giant. There is still no way to set this flag when
regstering a callout.

Reviewed by: -smp@, jlemon


68862 17-Nov-2000 jake

- Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb


68860 17-Nov-2000 jhb

- Change extra sanity checks in cpu_switch() to be conditional on INVARIANTS
instead of DIAGNOSTIC.
- Remove the p_wchan check as it no longer applies since a process may be
switched out during CURSIG() within msleep() or mawait().
- Remove an extra sanity check only needed during the early SMPng work.


68808 16-Nov-2000 jhb

Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by: witness


68737 14-Nov-2000 jhb

Always enable interrupts during fork_trampoline() after releasing the
sched_lock. This is needed for kernel threads that are created before
interrupts are enabled. kthreads created by kld's that are created at
SI_SUB_KLD such as the random kthread.

Tested by: phk


68676 13-Nov-2000 nyan

Initialize bus_space_handle_t with zero (for PC-98).


68490 08-Nov-2000 asmodai

Fix some further english grammar and typo's.


68489 08-Nov-2000 asmodai

Fix typo's: UPGRADE_CPU_HW_CACHE -> CPU_UPGRADE_HW_CACHE


68441 07-Nov-2000 alfred

Protect against an infinite loop when prefaulting pages. This can
happen when the vm system maps past the end of an object or tries
to map a zero length object, the pmap layer misses the fact that
offsets wrap into negative numbers and we get stuck.

Found by: Joost Pol aka Nohican <nohican@marcella.niets.org>
Submitted by: tegge


68063 31-Oct-2000 phk

Deprecate devsw->d_bmaj entirely.

This removes support for booting current kernels with very old bootblocks.

Device driver writers: Please remove initializations for the d_bmaj
field in your cdevsw{}.


67899 29-Oct-2000 phk

Remove unneeded <stddef.h> #includes.


67882 29-Oct-2000 phk

Remove unneeded #include <sys/proc.h> lines.


67759 28-Oct-2000 phk

Revert two experimental changes which escaped from my devel machine.


67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


67694 27-Oct-2000 bde

Declare or #define per-cpu globals in <machine/globals.h> in all cases.
The i386 UP case was messily different.


67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


67477 23-Oct-2000 jhb

Don't dink with interrupts in vm_page_zero_idle(). This code assumed it
was being called with interrupts disabled, when it was actually being called
with them enabled.

Pointed out by: tegge


67365 20-Oct-2000 jhb

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


67360 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Use cpu_throw() instead of cpu_switch() during cpu_exit() since we don't
need to save our previous state.


67358 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Catch up to the MI mutex structure due to saveflags,saveipl,savepsr
becoming saveintr.


67357 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for Giant and sched_lock.


67356 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- machine/ipl.h -> sys/ipl.h
- Use MUTEX_DECLARE() for clock_lock


67346 20-Oct-2000 kato

Convert the type of bus_space_handle_t of pc98 from structure into
pointer to structure.

Reviewed by: nyan


67308 19-Oct-2000 jhb

Axe the idle_event eventhandler, and add a MD cpu_idle function used
for things such as halting CPU's, idling CPU's, etc.

Discussed with: msmith


67268 18-Oct-2000 mdodd

Use appropriate resource management accessors instead of directly
referencing structure members.

Use rman_get_size() instead of end - start + 1.


67265 17-Oct-2000 jhb

- Catch up to moving headers, machine/ipl.h -> sys/ipl.h
- Fix some whitespace bogons.

Submitted by: bde (2)


67247 17-Oct-2000 ps

Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before: After:
WCE TIME WCE TIME
------------------ ------------------
1 141.820972 1 15.600111
0 797.265072 0 65.480465

Obtained from: Yahoo!
Reviewed by: peter


67164 15-Oct-2000 phk

Remove unneeded #include <machine/clock.h>


67095 13-Oct-2000 peter

savectx() is now used exclusively by the crash dump system. Move the
i386 specific gunk (copy %cr3 to the pcb) from the MI dumpsys() to the
MD savectx().


67011 12-Oct-2000 bde

Moved the definitions of AST_PENDING and AST_RESCHED to the correct place.


66855 09-Oct-2000 bde

Unremoved used include of <machine/ipl.h>. Removing it in rev.1.95
significantly pessimized syscalls by arranging to do null rescheduling
on return from every syscall. (AST_RESCHED was not defined, and the
mask ~AST_RESCHED gets replaced by the useless mask ~0. This bug has
been fixed before, in rev.1.92.)


66716 06-Oct-2000 jhb

- Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's. This is necessary for the
clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
This is needed because both hardclock() and statclock() need to run in
the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
during the clock interrupt. Instead, use two p_flag's in the proc struct
to mark the current process as having a pending SIGVTALRM or a SIGPROF
and let them be delivered during ast() when hardclock() has finished
running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways. It was
broken on the x86 if it was turned on since cpl is gone. It's only use
was to bogusly run softclock() directly during hardclock() rather than
scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
mutex. Since the spin mutex already handles disabling/restoring
interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
signals in hardclock() that are to be delivered in ast().

Submitted by: jakeb (making statclock safe in a fast interrupt)
Submitted by: cp (concept of delaying signals until ast())


66713 06-Oct-2000 jhb

Various whitespace cleanups after the SMPng commit, which jumbled things
around a bit in the trap handling code.


66712 06-Oct-2000 jhb

Don't treat a kernel stack fault the same as a general protect fault or
a segment not present fault in the non-vm86 case.


66711 06-Oct-2000 jhb

Remove an unnecessary sti and spl0() in fork_trampoline. Interrupts
should be enabled by MTX_EXIT() now when it releases the sched_lock.


66698 05-Oct-2000 jhb

- Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
completely from the x86 hardware interrupt code.
- The ihandlers array is now gone. Instead, there is a MI shandlers array
that just contains SWI handlers.
- Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by: dfr


66696 05-Oct-2000 jhb

Replace loadandclear() with atomic_readandclear_int().


66559 02-Oct-2000 peter

Fix a cosmetic sign problem on machines with 4G of ram.
0x00312000 - 0xe5fe7fff, 3855441920 bytes (4294859990 pages)
.. becomes
0x00314000 - 0xe5fe7fff, 3855433728 bytes (941268 pages)


66503 01-Oct-2000 peter

Fix the no-pci case of attaching isa, eisa and mca devices.
device_add_child() is meant to be called by the bus add_child method, not
to replace the bus add_child method. We could have called nexus_add_device
directly too, that would have also worked.

PR: 21657
Tested by: markm


66489 30-Sep-2000 msmith

More updates to the ACPI code:

- Move all register I/O into acpi_io.c
- Move event handling into acpi_event.c
- Reorganise headers into acpivar/acpireg/acpiio
- Move find-RSDT and find-ACPI-owned-memory into acpi_machdep
- Allocate all resources (except those detailed only by AML)
as real resources. Add infrastructure that will make adding
resource support to AML code easy.
- Remove all ACPI #ifdefs in non-ACPI code
- Removed unnecessary includes
- Minor style and commenting fixes

Reviewed by: iwasaki


66475 30-Sep-2000 bmilekic

Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.

Here's a list of things that have changed (I may have left out a few); for a
relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal

* Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
nobody uses anymore. It was great while it lasted, but now we're moving
onto bigger and better things (Approved by: wollman).

* Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
new allocations which grab the necessary lock.

* Make sure that necessary mbstat variables are manipulated with
corresponding atomic() routines.

* Changed the "wait" routines, cleaned it up, made one routine that does
the job.

* Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
are now included in the generalized "wait" routines.

* Sleep routines now use msleep().

* Free lists have locks.

* etc... probably other stuff I'm missing...

Things to look out for and work on later:

* find a better way to (dynamically) adjust EXT_COUNTERS

* move necessity to recurse on a lock from drain routines by providing
lock-free lower-level version of MFREE() (and possibly m_free()?).

* checkout include of mutex.h in sys/sys/mbuf.h - probably violating
general philosophy here.

The code has been reviewed quite a bit, but problems may arise... please,
don't panic! Send me Emails: bmilekic@freebsd.org

Reviewed by: jlemon, cp, alfred, others?


66464 29-Sep-2000 dfr

Ansify and fix warnings.


66458 29-Sep-2000 dfr

This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.


66442 29-Sep-2000 peter

Fill in some more missing bits from cpu_features according to the Intel
Pentium4 cpuid docs.


66441 29-Sep-2000 peter

First shot at identifying the Pentum 4 acording to our reading of the
the cpu_id extensions in the Intel docs. There is more info available.
See the following URL for more details.
http://developer.intel.com/design/processor/future/manuals/CPUID_Supplement.htm

Requested by: Intel


66416 28-Sep-2000 peter

Get out the roto-rooter and clean up the abuse of nexus ivars by the
i386/isa/pcibus.c. This gets -current running again on multiple host->pci
machines after the most recent nexus commits. I had discussed this with
Mike Smith, but ended up doing it slightly differently to what we
discussed as it turned out cleaner this way. Mike was suggesting creating
a new resource (SYS_RES_PCIBUS) or something and using *_[gs]et_resource(),
but IMHO that wasn't ideal as SYS_RES_* is meant to be a global platform
property, not a quirk of a given implementation. This does use the ivar
methods but does so properly. It also now prints the physical pci bus that
a host->pci bridge (pcib) corresponds to.


66407 27-Sep-2000 asmodai

Fix spelling of Katmai [Katami].


66400 27-Sep-2000 msmith

Since the nexus is responsible for creating the I/O resources (ports, memory)
it ought to be able to deal with devices directly attached to it having
allocations of such resources. Make it so.


66383 26-Sep-2000 kato

Recognize new Pentium III Xeon (stepping A0).

PR: 21233
Submitted by: ade


66277 22-Sep-2000 ps

Remove the NCPU, NAPIC, NBUS, NINTR config options. Make NAPIC,
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.

Reviewed by: peter


66206 22-Sep-2000 msmith

Implement halt-on-idle in the !SMP case, which should significantly
reduce power consumption on most systems.


66069 19-Sep-2000 eivind

Better error message when booting an SMP kernel on an UP system.


65856 14-Sep-2000 jhb

Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just
use struct mtx, struct witness, and struct witness_blessed.

Requested by: bde


65822 13-Sep-2000 jhb

- Remove the inthand2_t type and use the equivalent driver_intr_t type from
newbus for referencing device interrupt handlers.
- Move the 'struct intrec' type which describes interrupt sources into
sys/interrupt.h instead of making it just be a x86 structure.
- Don't create 'ithd' and 'intrec' typedefs, instead, just use 'struct ithd'
and 'struct intrec'
- Move the code to translate new-bus interrupt flags into an interrupt thread
priority out of the x86 nexus code and into a MI ithread_priority()
function in sys/kern/kern_intr.c.
- Remove now-uneeded x86-specific headers from sys/dev/ata/ata-all.c and
sys/pci/pci_compat.c.


65815 13-Sep-2000 bde

Be more careful about cleaning up the stack after function calls early
in the boot. The cleanup must be done in one of the few ways that
db_numargs() understands, so that early backtraces in ddb don't underrun
the stack. The underruns caused reboots a few years ago when there
was an unmapped page above the stack (trapping to abort the command
doesn't work early).

Cleaned up some nearby code.


65811 13-Sep-2000 bde

Fixed hang on booting with -d. mtx_enter() was called on an uninitialized
lock. The quick fix in trap.c was not quite the version tested and had no
effect; back it out.


65782 12-Sep-2000 jhb

Clean up process accounting some more. Unfortunately, it is still not
quite right on i386 as the CPU who runs statclock() doesn't have a valid
clockframe to calculate statistics with.


65781 12-Sep-2000 bde

Quick fix for hang on booting with -d. mtx_enter() was called before
curproc was initialized. curproc == NULL was interpreted as matching
the process holding Giant... Just skip mtx_enter() and mtx_exit() in
trap() if (curproc == NULL && cold) (&& cold for safety).


65713 11-Sep-2000 jhb

When doing statistics for statclock on other CPU's, use the other CPUs'
idleproc pointers instead of our own for comparisons.

Submitted by: tegge


65620 08-Sep-2000 jhb

Remove an unneeded extern declaration of cp_time.


65597 08-Sep-2000 jake

Really fix USER_LDT. (Don't use currentldt as an L-value.)


65587 07-Sep-2000 jake

Don't use currentldt as an L-value.
This should fix options USER_LDT.

Reported-by: John Hay <jhay@zibbi.mikom.csir.co.za>
Nickolay Dudorov <nnd@mail.nsk.ru>


65557 07-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


65556 07-Sep-2000 jasone

Add KTR, a facility that logs kernel events in order to to facilitate
debugging.

Acquired from: BSDi (BSD/OS)
Submitted by: dfr, grog, jake, jhb


65514 06-Sep-2000 phk

Introduce atomic_cmpset_int() and atomic_cmpset_long() from SMPng a
few hours earlier than the rest.

The next DEVFS commit needs these functions.

Alpha versions by: dfr
i386 versions by: jakeb

Approved by: SMPng


65500 05-Sep-2000 msmith

Teach the NFS && NFS_ROOT case how to pick up information left by the
PXE loader, and use this to build the nfs_diskless structure.


65389 03-Sep-2000 peter

Complain if we cannot find loader(8) metadata.


65292 31-Aug-2000 takawata

Merge rest piece of ACPI driver.To activate acpi driver ,add

device acpi

line. Merge finished. But still experimental phase.Need more hack!

Obtained from:ACPI for FreeBSD project


65273 31-Aug-2000 kato

Improved Cyrix 486DX supports for NEC PC-98.
- Enable WB cache via CCR2 and CR0.
- Set the need_pre_dma_flush when the CPU_I486_ON_386 option is
defined.

Submitted by: Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>


64837 19-Aug-2000 dwmalone

Replace the mbuf external reference counting code with something
that should be better.

The old code counted references to mbuf clusters by using the offset
of the cluster from the start of memory allocated for mbufs and
clusters as an index into an array of chars, which did the reference
counting. If the external storage was not a cluster then reference
counting had to be done by the code using that external storage.

NetBSD's system of linked lists of mbufs was cosidered, but Alfred
felt it would have locking issues when the kernel was made more
SMP friendly.

The system implimented uses a pool of unions to track external
storage. The union contains an int for counting the references and
a pointer for forming a free list. The reference counts are
incremented and decremented atomically and so should be SMP friendly.
This system can track reference counts for any sort of external
storage.

Access to the reference counting stuff is now through macros defined
in mbuf.h, so it should be easier to make changes to the system in
the future.

The possibility of storing the reference count in one of the
referencing mbufs was considered, but was rejected 'cos it would
often leave extra mbufs allocated. Storing the reference count in
the cluster was also considered, but because the external storage
may not be a cluster this isn't an option.

The size of the pool of reference counters is available in the
stats provided by "netstat -m".

PR: 19866
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: alfred (glanced at by others on -net)


64781 17-Aug-2000 bsd

Don't let an illegal value for dr7 get set, which can lead to an
unexpected TRCTRAP.

Reported by: John W. De Boskey <jwd@FreeBSD.org>


64728 16-Aug-2000 tegge

Prepare for a cleanup of pmap module API pollution introduced by the
suggested fix in PR 12378.

Keep track of all existing pmaps independent of existing processes.

This allows for a process to temporarily connect to a different address
space without the risk of missing an update of the original address space if
the kernel grows.

pmap_pinit2() is no longer needed on the i386 platform but is left as a
stub until the alpha pmap code is updated.

PR: 12378


64592 13-Aug-2000 jhb

Include machine/cputypes.h so we get the cpu_class variable. This is needed
if I386_CPU is defined in the kernel config file.


64529 11-Aug-2000 peter

Clean up some low level bootstrap code:

- stop using the evil 'struct trapframe' argument for mi_startup()
(formerly main()). There are much better ways of doing it.
- do not use prepare_usermode() - setregs() in execve() will do it
all for us as long as the p_md.md_regs pointer is set. (which is
now done in machdep.c rather than init_main.c. The Alpha port did it
this way all along and is much cleaner).
- collect all the magic %cr0 etc register settings into one place and
have the AP's call that instead of using magic numbers (!!) that keep
changing over and over again.
- Make it safe to call kthread_create() earlier, including during the
device probe sequence. It doesn't need the callback mechanism that
NetBSD's version uses.
- kthreads created this way are root-less as they exist before the root
filesystem is mounted. init(1) is set up so that it aquires the root
pointers prior to running. If other kthreads want filesystem acccess
we can make this code more generic.
- set all threads start times once we have decided what time it is.
- init uses a trampoline rather than the evil prepare_usermode() hack.
- kern_descrip.c has a couple of tweaks to deal with forking when there
is no rootdir or cwd etc.
- adjust the early SYSINIT() sequence so that a few prereqisites are in
place. eg: make sure the run queue is initialized before doing forks.

With this, the USB code can easily create a kthread to do the device
tree discovery. (I have tested it, it works nicely).

There are still some open issues before this is truely useful.
- tsleep() does not like working before the clock is running. It
sort-of tries to spin wait, but it can do more useful things now.
- stopping a kthread in kld code at unload time is "interesting" but
we have a solution for that.

The Alpha code needs no changes for this. It already uses pretty much the
same strategies, but a little cleaner.


64494 10-Aug-2000 tegge

Don't skip IOAPIC id conflict detection when only one pci bus is present.
PR: 20312
Reviewed by: Steve Roome <steve@sse0691.bri.hp.com>


64325 07-Aug-2000 tegge

Add workaround for livelock problem when starting APs.

With more than 1 AP present, an AP could fail to properly release
the mp lock before waiting for smp_started to become nonzero.

With early startup of APs, the BSP could fail to properly release
the mp lock before waiting for smp_started to become nonzero.


64294 06-Aug-2000 ps

Change the behavior of isa_nmi to log an error message instead of
panicing and return a status so that we can decide whether to drop
into DDB or panic. If the status from isa_nmi is true, panic the
kernel based on machdep.panic_on_nmi, otherwise if DDB is
enabled, drop to DDB based on machdep.ddb_on_nmi.

Reviewed by: peter, phk


64290 06-Aug-2000 tegge

Be more verbose when changing APIC ID on an IO APIC.

Don't allow cpu entries in the MP table to contain APIC IDs out of range.

Don't write outside array boundaries if an IO APIC entry in the MP table
contains an APIC ID out of range.

Assign APIC IDs for all IO APICs according to section 3.6.6 in the
Intel MP spec:

- If the current APIC ID on an IO APIC doesn't conflict with other
IO APICs or CPUs, that APIC ID should be used. The copy of the MP
table must be updated if the corresponding APIC ID in the MP table
is different.

- If the current APIC ID was in conflict with other units, the
corresponding APIC ID specified in the MP table is checked for conflict.

- If a conflict is still found then fall back to using a new unique ID.
The copy of the MP table must be updated.

- IDs out of range is considered to be in conflict.

During these operations, the IO_TO_ID array cannot be used, since any
conflict would have caused information loss. The array is then corrected,
since all APIC ID conflicts should have been resolved.

PR: 20312, 18919


64063 31-Jul-2000 luoqi

Handle write page faults (both write only or read-modify-write) as MI vm
write-only faults. This would allow write-only mmapped regions to function
correctly.


64031 30-Jul-2000 phk

Allow use of TSC even if APM is compiled in but disabled.


63981 28-Jul-2000 peter

Fix warning - isa/isavar.h is a prerequisite for isa/pnpvar.h


63140 14-Jul-2000 ps

Change the way NMI's are handled. Before, if DDB was enabled and
a NMI occured, you could type continue in DDB and the kernel would
not attempt to detect what type of NMI was recieved. Now we check
for the type of NMI first and then go to DDB if it is enabled.

This will solve the problem with having DDB enabled and getting an
NMI due to some possibly bad error and being able to continue the
operation of the kernel when you really want to panic and know
what happened.

Submitted by: jhb


62947 11-Jul-2000 tanimura

Finally merge newmidi.
(I had been busy for my own research activity until the last weekend)

Supported devices:

SB Midi Port (sbc + midi)
SB OPL3 (sbc + midi)
16550 UART (midi, needs a trick in your hint)
CS461x Midi Port (csa + midi)

OSS-compatible sequencer (seq)

Supported playing software:

playmidi (We definitely need more)

Notes:

/dev/midistat now reports installed midi drivers. /dev/sndstat reports
only pcm drivers. We need the new name(pcmstat?).

EMU8000(SB AWE) does not sound yet but does get probed so that the OPL3
synth on an AWE card works.

TODO:

MSS/PCI bridge drivers
Midi-tty interface to support general serial devices
Modules


62870 10-Jul-2000 kris

Don't call printf with no format string.

Reviewed by: msmith


62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


62298 01-Jul-2000 bsd

Fix my own style bugs (use of spaces instead of tabs for indentation).
This is a style-only change.


62088 25-Jun-2000 markm

Duh. Fix a fatfingered patch.


62085 25-Jun-2000 markm

Fix an uninitialised variable and a function return value.

Reported by: dillon


62057 25-Jun-2000 markm

Strip out the machine-independant parts of the memory device.
/dev/(u)random, /dev/null, /dev/zero are all moving to machine-independant
drivers.
Reviewed by: dfr


61996 23-Jun-2000 msmith

Make the PnP 'slopsucker' quiet in the !bootverbose case - the real NPX
probe happens much earlier, and may come to very different conclusions
about the system's NPX setup.


61995 23-Jun-2000 msmith

Add a stub driver to consume the PnP "system resource" items, and hide
them in the !bootverbose case.


61994 23-Jun-2000 msmith

Add PnP probe methods to some common AT hardware drivers. In each case,
the PnP probe is merely a stub as we make assumptions about some of this
hardware before we have probed it.

Since these devices (with the exception of the speaker) are 'standard',
suppress output in the !bootverbose case to clean up the probe messages
somewhat.


61992 23-Jun-2000 msmith

Stop trying to do anything funny with the interrupt resource range. The
AT PIC will consume IRQ 2 correctly in the !APIC_IO case.


61717 15-Jun-2000 phk

Add disk_enumerate() for finding names of disks. Vinum and libh will
need this RSN.

Remove a pointless warning in the root device locating code.

Remove the "wd" compatibility name from the "ad" driver.

WARNING: If you have not updated to use /dev/wd* in your /etc/fstab
and modern bootblocks, it would be a very good idea to do so BEFORE
you upgrade your kernel.


61623 13-Jun-2000 kato

Recognize Coppermine Celeron processors whose CPU ID = 0x68?. They
were recognized as "Pentium III/Pentium III Xeon."


61616 13-Jun-2000 kato

Added new options CPU_PPRO2CELERON and CPU_L2_LATENCY to support
Socket 8 to 370 converters. When (1) CPU_PPRO2CELERON option is
defined, (2) Intel CPU is found and (3) CPU ID is 0x66?, L2 cache is
enabled through MSR 0x11e. The L2 cache latency value can be
specified by CPU_L2_LATENCY option. Default value of L2 cache latency
is 5.

These options are useful if you use Socket 8 to Socket 370 converter
(e.g. Power Leap's PL-Pro/II.) Most PentiumPro BIOSs don't enable L2
cache of Mendocino Celeron CPUs because they don't know Celeron CPUs.
These options are needles if you use a Coppermine (FCPGA) Celeron or
PentiumIII, becuase the L2 cache enable bit is hard wired and L2 cache
is always enabled.


61533 10-Jun-2000 msmith

Don't include opt_smp.h - we don't use anything defined in it.


61531 10-Jun-2000 msmith

Correct the tests for ISA PIC/APIC so that they actually work.


61474 10-Jun-2000 peter

Unused include: #include "ether.h"


61469 10-Jun-2000 peter

Add option BROKEN_KEYBOARD_RESET to an opt_*.h file


61422 08-Jun-2000 bde

Always include the full symbol table (as specified by its start and
end values in bootinfo) in kernel space if it is loaded (i.e., if its
specified end address is nonzero), not just if it is loaded and DDB
is configured. This may be used to fix kldsym(2) for booting without
/dev/loader; currently, in this case, it just fixes unused pointers
and wastes space consistently. For booting in the normal way with
/boot/loader, the table is included and pointed to in a different way
and kldsym(2) works.


61362 07-Jun-2000 iwasaki

Fix gdt pointer for the current cpu on SMP.
This will support power-off only. Fix for suspend/resume will come later.
Also, MFC on this is shceduled on next week.

Submitted by: sumitani@bd2.hnes.nec.co.jp
Reviewed by: jlemon


61339 06-Jun-2000 dillon

INTR_TYPE_FAST / FAST_INTR interrupts (currently just serial interrupts)
have their own lock and do not need the MP lock. The SMP cleanup was
a little too conservative in MP locking fast interrupts but at least
it's trivial to fix. MFC soon.

Submitted by: bde


61220 03-Jun-2000 bde

Fixed some style bugs in the signal handling funcations. This doesn't
change the object file.


61136 31-May-2000 msmith

Further fixes for multiple-IO-APIC systems from Tor Egge:

Further experimentation showed that some Dell 2450 machines with the
prevention kludge installed still got T_RESERVED traps. CPU interrupt
vector 0x7A was observed to be triggered. This might have been the
bitwise OR of two different vectors sent from each of the IOAPICs at
the same time.

IOAPIC #0: 0x68 --> irq 8: RTC timer interrupt
IOAPIC #1: 0x32 --> irq 18: scsi host adapter or network interface
----
0x7a --> T_RESERVED

Both IOAPICs had ID 0.

Appendix B.3 in the MP spec indicates that the operating system is
responsible for assigning unique IDs to the IOAPICs.

The enclosed patch programs the IOAPIC IDs according to the IOAPIC
entries in the MP table.

Submitted by: tegge


61130 31-May-2000 bde

Pack the SWI bits to save some time and space.


61126 31-May-2000 bde

Add SWI_TQ_MASK to all interrupt masks except SWI_CLOCK_MASK. Use a
new macro SWI_LOW_MASK to give the mask for low priority SWIs instead
of hard-coding this mask as SWI_CLOCK_MASK.

Reviewed by: dfr


61081 29-May-2000 dillon

This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it. It implements a new
PG_UNMANAGED flag that has slightly different characteristics
from PG_FICTICIOUS.

A new sysctl, kern.ipc.shm_use_phys has been added to enable the
use of physically-backed sysv shared memory rather then swap-backed.
Physically backed shm segments are not tracked with PV entries,
allowing programs which use a large shm segment as a rendezvous
point to operate without eating an insane amount of KVM in the
PV entry management. Read: Oracle.

Peter's OBJT_PHYS object will also allow us to eventually implement
page-table sharing and/or 4MB physical page support for such segments.
We're half way there.


61075 29-May-2000 dfr

Add SWI_TQ_MASK to imask definition.


61074 29-May-2000 dfr

Brucify the pmap_enter_temporary() changes.


61036 28-May-2000 dfr

Add a new pmap entry point, pmap_enter_temporary() to be used during
dumps to create temporary page mappings. This replaces the use of CADDR1
which is fairly x86 specific.

Reviewed by: dillon


60973 27-May-2000 jhb

- Remove unnecessary 'data32' and 'addr32' prefixes and #define's.
- Go ahead and use 'lgdt' again instead of hand-assembling the instruction.
During testing this code worked fine. If for some reason a 32-bit offset
is needed, 'lgdtl' should be used instead of reverting to manual machine
code.

Tested by: peter


60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


60933 25-May-2000 tegge

Reintroduce a workaround for a gas bug (misassembled lgdt instruction)
Use .code16 for the real mode part of the AP bootstrap trampoline code.


60881 24-May-2000 peter

pmap_enter() masked off the page offset bits, pmap_kenter() did not.
This (I believe) is the cause of the XFree86 startup and/or mptable(8)
panics when programs were reading from /dev/mem at non-page-aligned
offsets. The offsets were being converted into random page flags in the
page tables. :-( (including PG_PS = 4MB page size)


60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


60804 22-May-2000 obrien

Sort the sys includes.


60755 21-May-2000 peter

Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that. In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t. This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system. (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc). Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries. This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist. This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by: dillon


60676 18-May-2000 grog

Correct previous commit: solve the "stopped clock" syndrome in remote
kernel debugger.


60668 17-May-2000 msmith

If we are running in APIC_IO mode, pretend that we didn't see the BIOS
reporting an AT PIC. We do this because otherwise the PIC will claim
IRQ 2 in an unshareable mode, preventing other devices from legitimately
using it.

For symmetry, in !APIC_IO mode, ignore the APIC if it's reported.

This is a hack; a better solution would have the PIC's driver release
the IRQ if it was not going to be active.


60346 11-May-2000 peter

Move <machine/ipl.h> outside #ifdef SMP because it supplies AST_RESCHED.
Without this, it shows up as an undefined symbol in /kernel. (!)
(This looks very freaky when doing a nm /kernel!)


60303 10-May-2000 obrien

1. `movl' is for use with 32-bit operands. Do NOT use it with 16-bit
operands. `movw' could be used, but instead let the assembler decide
the right instruction to use.
2. AT&T asm syntax requires a leading '*' in front of the operand for
indirect calls and jumps.


60298 10-May-2000 obrien

Do not specify the size to move. Allow the assembler to figure it out.


60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


60008 04-May-2000 wollman

Add a little do-nothing ``slopsucker'' device which gives a home
to PNP0c04 (legacy ISA coprocessor support). Tourist info.


59926 03-May-2000 dwhite

I mentioned yesterday that I could use some work, and Kelly says, "Commit my
PRs!" So here I go.

Add definitions for some of the AMD CPU feature bits. Also add a comment on
where to find the rest of them. This is a purely cosmetic change.

PR: i386/14438
Submitted by: Kelly Yancey <kbyanc@egroups.net>


59839 01-May-2000 peter

Move the MSG* and SEM* options to opt_sysvipc.h
Remove evil allocation macros from machdep.c (why was that there???) and
use malloc() instead.
Move paramters out of param.h and into the code itself.
Move a bunch of internal definitions from public sys/*.h headers (without
#ifdef _KERNEL even) into the code itself.

I had hoped to make some of this more dynamic, but the cost of doing
wakeups on all sleeping processes on old arrays was too frightening.
The other possibility is to initialize on the first use, and allow
dynamic sysctl changes to parameters right until that point. That would
allow /etc/rc.sysctl to change SEM* and MSG* defaults as we presently
do with SHM*, but without the nightmare of changing a running system.


59615 25-Apr-2000 grog

Fix a long-standing bug which caused massive character loss in remote
serial gdb: interrupts were causing either overruns or stealing
characters. Put splhigh() around the routines which transfer packets
across the line. Since this happens when the system is halted in
debug, this doesn't cause any particular problem. Now it is possible
to run the link at 115,200 bps.

PR: (not assigned yet, must be in limbo somewhere)

Add partial support for detecting non-existent gdb devices.

Add $FreeBSD$ tag.


59604 24-Apr-2000 obrien

* Use sys/sys/random.h rather than a i386 specific one.
* There was nothing that should be machine dependant about
i386/isa/random_machdep.c, so it is now sys/kern/kern_random.c.


59539 23-Apr-2000 nyan

Disable PCI BIOS on PC-98.


59495 22-Apr-2000 nyan

- PC-98 uses IRQ2 too.
- Fixed the range of DMA channels on PC-98.

Submitted by: "T.Yamaoka" <taka@windows.squares.net>


59440 20-Apr-2000 luoqi

IO apics are not necessarily page aligned, they are only required to be aligned
on 1K boundary. Correct a typo that would cause problem to a second IO apic.

Pointed out by: Steve Passe <smp.csn.net>


59368 18-Apr-2000 phk

Remove unneeded <sys/buf.h> includes.

Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.


59294 16-Apr-2000 msmith

Some more i386-only BIOS-friendliness:

- Add support for using the PCI BIOS functions for configuration space
accesses, and make this the default.

- Make PNPBIOS the default (obsoletes the PNPBIOS config option).

- Add two new boot-time tunables to disable each of the above.


59249 15-Apr-2000 phk

Complete the bio/buf divorce for all code below devfs::strategy

Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.

CCD not converted yet, casts to struct buf (still safe)

atapi-cd casts to struct buf to examine B_PHYS


58962 03-Apr-2000 msmith

Remove the !(I386 & SMP) tests; we don't run SMP on an i386 system, and
they break the LINT build.


58941 02-Apr-2000 dillon

Make the sigprocmask() and geteuid() system calls MP SAFE. Expand
commentary for copyin/copyout to indicate that they are MP SAFE as
well.

Reviewed by: msmith


58934 02-Apr-2000 phk

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


58820 30-Mar-2000 peter

Make sysv-style shared memory tuneable params fully runtime adjustable
via sysctl. It's done pretty simply but it should be quite adequate.
Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that
went with it were wrong... we don't allocate KVM space for the pages so
that comment is bogus.. The only practical limit is how much physical
ram you want to lock up as this stuff isn't paged out or swap backed.


58786 29-Mar-2000 kato

PC-98 BIOS copies the DX register into its work area. The value of it
shows `CPUID' and it is useful to identify CPU. So, it is copied from
BIOS work area to the cpu_id variable (PC-98 only).

Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)


58764 29-Mar-2000 dillon

The SMP cleanup commit broke need_resched, this fixes that and also
removed unncessary MPLOCKED and 'lock' prefixes from the interrupt
nesting level, since (A) the MP lock is held at the time, and (B) since
the neting level is restored prior to return any interrupted code
will see a consistent value.


58762 29-Mar-2000 kato

Added indirect pio into the bus space stuff for the NEC PC-98. bus.h
includes one of bus_at386.h and bus_pc98.h. Becuase only bus_pc98.h
supports indirect pio and bus_at386.h is identical to old bus.h, there
is no functional change in PC-AT's kernels. That is, it cannot cause
performance loss.

Submitted by: nyan
Reviewed by: imp
bde and luoqi provided useful comments for earlier version.


58717 28-Mar-2000 dillon

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


58706 27-Mar-2000 dillon

Commit the buffer cache cleanup patch to 4.x and 5.x. This patch fixes a
fragmentation problem due to geteblk() reserving too much space for the
buffer and imposes a larger granularity (16K) on KVA reservations for
the buffer cache to avoid fragmentation issues. The buffer cache size
calculations have been redone to simplify them (fewer defines, better
comments, less chance of running out of KVA).

The geteblk() fix solves a performance problem that DG was able reproduce.

This patch does not completely fix the KVA fragmentation problems, but
it goes a long way

Mostly Reviewed by: bde and others
Approved by: jkh


58377 20-Mar-2000 phk

Isolate the Timecounter internals in their own two files.

Make the public interface more systematically named.

Remove the alternate method, it doesn't do any good, only ruins performance.

Add counters to profile the usage of the 8 access functions.

Apply the beer-ware to my code.

The weird +/- counts are caused by two repocopies behind the scenes:
kern/kern_clock.c -> kern/kern_tc.c
sys/time.h -> sys/timetc.h
(thanks peter!)


58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


58132 16-Mar-2000 phk

Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.


57996 13-Mar-2000 bde

Disabled the optimization of not doing an invltlb_1pg() when changing
pte's from zero. The TLB is supposed to be invalidated when pte's are
changed _to_ zero, but this doesn't occur in all cases for global pages
(PG_G stops invltlb() from working, and invltlb_1pg() is not used
enough).

PR: 14141, 16568
Submitted by: dillon


57704 02-Mar-2000 dufault

I applied the wrong patch set. Back out anything associated
with the known bogus currtpriority. This undoes the previous changes to
sys/i386/i386/trap.c, sys/alpha/alpha/trap.c, sys/sys/systm.h

Now we have the patch set approved by bde.

Approved by: bde


57701 02-Mar-2000 dufault

Patches that eliminate extra context switches in FIFO case.
Fixes p1003_1b regression test in the simple case of no RR and
FIFO processes competing.

Reviewed by: jkh, bde


57571 28-Feb-2000 bsd

Reset the hardware debug registers when exec'ing a new image.

Reviewed by: bde,jlemon
Approved by: jkh


57362 20-Feb-2000 bsd

Don't forget to reset the hardware debug registers when a process that
was using them exits.

Don't allow a user process to cause the kernel to take a TRCTRAP on a
user space address.

Reviewed by: jlemon, sef
Approved by: jkh


57178 13-Feb-2000 peter

Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs. Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by: jkh


56845 29-Jan-2000 peter

Remove a bunch of unused (NO-OP) #if NFOO > 0 type includes and some
#include "foo.h" headers.


56814 29-Jan-2000 peter

Remove a workaround for a gas bug. It couldn't assemble a certain
lgdt instruction, but the binutils based one is fine and has been
for ages.


56797 29-Jan-2000 kato

Simplify messages of Pentium II, Pentium II Xeon, Celeron, Pentium III
and Pentium III Xeon CPUs. If a CPU is one of Pentium II, Pentium II
Xeon and Celeron, the message is always "Pentium II/Pentium II
Xeon/Celeron". If a CPU is one of Pentium III and Pentium III Xeon,
the message is always "Pentium III/Pentium III Xeon".


56623 26-Jan-2000 msmith

Correctly initialise the available IRQ numbers in the APIC_IO case.
IRQ 2 was being unilaterally disallowed, which is only appropriate if
the interrupt hardware is the traditional chained PIC arrangement.

Reviewed by: tegge (in principle)


56615 25-Jan-2000 dfr

Use device_printf() instead of device_print_prettyname().


56243 18-Jan-2000 billf

Cast rman_get_virtual() to a vm_offset_t.

Submitted by: msmith


56237 18-Jan-2000 alfred

unbreak (rv -> r), afaik what Mike intended, boots fine on my machine


56213 18-Jan-2000 msmith

Don't try to map memory resources into the kernel until they're actually
activated. Some of the things that get listed as "resources" aren't
necessarily suited for this.

(This shouldn't be a problem for any driver that correctly passes
RF_ACTIVE)


56024 15-Jan-2000 tanimura

A processor with the CPUID of 0x?8? is Pentium III.
(aka Coppermine)

Noticed by: Satoshi Sawada <k-sawata@gnoc2.comminet.or.jp>
Reviewd by: Takuma Yamada <fuzzy2@st.rim.or.jp>


55891 13-Jan-2000 mdodd

Allow SMP systems with an MCA bus to work properly.

Reviewed by: peter


55823 11-Jan-2000 yokota

Add a new mechanism, cndbctl(), to tell the console driver that
ddb is entered. Don't refer to `in_Debugger' to see if we
are in the debugger. (The variable used to be static in Debugger()
and wasn't updated if ddb is entered via traps and panic anyway.)

- Don't refer to `in_Debugger'.
- Add `db_active' to i386/i386/db_interface.d (as in
alpha/alpha/db_interface.c).
- Remove cnpollc() stub from ddb/db_input.c.
- Add the dbctl function to syscons, pcvt, and sio. (The function for
pcvt and sio is noop at the moment.)

Jointly developed by: bde and me

(The final version was tweaked by me and not reviewed by bde. Thus,
if there is any error in this commit, that is entirely of mine, not
his.)

Some changes were obtained from: NetBSD


55604 08-Jan-2000 bde

Compile genassym.c with ordinary ${CFLAGS}. The (small) needs for
${GEN_CFLAGS} and -U_KERNEL became negative when all all the
genassym.c's were converted to be cross-built.

Makefile.*:
- Cleanups associated with the old genassym.
- Fixed deprecated spelling of ${.IMPSRC} as "$<".


55545 07-Jan-2000 marcel

Use genassym(1). The definitions of NKPDE and NKPT have been removed
because they are already defined in pmap.h, resulting in duplicate
definitions.

Reviewed by: bde


55540 07-Jan-2000 luoqi

Allow SMP && NCPU == 1 to work. From now on, there's no restriction on the
value of NCPU relative to the number of cpus physically present, the actual
number of cpus utilized will be the smaller of the two.


55420 04-Jan-2000 tegge

ISA device drivers use the ISA source interrupt number in locations where
the low level interrupt handler number should be used. Change
setup_apic_irq_mapping() to allocate low level interrupt handler X (Xintr${X})
for any ISA interrupt X mentioned in the MP table.

Remove an assumption in the driver for the system clock (clock.c) that
interrupts mentioned in the MP table as delivered to IOAPIC #0 intpin Y
is handled by low level interrupt handler Y (Xintr${Y}) but don't assume
that low level interrupt handler 0 (Xintr0) is used.

Don't allocate two low level interrupt handlers for the system clock.
Reviewed by: NOKUBI Hirotaka <hnokubi@yyy.or.jp>


55312 02-Jan-2000 phk

Move the "sti" instruction to right before the "hlt" to close a tiny
race condition.

Obtained from: bde and/or obrien


55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


55117 26-Dec-1999 bde

Don't include <isa/isavar.h> or compile code depending on it when isa
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.


55110 26-Dec-1999 bde

Fixed breakage of read-only opening of /dev/*mem at securelevel > 0 in
previous pair of commits.

Spell the "securelevel > 0" check consistently.

Use the proc arg instead of curproc in mmopen() and mmclose().


55098 25-Dec-1999 bde

Fixed races accessing the RTC. The races apparently caused
apm_default_resume() to sometimes set a very wrong time.
(1) Accesses to the RTC index and data registers were not atomic enough.
Interrupts were not masked. This was only good enough until an
interrupt handler (rtcintr()) started accessing the RTC in FreeBSD-2.0.
(2) Access to the block of time registers in inittodr() was not atomic
enough. inittodr() has 244us to read the time registers. Interrupts
were not masked. This was only good enough until something (apm)
started calling inittodr() after boot time in FreeBSD-2.0.
The fix for (2) also makes the timecounter update more atomic, although
this is currently unimportant due to the low resolution of the RTC.

Problem reported by: mckay


54952 21-Dec-1999 eivind

Change incorrect NULLs to 0s


54890 20-Dec-1999 peter

Remove references to register_intr() etc in comments.


54192 06-Dec-1999 luoqi

Need header <machine/smp.h> for prototype declaration of smp_rendezvous()
in my previous commit.


54188 06-Dec-1999 luoqi

User ldt sharing.


54128 04-Dec-1999 kato

The address 0x472 is used for the SCSI HDD geometry information on
PC-98. Therefore, the PC-98 kernel should not modify it.


54121 04-Dec-1999 marcel

oszsigcode -> szosigcode

Pointed out by: bde


54120 04-Dec-1999 marcel

Fix type of sf_addr.

Pointed out by: bde


54073 03-Dec-1999 mdodd

Remove the 'ivars' arguement to device_add_child() and
device_add_child_ordered(). 'ivars' may now be set using the
device_set_ivars() function.

This makes it easier for us to change how arbitrary data structures are
associated with a device_t. Eventually we won't be modifying device_t
to add additional pointers for ivars, softc data etc.

Despite my best efforts I've probably forgotten something so let me know
if this breaks anything. I've been running with this change for months
and its been quite involved actually isolating all the changes from
the rest of the local changes in my tree.

Reviewed by: peter, dfr


53888 29-Nov-1999 dillon

Make BOOTP work again.

Submitted by: Doug Ambrisko <ambrisko@whistle.com>


53745 27-Nov-1999 bde

Moved scheduling-related code to kern_synch.c so that it is easier to fix
and extend. The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).

Agreed with in principle by: dufault


53706 26-Nov-1999 julian

Fix out-of-date comment


53648 24-Nov-1999 archie

Change the prototype of the strto* routines to make the second
parameter a char ** instead of a const char **. This make these
kernel routines consistent with the corresponding libc userland
routines.

Which is actually 'correct' is debatable, but consistency and
following the spec was deemed more important in this case.

Reviewed by (in concept): phk, bde


53624 23-Nov-1999 green

Fix a confusion between osigcontext and ucontext_t in the previous commit.
Since an osigcontext is smaller, if you check for a valid (much larger sized)
ucontext_t and it fails, we bogusly would reject the osigcontext as per
rev 1.378. Instead, check for osigcontext range validity first, and
ucontext_t later. This unbreaks Netscape.

Pointed to the right commit by: peter


53504 21-Nov-1999 pho

Moved useracc() to top of sigreturn as to avoid panic
caused by invalid arguments to rutine.

Reviewed by: marcel, phk


53503 21-Nov-1999 phk

s/p_cred->pc_ucred/p_ucred/g


53433 19-Nov-1999 phk

Use LIST_FOREACH to traverse the allproc list.

Submitted by: Jake Burkholder jake@checker.org


53425 19-Nov-1999 dillon

Optimize two cases in the MP locking code. First, it is not necessary
to use a locked cmpexg when unlocking a lock that we already hold, since
nobody else can touch the lock while we hold it. Second, it is not
necessary to use a locked cmpexg when locking a lock that we already
hold, for the same reason. These changes will allow MP locks to be used
recursively without impacting performance.

Modify two procedures that are called only by assembly and are already
NOPROF entries to pass a critical argument in %edx instead of on the
stack, removing a significant amount of code from the critical path
as a consequence.

Reviewed by: Alfred Perlstein <bright@wintelcom.net>, Peter Wemm <peter@netplex.com.au>


53106 12-Nov-1999 marcel

Change the type of sf_addr in struct {o}sigframe from char* to
register_t.

Fix some style bugs and bitrotted comments.

Submitted by: bde


53045 09-Nov-1999 alc

Passing "0" or "FALSE" as the fourth argument to vm_fault is wrong. It
should be "VM_FAULT_NORMAL".


52968 07-Nov-1999 phk

Patch got this one wrong, we want to check securelevel in open()


52967 07-Nov-1999 phk

Remove the iskmemdev() function. Make it the responsibility of the mem.c
drivers to enforce the securelevel checks.


52805 02-Nov-1999 jhb

Remove the prototypes for two functions that were removed when the
CD9660_ROOT option was axed.


52778 01-Nov-1999 msmith

This is a complete rewrite of vfs_conf.c, which changes the way the root
filesystem is discovered. Preference is given to using the kernel
environment variable vfs.root.mountfrom, which is set by the loader
according to the contents of /etc/fstab. Changes in the MD code
provide fallback mechanisms for systems not using the loader.

A more robust fallback path is also provided, with the last recourse
being to prompt on the console for a root device.

These changes drastically simplify the machine-dependant parts of
the root configuration process. In addition, support for CDROM root
devices has been removed; it was a nasty hack and didn't work.


52720 31-Oct-1999 alc

The useracc() calls in osigreturn() and sigreturn() should specify
VM_PROT_READ rather than VM_PROT_WRITE. (This mistake predates
the B_READ/B_WRITE -> VM_PROT_READ/VM_PROT_WRITE change.)

Submitted by: bde


52669 30-Oct-1999 iwasaki

i8254_restore is called from apm_default_resume() to reload
the countdown register.
this should not be necessary but there are broken laptops that
do not restore the countdown register on resume.
when it happnes, it messes up the hardclock interval and system clock,
which leads to the infamous "calcru: negative time" problem.

Submitted by: kjc, iwasaki
Reviewed by: Steve O'Hara-Smith <steveo@eircom.net> and committers.
Obtained from: PAO3


52647 30-Oct-1999 alc

The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to
eliminate an extra (useless) level of indirection in half of the page
queue accesses and (2) to use a single name for each queue throughout,
instead of, e.g., "vm_page_queue_active" in some places and
"vm_page_queues[PQ_ACTIVE]" in others.

Reviewed by: dillon


52644 30-Oct-1999 phk

Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the
"rw" argument, rather than hijacking B_{READ|WRITE}.

Fix two bugs (physio & cam) resulting by the confusion caused by this.

Submitted by: Tor.Egge@fast.no
Reviewed by: alc, ken (partly)


52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


52625 29-Oct-1999 phk

Remove #ifdef notyet code for doing I/O in a way we never will do it.


52469 24-Oct-1999 alc

Add text for the Athlon's MMX and 3DNow! (DSP) instruction extensions
to print_AMD_features.


52452 24-Oct-1999 dillon

Adjust the buffer cache to better handle small-memory machines. A
slightly older version of this code was tested by BDE and I.

Also fixes a lockup situation when kva gets too fragmented.

Remove the maxvmiobufspace variable and sysctl, they are no longer
used. Also cleanup (remove) #if 0 sections from prior commits.

This code is more of a hack, but presumably the whole buffer cache
implementation is going to be rewritten in the next year so it's no
big deal.


52397 19-Oct-1999 peter

Remove pccard attachment stub, this caused pccard unit 0 to be allocated
and unusable by the pccard system since pccard doesn't attach to the
nexus any more. This was stopping my 3c589D from working as pccard unit
0 is used directly for resource allocation and this fails when unit 0
isn't actually attached to anything.


52271 15-Oct-1999 tegge

Eliminate remaining part of incorrect PCI bus numbering sanity check on systems with more than one PCI bus.


52241 14-Oct-1999 dfr

* Add some verbose logging to the PnP parser and fix a couple of bugs.
* Move pnp_eisaformat() to pnp.c, declared in <isa/pnpvar.h>.
* Turn the pnpbios code into an enumerator for the isa bus. This allows
all devices known to the bios to be probed automatically.

Currently the pnpbios code is dependant on the PNPBIOS option. As the code
is tested more and when more drivers are converted this will be made the
default. I have PnP changes in the wings for fdc, atkbd, psm, pcaudio, and
joy. Sio already works with pnpbios.


52237 14-Oct-1999 kato

Recognize Pentium II w/ CPUID = 0x6XX and Pentium III Xeon w/ CPUID =
0x7XX.

Pointed out by: Brian Somers <brian@Awfulhak.org>


52199 13-Oct-1999 marcel

Fix a security bug. eflags was copied verbatim from userland.

Submitted by: bde


52177 12-Oct-1999 green

Enable MTRR support for K7 (Athlon) processors, which happens to have the
same interface as Intel's P6 family has. Incidentally, I had disabled
it in the first place since I knew the K7s were coming out soon but
did not want to assume they'd have the same MTRR interface as Intel's
chips.

Submitted by: Ville-Pertti Keinonen <will@iki.fi>


52150 12-Oct-1999 marcel

Now that userland, including modules don't use the osig* syscalls
and the kernel itself doesn't use any SYS_osig* constants, change
the syscalls to be of type COMPAT.


52140 11-Oct-1999 luoqi

Add a per-signal flag to mark handlers registered with osigaction, so we
can provide the correct context to each signal handler.

Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).

Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.

Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.

Reviewed by: marcel, jdp, bde


52121 11-Oct-1999 peter

Zap unneeded #includes

Submitted by: phk


51984 07-Oct-1999 marcel

Simplification of the signal trampoline and other cleanups.

o Remove unused defines from genassym.c that were needed
by the trampoline.
o Add load_gs_param function to support.s that catches
a fault when %gs is loaded with an invalid descriptor.
The function returns EFAULT in that case.
o Remove struct trapframe from mcontext_t and replace it
with the list of registers.
o Modify sendsig and sigreturn accordingly.

This commit contains a patch by bde.

Reviewed by: luoqi, jdp


51942 04-Oct-1999 marcel

Re-introduction of sigcontext.

struct sigcontext and ucontext_t/mcontext_t are defined in such
a way that both (ie struct sigcontext and ucontext_t) can be
passed on to sigreturn. The signal handler is still given a
ucontext_t for maximum flexibility.

For backward compatibility sigreturn restores the state for the
alternate signal stack from sigcontext.sc_onstack and not from
ucontext_t.uc_stack. A good way to determine which value the
application has set and thus which value to use, is still open
for discussion.

NOTE: This change should only affect those binaries that use
sigcontext and/or ucontext_t. In the source tree itself
this is only doscmd. Recompilation is required for those
applications.

This commit also fixes a lot of style bugs without hopefully
adding new ones.

NOTE: struct sigaltstack.ss_size now has type size_t again. For
some reason I changed that into unsigned int.

Parts submitted by: bde
sigaltstack bug found by: bde


51908 03-Oct-1999 marcel

Reinstate the 4th argument to old signal handlers. Don't set it
when the handler uses siginfo_t.


51834 01-Oct-1999 marcel

Implement the use of si_addr in siginfo_t.

Suggested by: jdp


51833 01-Oct-1999 marcel

Don't check %cs *after* it has being set in sigreturn. If the check
fails, applications could end up running in kernel mode (oops).

Submitted by: bde


51792 29-Sep-1999 marcel

sigset_t change (part 3 of 5)
-----------------------------

By introducing a new sigframe so that the signal handler operates
on the new siginfo_t and on ucontext_t instead of sigcontext, we
now need two version of sendsig and sigreturn.

A flag in struct proc determines whether the process expects an
old sigframe or a new sigframe. The signal trampoline handles
which sigreturn to call. It does this by testing for a magic
cookie in the frame.

The alpha uses osigreturn to implement longjmp. This means that
osigreturn is not only used for compatibility with existing
binaries. To handle the new sigset_t, setjmp saves it in
sc_reserved (see NOTE).

the struct sigframe has been moved from frame.h to sigframe.h
to handle the complex header dependencies that was caused by
the new sigframe.

NOTE: For the i386, the size of jmp_buf has been increased to hold
the new sigset_t. On the alpha this has been prevented by
using sc_reserved in sigcontext.


51659 25-Sep-1999 mjacob

Fix from Tor so that if we enter the debugger in the tristate going to
SMP (other CPUs stopped but SMP mode not really started).

Obtained from: Tor.Egge@fast.no


51658 25-Sep-1999 phk

Remove five now unused fields from struct cdevsw. They should never
have been there in the first place. A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags


51561 22-Sep-1999 luoqi

Display CPU (BSP) clock speed on SMP systems.


51498 21-Sep-1999 phk

Print out flags value


51474 20-Sep-1999 dillon

Fix bug in pipe code relating to writes of mmap'd but illegal address
spaces which cross a segment boundry in the page table. pmap_kextract()
is not designed for access to the user space portion of the page
table and cannot handle the null-page-directory-entry case.

The fix is to have vm_fault_quick() return a success or failure which
is then used to avoid calling pmap_kextract().


51211 12-Sep-1999 green

Correction: mem.c devices are "D_MEM" (and D_MEM is added.)

Taken issue with by: phk


51207 12-Sep-1999 green

Mainly stylistic fixes:
1. return( -> return (
2. inappropriate ENODEV -> ENOTTY
3. some unreachable cases removed


51206 12-Sep-1999 green

Make the d_flags of mem devices D_DISK to signify that they are disk-like
random-seekable devices. This lets dd(1) know it can seek on them. It
also affects spec_vnopen() (IIRC), but only makes the path of execution smaller,
and does not change its behavior. This is when securelevel >= 2.


51193 12-Sep-1999 msmith

Some PnP BIOSsen return garbage in the high byte of the number-of-devices
field (or don't set the high byte at all). Clear it to avoid reporting
a silly number of devices.

Reported by: phk


51183 11-Sep-1999 peter

Make pmap_mapdev() deal with non-page-aligned requests.
Add a corresponding pmap_unmapdev() to release the KVM back to kernel_map.


51130 10-Sep-1999 phk

System clock don't update, because C6's TSC stop count up when run
HALT instruction.

PR: 13683
Submitted by: IMAI Takeshi <take-i@ceres.dti.ne.jp>
Reviewed by: phk


51126 10-Sep-1999 peter

Add text for the PN (Processor serial number) and XMM (extended SIMD/MMX2/
support), as well as a bunch of comments for what the various bits mean
(those that I remember anyway).


51121 10-Sep-1999 msmith

Look for the right ACPI signature.

Submitted by: dfr


51114 10-Sep-1999 msmith

Invoke smp_rendezvous_action() using the a.out compatible asnames.h
technique (bleagh).


51065 07-Sep-1999 luoqi

Save %gs in sigcontext when delivering a signal and restore them upon
return (in signal trampoline code). I plan to do the same on -stable,
so that we have a consistent interface to userland applications.

Reviewed by: bde


50992 06-Sep-1999 imp

Add pccard child to nexus. A better version would take care of this
with an identify method, but that has not been implemented.

Forgotten by: imp


50972 05-Sep-1999 peter

Set up FPU state on the AP.

Tested by: phk


50823 03-Sep-1999 mdodd

This adds the i386 specific support for systems with a MicroChannel
Architecture bus.

Reviewed by: msmith


50816 02-Sep-1999 luoqi

Some reorganization of sysarch() interface:
1. Move definitions of struct i386_*_args to the header file sysarch.h,
since they are part of the sysarch API. struct i386_get_ldt_args and
i386_set_ldt_args were identical, therefore make them into one
struct i386_ldt_args. Libc should use these definitions as well.
2. Return a more sensible EOPNOTSUPP for unknown operations.

Reviewed by: marcel


50788 02-Sep-1999 peter

Update for new pnp includes


50769 01-Sep-1999 dfr

This represents essentially a complete rewrite of the ISA PnP code. The
new system is integrated with the ISA bus code more cleanly and allows
the future addition of more enumerators such as PnPBIOS and ACPI.

This commit also enables the new pcm driver since it is somewhat tied to
the new PnP code.


50677 31-Aug-1999 msmith

Make the error return from mem_range_attr_get actually do something useful
(return an error to the caller)


50674 30-Aug-1999 msmith

Check that there is memory range support before attempting to perform such
an operation, as a kernel client may not have previously checked the CPU
type (it may not be able to).

Also correct the function declaration style for the mem_range functions to
match the rest of this file (oops).

Submitted by: gibbs


50511 28-Aug-1999 phk

We don't need to pass the diskname argument all over the diskslice/label
code, we can find the name from any convenient dev_t


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50463 27-Aug-1999 jlemon

Reference the correct gdt[] entry on SMP. Remove the `generation' flag,
and always reload the selectors for every bios call.


50379 25-Aug-1999 peter

Use .p2align to ensure consistant a.out/elf alignment. I'd have used
SUPERALIGN_TEXT, but this is inline assembler and after cpp has run.
Inspired by bde's comments on linux_locore.s.


50337 25-Aug-1999 msmith

Rename 'bios_jmp' to 'bios16_jmp' to make it clear what it's related to.


50336 25-Aug-1999 peter

Use the far jump for the base of the page arithmatic rather than the
calling function, otherwise Bad Things Happen(tm) when bios16_call is
not in the same page as bios_jmp.

Reviewed by: msmith


50313 24-Aug-1999 msmith

Work around a bad design in some PnP BIOS code whereby the BIOS can reach
off the top of our constructed stack segment while it's trying to copy a
maximally-sized PnP argument frame around.


50303 24-Aug-1999 alc

Cosmetic: Correct the Id string.

Submitted by: Peter Jeremy <jeremyp@gsmx07.alcatel.com.au>


50271 24-Aug-1999 bde

Fixed a misplaced cast to uintptr_t. Cosmetic.

Use device_get_nameunit() instead of rolling our own.


50268 23-Aug-1999 bde

`bootdev' is an ordinary u_long, so don't cast it to a pointer to print it.
gcc warns about the cast on i386's with 64-bit longs.

Print `bootdev' in all cases when we bail out because it is unreasonable.


50257 23-Aug-1999 phk

Now that we can bind cdevsw to the individual dev_t, divorce the PERFMON
stuff from mem.c. If PERFMON is there, it will "steal" a minor from
mem.c, but mem.c doesn't need to know about this.

Fixed type of cmd argument in perfmon_ioctl().


50254 23-Aug-1999 phk

Convert DEVFS hooks in (most) drivers to make_dev().

Diskslice/label code not yet handled.

Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)

Add the correct hook for devfs to kern_conf.c

The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.

A few drivers had minor additional cleanups performed relating to cdevsw
registration.

A few drivers don't register a cdevsw{} anymore, but only use make_dev().


50252 23-Aug-1999 peter

The nexus_attach() code works a lot better if it's actually connected to
the device methods... Also, don't fail to add eisa/isa because a previous
device failed to attach.


50251 23-Aug-1999 alc

Modify the macros IMASK_UNLOCK, CPL_UNLOCK, and REL_FAST_INTR_LOCK
to perform the s_unlock inline.


50197 22-Aug-1999 peter

The previous fix didn't do anything if you didn't have pnp. The ICU
macros are only called in the !APIC_IO case, include icu.h there.


50196 22-Aug-1999 green

Finish unbreaking autoconf.c includes (for non-SMP.)


50185 22-Aug-1999 peter

Oops, that wasn't so clever after all. struct isa_device is still a
prerequisite for this old pnp.h.


50184 22-Aug-1999 peter

Zap a heap of unused cruft now. We don't need the ISA/EISA/PCI hooks
here any more as they are self identifying. Only PNP remains but that
will be replaced any day now.
Also reword a comment that had been XXX'ed to death to make it clear[er]
why we don't enable interrupts before probing.
PCIBIOS interrupt routing controls may make this possible to fix one day.


50183 22-Aug-1999 peter

Take advantage of the apm/npx code and let them identify themselves rather
than having explicit hooks here.
Treat the eisa/isa attach a little differently so that we defer the
decision about to attach eisa/isa to the motherboard directly only if
the PCI probe (if it exists) fails to turn up a PCI->EISA/ISA bridge.
This restores the original device geometry where ISA and/or EISA attach
to their bridge rather than bypassing and going to the root.


50181 22-Aug-1999 peter

Add an identify method to allow npx to arrange itself to be attached to
the nexus without explicit code in the nexus to do so.


50094 20-Aug-1999 msmith

Loosen up the constructed argument segment generation slightly; rather than
trying to size it intelligently just make it 64k and leave it up to the caller
to ensure that the arguments all fit within that range.

This should resolve the issue that some people were seeing with the PnP BIOS
scan crashing on a large PnP node.


50081 20-Aug-1999 kato

There may exist two kinds of IBM BlueLightning CPU. One is that 5/2
test does not change undefined flag like Cyrix CPUs. Another is that
5/2 test changes undefined flag like Intel CPUs. Latter one could not
be detected and was recognized 486DX CPU. To solve this,
finishidentcpu() calls identblue() when cpu_vendor is null string
(that is, CPUID instruction is not supported) and cpu == CPU_486.
Tests have been done on IBM BlueLightning CPUs, i486SX and i486DX.


50037 19-Aug-1999 peter

Update for MI switch code, and trim a heap of unused (I believe) entries.


50036 19-Aug-1999 peter

Use the MI process selection. We use a quick routine to decide whether
to get the mplock and enter the kernel to run a process in the SMP case.


49999 18-Aug-1999 alc

Create callable (non-inline) versions of the atomic_OP_TYPE functions
that are linked into the kernel. The KLD compilation options are
changed to call these functions, rather than in-lining the
atomic operations.

This approach makes atomic operations from KLDs significantly
faster on UP systems (though somewhat slower on SMP systems).

PR: i386/13111
Submitted by: peter.jeremy@alcatel.com.au


49996 18-Aug-1999 msmith

Remove the SMBIOS detection and definitions; this should be handled in a
loadable module (under development).


49953 17-Aug-1999 msmith

Search for and interrogate the PnP BIOS if found. This code just prints
the PnP device IDs in verbose mode; it does not (yet) save any resource
data or contribute to the PnP process nor resource management.


49952 17-Aug-1999 msmith

Mindbogglingly, many BIOS vendors expect to be able to load %ds with
0x40 and then access data stored in real-mode segment 0x40, even when
called in protected mode. Microsoft unfortunately coddle these individuals,
and so must we if we want to run their code.

This change works around GPFs in some APM and PnP BIOS implementations.

Obtained from: Linux


49859 16-Aug-1999 gibbs

Fix a bug in busdma_mem_free() where we were improperly checking
the map associated with the region to free.


49679 13-Aug-1999 phk

The bdevsw() and cdevsw() are now identical, so kill the former.


49635 11-Aug-1999 alc

_pmap_allocpte:
If the pte page isn't PQ_NONE, panic rather than silently
covering up the problem.


49591 10-Aug-1999 alc

pmap_remove_pages:
Add KASSERT to detect out of range access to the pv_table and
report the errant pte before it's overwritten.


49558 09-Aug-1999 phk

Merge the cons.c and cons.h to the best of my ability. alpha may or
may not compile, I can't test it.


49474 06-Aug-1999 phk

Forgot the "bsd" slice, now setrootbyname() understands "wd0s1a".


49421 04-Aug-1999 msmith

Fix typo which would have caused MTRR support on non-SMP systems to
behave in an utterly random fashion.

Submitted by: gibbs


49337 31-Jul-1999 alc

pmap_object_init_pt:
Verify that object != NULL.


49326 31-Jul-1999 alc

Change the type of vpgqueues::lcnt from "int *" to "int". The indirection
served no purpose.


49304 31-Jul-1999 alc

Add parentheses for clarity.

Submitted by: dillon


49223 29-Jul-1999 msmith

Formatting-only cleanup accidentally omitted from the patch merge in the
previous major update. Bring new code into style alignment with the
existing code. No functional changes.


49207 29-Jul-1999 peter

GBIOSSTACK_SEL is undefined, but OTOH, BSSSEL apparently isn't used either.


49204 29-Jul-1999 msmith

Remove some duplicate definitions, as suggested by Alan Cox.


49203 29-Jul-1999 msmith

Fix for vmspace sharing as per Alan Cox. Thanks!


49197 29-Jul-1999 msmith

Major update to the kernel's BIOS-calling ability.

- Add support for calling 32-bit code in other segments
- Add support for calling 16-bit protected mode code

Update APM to use this facility.

Submitted by: jlemon


49196 29-Jul-1999 green

Remove XXX from the headers (broke the build, I'm betting.)


49195 29-Jul-1999 mdodd

Alter the behavior of sys/kern/subr_bus.c:device_print_child()

- device_print_child() either lets the BUS_PRINT_CHILD
method produce the entire device announcement message or
it prints "foo0: not found\n"

Alter sys/kern/subr_bus.c:bus_generic_print_child() to take on
the previous behavior of device_print_child() (printing the
"foo0: <FooDevice 1.1>" bit of the announce message.)

Provide bus_print_child_header() and bus_print_child_footer()
to actually print the output for bus_generic_print_child().
These functions should be used whenever possible (unless you can
just use bus_generic_print_child())

The BUS_PRINT_CHILD method now returns int instead of void.

Modify everything else that defines or uses a BUS_PRINT_CHILD
method to comply with the above changes.

- Devices are 'on' a bus, not 'at' it.
- If a custom BUS_PRINT_CHILD method does the same thing
as bus_generic_print_child(), use bus_generic_print_child()
- Use device_get_nameunit() instead of both
device_get_name() and device_get_unit()
- All BUS_PRINT_CHILD methods return the number of
characters output.

Reviewed by: dfr, peter


49186 28-Jul-1999 msmith

We're called too early to have any idea whether APM is going to be
active or not. The only sane thing we can do here is assume that if
APM is supported it might be active at some point, and bail.

In reality, even this isn't good enough; regardless of whether we support
APM or not, the system may well futz with the CPU's clock speed and throw
the TSC off. We need to stop using it for timekeeping except under
controlled circumstances. Curse the lack of a dependable high-resolution
timer.


49178 28-Jul-1999 msmith

Remove some droppings left over from the removal of the APM hooks.


49098 26-Jul-1999 cracauer

Various formatting fixes on my FPE trapcode commit.

Submitted by: BDE


49081 25-Jul-1999 cracauer

On FPU exceptions, pass a useful error code (one of the FPE_...
macros) to the signal handler, for old-style BSD signal handlers as
the second (int) argument, for SA_SIGINFO signal handlers as
siginfo_t->si_code. This is source-compatible with Solaris, except
that we have no <siginfo.h> (which isn't even mentioned in POSIX
1003.1b).

An rather complete example program is at
http://www3.cons.org/cracauer/freebsd-signal.c
This will be added to the regression tests in src/.

This commit also adds code to disable the (hardware) FPU from
userconfig, so that you can use a software FP emulator on a machine
that has hardware floating point. See LINT.


48974 22-Jul-1999 alc

Reduce the number of "magic constants" used for page coloring
by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same
thing at different places in the kernel. Drop PQ_PRIME3.


48963 21-Jul-1999 alc

Fix the following problem:

When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR: kern/12378
Submitted by: tegge
Reviewed by: dfr


48925 20-Jul-1999 msmith

Update of the i686 MTRR/memory range support.

- Support for setting memory range attributes on SMP systems using the
new SMP rendezvous function
- Don't print the confusing default memory type message.
- Allow legal overlapping range types.
- Turn interrupts back on after setting MTRRs in UP mode (whoops)
- Don't waste time calling invltlb() after wbinvd(); it's not
SMP-compatible (interrupts are off) and unncessary because
wbinvd already flushes the TLB.

This code is now essentially feature-complete.


48924 20-Jul-1999 msmith

Implement an all-CPU shootdown-style rendezvous facility. This allows
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.

The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.

Reviewed by: peter (partially)


48918 19-Jul-1999 peter

Fix a page size vs. KB mixup. The extra buffers allocated at a reduced
rate is meant to kick in at 64MB, not 256MB.

Reviewed by: Matthew Dillon <dillon@backplane.com>


48889 18-Jul-1999 bde

Updated acquire_timer2()'s state machine to work when the i8254 is
being used for timecounting. Fixed a race or two in it. Undisabled
it.

PR: 10455


48888 18-Jul-1999 bde

Don't let the machdep.tsc_freq sysctl proceed if the TSC is present
but broken, since tsc_timecounter is not initialised in that case,
and updating an uninitialised timecounter is fatal.

Fixed style bugs in the machdep.i8254_freq and machdep.tsc_freq
sysctls.

Reviewed by: phk


48868 17-Jul-1999 phk

Centralize dumpdev handling.


48832 16-Jul-1999 msmith

Add support for multiple PCI busses directly connected to the nexus.
This is only partially complete, but allows 450NX-based systems with
more than one PCI bus to be used again.

Submitted by: dfr


48729 10-Jul-1999 bde

Go back to the old (icu.s rev.1.7 1993) way of keeping the AST-pending
bit separate from ipending, since this is simpler and/or necessary for
SMP and may even be better for UP.

Reviewed by: alc, luoqi, tegge


48691 09-Jul-1999 jlemon

Implement support for hardware debug registers on the i386.

Submitted by: Brian Dean <brdean@unx.sas.com>


48677 08-Jul-1999 mckusick

These changes appear to give us benefits with both small (32MB) and
large (1G) memory machine configurations. I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.

* buffer cache hash table now dynamically allocated. This will
have no effect on memory consumption for smaller systems and
will help scale the buffer cache for larger systems.

* minor enhancement to pmap_clearbit(). I noticed that
all the calls to it used constant arguments. Making
it an inline allows the constants to propogate to
deeper inlines and should produce better code.

* removal of inherent vfs_ioopt support through the emplacement
of appropriate #ifdef's, with John's permission. If we do not
find a use for it by the end of the year we will remove it entirely.

* removal of getnewbufloops* counters & sysctl's - no longer
necessary for debugging, getnewbuf() is now optimal.

* buffer hash table functions removed from sys/buf.h and localized
to vfs_bio.c

* VFS_BIO_NEED_DIRTYFLUSH flag and support code added
( bwillwrite() ), allowing processes to block when too many dirty
buffers are present in the system.

* removal of a softdep test in bdwrite() that is no longer necessary
now that bdwrite() no longer attempts to flush dirty buffers.

* slight optimization added to bqrelse() - there is no reason
to test for available buffer space on B_DELWRI buffers.

* addition of reverse-scanning code to vfs_bio_awrite().
vfs_bio_awrite() will attempt to locate clusterable areas
in both the forward and reverse direction relative to the
offset of the buffer passed to it. This will probably not
make much of a difference now, but I believe we will start
to rely on it heavily in the future if we decide to shift
some of the burden of the clustering closer to the actual
I/O initiation.

* Removal of the newbufcnt and lastnewbuf counters that Kirk
added. They do not fix any race conditions that haven't already
been fixed by the gbincore() test done after the only call
to getnewbuf(). getnewbuf() is a static, so there is no chance
of it being misused by other modules. ( Unless Kirk can think
of a specific thing that this code fixes. I went through it
very carefully and didn't see anything ).

* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not
think this check is necessary, the buffer should flush properly
whether the vnode is locked or not. ( yes? ).

* removal of extra arguments passed to getnewbuf() that are not
necessary.

* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
vfs_cluster.c

* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
which should greatly aid flushing operations in heavy load
situations - both the pageout and update daemons will be able
to operate more efficiently.

* removal of b_usecount. We may add it back in later but for now
it is useless. Prior implementations of the buffer cache never
had enough buffers for it to be useful, and current implementations
which make more buffers available might not benefit relative to
the amount of sophistication required to implement a b_usecount.
Straight LRU should work just as well, especially when most things
are VMIO backed. I expect that (even though John will not like
this assumption) directories will become VMIO backed some point soon.

Submitted by: Matthew Dillon <dillon@backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


48636 06-Jul-1999 peter

Quieten gcc paranoia.


48632 06-Jul-1999 peter

Typo: s/0ff0/0xff0/


48621 06-Jul-1999 cracauer

Implement SA_SIGINFO for i386. Thanks to Bruce Evans for much more
than a review, this was a nice puzzle.

This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.

Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.

Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.

Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c

Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):

Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)

New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */

The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.

FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.

Reviewed by: BDE


48618 06-Jul-1999 green

Add Centaur/IDT WinChip support.

Why in the world do people put breaks at the end of a switch's default case?


48615 06-Jul-1999 green

I made some cleanups, rearranged things a bit, and made AMD Features default
printing on CPUs that have it.
If there are no objections, I'll MFC all recent changes (harmless, really)
to 3.2 and PAO.


48579 05-Jul-1999 msmith

Move the initialisation/tuning of nmbclusters from param.c/machdep.c
into uipc_mbuf.c. This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.

Make NMBUFs tunable as well.

Move the nmbclusters sysctl here as well.

Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.

Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).


48572 05-Jul-1999 green

Add an extra space to " AMD Features=" to make it line up well.


48571 05-Jul-1999 green

K6-III CPUs are now case:d in the appropriate switch; also, in
print_AMD_info(), L2 internal cache is shown, as are AMD's special CPUID
infos:

CPU: AMD-K6(tm) 3D processor (350.81-MHz 586-class CPU)
Origin = "AuthenticAMD" Id = 0x58c Stepping=12
Features=0x8021bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX>
AMD Features=0x808029bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SYSCALL,PGE,MMX,3DNow!>

PR: kern/12512
Submitted by: Louis A. Mamakos <louie@TransSys.COM>


48546 04-Jul-1999 jlemon

Some cleanup and rearrangement. hw.physmem is now an absolute quantity;
we will never use more memory than this value (if specified), but will always
check memory for validity up to this amount.

Get rid of the speculative_mprobe option; the memory amount can now be
specified by hw.physmem.


48544 04-Jul-1999 mckusick

The buffer queue mechanism has been reformulated. Instead of having
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean
and dirty buffers have been separated. Empty buffers with KVM
assignments have been separated from truely empty buffers. getnewbuf()
has been rewritten and now operates in a 100% optimal fashion. That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).

Buffer flushing has been reorganized. Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur. This resulted in processes blocking on conditions
unrelated to what they were doing. This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits. We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.

The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat. The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.

reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed. This
algorithm has been changed. reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list. The new algorithm is deterministic but
not perfect. The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.

The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit. This bit allows processes working with filesystem
buffers to use available emergency reserves. Normal processes do not set
this bit and are not allowed to dig into emergency reserves. The purpose
of this bit is to avoid low-memory deadlocks.

A small race condition was fixed in getpbuf() in vm/vm_pager.c.

Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


48531 03-Jul-1999 peter

printf int/dev_t (pointer) warning


48521 03-Jul-1999 peter

Fix warnings in last commit (dev_t is not an int, and not even int
compatable in arg lists on the Alpha)


48512 03-Jul-1999 phk

Be more informative and try to ask the user in some instances if we can't
figure out the root device.


48505 03-Jul-1999 alc

An SMP-specific change: Add the lock prefix to RMW operations
on ipending.


48476 02-Jul-1999 msmith

Lightly overhaul the memory sizing code again.

- The kernel environment variable 'hw.physmem' can be used to set the
amount of physical memory space, based at 0, that FreeBSD will use.
Any memory detected over this limit is ignored. Documentation for
this is available under 'help set tunables' in the loader.

- In the case where system memory size can't be accurately determined,
hw.physmem is used as a best-guess memory size, but speculative
probing will be used to determine actual memory size if any of the
guesses or hints are 16M or more.

- If RB_VERBOSE, we list the memory regions as we test them.

- The compile-time option MAXMEM supplies a default value for
'hw.physmem'.


48449 02-Jul-1999 mjacob

Correct some ugly formatting. Remember to initialize the alignment tag.
Honor and pass a callers request to contigalloc if they had a non-zero
alignment constraint.


48445 02-Jul-1999 peter

Zap totally the npx0 memory size override. It only worked if statically
specified in the kernel config file - but setting options MAXMEM works
exactly the same. Userconfig overrides of this have not worked for
ages.

Also, change the getenv for the loader override to hw.physmem based on a
prior suggestion from Mike Smith. I think he still wants to change this
some, but this shouldn't get in his way. This is a forced setting of
the memory size, not a "cap". We probably should have a plain 'maxmem'
variable as well which does do a cap, without loosing the bios memory
configuration data.


48405 01-Jul-1999 peter

Look up the kernel environment for MAXMEM as a final override for the
memory size. If somebody wants to change the name, fine - I used this
since it's consistant with the config variable it replaces.
This is intended to replace the npx0 msize hack (which no longer works).


48404 01-Jul-1999 peter

Move kern_envp and preload initialization a little earlier so that we
can do a getenv_int() inside the memory sizing routines to override the
memory limit.


48391 01-Jul-1999 peter

Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface. kproc_start is still available
as a SYSINIT() hook. This allowed simplification of chunks of the
sysinit code in the process. This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process. It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit. This means that nfsiod
doesn't need to be in /sbin and is always "available". This is a fair bit
easier to do outside of the SYSINIT_KT() framework.


48327 28-Jun-1999 luoqi

Save common_tssd before it's loaded and the busy bit set.

Submitted by: bde


48308 28-Jun-1999 peter

Use the same -UKERNEL strategy as the alpha to avoid the inlines etc.


48288 27-Jun-1999 alc

An SMP-specific change: Remove an unnecessary lock acquire and release
from every system call. (Storing a 32-bit constant is inherently
atomic.)

Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>


48266 27-Jun-1999 peter

Shut up gcc.


48203 24-Jun-1999 jlemon

Fix warning message; that was 4GB, not 2GB. I apparently can't do
arithmetic today.


48202 24-Jun-1999 jlemon

Explicitly ignore any memory > 2GB, we don't support it yet.


48200 24-Jun-1999 jlemon

Only include AMD wt_alloc routines if I586_CPU is defined. Fixes
CPU_WT_ALLOC for cyrix chips.

Submitted by: "Brian Smith" <dbsoft@technologist.com>


48160 24-Jun-1999 green

This commit gives support for the Rise mP6 CPU. It has two changes:
1. Rise is recognized in identdcpu.c.
2. The TSC is not written to. A workaround for the CPU bug is being
applied to clock.c (the bug being that the mP6 has TSC enabled
in its CPUID-capabilities, but it only supports reading it. If we
try to write to it (MSR 16), a GPF occurs.) The new behavior is that
FreeBSD will _not_ zero the TSC. Instead, we do a bit of 64-bit
arithmetic.

Reviewed by: msmith
Obtained from: unfurl & msmith


48145 23-Jun-1999 msmith

Changes in the way that the APs are started appears to have removed the
problem with having more CPUs than NCPU.

PR: kern/4255
Submitted by: peter


48144 23-Jun-1999 luoqi

Do not setup 4M pdir until all APs are up.


48119 22-Jun-1999 msmith

Remove an unnecessary panic when sparse PCI bus numbering is encountered.
This is found eg. on some Compaq Proliant systems.

Submitted by: peter


48008 18-Jun-1999 green

Harmless change to prevent possible problems in the future. I made
sure that i686_mem was only used when
1. CPUID had MTRR set (this was there before)
2. the CPU was GenuineIntel (not there)
3. the CPU is a 686 (also not there)

This should prevent any problems with CPUs that set MTRR but aren't
compatibile with Intel's interface (none that I know of yet.)


48005 18-Jun-1999 bde

Changed the global `idt' from an array to a pointer so that npx.c
automatically hacks on the active copy of the IDT if f00f_hack()
has changed it. This also allows simplifications in setidt().
This fixes breakage of FP exception handling by rev.1.55 of
sys/kernel.h. FP exceptions were sent to npx.c's probe handlers
because npx.c "restored" the old handlers to the wrong copy of the
IDT. The SYSINIT for f00f_hack() was purposely run quite late to
avoid problems like this, but it is bogusly associated with the
SYSINIT for proc0 so it was moved with the latter.

Problem reported and fix tested by: Martin Cracauer <cracauer@cons.org>


47942 16-Jun-1999 tegge

Clean up bitrot in interrupt tracing code.


47926 15-Jun-1999 des

Kill option FAILSAFE.

PR: i386/12187
Approved by: bde


47892 13-Jun-1999 alc

Use pmap_kenter instead of pmap_enter to map the message buffer.


47862 10-Jun-1999 jlemon

Change variable used for calculating ending address of physical memory
from 'int' to 'vm_offset_t'.

Spotted by: Richard Cownie <tich@ma.ikos.com>


47842 08-Jun-1999 dt

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() to allocate
kernel virtual address space for UPAGES.


47763 05-Jun-1999 luoqi

Fix an accounting problem when prefaulting 4M pages.

PR: kern/11948


47688 01-Jun-1999 jlemon

Unbreak memory sizing for SMP.


47680 01-Jun-1999 phk

Introduce the makebdev() function, it does the same as the makedev()
function for now, but that will change.


47679 01-Jun-1999 jlemon

Null commit; note that there is a new memory sizing routine that uses
the BIOS calls to determine the memory configuration. This should fix
problems with >64M for good.

Reviewed by: Mike Smith


47678 01-Jun-1999 jlemon

Unifdef VM86.

Reviewed by: silence on on -current


47642 31-May-1999 dfr

Remove fd driver from its old home and change files which include rtc.h
to account for its new location.


47640 31-May-1999 phk

Simplify cdevsw registration.

The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.


47625 30-May-1999 phk

This commit should be a extensive NO-OP:

Reformat and initialize correctly all "struct cdevsw".

Initialize the d_maj and d_bmaj fields.

The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.


47611 30-May-1999 dfr

Activate/deactivate resources by calling the method, not through the
resource manager automatic handling of RF_ACTIVE.


47592 29-May-1999 phk

Stop the TSC from being used as timecounter on K5/step0 machines.


47588 28-May-1999 bde

Fixed glitches (jumps) of about 1/HZ seconds for the i8254 timecounter.
The old version only worked right when the time was read strictly
more often than every 1/HZ seconds, but we only guarantee reading
it every (1/HZ + epsilon) seconds. Part of rev.1.126-1.127 attempted
to fix this but didn't succeed. Detect counter rollover using the
heuristic from the old version of microtime() with additional
complications for supporting calls from fast interrupt handlers.
This works provided i8254 interrupts are not delayed by more than
1/(2*HZ) seconds.

This needs more comments, and cleanups for the SMP case, and more
testing of the SMP case before it is merged into RELENG_3.

Tested by: jhay


47575 28-May-1999 alc

pmap_object_init_pt:
The size of vm_object::memq is vm_object::resident_page_count,
not vm_object::size.


47444 24-May-1999 jb

- Make setroot() conditional on FFS etc, to avoid a compiler warning
on systems with no FFS.
- Remove all references to mfs from cpu_rootconf(). mfs_init is
called prior to cpu_rootconf(), so it can set mountrootfsname to mfs
and (more imporantly) set rootdev using the (bogus in Bruce's opinion)
special major number of 255.


47307 18-May-1999 peter

Move pcibus (host -> pci bus) probe/attach routines from nexus
to pcibus.c. pci_cfgopen() becomes static and there are no more
bus #ifdef's in nexus.c.


47292 18-May-1999 alc

pmap_qremove:
Eliminate unnecessary TLB shootdowns.


47226 15-May-1999 peter

Don't hardcode IRQ 13 for NPX. It's as good as hardwired in the hardware
though, on systems (386 mostly) that still have a seperate fpu, but it
might be possible to find systems where the FPU coprocessor is wired to
a different IRQ pin.


47101 13-May-1999 bde

Renamed the private copies of strlen and strcpy to gdb_strlen and
gdb_strcpy, respectively. This saves fixing the wrong return type
of the private strlen and makes the addresses of strlen and strcpy
unambiguous.


47081 12-May-1999 luoqi

Unbreak VESA on SMP.


47080 12-May-1999 luoqi

VM86_FRAMESIZE is now the size of vm86 frame, not the number of 4-byte words.

Requested by: Bruce


47048 12-May-1999 phk

Fix dumpon. It passes a udev_t from userland to kernel, that needs a
udev2dev() before we use it.

It really should pass a name like swapon does.


47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


47024 11-May-1999 luoqi

Yet another place I missed when increasing trapframe size, which causes problem
to SIGFPE handling.

Reviewed by: Bruce Evans <bde@zeta.org.au>


47022 11-May-1999 luoqi

Do not hardcode size of struct vm86frame.

Submitted by: Jonathan Lemon <jlemon@americantv.com>


46917 10-May-1999 dfr

Add missing suspend/resume methods.


46915 10-May-1999 peter

Move the mfs_getimage() prototype to mfs_extern.h duplicating it
everywhere.


46881 10-May-1999 bde

[Forgot to commit this in the batch a few days ago.]

Fixed profiling of elf kernels. Made high resolution profiling compile
for elf kernels (it is broken for all kernels due to lack of egcs support).

Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others. A few type declarations are still missing, mainly for SMP.

PR: 9413
Submitted by: Assar Westerlund <assar@sics.se> (initial parts)


46847 09-May-1999 peter

For what it's worth, idelayed is declared as a volatile in the headers,
and even though it's not used in this file make it a volatile here too.


46823 09-May-1999 peter

s/main/mi_startup/ for the kernel entry point so that egcs doesn't get
upset about it (and generate things like __main() calls that are reserved
for main()). Renaming was phk's suggestion, but I'd already thought about
it too. (phk liked my suggested name tada() but I decided against it :-)

Reviewed by: phk


46808 09-May-1999 phk

Oops. If ROOTDEVNAME isn't defined, have -r call -a.


46806 09-May-1999 phk

Major lobotomy of config(8). The

config kernel mumble mumble

line has been obsoleted and removed and with it went all knowledge of
devices on the part of config.

You can still configure a root device (which is used if you give
the "-r" flag) but now with an option:

options ROOTDEVNAME=\"da0s2e\"

The string is parsed by the same code as at the "boot -a" prompt.

At the same time, make the "boot -a" prompt both more able and more
informative.

ALPHA/PC98 people: You will have to adapt a few simple changes
(defining rootdev and dumpdev somewhere else) before config works
for you again, sorry, but it's all in the name of progress.


46783 09-May-1999 phk

add some amount of sanity to the way the gdb stuff finds its device.

I'm not too happy about the result either, but at least it has less
chance of backfiring.

This particular feature could be called "a mess" without offending
anybody.


46773 09-May-1999 phk

Duh, bdevsw() takes dev_t arg.


46743 08-May-1999 dfr

Move the declaration of the interrupt type from the driver structure
to the BUS_SETUP_INTR call.


46737 08-May-1999 peter

Add some notes about the globalness of certain things like interrupts
and ISA DMA channels (ie: on most PCI systems, they are not.. they are
on the ISA side of the PCI-ISA bridge and could be duplicated if there
were multiple PCI-ISA bridges, say in a laptop docking station), while
the APIC resources would be global on SMP systems.
Also, revert a previous change, change some printfs back to panics.


46728 08-May-1999 peter

Don't print 'interrupting at irq nn' on the x86 family, it's not all
that big a deal just yet and isn't worth a whole line on the boot screen.
This could change later in the face of multi-ISA-bus (eg: laptop docking
stations with two independent ISA busses) and SMP/APIC systems. The Alpha
already has multiple interrupt destinations to deal with.


46717 08-May-1999 peter

Fix unused variable "flags". (only used if #ifdef I586_CPU)


46703 08-May-1999 peter

Make sure the mem_range_AP_init() prototype is seen where it's needed, and
#ifdef SMP around it for fun.


46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


46635 07-May-1999 phk

Continue where Julian left off in July 1998:

Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.

Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)

Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)


46624 07-May-1999 mckusick

Generalize to allow any serial port to be used as the GDB port.
Mark the GDB port in the config file with flags 0x80. Currently
only the sio driver checks these flags and sets up a GDB port,
but adding similar code to other serial drivers would be easy.
For backward compatibility, if an sio port is marked as the console
and no port is marked as the gdb port, the GDB port will be mapped
to the console port. This hack should go away at some point.


46600 06-May-1999 peter

Ensure prototype for pnp_configure() is visible.


46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


46555 06-May-1999 peter

I'm not sure why the #ifdef SMP became #if 1 (this overrode the npx probe
and always succeeded as is required on SMP). Anyway, reverting this
still compiles and appears ok.


46548 06-May-1999 bde

Fixed profiling of elf kernels. Made high resolution profiling compile
for elf kernels (it is broken for all kernels due to lack of egcs support).

Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others. A few type declarations are still missing, mainly for SMP.

PR: 9413
Submitted by: Assar Westerlund <assar@sics.se> (initial parts)


46539 06-May-1999 luoqi

Initialize dblfault_tss.tss_fs to the per-cpu private data segment selector.


46537 06-May-1999 luoqi

Do not set curproc until proc0 is fully initialized (in proc0_init()).


46381 03-May-1999 billf

Add sysctl descriptions to many SYSCTL_XXXs

PR: kern/11197
Submitted by: Adrian Chadd <adrian@FreeBSD.org>
Reviewed by: billf(spelling/style/minor nits)
Looked at by: bde(style)


46357 03-May-1999 peter

Don't deref a NULL mem_range_softc.mr_op pointer on non-MTRR systems when
starting the AP.


46245 02-May-1999 msmith

Whoops, not all SMP systems have memory range attribute support. Don't
try to set it up on an AP unless we do.

Submitted by: dave adkins <adkin003@tc.umn.edu>


46215 30-Apr-1999 msmith

Add a hook that can be called to initialise a slave processor's memory
range attributes after they have been extracted from the master.

Hook up the i686 MP code to do this for each AP.

Be more careful about printing the default memory type for the i686.

Suggestions from: luoqi


46129 28-Apr-1999 luoqi

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


46089 26-Apr-1999 peter

Register the netisr's via SYSINIT rather than linker sets.


46072 25-Apr-1999 alc

pmap_dispose_proc and pmap_copy_page:
Conditionally compile 386-specific code.

pmap_enter:
Eliminate unnecessary TLB shootdowns.

pmap_zero_page and pmap_zero_page_area:
Use invltlb_1pg instead of duplicating the code.


46054 25-Apr-1999 phk

Make the machdep.i8254_freq and machdep.tsc_freq sysctls modify the
timecounter as well

Asked for by: bde, jhay


45980 24-Apr-1999 kato

1MB is not 1024 * 1024 * 1024 but 1024 * 1024.


45960 23-Apr-1999 dt

Make pmap_collect() an official pmap interface.


45900 21-Apr-1999 peter

oops, SMP was missing includes for a typedef.


45897 21-Apr-1999 peter

Stage 1 of a cleanup of the i386 interrupt registration mechanism.
Interrupts under the new scheme are managed by the i386 nexus with the
awareness of the resource manager. There is further room for optimizing
the interfaces still. All the users of register_intr()/intr_create()
should be gone, with the exception of pcic and i386/isa/clock.c.


45833 19-Apr-1999 alc

_pmap_unwire_pte_hold and pmap_remove_page:
Use pmap_TLB_invalidate instead of invltlb_1pg to eliminate
unnecessary IPIs.

pmap_remove, pmap_protect and pmap_remove_pages:
Use pmap_TLB_invalidate_all instead of invltlb to eliminate
unnecessary IPIs.

pmap_copy:
Use cpu_invltlb instead of invltlb when updating APTDpde.

pmap_changebit:
Rather than deleting the unused "set bit" option (which may be
useful later), make pmap_changebit an inline that is used
by the new pmap_clearbit procedure.

Collectively, the first three changes reduce the number of TLB shootdown
IPIs by 1/3 for a kernel compile.


45821 19-Apr-1999 peter

unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.


45808 19-Apr-1999 peter

Always create attach points for the various child busses that can be
attached to the nexus. With one exception, this (for example) allows
you to do wierd things like kldload the eisa bus on the fly and then
drivers, and have it auto probe the eisa bus when the drivers come online.

The one exception being pci, it only adds the pcib after the presence of
the pci bus is detected and that's #if'ed code.

A side effect of this is that isa and eisa will be attached to the nexus
directly rather than the PCI->ISA or PCI->EISA bridges. I'm not sure if
this is good or bad at this point, but it seems to be closer to the way
things are for the i386 family... This is likely to be followed up.

This also fixes compilation without a PCI bus configured and will allow
eisa to work without PCI too.


45791 18-Apr-1999 peter

Implement an EISA new-bus framework. The old driver probe mechanism
had a quirk that made a shim rather hard to implement properly and it was
just easier to convert the drivers in one go. The changes to the
buslogic driver go beyond just this - the whole driver was new-bus'ed
including pci and isa. I have only tested the EISA part of this so far.

Submitted by: Doug Rabson <dfr@nlsystems.com>


45779 18-Apr-1999 kato

Added PC98 code.

Submitted by: Takahashi Yoshihiro <nyan@wyvern.cc.kogakuin.ac.jp>


45720 16-Apr-1999 peter

Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by: core


45703 15-Apr-1999 bde

Made booting with -a work for all configurations. Previously it
only worked for configurations with "swap on generic".

usr.sbin/config/config.y:
- ignore all "swap [on] device ...' specifications except for
warning about them. They haven't done anything related to swap
for almost 4 years, and were previously silently ignored,
except for "swap on generic" which stopped swap${KERNEL}.c
from being generated. Code to support swapping is now deader
than before.

usr.sbin/config/mkswapconf.c:
- don't generate a dummy setconf() function in swap${KERNEL}.c.

sys/i386/conf/files.i386:
- swapgeneric.c is now standard. It should be merged into autoconf.c
so that it doesn't conflict with swap${KERNEL}.c for kernels named
"generic".

sys/i386/i386/autoconf.c:
- don't call setroot() for mfs roots. Since setroot() doesn't do anything
harmful, this was just a waste of time, except possibly for booting with
-a it may have helped prevent an undesireable call to setconf() by
finding a bogus rootdev.
- honor -a for ffs roots. -a now overrides all other ways of specifying
the root device. Previously, -r had precedence over -a, and the -a
handling was usually a no-op.
- don't honor -a for non-ffs roots, since it would currently just get in
the way of a clean panic.

sys/i386/i386/swapgeneric.c:
- don't declare things that are now always declared in swap${KERNEL}.c.
Don't decide things that are now decided in autoconf.c. Code to
support the "generic" case is now dead instead of useless.


45676 14-Apr-1999 bde

Generate intrnames[] dynamically. This should be new-bus friendly.

Old version reviewed by: se


45643 13-Apr-1999 tegge

Backout early start of APs since it caused some machines to hang.


45566 11-Apr-1999 tegge

Add prototype for wait_ap().


45562 10-Apr-1999 tegge

Let BSP wait until all APs are initialized.


45555 10-Apr-1999 tegge

Test CF after a btrl operation instead of testing ZF (which is undefined).


45524 10-Apr-1999 alc

pmap_remove_pte:
Use "loadandclear" to update the pte.

pmap_changebit and pmap_ts_referenced:
Switch to pmap_TLB_invalidate from invltlb.


45436 07-Apr-1999 peter

Disable the mtrr copy calls, it doesn't work with the i686_mem.c stuff.
This should make it compile/link again.


45405 07-Apr-1999 msmith

mem.c
Split out ioctl handler a little more cleanly, add memory
range attribute handling for both kernel and user-space
consumers.

pmap.c
Remove obsolete P6 MTRR-related code.

i686_mem.c
Map generic memory-range attribute interface to the P6 MTRR
model.


45370 06-Apr-1999 alc

Two changes to pmap_remove_all:

1. Switch to pmap_TLB_invalidate from invltlb, eliminating a full TLB
flush where a single-page flush suffices. (Also, this eliminates some
unnecessary IPIs.)

2. Use "loadandclear" to update the pte, eliminating a race condition
on SMPs.

Change #2 should be committed to -STABLE.


45347 05-Apr-1999 julian

Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>


45270 03-Apr-1999 jdp

Restore support for executing BSD/OS binaries on the i386 by passing
the address of the ps_strings structure to the process via %ebx.
For other kinds of binaries, %ebx is still zeroed as before.

Submitted by: Thomas Stephens <tas@stephens.org>
Reviewed by: jdp


45252 02-Apr-1999 alc

Put in place the infrastructure for improved UP and SMP TLB management.

In particular, replace the unused field pmap::pm_flag by pmap::pm_active,
which is a bit mask representing which processors have the pmap activated.
(Thus, it is a simple Boolean on UPs.)

Also, eliminate an unnecessary memory reference from cpu_switch()
in swtch.s.

Assisted by: John S. Dyson <dyson@iquest.net>
Tested by: Luoqi Chen <luoqi@watermarkgroup.com>,
Poul-Henning Kamp <phk@critter.freebsd.dk>


45140 30-Mar-1999 phk

Purging lint from the Bruce filter.


45100 28-Mar-1999 dt

Ifdef declaration of a conditionally defined function "timezero".


44924 21-Mar-1999 phk

Link the bb structures together as we find them.


44917 20-Mar-1999 alc

Eliminate a pointless TLB flush from the SMP idle loop.

Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
Reviewed by: "John S. Dyson" <toor@dyson.iquest.net>


44844 18-Mar-1999 jlemon

Change btrl/btsl to cmpl/movl, since each cpu now has their own copy
of private_tss, and there's no need to use a bit array. Also fixes
the problem of using `je' after btrl, since cmpl sets ZF.

Noticed by: Luoqi, on -current


44807 16-Mar-1999 msmith

Look for the right ACPI table signature.

PR: i386/10587
Submitted by: Takanori Watanabe <takawata@shidahara1.planet.sci.kobe-u.ac.jp>


44704 13-Mar-1999 alc

pmap_qenter/pmap_qremove:
Use the pmap_kenter/pmap_kremove inline functions
instead of duplicating them.

pmap_remove_all:
Eliminate an unused (but initialized) variable.

pmap_ts_reference:
Change the implementation. The new implementation is much smaller
and simpler, but functionally identical. (Reviewed by
"John S. Dyson" <dyson@iquest.net>.)


44645 10-Mar-1999 roberto

Fix two tests against hex. values for CPUID.

PR: i386/10050
Submitted by: Kevin Day <toasty@dragondata.com>


44611 09-Mar-1999 phk

Make TIMER_FREQ a normal, undocumented option. Raise confusion to
a higher level with example in LINT.

Clarify comment about PPS_SYNC. Ignore for now that it doesn't
work in FLL mode, it will in a few days.


44510 06-Mar-1999 wollman

Expose a slightly-lower-level interface to timeouts which allows callers
to manage their own memory. Tested on my machine (make buildworld).
I've made analogous changes on the alpha, but don't have a machine
to test.

Not-objected-to by: dg, gibbs


44487 05-Mar-1999 bde

The magic "no-cpu" cpu number is 0xff. Don't misrepresent cpu
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them. In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.


44470 05-Mar-1999 alc

Fix an SMP-only TLB invalidation bug. Specifically, disable
a TLB invalidation optimization that won't work given the
limitations of our current SMP support.

This patch should be applied to -stable ASAP.

Thanks to John Capo <jc@irbs.com>,
Steve Kargl <sgk@troutmask.apl.washington.edu>, and
Chuck Robey <chuckr@mat.net>
for testing.


44327 28-Feb-1999 bde

Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process. Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by: dfr, phk


44289 26-Feb-1999 tegge

Don't call assign_apic_irq with a value for irq that is out of range.


44256 25-Feb-1999 bde

Don't forget to update `switchticks' in corner cases (except for
the alpha fork_trampoline(), forget it because it I believe it is
only necessary for the unsupported SMP case).


44215 22-Feb-1999 bde

Added a per-cpu variable `switchticks' for use in scheduling.


44168 20-Feb-1999 roberto

Bit 24 of the Feature Flag is FXSR (for Fast FP Save and Restore).

Reminded by: Francis Dupont <Francis.Dupont@inria.fr>


44157 19-Feb-1999 luoqi

Introduce machine-dependent macro pgtok() to convert page count to number
of kilobytes. Its definition for each architecture could be optimized to
avoid potential numerical overflows.


44146 19-Feb-1999 luoqi

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


44078 16-Feb-1999 dfr

* Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)


43970 13-Feb-1999 bde

Don't pass PSL_NT to vm86 signal handlers. Some vm86/real mode
programs, including msdos, set PSL_NT in probes for old cpu types,
although PSL_NT doesn't do anything useful in vm86 or real mode.
PSL_NT is even less useful in the signal handlers. It just causes
T_TSSFLT faults on return from syscalls made by the handlers.
These faults are fixed up lazily so that Xsyscall() doesn't have
to be slowed down to prevent them. The fault handler recently
started complaining about these faults occurring "with interrupts
disabled". It should not have, but the complaints pointed to this
bug.

PR: 9211


43887 11-Feb-1999 msmith

Zero p->retval[1] when starting a process. This value ends up in %edx
when the process starts, and having it nonzero causes statically-linked
Linux binaries to fail.

PR: i386/10015
Submitted by: Marcel Moolenaar <marcel@scc.nl>


43758 08-Feb-1999 dillon

Adjust idle zero-page fill hysteresis based on tests. Use 2/3 and 4/5
zero-fill levels.

Adjust comment for ozfod in vmmeter.h - this counter represents
non-optimal ( on the fly ) zero fills, not prefills.


43752 08-Feb-1999 dillon

Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with
PQ_FREE. There is little operational difference other then the kernel
being a few kilobytes smaller and the code being more readable.

* vm_page_select_free() has been *greatly* simplified.
* The PQ_ZERO page queue and supporting structures have been removed
* vm_page_zero_idle() revamped (see below)

PG_ZERO setting and clearing has been migrated from vm_page_alloc()
to vm_page_free[_zero]() and will eventually be guarenteed to remain
tracked throughout a page's life ( if it isn't already ).

When a page is freed, PG_ZERO pages are appended to the appropriate
tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended.
When locating a new free page, PG_ZERO selection operates from within
vm_page_list_find() ( get page from end of queue instead of beginning
of queue ) and then only occurs in the nominal critical path case. If
the nominal case misses, both normal and zero-page allocation devolves
into the same _vm_page_list_find() select code without any specific
zero-page optimizations.

Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been
added and zero-page tracking adjusted to conform with the other changes.
Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free
pages. We may wish to increase both parameters as time permits. The
hysteresis is designed to avoid silly zeroing in borderline allocation/free
situations.


43612 04-Feb-1999 kato

Recognize Pentium II Xeon, Celeron and Pentium III cpus. Because CPU
names are printed on their packages and shown by BIOS, kernel does not
need to show details.

PR: 8751, 9320 and 9463


43564 03-Feb-1999 dg

Fixed the type of target_page to vm_offset_t (unsigned). This fixes a
panic during boot on machines with >=2GB of RAM. Also changed some
incorrect printf conversion specifiers from %d to %u (signed to unsigned).
This fixes bugs when printing the amount of memory on machines with >=2GB
of RAM.


43530 02-Feb-1999 bde

Check for signals while reading /dev/urandom. Reading 10MB from
/dev/urandom takes about 38 seconds on a P5/133. It is useful
to be able to kill such reads almost immediately. Processes
doing such reads are now scheduled so their denial of service
is no worse than that of processes looping in user mode.


43447 31-Jan-1999 kato

Use offset to _pc98_system_parameter instead of immediate value which
assumes KERNBASE=0x100000.


43434 30-Jan-1999 kato

Moved pc98_system_parameter from .text to .data to make ELF kernel
work.


43403 29-Jan-1999 dillon

More const fixes for -Wall, -Wcast-qual


43387 29-Jan-1999 dillon

More -Wall / -Wcast-qual cleanup. Also, EXEC_SET can't use
C_DECLARE_MODULE due to the linker_file_sysinit() function
making modifications to the data.


43340 28-Jan-1999 newton

Sun Bug ID 1251858 (on http://sunsolve1.sun.com) discusses the way that
Sun implemented iBCS2 compatibility on Solaris >= 2.6: The emulator
runs in user-mode, patching the LDT so that client programs making
syscalls through the old iBCS2 call gate get handled by the emulator
process. Unemulated syscalls therefore need their own call-gate that
bypasses the emulator. Sun chose LDT entry 4 to implement this, which
is what we've been using as LUDATA_SEL, so we need to change LUDATA_SEL
if we want to run Solaris executables.

Discussed with: Mike Smith


43314 28-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


43309 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile.

This commit includes significant work to proper handle const arguments
for the DDB symbol routines.


43138 24-Jan-1999 dillon

Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

The inline can thus do sanity checks ( or not ) over all cases.


42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


42880 20-Jan-1999 jkh

Make more messages conditional on bootverbose


42817 19-Jan-1999 peter

Break configure() into a couple of stages to allow insertion of
hooks (eg: by drivers or (pre)loadable modules into a convenient spot.


42732 16-Jan-1999 kato

There are two models of AMD K6-2 Model 8 (c.f. AMD's document), so the
CPU stepping must be checked. Also, fixed print_AMD_info.

Submitted by: Akio Morita <amorita@meadow.scphys.kyoto-u.ac.jp>


42705 15-Jan-1999 msmith

Fetch an overide for NMBCLUSTERS from the kernel environment. Never allow
the value to be reduced below that defined when the kernel was built.


42543 12-Jan-1999 eivind

Silence warnings.


42542 12-Jan-1999 eivind

Silence warnings by removing unused convenience function and
globalizing debugging functions.


42448 09-Jan-1999 dt

Oops --<, replace 1.216 with a version that actually check pv_entries (and
was tested for month or two in production).

Noticed by: Stephen McKay

Stephen also suggested to remove the complication at all. I don't do it as
it would be backout of a large part of 1.190 (from 1998/03/16)...


42440 09-Jan-1999 bde

Removed a stray label that broke compiling in the (elf && profiling) case.

PR: 9369
Submitted by: Assar Westerlund <assar@sics.se>


42437 09-Jan-1999 bde

Fixed switching between consoles (sc0, vt0 or sioN) in userconfig.

Broken in: rev.1.315


42428 09-Jan-1999 bde

Don't put operands in clobber lists, since this is dubious for old
versions of gcc and broken for current versions of egcs.

Cleaned up the asm statement for do_cpuid() a little.

Submitted by: "John S. Dyson" <dyson@iquest.net> but rewritten by me


42406 08-Jan-1999 bde

Moved declarations related to copying and zeroing to the right place.


42396 08-Jan-1999 luoqi

Allocate kernel page table object (kptobj) before any kmem_alloc calls.
On a system with a large amount of ram (e.g. 2G), allocation of per-page
data structures (512K physical pages) could easily bust the initial kernel
page table (36M), and growth of kernel page table requires kptobj.


42381 07-Jan-1999 dt

Make pmap_ts_referenced check more than 1 pv_entry. (One should be carefull
when move elements to the tail of a list in a loop...)


42373 07-Jan-1999 yokota

Remove a hard-coded table of kernel console I/O functions exported
from sc, vt and sio drivers. Use instead a linker_set to collect them.

Staticize ??cngetc(), ??cnputc(), etc functions in sc and vt drivers.
We must still have siocngetc() and siocnputc() as globals because they
are directly referred to by i386-gdbstub.c :-(

Oked by: bde


42360 06-Jan-1999 julian

Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>


42135 28-Dec-1998 msmith

Improved DDB_UNATTENDED behaviour. From the submitter:

There's something that's been bugging me for a while, so I decided to fix it.
FreeBSD now will DTRT WRT DDB and DDB_UNATTENDED (!debugger_on_panic), at least
in my opinion. The behavior change is such that:

1. Nothing changes when debugger_on_panic != 0.
2. When DDB_UNATTENDED (!debugger_on_panic), if a panic occurs, the
machine will reboot. Also, if a trap occurs, the machine will
panic and reboot, unlike how it broke to DDB before. HOWEVER,
a trap inside DDB will not cause a panic, allowing full use
of DDB without having to worry about the machine being stuck
at a DDB prompt if something goes wrong during the day.
Patches for this behavior follow my signature, and it would
be a boon to anyone (like me) who uses DDB_UNATTENDED, but
actually wants the machine to panic on a trap (otherwise,
what's the use, if the machine causes a fatal trap rather than
a true panic, of debugger_on_panic?). The changes cause no
adverse behavior, but do involve two symbols becoming global

Submitted by: Brian Feldman <green@unixhelp.org>


42112 27-Dec-1998 msmith

From the submitter:

CPU_WT_ALLOC does not work correctly for K6-2s of model 8+ and
probably K6-3s (when they appear on the market soon). In addition,
print_AMD_info() incorrectly printfs write allocation's size. I've
fixed them, so they now Do The Right Thing, and added a
"NO_MEMORY_HOLE" option to easily allow 15-16mb range handling for us
K6 and K6-2 users.

Submitted by: Brian Feldman <green@unixhelp.org>


41871 16-Dec-1998 bde

Removed the cast to a pointer in the definition of PS_STRINGS and
adjusted related casts to match (only in the kernel in this commit).
The pointer was only wanted in one place in kern_exec.c. Applications
should use the kern.ps_strings sysctl instead of PS_STRINGS, so they
shouldn't notice this change.


41868 16-Dec-1998 bde

Removed bogus casts of USRSTACK and/or the other operand in binary
expressions involving USRSTACK.


41797 14-Dec-1998 bde

Moved the declaration of another non-SMP variable into the non-SMP section.


41794 14-Dec-1998 bde

Ifdefed the declarations of conditionally used variables.


41787 14-Dec-1998 mckay

Fix tabs that should have been spaces. Some were in kernel error messages.


41770 14-Dec-1998 dillon

Get rid of uninitialized variable warnings. No bugs found, just
preinitializing some locals to 0 to get rid of the compiler warnings.


41764 14-Dec-1998 dillon

author was assuming that nextpaddr declared *inside* the do loop would
survive within the loop. This is not guarenteed by C. I have moved
the nextpaddr declaration to outside the do loop.


41763 14-Dec-1998 dillon

Change local ddb_mode variable to volatile to handle GCC warning about
the variable possibly being clobbered by setjmp/longjmp.


41629 10-Dec-1998 steve

Cleanup up the wording for the F00F bug workaround message.

PR: 8041
Submitted by: Dan Nelson <dnelson@emsphone.com>


41599 08-Dec-1998 kato

Use CNAME macro for pc98_system_parameter, which is referenced from C
source.

Submitted by: Masanori Kanaoka <kana@saijo.mke.mei.co.jp>


41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


41547 06-Dec-1998 archie

Avoid compiler warning (printf arg type mismatch) when compiling #ifdef DEBUG


41541 05-Dec-1998 kato

Print out information for write-allocate of AMD CPUs.

Submitted by: Akio Morita <amorita@meadow.scphys.kyoto-u.ac.jp>


41454 02-Dec-1998 kato

- For some old Cyrix CPUs, %cr2 is clobbered by interrupts. This
problem is worked around by using an interrupt gate for the page
fault handler. This code was originally made for NetBSD/pc98 by
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp> and has already
been in PC98 tree. Because of this bug, trap_fatal cannot show
correct page fault address if %cr2 is obtained in this function.
Therefore, trap_fatal uses the value from trap() function.
- The trap handler always enables interruption when buggy application
or kernel code has disabled interrupts and then trapped. This code
was prepared by Bruce Evans <bde@FreeBSD.org>.

Submitted by: Bruce Evans <bde@FreeBSD.org>
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp>


41370 27-Nov-1998 tegge

Don't forget to update the pmap associated with aio daemons when adding
new page directory entries for a growing kernel virtual address space.


41367 26-Nov-1998 tegge

Attempt to handle interrupts delivered to all IO APICs by using the first
IO APIC with a sufficient number of pins.


41362 26-Nov-1998 eivind

Staticize.


41318 24-Nov-1998 eivind

Move the declaration of PPro_vmtrr from the header file to pmap.c,
replacing the one in the header file with a definition. This makes it
easier to work with tools that grok ANSI C only.


41004 08-Nov-1998 dfr

* Fix a couple of places in the device pager where an address was
truncated to 32 bits.
* Change the calling convention of the device mmap entry point to
pass a vm_offset_t instead of an int for the offset allowing
devices with a larger memory map than (1<<32) to be supported
on the alpha (/dev/mem is one such).

These changes are required to allow the X server to mmap the various
I/O regions used for device port and memory access on the alpha.


40998 08-Nov-1998 msmith

Enable 686 class optimisations for all 686-class processors, not just the
Pentium Pro. This resolves the "Dog slow SMP" issue for Pentium II
systems.


40866 03-Nov-1998 msmith

Remove the USERCONFIG_BOOT option. Userconfig script data is searched
for in a loaded module of type "userconfig_script". The RB_CONFIG
flag will always result in the user being left inside userconfig at
the end of the script's execution, regardless of 'quit' commands in
the script. If the RB_CONFIG flag is not specified, the user will
never be left inside userconfig, even if the script does not have an
explicit exit command.

Add the INTRO_USERCONFIG option. This option forces the userconfig 'intro'
screen (after a script has optionally been executed). There is no longer
a need to queue an 'intro' command.


40794 31-Oct-1998 peter

Add John Dyson's SYSCTL descriptions, and an export of more stats to
a sysctl hierarchy (vm.stats.*). SYSCTL descriptions are only present
in source, they do not get compiled into the binaries taking up memory.


40751 30-Oct-1998 msmith

Add the ability to specify where on the at_shutdown queue a handler is
installed.

Remove cpu_power_down, and replace it with an entry at the end of the
SHUTDOWN_FINAL queue in the only place it's used (APM).

Submitted by: Some ideas from Bruce Walter <walter@fortean.com>


40700 28-Oct-1998 dg

Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.


40658 26-Oct-1998 bde

Check the major number of the boot device more carefully. There was only
a problem if the boot blocks passed bad data.

Check the major number of the dump device consistently.


40610 23-Oct-1998 phk

Update timecounters to new interface.


40565 22-Oct-1998 bde

Initialize isa_devtab entries for interrupt handlers in individual
device drivers, not in ioconf.c. Use a different hack in isa_device.h
so that a new config(8) is not required yet.

pc98 parts approved by: kato


40545 21-Oct-1998 dg

Decrement the now unused page table page's wire_count prior to freeing it.
It will soon be required that pages have a zero wire_count when being
freed.


40435 16-Oct-1998 peter

*gulp*. Jordan specifically OK'ed this..

This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.


40286 13-Oct-1998 dg

Fixed two potentially serious classes of bugs:

1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.


40179 10-Oct-1998 kato

mp_machdep.c: Set a vector to boot code (PC-98).
locore.s: Tell the bios to warmboot next time (PC-98).


40173 10-Oct-1998 kato

PC-98 doesn't have CMOS ram.


40169 10-Oct-1998 kato

PC-98 doesn't have CMOS ram.


40164 10-Oct-1998 jkh

Allow more flexible use of MFS root.
Submitted by: peter


40152 09-Oct-1998 peter

Relocate the preload module info from machdep specifically rather than
trying to do it in locore. We also walk through the module table
and relocate any MODINFO_ADDR pointers so that they become KVM relative
rather than physical addresses. This means that hacks for adding
0xf0000000 in places like MFS go away.


40130 09-Oct-1998 peter

Null commit.. CVS aborted on freefall last time (reaonly file).

An elf_reloc() function for the i386. Based on alpha/alpha/elf_machdep.c
and rtld-elf/i386/reloc.c.


40129 09-Oct-1998 peter

An elf_reloc() function for the i386. Based on alpha/alpha/elf_machdep.c
and rtld-elf/i386/reloc.c.


40089 09-Oct-1998 msmith

Initialise kernel environment and module metadata pointers.


40081 08-Oct-1998 msmith

Fix up the kernel environment and module data pointers in the bootinfo if
they are present.
If we are told where the end of the loaded kernel image is, believe it.


40067 08-Oct-1998 kato

BIOS ROM base address is 0xe8000 on PC-98.


40029 07-Oct-1998 gibbs

Fix a parent tag reference count bug during tag teardown.

Enable optimization for nobounce_dmamap clients by setting the map
held by the client to NULL. This allows the macros in bus.h to check
against a constant to avoid function calls.

Don't attempt to 'free()' contigmalloced pages in bus_dmamem_free().
We currently leak these pages, which is not ideal, but is better than
a panic. The leak will be fixed when contigmalloc is merged into the
bus dma framework after 3.0R.


40003 06-Oct-1998 kato

- Implement enabling write allocate on AMD K5/K6/K6-2 cpus.
The code was originaly contributed by Kelly Yancey
<kbyanc@freedomnet.com> in PR i386/6269 and revised by Akio Morita
<amorita@meadow.scphys.kyoto-u.ac.jp> and me. Test was performed by
Akio Morita and Toshiomi Moriki <moriki@db.is.kyushu-u.ac.jp>.
- Fix stylistic bug in identcpu.c.
- Update copyright in initcpu.c
- Fix typo in LINT.

PR: 6269 and 6270


39982 05-Oct-1998 obrien

Undo most of the previous commit.


39976 05-Oct-1998 obrien

Now require *FS_ROOT to enable the ability to mount a *FS /.
Previously one could config(8) a kernel that would not link.


39760 29-Sep-1998 abial

Add sysctl 'machdep.msgbuf_clear'. Setting it to anything causes the
kernel message buffer to be cleared. It comes handy in situations when
the only logging facility you have is the msgbuf.

Reviewed by: jkh


39755 29-Sep-1998 bde

Don't pretend to support ix86's with 16-bit ints by using longs just
to ensure 32-bit variables. Doing so broke ix86's with 64-bit longs.


39703 28-Sep-1998 tegge

Initialize pcb_mpnest to 1 in the child process in cpu_fork(). This should
fix the 50% idle problem that the ELF /sbin/init triggered. The problem
appeared when the last context switch before a fork() call was due to
the kernel faulting in user pages via normal page faults (e.g. copyin).
Reviewed by: Peter Wemm <peter@netplex.com.au>


39702 28-Sep-1998 tegge

Use correct virtual address when configuring the per CPU idle page directory
for a vm86 call under SMP.


39648 25-Sep-1998 peter

Goodbye BOUNCE_BUFFERS, for a hack it has served us well.

The last consumer of this code (the old SCSI system) has left us and
the CAM code does it's own bouncing. The isa dma system has been
doing it's own bouncing for a while too.

Reviewed by: core


39613 24-Sep-1998 bde

Don't redefine kernel. Makefile.i386 now defines it.
Removed some unused includes.


39526 20-Sep-1998 bde

Attempt to work around a bug in the previous commit related to
non-reentrancy of SMP clock locking. Depend on the giant lock
protecting clkintr().


39503 20-Sep-1998 bde

Ensure that the i8254 timecounter doesn't go backards. It sometimes
went backwards when interrupts were masked for more than one i8254
interrupt period. It sometimes went backwards when the i8254 counter
was reprogrammed. Neither of these should happen in normal operation.

Update the i8254 timecounter support variables atomically. Calling
timecounter functions from fast interrupt handlers may actually work
in all cases now.


39243 15-Sep-1998 gibbs

autoconf.c:
Convert autoconf hooks from old SCSI system to CAM.

busdma_machdep.c:
bus_dmamap_free() should expect the nobounce map, not a NULL one.

mountroot.c:
swapgeneric.c:
da and od changes.

symbols.raw:
Nuke the old disk stat symbols.

userconfig.c:
Disable the SCSI listing code until it can be converted to CAM.


39197 14-Sep-1998 jdp

Add new functions fill_fpregs() and set_fpregs(), like fill_regs()
and set_regs() but for the floating point register state. The code
is stolen from procfs_machdep.c, and moved out of there into
machdep.c.

These functions are needed for generating ELF core dumps.


39187 14-Sep-1998 sos

Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.


39176 14-Sep-1998 abial

This implements retrieving the contents of message buffer via sysctl(3)
as "machdep.msgbuf". It's needed in case of using stripped kernels, where
normal dmesg (which has to use kvm) doesn't work.

The buffer is unwound, meaning that the data will be linear, possibly
with some leading NULLs.

Reviewed by: Jordan K. Hubbard <jkh@freebsd.org>


38893 06-Sep-1998 tegge

Don't go below the low water mark of free pages due to optional prefaulting
of pages.
PR: 2431


38888 06-Sep-1998 tegge

Maintain a mapping from irq number to (ioapic number, int pin) tuple,
and use this when masking/unmasking interrupts.

Maintain a mapping from (iopaic number, int pin) tuple to irq number,
and use this when configuring devices and programming the ioapics.

Previous code assumed that irq number was equal to int pin number, and
that the ioapic number was 0.

Don't let an AP enter _cpu_switch before all local apics are initialized.


38824 04-Sep-1998 luoqi

Make irq forwarding truely functional.


38807 04-Sep-1998 ache

PAGE_WAKEUP -> vm_page_wakeup


38779 03-Sep-1998 nsouch

Reviewed by: Doug Rabson
Submitted by: nsouch
root_bus_configure() call to initialize new bus arch in i386 env.


38717 01-Sep-1998 kato

- Fix style bug.
- hw.ispc98 -> machdep.ispc98.

Submitted by: Garrett Wollman (hw -> machdep)


38700 31-Aug-1998 luoqi

Use 16bit register in inline asm code to set segment registers.


38673 31-Aug-1998 kato

- hw.machine_arch returns cpu architecture type.
- moved definition of MACHINE_ARCH from cpu.h to parm.h as alpha.
- Added definitions of _MACHINE and _MACHINE_ARCH.
- Added hw.ispc98. The hw.ispc98 is 1 in PC98 kernel and is 0 in
IBM-PC kernel.

Discussed with: John Birrell <jb@FreeBSD.ORG>


38505 24-Aug-1998 bde

Fixed printf format errors. Only one left in LINT on i386's.


38490 23-Aug-1998 des

Don't check minor number of dump device at all.

Discussed-with: Jörg Wunsch


38488 23-Aug-1998 bde

Fixed printf format errors.


38422 18-Aug-1998 msmith

Presently there is only one `currentldt' variable for all cpus
in a SMP system. Unexpected things could happen if each cpu
has a different ldt setting and one cpu tries to use value
of currentldt set by another cpu.

The fix is to move currentldt to the per-cpu area. It includes
patches I filed in PR i386/6219 which are also user ldt related.

PR: i386/7591, i386/6219
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>


38349 16-Aug-1998 bde

pmap.c:
Cast pointers to (vm_offset_t) instead of to (u_long) (as before) or to
(uintptr_t)(void *) (as would be more correct). Don't cast vm_offset_t's
to (u_long) just to do arithmetic on them.

mp_machdep.c:
Cast pointers to (uintptr_t) instead of to (u_long). Don't forget
to cast pointers to (void *) first or to recover from integral
possible integral promotions, although this is too much work for
machine-dependent code.

vm code generally avoids warnings for pointer vs long size mismatches
by using vm_offset_t to represent pointers; pmap.c often uses plain
`unsigned int' instead of vm_offset_t and didn't use u_long elsewhere,
but this style was messed up by code apparently imported from mp_machdep.c.


38246 11-Aug-1998 bde

Register tty software interrupt handlers at run time using register_swi()
instead of at compile time using ifdefs.

Use _swi_null instead of dummycamisr. CAM and dpt should call
register_swi() instead of hacking on ihandlers[] directly.


38244 11-Aug-1998 bde

Implemented dynamic registration of software interrupt handlers. Not
used yet.

Use dummy SWI handlers to avoid some checks for null pointers.


38233 10-Aug-1998 bde

Fixed restoring of cpl after trap handling. The wrong cpl (SWI_AST_MASK
instead of 0) was "restored" after handling a trap that occurred while
returning to user mode. This bug was most noticeable for VM86 and is
still detected and fixed up (on return from the next exception) in doreti
if VM86 is configured.


38063 03-Aug-1998 msmith

Copy in the nfs_diskless structure if NFS_ROOT is defined. A previous
change to include nfs_root.h precluded NFS from being defined.
Submitted by: Parag Patel <parag@cgt.com>


37920 28-Jul-1998 bde

Set p->p_switchtime to switchtime instead of to the current time in
fork_trampoline() if switchtime is valid. This fixes not accounting
for the time between the previous context switch and and the current
time (when the forked child starts up here) in most cases - the time
is now counted in the child's runtime. I think it actually fixes
all cases, and switchtime is always valid here, since there must have
been a context switch just before the forked child starts up. Some
code should be removed if this is correct. The check that switchtime
is valid sometimes gives a false negative because the check isn't
correct until the after the first context switch after the system
has been up for >= 1 second.


37919 28-Jul-1998 bde

Micro-optimized and cleaned up the clearing of switchtime in idle().

Cleaned up the conditionals in the disgusting SMP ifdef in idle().


37902 28-Jul-1998 jlemon

Fix an off-by-one error when setting the iomap bits.
Change struct i386_*_iomap to use ints instead of shorts/chars.
(pointed out by bde long ago, prodded into action by msmith)


37889 27-Jul-1998 jlemon

Re-arrange the page layout used by vm86_bioscall so that we can
potentially re-use the stack page.

Cosmetic cleanup of the code to de-obfuscate it and make it easier
to follow. There should be no functional changes in this commit.


37757 19-Jul-1998 jkh

A slap on the wrist to Dag-Erling, who plainly did not test this before
committing it. There was a large syntax error at line 404 which could
not possibly have allowed compilation. :)


37743 18-Jul-1998 des

Allow dump devices with dkpart != SWAP_PART on devfs/slice
systems. This test should probably be removed altogether.

See CVS log entries for revisions 1.97 and 1.98.


37679 15-Jul-1998 bde

%n in a comment was a poor abbreviation for Immediate-byte-signed,
especially now that %n format has almost gone away.


37651 15-Jul-1998 bde

Cast virtual addresses that happen to be represented as u_longs to
uintptr_t before casting them to pointers. Explicit u_longs should
never be used to represent virtual addresses... (vm_offset_t is
normally right).


37564 11-Jul-1998 bde

Fixed printf format errors.

Use offsetof() instead of null pointer hacks.


37561 11-Jul-1998 bde

Fixed printf format errors.


37557 11-Jul-1998 phk

Don't disable pmap_setdevram() which isn't called, but which could be,
but instead disable pmap_setvidram() which is called, but probably
shouldn't be.

PR: 7227, 7240


37555 11-Jul-1998 bde

Fixed printf format errors.


37553 11-Jul-1998 bde

Don't pretend to support ix86's with 16-bit ints by using longs just to
ensure 32-bit variables. Doing so mainly bogotified some printf formats.

Fixed disorder in md_var.h.


37506 08-Jul-1998 bde

Use not-so-new printf formats %r and/or %z instead of %n and/or %+x.


37504 08-Jul-1998 bde

Fixed bogus type of valuep in struct db_variable. It was `int *' and
became `long *' for alpha, but should always have been `db_expr_t *'.
Fixed variable types to match.


37389 04-Jul-1998 julian

There is no such thing any more as "struct bdevsw".

There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries. The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).

rawread()/rawwrite() went away as part of this though it's not strictly
the same patch, just that it involves all the same lines in the drivers.

cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.

Reviewed by: Eivind Eklund and Mike Smith
Changes suggested by eivind.


37315 30-Jun-1998 phk

Add 3 sysctl variables for future use by ps)1_


37311 30-Jun-1998 phk

Add PSE36 to the bits we know by name.


37272 30-Jun-1998 jmg

convert some nfs tunables to options, these are:
NFS_MINATTRTIMO VREG attrib cache timeout in sec
NFS_MAXATTRTIMO
NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec
NFS_MAXDIRATTRTIMO
NFS_GATHERDELAY Default write gather delay (msec)
NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this
NFS_WDELAYHASHSIZ and with this
NFS_MUIDHASHSIZ Tune the size of nfsmount with this
NFS_NOSERVER (already documented in LINT)
NFS_DEBUG turn on NFS debugging

also, because NFS_ROOT is used by very different files, it has been
renamed to opt_nfsroot.h instead of the old opt_nfs.h....


37101 21-Jun-1998 bde

Removed unused includes.


37099 21-Jun-1998 bde

Removed unused includes.
Ifdefed conditionally used includes.


37086 21-Jun-1998 bde

Converted add_interrupt_randomness() to take a `void *' arg. Rewrote
mmioctl() to fix hundreds of style bugs and a few error handling bugs
(don't check for superuser privilege for inappropriate ioctls, don't
check the input arg for the output-only MEM_RETURNIRQ ioctl, and don't
return EPERM for null changes).


37034 17-Jun-1998 bde

Don't declare isa device structs or isa interrupt handlers in <sys/conf>,
and don't depend on them being declared there. This will cause lots of
warnings for a few minutes until config is updated. Interrupt handlers
should never have been configured by config, and the machine generated
declarations get in the way of changing the arg type from int to void *.


36810 09-Jun-1998 phk

Add a tc_ prefix to struct timecounter members.

Urged by: bde


36809 09-Jun-1998 bde

Pass lists of possible root devices and their names up to the
machine-independent code and try mounting the devices in the
lists instead of guessing alternative root devices in a machine-
dependent way.

autoconf.c:
Reject preposterous slice numbers instead of silently converting
them to COMPATIBILITY_SLICE.

Don't forget to force slice = COMPATIBILITY_SLICE in the floppy
device name.

Eliminated most magic numbers and magic device names in setroot().

Fixed dozens of style bugs.

vfs_conf.c:
Put the actual root device name instead of "root_device" in the
mount struct if the actual name is available. This is useful after
booting with -s. If it were set in all cases then it could be used
to do mount(8)'s ROOTSLICE_HUNT and fsck(8)'s hotroot guess better.


36766 08-Jun-1998 dfr

Fix more of my DDB breakage.


36760 08-Jun-1998 dfr

Make DDB work again after I broke it :-(.


36741 07-Jun-1998 phk

Add a member function more to the timecounters, this one is for use
with latch based PPS implementations. The client that uses it will
be committed after more testing.


36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


36719 07-Jun-1998 phk

Add a "this" style argument and a "void *private" so timecounters can
figure out which instance to wount with.


36605 03-Jun-1998 bde

Ifdefed the netisr support.

PR: 6760
Reviewed by: joerg


36596 03-Jun-1998 msmith

If vm86 services are available, use these to perform the APM BIOS
probe and intialisation. This will ultimately remove the grubby (but
functional) hack that copies a real-mode function into low memory
early in locore.s.


36441 28-May-1998 phk

Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.

Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING: Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by: bde


36303 22-May-1998 des

Use switch instead of if/else chain for 686 model identification.
Add precise model identification for 586-family CPUs.


36286 21-May-1998 des

Correctly identify the precise CPU model within the 686 family: instead
of just printing "Pentium Pro", check the model (cpu_id & 0xf0) and print
the appropriate information.


36275 21-May-1998 dyson

Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


36214 19-May-1998 dufault

Remove option for SCHED_FIFO. With this optional, SCHED_FIFO
is the same as RTPRIO_IDLE when it falls through to the default.


36200 19-May-1998 peter

Missing parens caused cpu features not to be printed for cyrix >= M2/MX.
Althought the comments say the datasheet doesn't list the device ID
registers on the M2/MX, they seem to be there and quite alive.
(It's interesting to note that the M2/MX calls itself a 686 class cpu but
is missing a heck of a lot of features, including VME, PGE, PSE, etc)


36198 19-May-1998 phk

Change a data type internal to the timecounters, and remove the "delta"
function.

Reviewed, but not entirely approved by: bde


36179 19-May-1998 phk

Make the size of the msgbuf (dmesg) a "normal" option.


36169 19-May-1998 tegge

Back out part of revision 1.198 commit (clearing kernel stack pages).
By request from David Greenman <dg@root.com>


36168 19-May-1998 tegge

Disallow reading the current kernel stack. Only the user structure and
the current registers should be accessible.
Reviewed by: David Greenman <dg@root.com>


36138 17-May-1998 tegge

Change simple lock handling to not depend upon having a local apic
available. The per-cpu variable ss_tpr has been replaced by ss_eflags.
This reduced the number of interrupts sent to the wrong CPU, due to
the cpu having the global lock being inside a critical region.

Remove some unneeded manipulation of tpr register in mplock.s.

Adjust code in mplock.s to be aware of variables on the stack being
destroyed by MPgetlock if GRAB_LOPRIO is defined.


36135 17-May-1998 tegge

Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.


36125 17-May-1998 tegge

For SMP, use prv_PPAGE1/prv_PMAP1 instead of PADDR1/PMAP1.
get_ptbase and pmap_pte_quick no longer generates IPIs.
This should reduce the number of IPIs during heavy paging.


36121 17-May-1998 tegge

Clear kernel stack pages before usage.
Correct panic message in pmap_zero_page (s/CMAP /CMAP2 /).


36119 17-May-1998 phk

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


36095 16-May-1998 kato

Some of newer PC-98 may cause "Windows Protection Fault" when booting
Windows 95 after rebooting FreeBSD without power off. In PC-98
system, reboot mode is set via I/O port 0x37 in cpu_reset(), and
accessing of this port is the reason of the problem. To avnoid the
fault, current status of reboot mode should be checked before
accessing the I/O port.


36094 16-May-1998 kato

Disable local APIC in UP kernel. Intel specification update describes
that local APIC should be disabled in UP system. However, some of old
BIOS does not disable local APIC, and virtual wire mode through local
APIC may cause int 15.


36051 15-May-1998 dyson

Disable the auto-Write Combining setup for the pmap code. This
worked on a couple of machines of mine, but appears to cause problems
on others.


35977 12-May-1998 dyson

Some temporary fixes to SMP to make it more scheduling and signal friendly.
This is a result of discussions on the mailing lists. Kudos to those who
have found the issue and created work-arounds. I have chosen Tor's fix
for now, before we can all work the issue more completely.
Submitted by: Tor Egge


35974 12-May-1998 bde

Backed out previous commit. It is invalid to call d_ioctl() on
possibly non-open devices, and we don't want to restrict dumping
to swap devices anwyay. It is especially invalid to call d_ioctl()
in non-process context for panics. d_psize() can be called on
non-open devices, at least on non-SLICED ones that support d_dump(),
and setdumpdev() has depended on this for a long time although it
is probably wrong, but even d_psize() can't be called in non-process
context - that's why dumpsys() depends on previously computed values
although these values may be stale. The historical restriction to
devices with dkpart(dev) == SWAP_PART should go away.


35940 11-May-1998 dyson

Change some tests from CPU_CLASS686 to CPU_686 as appropriate, and
also correct a serious ommision that would cause process faulures
due to forgetting an invltlb type operatino. This was just a
transcription problem.


35933 11-May-1998 dyson

Support better performance with P6 architectures and in SMP
mode. Unnecessary TLB flushes removed. More efficient
page zeroing on P6 (modify page only if non-zero.)


35932 11-May-1998 dyson

Attempt to set write combining mode for graphics devices.


35812 06-May-1998 julian

Add dump support to the DEVFS/slice code.
now we can actually catch our crashes :-)

Submitted by: Luoqi Chen <luoqi@chen.ml.org> (the man who's everywhere)


35767 06-May-1998 gibbs

Implement bus_dmamem_* functions and correct a few nits reported by Peter Wemm.


35496 28-Apr-1998 eivind

Translate T_PROTFLT to SIGSEGV instead of SIGBUS when running under
Linux emulation. This make Allegro Common Lisp 4.3 work under
FreeBSD!

Submitted by: Fred Gilham <gilham@csl.sri.com>
Commented on by: bde, dg, msmith, tg
Hoping he got everything right: eivind


35456 26-Apr-1998 dyson

Add the PAT cpuid feature.


35393 22-Apr-1998 tegge

Mask the interrupt before setting the corresponding bit in ipending if
the interrupt is already active.
Don't use lock prefix for operations on ipending.
Always use lock prefix for operations on iactive.


35356 20-Apr-1998 julian

Remove an LFS clause, now that it is in the history,
anyone who wants to see what was needed cto revive LFS can see it.


35323 20-Apr-1998 julian

Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)


35319 19-Apr-1998 julian

Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.


35303 19-Apr-1998 bde

Support compiling with `gcc -pedantic' (don't use hard newlines in
(asm) string constants).


35302 19-Apr-1998 bde

Support compiling with `gcc -pedantic' (don't use hard newlines in
(asm) string constants or hard long long constants).


35296 19-Apr-1998 bde

Support compiling with gcc -pedantic (don't use a bogus, null cast).


35256 17-Apr-1998 des

Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.


35215 15-Apr-1998 bde

Finish supporting compiling with `gcc -ansi'. Fix missing `volatile's
in __asm() statements while I'm here.


35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


35203 15-Apr-1998 bde

Fixed breakage of fork accounting in previous commit. A fork benchmark
reported about 15 times as much sys time as real time. getmicroruntime()
is confusing name.


35087 06-Apr-1998 peter

Fix VM86 compiles. a #include "opt_vm86.h" was missing, and the my_tr
variable was needed in the non-SMP case.

Submitted by: Jonathan Lemon <jlemon@americantv.com>


35079 06-Apr-1998 peter

remove #ifdef declaration of npxproc, use globals.s and the extern always.


35077 06-Apr-1998 peter

Use real types for the SMP pages being allocated rather than arrays of
ints. Remove some no longer needed casts. Initialize the per-cpu
global data area using the structs rather than knowing too much about
layout, alignment, etc.


35076 06-Apr-1998 peter

clean up #ifdefs, define the variables that have to be per-cpu on SMP
in globals.s only and use externs always.


35075 06-Apr-1998 peter

_curpcb is always defined in globals.s instead of here in #ifdefs


35074 06-Apr-1998 peter

Bogus casts


35072 06-Apr-1998 peter

Rather than filling this file up with SMP .sets, use those from
globals.s instead.
Initialize curproc in the same place for both UP and SMP.


35071 06-Apr-1998 peter

Generate #defines that the asm code can access for the per-cpu data
structures.


35058 06-Apr-1998 phk

Make a kernel version of the timer* functions called timerval* to be
more consistent.

OK'ed by: bde


35035 05-Apr-1998 tegge

Remove some unneeded statements that enabled interrupts.


35029 04-Apr-1998 phk

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


34990 01-Apr-1998 tegge

Add two workarounds for broken MP tables:

- Attempt to handle PCI devices where the interrupt is
an ISA/EISA interrupt according to the mp table.

- Attempt to handle multiple IO APIC pins connected to
the same PCI or ISA/EISA interrupt source. Print a
warning if this happens, since performance is suboptimal.
This workaround is only used for PCI devices.

With these two workarounds, the -SMP kernel is capable of running on
my Asus P/I-P65UP5 motherboard when version 1.4 of the MP table is disabled.


34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


34925 28-Mar-1998 dufault

Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work. Changes:

Change all "posix4" to "p1003_1b". Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.


34924 28-Mar-1998 bde

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


34840 23-Mar-1998 jlemon

Add the ability to make real-mode BIOS calls from the kernel. Currently,
everything is contained inside #ifdef VM86, so this option must be
present in the config file to use this functionality.

Thanks to Tor Egge, these changes should work on SMP machines. However,
it may not be throughly SMP-safe.

Currently, the only BIOS calls made are memory-sizing routines at bootup,
these replace reading the RTC values.


34643 17-Mar-1998 kato

Make EPSON_BOUNCEDMA a new-style option.


34636 17-Mar-1998 msmith

Add missing entry to list of major device names. This list should not
exist.


34624 16-Mar-1998 msmith

Spell 'compatibility' like everyone else.


34622 16-Mar-1998 msmith

Use dkmakeminor() rather than magic knowledge of the size and location of
the slice field. Handle incomprehensible slice numbers slightly better.
Suggested by: bde


34617 16-Mar-1998 phk

Be less draconian about the TSC if APM is configured, use it for
timecounting if APM-BIOS isn't found.
Be just as draconian about SMP as always, but explain it better.


34611 16-Mar-1998 dyson

Some VM improvements, including elimination of alot of Sig-11
problems. Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1) Create an object for kernel page table allocations. This
fixes a bogus allocation method previously used for such, by
grabbing pages from the kernel object, using bogus pindexes.
(This was a code cleanup, and perhaps a minor system stability
issue.)

pmap.c:
2) Pre-set the modify and accessed bits when prudent. This will
decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3) Rather than calculating the beginning virtual byte offset
multiple times, stick the offset into the buffer header, so
that the calculated offset can be reused. (Long long multiplies
are often expensive, and this is a probably unmeasurable performance
improvement, and code cleanup.)

vfs_bio.c:
4) Handle write recursion more intelligently (but not perfectly) so
that it is less likely to cause a system panic, and is also
much more robust.

vfs_bio.c:
5) getblk incorrectly wrote out blocks that are incorrectly sized.
The problem is fixed, and writes blocks out ONLY when B_DELWRI
is true.

vfs_bio.c:
6) Check that already constituted buffers have fully valid pages. If
not, then make sure that the B_CACHE bit is not set. (This was
a major source of Sig-11 type problems.)

vfs_bio.c:
7) Fix a potential system deadlock due to an incorrectly specified
sleep priority while waiting for a buffer write operation. The
change that I made opens the system up to serious problems, and
we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8) Make clustered reads work more correctly (and more completely)
when buffers are already constituted, but not fully valid.
(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9) Create a vtruncbuf function, which is used by filesystems that
can truncate files. The vinvalbuf forced a file sync type operation,
while vtruncbuf only invalidates the buffers past the new end of file,
and also invalidates the appropriate pages. (This was a system reliabiliy
and performance issue.)

10) Modify FFS to use vtruncbuf.

vm_object.c:
11) Make the object rundown mechanism for OBJT_VNODE type objects work
more correctly. Included in that fix, create pager entries for
the OBJT_DEAD pager type, so that paging requests that might slip
in during race conditions are properly handled. (This was a system
reliability issue.)

vm_page.c:
12) Make some of the page validation routines be a little less picky
about arguments passed to them. Also, support page invalidation
change the object generation count so that we handle generation
counts a little more robustly.

vm_pageout.c:
13) Further reduce pageout daemon activity when the system doesn't
need help from it. There should be no additional performance
decrease even when the pageout daemon is running. (This was
a significant performance issue.)

vnode_pager.c:
14) Teach the vnode pager to handle race conditions during vnode
deallocations.


34591 15-Mar-1998 msmith

Use dsname() to generate the disk region name for the "changing root
device to" message. Suppress this message if only the slice number
has changed.


34571 14-Mar-1998 tegge

On SMP systems, initially follow the MP spec with regard to which pin
on the IOAPIC being connected to the 8254 timer interrupt.
Verify that timer interrupts are delivered. If they aren't, attempt
a fallback to mixed mode (i.e. routing the timer interrupt via the 8259 PIC).


34569 14-Mar-1998 tegge

Don't use the standard macros for disabling/enabling interrupt.
On SMP systems, this left the mpintr_lock simplelock locked, causing
further calls to disable_intr to deadlock or panic.


34507 12-Mar-1998 bde

Fixed breakage of the !SMP case in vm_page_zero_idle() in the
previous commit. Opportunities to clean pages were often missed,
and leaving of the idle state was sometimes delayed until the next
interrupt (after any that occurred while cleaning).

Fixed an unstaticization, a syntax error and a style bug in the
previous commit.


34506 12-Mar-1998 bde

Don't depend on "implicit int" or bloat the data section in the
declaration of mem_devsw_installed.

Reduced include nesting.


34440 09-Mar-1998 eivind

Turn "PMAP_SHPGPERPROC" into a new-style option, add it to LINT, and
document it there.


34392 09-Mar-1998 msmith

"Correct behaviour" involves being consistent with the canonical names of
other partitions. In this case, they appear in the first slice in the
WHOLE_DISK_SLICE case.


34390 09-Mar-1998 msmith

Merge from 2.2; behave correctly in the presence of a slice number that
doesn't directly correspond to the slice field in the device minor number.


34309 08-Mar-1998 msmith

Construct the minor number for the root device taking into account the
slice number passed in by the bootblocks. This means the kernel will
not use the compatability slice to obtain the root filesystem when
booting from a sliced disk.

Use the extraction macros from reboot.h rather than stating them in full
again.


34206 07-Mar-1998 dyson

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


34197 07-Mar-1998 tegge

The APs now reload the interrupt descriptor table pointer after
f00f_hack has run.

Use the global r_idt descriptor in f00f_hack when in SMP mode,
so the APs find the relocated interrupt descriptor table.

Submitted by: Partially from David A Adkins <adkin003@tc.umn.edu>


34058 05-Mar-1998 tegge

Remove special handling for resuming clock interrupt when using APIC_IO.
The `generic' vector stubs do the right thing.


34057 05-Mar-1998 tegge

Use t_idt instead of idt inside setidt() if f00f_hack() has relocated the IDT.
Submitted by: Bruce Evans <bde@zeta.org.au>


34028 04-Mar-1998 dufault

Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.


34021 03-Mar-1998 tegge

When entering the apic version of slow interrupt handler, level
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.

Clock interrupts now have higher priority than other slow interrupts.


34020 03-Mar-1998 tegge

Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.


34019 03-Mar-1998 tegge

Reduce timeout before assuming that forwarding of hardclock or softclock
failed. Don't complain on forwarding failure, unless
BETTER_CLOCK_DIAGNOSTIC is defined.


33983 02-Mar-1998 peter

Update the ELF image activator to use some of the exec resources rather
than rolling it's own. This means that it now uses the "safe"
exec_map_first_page() to get the ld.so headers rather than risking a panic
on a page fault failure (eg: NFS server goes down).
Since all the ELF tools go to a lot of trouble to make sure everything
lives in the first page for executables, this is a win. I have not seen
any ELF executable on any system where all the headers didn't fit in the
first page with lots of room to spare.
I have been running variations of this code for some time on my pure ELF
systems.


33936 01-Mar-1998 dyson

1) Use a more consistent page wait methodology.
2) Do not unnecessarily force page blocking when paging
pages out.
3) Further improve swap pager performance and correctness,
including fixing the paging in progress deadlock (except
in severe I/O error conditions.)
4) Enable vfs_ioopt=1 as a default.
5) Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."


33929 28-Feb-1998 phk

Prevent the TSC from being used on APM machines, we have no idea if
it runs at a constant frequency. This was less of an issue before,
because the TSC only interpolated in the HZ intervals, but now where
the timecounter is used all the way, this becomes much more visible.

Nit: Fix a printf which triggered the bde-filter.


33817 25-Feb-1998 dyson

Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs. The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by: Partially from Tor Egge.


33753 23-Feb-1998 bde

Quick fix for the i8254 timecounter often gaining 10 msec.


33727 21-Feb-1998 jkh

Add missing CLOCK_UNLOCK() before write_eflags().
Submitted by: dave adkins <adkin003@tc.umn.edu>


33690 20-Feb-1998 phk

Replace TOD clock code with more systematic approach.

Highlights:
* Simple model for underlying hardware.
* Hardware basis for timekeeping can be changed on the fly.
* Only one hardware clock responsible for TOD keeping.
* Provides a real nanotime() function.
* Time granularity: .232E-18 seconds.
* Frequency granularity: .238E-12 s/s
* Frequency adjustment is continuous in time.
* Less overhead for frequency adjustment.
* Improves xntpd performance.

Reviewed by: bde, bde, bde


33676 20-Feb-1998 bde

Removed unused #includes.


33362 15-Feb-1998 bde

Removed a superstitious fnop() that broke the usefulness of the FPU's
"last instruction" pointer.


33320 13-Feb-1998 kato

Use RDMSR instruction instead of WRMSR.


33309 13-Feb-1998 bde

Update timer0_prescaler_count before calling hardclock() while timer0
is "acquired". This fixes a TSC biasing error of about 10 msec when
pcaudio is active.

Update `time' before calling hardclock() when timer0 is being released.
This is not known to be important.

Added some delays in writertc(). Efficiency is not critical here, unlike
in rtcin(), and we already use conservative delays there.

Don't touch the hardware when machdep.i8254_freq is being changed but
the maximum count wouldn't change. This fixes jitter of up to 10 msec
for most small adjustments to machdep.i8254_freq. When the maximum
count needs to change, the hardware should be adjusted more carefully.


33307 13-Feb-1998 bde

Ifdefed some npx code. npx should be optional again.


33306 13-Feb-1998 bde

Fixed missing privilege checking and off-by-1 bounds checking in
i386_set_ioperm(). Don't use a magic number for the bound.

Fixed missing bounds checking in i386_get_ioperm(). Don't use a
magic number for the bound elsewhere in this function.

Removed some bogus initializers.


33282 12-Feb-1998 bde

Fixed initialization of the 4MB page. Kernels larger than about 2.75MB
(from _btext to _end) crashed in pmap_bootstrap(). Smaller kernels
worked accidentally.


33281 12-Feb-1998 bde

Only use the i586-optimized copying and zeroing functions if they are
actually faster (more than 20% faster for zeroing 1 MB at boot time).
This fixes pessimized copying and zeroing on K6's and perhaps on other
CPUs that are misclassified as i586's.


33221 10-Feb-1998 eivind

Fix warning after previous staticization.


33181 09-Feb-1998 eivind

Staticize.


33179 09-Feb-1998 eivind

Remove warnings from f00f_hack.


33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


33109 05-Feb-1998 dyson

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


33068 04-Feb-1998 eivind

Make FAILSAFE a new-style option.


33056 03-Feb-1998 bde

Converted DISABLE_PSE to a new-style option.

Fixed some formatting in options.i386.


33051 03-Feb-1998 bde

Ifdefed some SMP and VM86 code. Note that although VM86 is not a global
option, the ifdef on it in a header works because only the name of the
VM86 extension is hidden.


32992 01-Feb-1998 bde

Declare printf() instead of including <stdio.h>, so that this doesn't
depend on anything outside of "sys".

Removed an unused include.

Don't use `extern' in a function declaration.


32937 31-Jan-1998 dyson

Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY. It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed. The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use. There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages. I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise. For now, we are stuck
with the current morass.


32929 31-Jan-1998 eivind

Make the debug options new-style.

This also zaps a DPT option from lint; it wasn't referenced from
anywhere.


32925 31-Jan-1998 eivind

Make POWERFAIL_NMI, PPS_SYNC and NATM new style options.

This also fixes a couple of defunct options; submitted by bde.


32917 31-Jan-1998 eivind

Include "opt_nfs.h"

Pointed out by: Eric L. Hernes <erich@lodgenet.com>


32889 30-Jan-1998 phk

Retire LFS.

If you want to play with it, you can find the final version of the
code in the repository the tag LFS_RETIREMENT.

If somebody makes LFS work again, adding it back is certainly
desireable, but as it is now nobody seems to care much about it,
and it has suffered considerable bitrot since its somewhat haphazard
integration.

R.I.P


32884 30-Jan-1998 dyson

Make the bounce buffer code a little more robust when space isn't
available. If there isn't bounce space available, the bounce code
is disabled. This will allow most large systems to run properly
when the bounce space is mistakenly allocated above 16MB.


32850 28-Jan-1998 phk

APM calls inittodr(0) which is stupid, but at least stop setting the
clock back to when Dennis had a good idea.


32820 27-Jan-1998 kato

Execute cpuid if BIOS disables cpuid instruction of Cyrix 6x86MX CPU.


32781 25-Jan-1998 kato

Undo previous commit. The cpuid symbol has been already used by SMP
stuff.

Pointed-out by: Manfred Antar <root@mantar.slip.netcom.com>


32771 25-Jan-1998 kato

Execute cpuid if BIOS disables cpuid instruction of Cyrix 6x86MX CPU,
and store its result into cpu_id and cpu_feature variables.

Tested by: Simon Coggins <chaos@ultra.net.au>


32765 25-Jan-1998 kato

Even though BIOS writer's guide recommends cpuid instruction of Cyrix
6x86MX CPU is enabled (BIOS should not disable it), some BIOS disables
it via CCR4. In this case, cpu variable becomes CPU_486 and
identblue() is called. Because Cyrix 6x86MX has MSR and doesn't have
MSR1002, wrmsr instruction generates general protection fault.

Tested by: Simon Coggins <chaos@ultra.net.au>


32726 24-Jan-1998 eivind

Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.

This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.


32724 24-Jan-1998 dyson

Add better support for larger I/O clusters, including larger physical
I/O. The support is not mature yet, and some of the underlying implementation
needs help. However, support does exist for IDE devices now.


32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


32617 19-Jan-1998 tegge

The removal of a page from the free queue in vm_page_zero_idle was
imcomplete. Also set m->queue, in order to prevent vm_page_select_free
from selecting the page being zeroed.


32585 17-Jan-1998 dyson

Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.


32518 15-Jan-1998 gibbs

Addition of splsoftvm and a VM SWI to handle bus dma related callbacks.
This SWI may be useful for other, defered, VM tasks.


32516 15-Jan-1998 gibbs

Implementation of Bus DMA for FreeBSD-x86. This is sufficient to do
page level bounce buffering, but there are still some issues left to
address.


32464 12-Jan-1998 dyson

Adjust upwards the size of exec map in order to take into account the
additional PAGE_SIZE needed for exec operatino.


32358 09-Jan-1998 eivind

Make the BOOTP family new-style options (in opt_bootp.h)


32203 03-Jan-1998 obrien

AMD calls the PR166 and PR200, models 2 and 3 respectively.


32200 03-Jan-1998 obrien

Update AMD URL for CPU recognition docs.


32199 03-Jan-1998 kato

Fix typo. Option `CPU_SUSP_HLT' didn't work on Cyrix 486DX box.

Submitted by: nyan@wyvern.cc.kogakuin.ac.jp (Takahashi Yoshihiro)


32164 01-Jan-1998 msmith

Don't try to call into BIOS32 handlers outside the normal ROM
address range. They may have been trashed earlier in the boot
process, or the directory header may simply be bogus.

PR: 5140
Submitted by: Joel Faedi <Joel.Faedi@esial.u-nancy.fr>
Brought-to-attention-by: Derek Inksetter <derek@saidev.com>, bde


32054 28-Dec-1997 phk

More cleanup relating to our use of the TSC.
Look in the cpu_feature (CPUID output) to see if we have it.


32052 28-Dec-1997 phk

wash, sort and put in order various nits from the i586_ctr -> tsc
commit.

Pointed out by: bde


32012 27-Dec-1997 peter

Back out previous commit, the so-called "unused code" was most definately
used, and caused a reference to an uninitialised variable (state).
I think I've fixed it now, but since nothing in the tree seems to use it,
I'm not sure.


32010 27-Dec-1997 peter

#include "opt_user_ldt.h" so that the #ifdef USER_LDT checks can work, as
commented about at length in the PR audit trail.

PR: 2412


32005 26-Dec-1997 phk

Rename "i586_ctr" to "tsc" (both upper and lower case instances).
Fix a couple of printfs too.

Warning: This changes the names of a couple of kernel options!


31934 22-Dec-1997 dyson

Correct my previous fix for the UPAGES problem.


31930 22-Dec-1997 dyson

Hopefully fix the problem with the TLB not being updated correctly.
Problem tracked down by bde@freebsd.org, but this is an attempted
efficient fix.


31723 15-Dec-1997 tegge

Add support for low resolution SMP kernel profiling.

- A nonprofiling version of s_lock (called s_lock_np) is used
by mcount.

- When profiling is active, more registers are clobbered in
seemingly simple assembly routines. This means that some
callers needed to save/restore extra registers.

- The stack pointer must have space for a 'fake' return address
in idle, to avoid stack underflow.


31720 15-Dec-1997 tegge

Don't forward hardclock or statclock to stopped cpus. Disable forwarding
when a panic has occured.


31709 14-Dec-1997 dyson

After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations. The number
of UP TLB flushes is decreased much more than significantly with these
changes. Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.


31689 12-Dec-1997 tegge

Add needed #include.

Problem found by: Bruce Evans <bde@zeta.org.au>


31639 08-Dec-1997 fsmp

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


31638 08-Dec-1997 fsmp

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

apic_vector.s also contains a small change I (smp) made to eliminate
the double level INT problem. It seems stable, but I haven't the tools
in place to prove it fixes the problem.

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


31564 06-Dec-1997 sef

Changes to allow event-based process monitoring and control.


31544 04-Dec-1997 jmg

document and make the NO_F00F_HACK a proper option...

also, sort some option includes while I'm here..

Forgotten by: sef


31535 04-Dec-1997 jkh

After consultation with David, change
#ifndef NO_F00F_HACK
to
#if defined(I586_CPU) && !defined(NO_F00F_HACK)


31515 03-Dec-1997 sef

Make has_f00f_bug extern, and get rid of some unused code in the f00f
code.

Submitted by: Mikael Karpberg & Cy Schubert


31507 03-Dec-1997 sef

Work around for the Intel Pentium F00F bug; this is Intel's recommended
workaround. Note that this currently eats up two pages extra in the system;
this could be alleviated by aligning idt correctly, and then only dealing with
that (as opposed to the current method of allocated two pages and copying the
IDT table to that, and then setting that to be the IDT table).


31424 26-Nov-1997 joerg

Removed an unused line of code, that caused an ``maybe used uninitialized''
warning.

Found by: Simon Shapiro


31397 24-Nov-1997 bde

Fixed multiple definitions of boothowto.


31395 24-Nov-1997 bde

Added a sysctl (machdep.cputime_clock) to select the clock used by
"high resolution" profiling. The available clocks are:
- the i8254 clock
- on non-SMP i586's and i686's: the TSC
- on systems with I586_PMC_GUPROF configured, and PERFMON configured
and available: all the performance counters.
This is unfinshed (there are problems with locking out the PERFMON
device driver, and with losing calibration after switching the clock),
but better than static configuration or writing to kmem.

Changed ifdefs to avoid generating code for non-working option
combinations.


31389 24-Nov-1997 bde

Fixed some #include messes.

Hid the check of the user %cs in syscall() under `#ifdef DIAGNOSTIC'.


31338 21-Nov-1997 jlemon

Correct CPU_CYRIX_NO_LOCK fix.
PR: 5121
Pointed out by: Matthew Hunt


31337 21-Nov-1997 bde

Fixed setting of `safepri'. It should be SWI_AST_MASK most of the
time, but was left at 0. This caused the "can't happen" case in
splz_swi to happen for panics when tsleep() calls splx(safepri)
and there is a SWI_AST pending. This was harmless because the
the error handling happens to be right. Debugging this was tricky
because debugger traps force SWI_AST_MASK on in `cpl'.


31336 21-Nov-1997 bde

Moved splhigh()/spl0() calls from isa_configure() to configure() so that
there is a natural place to initialize `safepri' in a future commit.
Spinoffs:
- spl0() gets called in the unlikely event that isa is not configured.
- configure() has better control over enabling interrupts.
- it is now less unclear that interrupts aren't actually enabled early.
Rev.1.48 of autoconf.c seems to have done the opposite of what was
intended - moving the isa_configure() call delayed the spl0() side
effect.
Added some comments about the bogons. Removed the splhigh() call since
it is a no-op.


31328 21-Nov-1997 peter

Previous commit refers to SWAP_PART, which is only defined if the include
file that it's in is #included...


31322 20-Nov-1997 bde

Removed a duplicate (sloppy common-style) definition.

Fixed some style bugs.


31321 20-Nov-1997 bde

Moved some extern declarations to header files (unused ones to /dev/null).


31319 20-Nov-1997 bde

Avoid passing some more `retval's.


31318 20-Nov-1997 bde

Fixed wrong limits for the kernel text in db_numargs(). The
interval [VM_MIN_KERNEL_ADDRESS, etext] was used instead of
[btext, etext). Added a comment about this being completely
wrong for LKMs. This only affects interpreting the instructions
after the return to attempt decide the number of args. The
attempt usually fails anyway.


31317 20-Nov-1997 bde

Fixed write enabling of the kernel text section. The overlap
checking was mostly wrong at the boundaries. For the lower limit,
VM_MIN_KERNEL_ADDRESS was used instead of btext and there was an
off-by-(`size' - 1) error. For the upper limit, &etext was used
instead of etext and there was an off-by-1 error. The bugs were
harmless because `size' is not too large and some memory is mapped
just beyond the ends. We still depend on the former to avoid
having to handle the case where the memory range covers the whole
text section, and on the latter to prevent problems when we map
just beyond an end to allow writing an address range that overlaps
the end.

Fixed placement of a nearby comment.


31316 20-Nov-1997 bde

Don't allow setting the dump device to any partition except the
one traditionally reserved for swap devices. The restrictions
should now be the same as the ones for dumpsys(). The restriction
on the partition should be removed someday, and dumpsys() shouldn't
repeat all the checks.


31255 18-Nov-1997 bde

Removed an unused #included.

Ifdefed #includes that are not used in the SMP case.


31253 18-Nov-1997 bde

Removed #unused includes.

Added a used #include (don't depend on yet to be fixed namespace pollution).


31249 18-Nov-1997 bde

Don't #include <machine/smp.h> even in the SMP case. Fixed the one
place that depended on it. The "bazillion warnings" mentioned in the
log for rev.1.45 apparently aren't a problem any more. It is hard
to be sure because the SIMPLELOCK_DEBUG option turns off (and breaks)
things in the SMP case.


31030 07-Nov-1997 tegge

Use UPAGES when setting up private pages for SMP (which includes idle stack).


31017 07-Nov-1997 phk

Rename some local variables to avoid shadowing other local variables.

Found by: -Wshadow


31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


30976 06-Nov-1997 kato

Identify MediaGX CPU correctly. Old MeidaGX CPU and GXm CPU are
distinguished. CPU-classes of MeidaGX CPU and GXm CPU are 486-class
and 586-class, respectively.

PR: 4936


30964 05-Nov-1997 kato

Fix rare 6x86 CPU whose DIR0 = 0x20 - 0x28 case.


30918 04-Nov-1997 kato

Use same address for USERCONFIG_BOOT on PC-98 as IBM-PC.

Submitted by: H. Nokubi <h-nokubi@nmit.tmg.nec.co.jp>
Forgotten by: kato


30813 28-Oct-1997 bde

Removed unused #includes.


30805 28-Oct-1997 bde

Don't include <machine/cputypes.h> or declare cputype/class interfaces
in <machine/cpu.h>. Moved the declarations to <machine/cputypes.h>.
Fixed style bugs in the moved code. Fixed everything that depended on
the nested include. Don't include <machine/cpu.h> (in the changed files)
unless something in it is used directly.


30789 27-Oct-1997 bde

Moved declaration of etext from <machine/md_var.h> to <machine/cpu.h>
and fixed everything that dependended on it being declared in the old
place. It is used in "machine-independent" code in subr_prof.c.

Moved declaration of btext from subr_prof.c to <machine/cpu.h>. It
is machine-dependent.


30788 27-Oct-1997 bde

Oops, <machine/psl.h> is used unconditionally in -current.


30786 27-Oct-1997 bde

Cleaned up #includes.

Ifdefed conditionally used includes.

Finished changing indentation of per-statement comments to 40.


30754 27-Oct-1997 dyson

Check to see if the pv_limits are initialized before checking.


30732 26-Oct-1997 dyson

Change the initial amount of memory allocated for pv_entries to be proportional
to the amount of system memory. Also, clean-up some of the new pv_entry
mgmt code.


30720 26-Oct-1997 nate

- Do a bunch of gratuitous changes intended to make the code easier to
follow.
* Rename/reorder all of the pccard structures, change many of the member
names to be descriptive, and follow more closely other 'bus' drivers
naming schemes.
* Rename a bunch of parameter and local variable names to be more
consistant in the code.
* Renamed the PCCARD 'crd' device to be the 'card' device
* KNF and make the code consistant where it was obvious.
* ifdef'd out some unused code


30702 25-Oct-1997 dyson

Somehow an error crept in during the previous commit.


30701 25-Oct-1997 dyson

Support garbage collecting the pmap pv entries. The management doesn't
happen until the system would have nearly failed anyway, so no signficant
overhead is added. This helps large systems with lots of processes.


30700 24-Oct-1997 dyson

Decrease the initial allocation for the zone allocations.


30623 21-Oct-1997 msmith

Reference the DMI table inside the SMBIOS table correctly, not using a variable
that won't be initialised until a later test.
Submitted by: bde via -Wunused


30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


30343 12-Oct-1997 peter

Try and fix some style problems


30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


30275 10-Oct-1997 peter

Compensate for pcb.h tweaks.

(Bruce pointed out the nesting)


30273 10-Oct-1997 peter

GPROC0_SEL isn't used in any *.s files it seems..


30265 10-Oct-1997 peter

Convert the VM86 option from a global option to an option only depended
on by the files that use it. Changing the VM86 option now only causes
a recompile of a dozen files or so rather than the entire kernel.


30162 06-Oct-1997 kato

Added two Cyrix 6x86/6x86MX options.

- CPU_CYRIX_NO_LOCK enables weak locking. If this option is not set and
FAILESAFE is defined, NO_LOCK bit of CCR1 is cleared.
- CPU_WT_ALLOC enables write-through allocation.


30136 06-Oct-1997 dyson

It is possible that MB's with really broken bios's not set up more of
the mtrr registers. This just fills in more of the registers.


30112 05-Oct-1997 dyson

Make sure that the memory type registers are the same for each CPU
in a P6 SMP system. Some MB bios'es don't set the registers up correctly
for the AP's. Additionally, set the memory between 0xa0000 and 0xbffff
as write combining.


30082 03-Oct-1997 kato

Call identifycyrix() when 6x86MX CPU is found. The identifycyrix()
function sets cyrix_did. Old code could not display correct variable.

Reviewed by: Hideyuki Suzuki <hideyuki@sat.t.u-tokyo.ac.jp>


29945 28-Sep-1997 gibbs

Fix a serious bug I introduced while adding in support for CAM interrupts.
It seems I didn't count my 0's properly when adding the new masks into
icu_vector.s pushing SWI_AST_MASK off the end of the array and screwing
up the indexing for SWI_CLOCK_MASK.

Fix the bug icu_vector.s and also reformat the code in both icu_vector.s and
apic_vector.s so that it will be much harder to make the same mistake in
the future.

Submitted by: Bruce Evans <bde@zeta.org.au>


29851 25-Sep-1997 dg

Fix a bug where the speculative memory probe wouldn't occur on systems that
report slightly more than 64MB of total memory. This can happen due to the
total being the sum of both base and extended memory.
Submitted by: Alan Cox <alc@cs.rice.edu>


29789 24-Sep-1997 phk

Look for another couple of magic bios things..


29742 23-Sep-1997 bde

Moved setconf() call after root configuration again. This fixes a
null pointer panic in the "generic" version of setconf().

Removed the resulting near-duplicate printf.


29702 22-Sep-1997 peter

Turn on CR4_VME on the AP's the same as the BSP. Note that we do not
[yet] probe the AP's for their cpuid/capabilities etc, so this is a fudge
at best.

Problem noted by: Jonathan Lemon <jlemon@americantv.com>


29677 21-Sep-1997 gibbs

aha1542.c aic6360.c cy.c fd.c ft.c
if_ie.c if_wl.c if_zp.c isa.c isa_device.h
labpc.c mcd.c ncr5380.c scd.c seagate.c si.c
sio.c tw.c ultra14f.c wcd.c wd.c:

Update for changes in the callout interface.

apic_vector.s icu_vector.s ipl.s ipl_funcs.c:

Add CAM software/hardware interrupt support.


29675 21-Sep-1997 gibbs

autoconf.c:
Add cpu_rootconf and cpu_dumpconf so that configuring these
two devices can be better controlled by the MI configuration
code.

machdep.c:
MD initialization code for the new callout interface.

trap.c:
Add support for printing out whether cam interrupts are masked
during a panic.


29663 21-Sep-1997 peter

Implement the parts needed for VM86 under SMP.


29655 21-Sep-1997 dyson

Add support for more than 1 page of idle process stack on SMP systems.


29639 20-Sep-1997 phk

For AMD chips, pick up the long description from the chip if
possible. (This is not really a typographical improvement in the
case of the K6 it seems, but AMD appearantly want it too look
that way). Also if bootverbose, dump some more info about the
chip.


29368 14-Sep-1997 peter

Update select -> poll in drivers.


29330 13-Sep-1997 joerg

Revert the logic behind my last change, and use a function called
`is_physical_memory()' now for the decision whether to dump some
region of memory or not.

Suggested by: davidg


29280 10-Sep-1997 joerg

Do not ever try to coredump adapter memory regions.

PR: 4486
Submitted by: tegge@idi.ntnu.no (Tor Egge)

Implement a function is_adapter_memory() in order to determine what
should nto be dumped at all. Currently, only populated with the ``ISA
memory hole''. Adapter regions of other busses should be added.


29243 09-Sep-1997 jmg

add neccessary calls to autoconf for pnp,

also teach userconfig about the new pnp commands, for usage see pnp(4)


29213 07-Sep-1997 fsmp

General cleanup of the lock pushdown code. They are grouped and enabled
from machine/smptests.h:

#define PUSHDOWN_LEVEL_1
#define PUSHDOWN_LEVEL_2
#define PUSHDOWN_LEVEL_3
#define PUSHDOWN_LEVEL_4_NOT


29174 07-Sep-1997 dyson

Fix an intermittent problem during SMP code operation. Not all of the
idle page table directories for all of the processors was being updated
during kernel grow operations. The problem appears to be gone now.


29151 05-Sep-1997 peter

Argh, what was I thinking?? Don't (yet) halt the CPU in the idle loop
while waiting for an interrupt (rather than spinning on the runqueue status
bits), since the other cpu can put stuff in there and the sleeping cpu may
not get an interrupt for a while. When we have a reschedule IPI, this can
come back.

Pointed out by: fsmp


29128 05-Sep-1997 peter

Cosmetic adjustment for the trap/double fault/panic cpu id listing.
It now prints the apic id in hex rather than decimal.


29110 04-Sep-1997 dg

Cosmetic change to last commit: speculative_mtest -> speculative_mprobe.


29109 04-Sep-1997 dg

Changed the memory sizing code so that if the following conditions
are met:

1) The BIOS indicates that there is exactly 64MB of RAM, and
2) The memory size isn't specified with the MAXMEM option or
the npx0 msize hack,

...then do a speculative memory probe beyond the 64MB's until the
first bad page is encountered. This is an admitted hack, but should
nonetheless deal with detecting the correct amount of memory in nearly
all of the modern systems with >64MB of RAM.
Also made a change that will cause the list of detected memory chunks
to be printed if bootverbose is set.


29041 02-Sep-1997 bde

Removed unused #includes.


29000 01-Sep-1997 fsmp

General cleanup of the sub-system locking macros.
Eliminated the RECURSIVE_MPINTRLOCK.
clock.c and microtime use clock_lock.
sio.c and cy.c use com_lock.

Suggestions by: Bruce Evans <bde@zeta.org.au>


28999 01-Sep-1997 fsmp

Cleanup.


28984 01-Sep-1997 bde

Move closer to supporting VM86 under SMP.

LINT now compiles but doesn't link. Other link-time breakage for LINT
is now visible (SMP is incompatible with SIMPLELOCK_DEBUG).
Submitted by: jlemon


28981 01-Sep-1997 bde

Removed unused #includes.


28976 31-Aug-1997 bde

Fixed options SHOW_BUSYBUFS and PANIC_REBOOT_WAIT_TIME which were broken
by incomplete cutting and pasting from machdep.c to kern_shutdown.c.

PR: 3953


28951 31-Aug-1997 fsmp

Debug version of simple_lock. This will store the CPU id of the
holding CPU along with the lock. When a CPU fails to get the lock
it compares its own id to the holder id. If they are the same it
panic()s, as simple locks are binary, and this would cause a deadlock.

Controlled by smptests.h: SL_DEBUG, ON by default.

Some minor cleanup.


28921 30-Aug-1997 fsmp

Another round of lock pushdown.
Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel.
UP kernel expects that this is enough to guarantee exclusive access to
regions of code bracketed by these 2 functions.
Add a simplelock to bracket clock accesses in clock.c: clock_lock.

Help from: Bruce Evans <bde@zeta.org.au>


28910 29-Aug-1997 fsmp

Support for the new FAST_HI algorithm, enabled.

Preliminary support for the INTR_SIMPLELOCK algorithm, disabled.
Note that this code is NOT ready.


28909 29-Aug-1997 fsmp

Support for the new FAST_HI algorithm.
Improved interrupt handling, fewer silo overflows.

With help from: dave adkins <adkin003@gold.tc.umn.edu>


28872 28-Aug-1997 jlemon

Remove the vm86 support as an LKM, and link it directly into the kernel
if 'options "VM86"' is in the config file. The LKM was really for
development, and has probably outlived its usefulness.


28809 26-Aug-1997 peter

Correct some things I forgot about until it was too late with smp_active.
smp_active = 1 used to indicate that the system had frozen previously
started AP's, while smp_active = 0 was "AP's not yet started". I have split
this into smp_started (which is set when the AP's come online), and
smp_active is left for turning on/off AP scheduling.


28808 26-Aug-1997 peter

Clean up the SMP AP bootstrap and eliminate the wretched idle procs.

- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)

More cleanup to follow, this is more of a checkpoint than a
'finished' thing.


28747 25-Aug-1997 bde

Finished (?) support for DISABLE_PSE option. 2-3MB of kernel vm was sometimes
wasted.

Fixed type mismatches for functions with vm_prot_t's as args. vm_prot_t
is u_char, so the prototypes should have used promoteof(u_char) to match
the old-style function definitions. They use just vm_prot_t. This depends
on gcc features to work. I fixed the definitions since this is easiest.
The correct fix may be to change vm_prot_t to u_int, to optimize for time
instead of space.

Removed a stale comment.


28743 25-Aug-1997 bde

Removed a bogus comment.


28717 25-Aug-1997 peter

s/.align/.p2align/ so that we get the same results when building elf
objects (the tools are a bit better)


28669 24-Aug-1997 fsmp

A clean fix for the spl "deadlock before smp_active" problem.

Added a new variable, 'bsp_apic_ready', which is set as soon as the bootstrap
CPU has initialized its local APIC. Conditionalize the GENSPLR functions
to call ss_lock ONLY after bsp_apic_ready is TRUE; This should prevent
any problems with races between the time the 1st AP becomes ready and the
time smp_active is set.


28641 24-Aug-1997 fsmp

The last of the encapsolation of cpl/spl/ipending things into a critical
region protected by the simplelock 'cpl_lock'.

Notes:

- this code is currently controlled on a section by section basis with
defines in machine/param.h. All sections are currently enabled.

- this code is not as clean as I would like, but that can wait till later.

- the "giant lock" still surrounds most instances of this "cpl region".
I still have to do the code that arbitrates setting cpl between the
top and bottom halves of the kernel.

- the possibility of deadlock exists, I am committing the code at this
point so as to exercise it and detect any such cases B4 the "giant lock"
is removed.


28551 21-Aug-1997 bde

#include <machine/limits.h> explicitly in the few places that it is required.


28496 21-Aug-1997 charnier

Revert my previous commit about using CS_SECURE macro.
Requested by: Bruce.


28487 21-Aug-1997 fsmp

Made PEND_INTS default.
Made NEW_STRATEGY default.
Removed misc. old cruft.

Centralized simple locks into mp_machdep.c
Centralized simple lock macros into param.h

More cleanup in the direction of making splxx()/cpl MP-safe.


28442 20-Aug-1997 fsmp

Preperation for moving cpl into critical region access.
Several new fine-grained locks.
New FAST_INTR() methods:
- separate simplelock for FAST_INTR, no more giant lock.
- FAST_INTR()s no longer checks ipending on way out of ISR.
sio made MP-safe (I hope).


28359 18-Aug-1997 charnier

Use CS_SECURE macro.
Reviewed by: John Dyson


28124 12-Aug-1997 dyson

Back out a part of the disk scheduling "improvements" :-(. Let me know
how the system works now!!!


28044 10-Aug-1997 fsmp

Oops, fix breakage to UP kernel.


28043 10-Aug-1997 fsmp

Added trap specific lock calls: get_fpu_lock, etc.
All resolve to the GIANT_LOCK at this time, it is purely a logical partitioning.


28041 10-Aug-1997 fsmp

Cheap fix for kern/4255.
If the problem is seen this fix suggests a compile-time work-around then panics.


28027 09-Aug-1997 fsmp

Some fixes towards making "default configs" work again.
Still not fixed, no idea why.

Debug help from: "Thomas D. Dean" <tomdean@ix.netcom.com>


28026 09-Aug-1997 fsmp

Minor conditionalization of XXX_MPLOCK on PEND_INTS.


28023 09-Aug-1997 fsmp

Added 'lock' instruction before 3 places that update ipending.
This may or may not fix the "high IO freezes SMP kernel" problem.


28013 09-Aug-1997 dyson

Modify the scheduling policy to take into account disk I/O waits
as chargeable CPU usage. This should mitigate the problem of processes
doing disk I/O hogging the CPU. Various users have reported the
problem, and test code shows that the problem should now be gone.


27993 09-Aug-1997 dyson

VM86 kernel support.
Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>,
Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>,
and probably alot of others.
Submitted by: Jnathan Lemon <jlemon@americantv.com>


27950 07-Aug-1997 dyson

Fix the DDB breakpoint code when using the 4MB page support.


27947 07-Aug-1997 dyson

More vm_zone cleanup. The sysctl now accounts for items better, and
counts the number of allocations.


27940 06-Aug-1997 peter

printf does not understand %hd in the kernel


27923 05-Aug-1997 dyson

Another attempt at cleaning up the new memory allocator.


27922 05-Aug-1997 dyson

Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use
the new zone code in pmap.c so that we can get rid of the ugly ad-hoc
allocations in pmap.c.


27906 05-Aug-1997 msmith

memcmp -> bmcp
Submitted by: smp, bde


27904 05-Aug-1997 dyson

Modify pmap to use our new memory allocator.


27903 05-Aug-1997 dyson

Slightly reorder some operations so that the main processor gets global
mappings early on.


27902 05-Aug-1997 dyson

Remove the PMAP_PVLIST conditionals in pmap.*, and another unneeded define.


27899 05-Aug-1997 dyson

Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator. This new allocator will also be
used for machine dependent pmap PV entries.


27893 04-Aug-1997 fsmp

Eliminate frequent silo overflows by restoring the TEST_LOPRIO code.
This code was eliminated when the PEND_INTS algorithm was added. But it was
discovered that PEND_INTS only worsen latency for FAST_INTR() routines,
which can't be marked pending.

Noticed & debugged by: dave adkins <adkin003@gold.tc.umn.edu>


27872 04-Aug-1997 msmith

Correctly checksum the DMI signature structure. Format the BSD revision
number therein.

Report from: dave adkins <adkin003@gold.tc.umn.edu>


27823 01-Aug-1997 msmith

Support functions for working with x86 PC-architecture BIOS.
Initially functionality is confined to 32-bit BIOS functions, however
it is envisioned that BIOS support may be enlisted for other
activities in the future.


27780 31-Jul-1997 fsmp

Converted the TEST_LOPRIO code to default.
Created mplock functions that save/restore NO registers.
Minor cleanup.


27779 31-Jul-1997 fsmp

Converted the TEST_LOPRIO code to default.
removed PEND_INTS 1st try
direct call to MPtrylock


27728 28-Jul-1997 fsmp

Modified the PEND_INTS algorithm to fix the ISA INT loss problem.

Noticed by: dave adkins <adkin003@gold.tc.umn.edu> and others.


27697 26-Jul-1997 fsmp

mpapic.c & mp_machdep:
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.

mplock.s:
- minor update of try_mplock for new algorithm where a CPU uses try_mplock
instead of get_mplock in the ISRs.


27696 26-Jul-1997 fsmp

clock.c:
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.

apic_vector.s:
- new algorithm where a CPU uses try_mplock instead of get_mplock:
if successful continue as before.
if fail set ipending bit, mask INT (to avoid recursion), cleanup & iret.

This allows the CPU to return to successful work, while the ISR will be run
by the CPU holding the lock as part of the doreti dance.


27654 24-Jul-1997 kato

Treat 6x86MX CPU as 686-class CPU instead of 586-class CPU.


27634 23-Jul-1997 fsmp

New simple_lock code in asm:
- s_lock_init()
- s_lock()
- s_lock_try()
- s_unlock()

Created lock for IO APIC and apic_imen (SMP version of imen)
- imen_lock

Code to use imen_lock for access from apic_ipl.s and apic_vector.s.
Moved this code *outside* of mp_lock.

It seems to work!!!


27616 22-Jul-1997 fsmp

Last commit didn't take, operator error???


27615 22-Jul-1997 fsmp

Hid the existance of imen via a dump routine.


27567 21-Jul-1997 fsmp

Made the SMP case ignore the possibility of an INT13 interface.
This eliminates all the APIC code, and thus several routines that
would otherwise need to be made MP-safe.

Reviewed by: Bruce Evans <bde@zeta.org.au>


27566 21-Jul-1997 dyson

Fix a crash that has manifest itself while running X after the 4MB
page upgrades.


27563 20-Jul-1997 fsmp

Developed a new strategy for handling the 8254/8259/APIC issue.


27561 20-Jul-1997 fsmp

Minor cleanup.
Pass string arg to apic_dump.
Moved bootverbose printing of SMP enabled INTs from clock.c to autoconf.c


27560 20-Jul-1997 fsmp

Minor cleanup.


27555 20-Jul-1997 bde

Removed unused #includes.


27535 20-Jul-1997 bde

Removed unused #includes.


27523 19-Jul-1997 fsmp

Added code to support #define APIC_PIN0_TIMER.
This code ALWAYS runs the 8254 timer thru the 8259 ICU.
It depricates the usage of "options SMP_TIMER_NC" in the config file.


27522 19-Jul-1997 fsmp

Added #code to support define APIC_PIN0_TIMER.
This code ALWAYS runs the 8254 timer thru the 8259 ICU.
It depricates the usage of "options SMP_TIMER_NC" in the config file.


27520 19-Jul-1997 fsmp

SMP or APIC_IO:
- Increased NIDT to 256.
- Moved IPI vectors up above the linux compat vector.
- Removed runtime setup of RTC vector.


27517 18-Jul-1997 fsmp

Split TEST_CPUSTOP code into CPUSTOP_ON_DDBBREAK and mainline code.


27490 18-Jul-1997 fsmp

Made the printing of the APIC INTs depend on bootverbose.


27489 18-Jul-1997 fsmp

printf cleanup.


27484 17-Jul-1997 dyson

Hopefully fix a few problems that could cause hangs in SMP mode.
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.

We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...


27464 17-Jul-1997 dyson

Add support for 4MB pages. This includes the .text, .data, .data parts
of the kernel, and also most of the dynamic parts of the kernel. Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)

The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.


27462 17-Jul-1997 peter

Remove the disable for the P5 cpu class bcopy using the FPU on SMP kernels,
it is understood to work now (and has been for quite a while apparently).


27424 15-Jul-1997 kato

Oops, added popfl after trynexgen label.

PR: 4091
Submitted by: Kazutaka YOKOTA <yokota@zodiac.mech.utsunomiya-u.ac.jp>


27408 15-Jul-1997 fsmp

Cleanup.


27407 15-Jul-1997 fsmp

Tighten up asm code for TEST_PRIO and other misc. things.
Use some new defines in place of "magic numbers".


27406 15-Jul-1997 fsmp

Tighten up asm code for EOI access.


27353 13-Jul-1997 fsmp

new code to control other CPUs: stop_cpus()/restart_cpus()/_Xstopcpu
this code is controlled by smptests.h: TEST_CPUSTOP, OFF by default

new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
this code is controlled by smptests.h: TEST_ALTTIMER, ON by default


27352 13-Jul-1997 fsmp

Cleanup old stop_cpus/restart_cpus() cruft.
new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
new code to control other CPUs: stop_cpus()/restart_cpus()/_Xstopcpu


27289 08-Jul-1997 fsmp

General cleanup of APIC code.
stop_cpus()/restart_cpus() STILL not working!


27288 08-Jul-1997 fsmp

Minor cleanup of APIC code.


27255 07-Jul-1997 fsmp

stop_cpus(), currently BROKEN! (turned off in smptests.h by default).
restart_cpus(), currently BROKEN! (turned off in smptests.h by default).


27251 06-Jul-1997 fsmp

First cut at code for handling "spurious INTerrupts".
First cut at code for handling CPU stop/restart.

Notes:
not working properly yet.


27250 06-Jul-1997 fsmp

#ifdef out debug for now...


27133 01-Jul-1997 bde

Un-inline a call to spl0(). It is not time critical, and was only inline
because there was no non-inline spl0() to call.

Don't frob intr_nesting_level in idle() or cpu_switch(). Interrupts
are mostly disabled then, so the frobbing had little effect.


27131 01-Jul-1997 bde

Un-inline a call to spl0(). It is not time critical, and was only inline
because there was no non-inline spl0() to call.


27007 27-Jun-1997 fsmp

apic_vector.s:
- added Xcpustop IPI code to support stop_cpus()/restart_cpus().
it is off by default, enable via smptests.h:TEST_CPUSTOP

intr_machdep.h:
- moved +ICULEN to lower level.
- added entry for Xcpustop.


27005 27-Jun-1997 fsmp

Added POST code output to various points of the startup code.

General cleanup.

New functions to stop/start CPUs via IPIs:

- int stop_cpus( u_int map );
- int restart_cpus( u_int map );

Turned off by default, enabled via smptests.h:TEST_CPUSTOP.
Current version has a BUG, perhaps a deadlock?


27004 27-Jun-1997 fsmp

Experimental calls to stop_cpus()/restart_cpus() within breakpoint calls.
Turned off by default in smptests.h.


27003 27-Jun-1997 fsmp

Added other_cpus to CPU private page.

This variable is a bitmap showing all CPUs present EXCEPT the CPU
owning the variable. In other words, it is equal to the global bitmap
'all_cpus' minus its own bit.


27001 27-Jun-1997 fsmp

Program lint1 to handle NMIs.

Till now NMIs would be ignored. Now an NMI is caught by the BSP.
APs still ignore NMI, am working on code to allow a CPU to stop other CPUs
via an IPI.


26994 27-Jun-1997 fsmp

Removed '#include <machine/smptests.h>' line, no longer needed.


26985 27-Jun-1997 kato

Added CPU_DIRECT_MAPPED_CACHE option which sets L1 cache in direct
mapped mode on Cyrix 486DLC box.


26954 26-Jun-1997 tegge

Back out a bad commit.


26950 25-Jun-1997 fsmp

Merged/renamed functions:

- get_isa_apic_mask() -> isa_apic_mask()
- get_isa_apic_irq() && get_eisa_apic_irq() -> isa_apic_pin()
- get_pci_apic_irq() -> pci_apic_pin()


26949 25-Jun-1997 fsmp

Modified to use merged/renamed functions:

- get_isa_apic_mask() -> isa_apic_mask()
- get_isa_apic_irq() && get_eisa_apic_irq() -> isa_apic_pin()


26944 25-Jun-1997 tegge

Allow kernel configuration file to override PMAP_SHPGPERPROC. The default
value (200) is too low in some environments, causing a fatal
"panic: get_pv_entry: cannot get a pv_entry_t". The same panic might
still occur due to temporary shortage of free physical memory
(cf. PR i386/2431).


26943 25-Jun-1997 tegge

Block some interrupts during the call to pmap_zero_page in
vm_page_zero_idle. This fixes some occurences of the problem
reported in PR kern/3216: "panic: pmap_zero_page: CMAP busy"


26896 24-Jun-1997 tegge

Ensure that the boot CPU honours write protection in kernel mode.
This fixes one of the problems noted in PR kern/3688.


26888 24-Jun-1997 kato

Recognize AMD K5 PR166 and PR200 CPUs.


26886 24-Jun-1997 fsmp

Fix calculation of initial mplock value.
We now use LOGICAL, not PHYSICAL, IDs to calculate the mplock.


26882 24-Jun-1997 fsmp

Fixed breakage for "default" configurations in mptable_pass1().


26812 22-Jun-1997 peter

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


26811 22-Jun-1997 peter

Kill some stale leftovers from the earlier attempts at SMP per-cpu pages


26659 15-Jun-1997 wollman

Fix another power down braino.


26657 15-Jun-1997 wollman

When APM is configured, turn off the power when halting for good.


26494 07-Jun-1997 bde

Preserve %fs and %gs across context switches. This has a relatively low
cost since it is only done in cpu_switch(), not for every exception.
The extra state is kept in the pcb, and handled much like the npx state,
with similar deficiencies (the state is not preserved across signal
handlers, and error handling loses state).


26447 04-Jun-1997 pst

Document a non-standard gdbremote protocol extension (kludge, really)
that I snuck in to our GDB last year. This allows you to debug headless
machines by sharing the console port between the debugger and the system
console. It's not 100% reliabile, but it works well. It's optional
and disabled by default.
Submitted by: Juniper Networks


26388 02-Jun-1997 peter

Fill in some gaps in the cpuid features list..
bit 10 is the old bit for MTRR (presumably this changed, an older P5 I
have has got it, the newer cpus have the new MTRR bit set)
bit 11 is SEP (fast syscalls), bit 23 is MMX
Fill in the other reserved ones with a stub so that we can see them if
they turn up.

Obtained from: Intel AP-485 rev.06


26379 02-Jun-1997 dfr

Change isa_device.h to intr_machdep.h


26373 02-Jun-1997 dfr

Move interrupt handling code from isa.c to a new file. This should make
isa.c (slightly) more portable and will make my life developing the really
portable version much easier.

Reviewed by: peter, fsmp


26309 31-May-1997 peter

Include file updates.. <machine/spl.h> -> <machine/ipl.h>, add
<machine/ipl.h> to those files that were depending on getting SWI_*
implicitly via <machine/cpufunc.h>


26302 31-May-1997 peter

The SWI_NET_MASK and SWI_TTY_MASK handlers are now back adjacent to the
top of the hardware interrupt handlers. Apparently this is slightly
faster with the bit scanning instruction that looks these up - this set of
changes reverts the original change.

Reviewed by: bde


26298 31-May-1997 kato

- Use `6x86MX' instead of `M2'. Cyrix officially use `6x86MX' for the
CPU code-named `M2'.

- Use the result of cpuid instruction instead of DIR to identify
6x86MX cpu. DIR0 and DIR1 are not documented in the data sheet, and
cpuid instruction is enabled at reset time.

- Add a function, init_6x86MX() to initialize 6x86MX cpu. It supports
CPU_SUSP_HLT and CPU_IORT options. It always sets NC1 (640K - 1M is
not cached.), and enables L1 cache in write-back mode.

- Fix typo in the comment in identblue().


26270 29-May-1997 fsmp

Code such as apic_base[APIC_ID] converted to lapic__id

Changes to pmap.c for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.


26267 29-May-1997 peter

remove no longer needed opt_smp.h includes


26266 29-May-1997 peter

minor style police (recent divergence from KNF code)


26265 29-May-1997 peter

remove opt_smp.h and fix the reason it was needed.


26264 29-May-1997 peter

No longer need opt_smp.h here


26203 27-May-1997 fsmp

Nuke the printing of the unredirect message unless bootverbose.


26171 26-May-1997 fsmp

Fix breakage from my last commit where mp_start() was missing from UP builds.


26169 26-May-1997 fsmp

Changed inclusion of isa/icu.s to isa/ipl.s.
This is part of the breakup of UP/SMP specific INTerrupt code.


26168 26-May-1997 fsmp

Split vector.s into UP and SMP specific files:
- vector.s <- stub called by i386/exception.s
- icu_vector.s <- UP
- apic_vector.s <- SMP

Split icu.s into UP and SMP specific files:
- ipl.s <- stub called by i386/exception.s (formerly icu.s)
- icu_ipl.s <- UP
- apic_ipl.s <- SMP

This was done in preparation for massive changes to the SMP INTerrupt
mechanisms. More fine tuning, such as merging ipl.s into exception.s,
may be appropriate.


26155 26-May-1997 fsmp

Added a test called 'LATE_START'.

This is now the default, it delays most of the MP startup to the function
machdep.c:cpu_startup(). It should be possible to move the 2 functions
found there (mp_start() & mp_announce()) even further down the path once
we know exactly where that should be...

Help from: Peter Wemm <peter@spinner.dialix.com.au>


26129 25-May-1997 fsmp

Made the array vec[] a global.
This allows the APIC code to reorder the vectors at runtime.


26108 25-May-1997 fsmp

Broke up parse_mp_table() into 2 passes:
- The 1st (preparse_mp_table()) counts the number of cpus, busses, etc. and
records the LOCAL and IO APIC addresses.
- The 2nd pass (parse_mp_table()) does the actual parsing of info and recording
into the incore MP table.

This will allow us to defer the 2nd pass untill malloc() & private pages
are available (but thats for another day!).


26102 24-May-1997 fsmp

Delay mp_start() till after the msgbuf is mapped. We really want to delay
it till even later but tss setup prevents that right now...


26101 24-May-1997 fsmp

Now that panic() is properly printing messages for early SMP panics all
the 'printf("..."); panic("\n")' sections are returned to 'panic("...")'.


26037 23-May-1997 charnier

typo (Cyirx -> Cyrix).


26019 22-May-1997 fsmp

Convert all:
panic( "xxxxx\n" );

to:
printf( "xxxxx\n" );
panic( "\n" );

For some as yet undetermined reason the argument to panic() is often NOT
printed, and the system sometimes hangs before reaching the panic printout.
So we hopefully at least print some useful info before the hang, as oppossed to
leaving the user clueless as to what has happened.


25985 21-May-1997 jdp

This commit affects ELF kernels only.

Remove "setdefs.h" and arrange to generate it automatically at
ELF kernel build time.

"gensetdefs.c" is a utility which scans a set of ELF object files
and outputs a line ``DEFINE_SET(name, length);'' for each linker
set that it finds. When generating an ELF kernel, this is run just
before the final link to generate "setdefs.h".

Remove the init_sets() function from "setdef0.c", and its call from
"machdep.c". Since "gensetdefs.c" calculates the length of each
set, it is no longer necessary in an ELF kernel to count the set
elements at kernel initialization time. Also remove "set_of_sets"
which was used for this purpose.

Link "setdef0" and "setdef1" into the kernel only if building for
ELF. Since init_sets() is no longer used, there is no need to link
them into an a.out kernel.


25925 19-May-1997 kato

Recognize AMD 486 CPUs.


25837 15-May-1997 tegge

Ignore the supplied nfs_diskless structure from the bootstrap loader
if we want to use NFS v3 to mount root and swap.


25723 11-May-1997 tegge

Bring in some kernel bootp support. This removes the need for netboot
to fill in the nfs_diskless structure, at the cost of some kernel
bloat. The advantage is that this code works on a wider range of
network adapters than netboot. Several new kernel options are
documented in LINT.
Obtained from: parts of the code comes from NetBSD.


25711 11-May-1997 bde

Fixed initialization of ldt[]. Unused entries were garbage. A comment
was stale.

Fixed initialization of gdt[] for the BDE_DEBUGGER case. APM entries
clobbered debugger entries if the debugger was loaded (APM is incompatible
with BDE_DEBUGGER) and unused entries were garbage if the debugger wasn't
loaded.


25648 10-May-1997 bde

Cleaned up #includes. Lite2 cleaned up <sys/mount.h> so no kludges
are required for NFS now.

Ifdefed SMP #defines.


25559 07-May-1997 fsmp

fix bug in get_isa_apic_mask() where EISA bus was ignored.

Submitted by: Peter Wemm <peter@spinner.DIALix.COM>


25558 07-May-1997 peter

Don't allow access to illegal addresses in /dev/kmem to panic kernel
(eg: above 0xffc00000). Programs using /dev/kmem are implicitly racing
the kernel, and can get right up high in memory. I've been running
these for some time now, but with printfs. It's saved two panics at
least that I can remember.


25557 07-May-1997 peter

clean up forked child creation. This is simplified also by having
md_regs being struct trapframe *. Do a npxsave() if needed and copy the
pcb rather than use the increasingly defunct savectx(). Copy %edi and
%ebp explicitly.

Submitted by: bde

XXX npxproc could be declared in npx.h so the externs with smp fruit
are not needed.


25556 07-May-1997 peter

md_regs is struct trapframe * now, rather than int []
Remove TF_REGP() macro and use. The original reason (address space
problems due to having UPAGES in mapped into user space) is gone. It
looks cleaner without it.


25555 07-May-1997 peter

md_regs is now a struct trapframe *


25554 07-May-1997 peter

forgotten comment


25552 07-May-1997 peter

simplify IOPL gain/remove privs code. It's easier with md_regs
being a trapframe.


25550 07-May-1997 peter

remove now redundant (struct trapframe *) cast


25499 05-May-1997 fsmp

Code to handle SMP/APIC_IO mapping of ISA INTs to APIC pins above IRQ15.

- doesn't break my system.
- NOT yet verified on the affected motherboard.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25495 05-May-1997 kato

Use `MediaGX' instead of `Gx86'.


25494 05-May-1997 kato

Use `M2' instead of `6x86 with MMX'. Cyrix seems to use `M2' officially.


25485 05-May-1997 peter

correct the order of the variables
use #ifdef where possible instead of #if defined

Submitted by: the KNF police, ie: bde :-)


25472 05-May-1997 dyson

Make sure that *fork() always returns with %edx == 1 in the
child. This was sometimes not happening correctly during my
threads code work.


25460 04-May-1997 joerg

This mega-commit brings the following:

. It makes cd9660 root f/s working again.
. It makes CD9660 a new-style option.
. It adds support to mount an ISO9660 multi-session CD-ROM as the root
filesystem (the last session actually, but that's what is expected
behaviour).

Sigh. The CDIOREADTOCENTRYS did a copyout() of its own, and thus has
been unusable for me for this work. Too bad it didn't simply stuff
the max 100 entries into the struct ioc_read_toc_entry, but relied on
a user supplied data buffer instead. :-( I now had to reinvent the
wheel, and created a CDIOREADTOCENTRY ioctl command that can be used
in a kernel context.

While doing this, i noticed the following bogosities in existing CD-ROM
drivers:

wcd: This driver is likely to be totally bogus when someone tries
two succeeding CDIOREADTOCENTRYS (or now CDIOREADTOCENTRY)
commands with requesting MSF format, since it apparently
operates on an internal table.

scd: This driver apparently returns just a single TOC entry only for
the CDIOREADTOCENTRYS command.

I have only been able to test the CDIOREADTOCENTRY command with the
cd(4) driver. I hereby request the respective maintainers of the
other CD-ROM drivers to verify my code for their driver. When it
comes to merging this CD-ROM multisession stuff into RELENG_2_2 i will
only consider drivers where i've got a confirmation that it actually
works.


25457 04-May-1997 peter

Don't remove i586_ctr_freq from scope, leave it defined as zero. This
simplifies some assumptions and stops some code compile problems.

This should fix the compile hiccup in PR#3491, but smp kernel profiling
isn't likely to be fixed by this.


25419 03-May-1997 fsmp

new function to turn an APIC pin# into an INT mask.
added missing APIC_IO define.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25361 01-May-1997 fsmp

fixed spelling error.

Submitted by: Bruce Albrecht <bruce@zuhause.mn.org>


25292 29-Apr-1997 fsmp

Enabled 'FIX_MP_TABLE_WORKS' code.
This code re-numbers PCI busses in the MP table to match PCI semantics
when the MP BIOS fails to do it properly.

Reviewed by: Peter Wemm <peter@spinner.DIALix.COM>


25243 28-Apr-1997 fsmp

cleaned out an old FIXME.


25216 28-Apr-1997 fsmp

removed all the TEST_UPPERPRIO crud.


25215 28-Apr-1997 fsmp

remove all the SMP_INVLTLB defines, making the code default for APIC_IO.

Reviewed by: informal discussion with Peter Wemm <peter@spinner.DIALix.COM>


25204 27-Apr-1997 fsmp

informal discussion between Bruce Evans <bde@zeta.org.au>,
Peter Wemm <peter@spinner.DIALix.COM>, Steve Passe <smp@csn.net>

removed all the IPI_INTS code.
made the XFAST_IPI32 code default, renaming Xfastipi32 to Xinvltlb.


25194 27-Apr-1997 peter

Whoops.. We forgot to turn off the 4MB Virtual==Physical mapping at address
zero from bootstrap in the non-SMP case.

Noticed by: bde


25175 26-Apr-1997 peter

Remove the curproc printing on trap/interrupt/etc. It's outlived it's
usefulness, and there were problems with it anyway.

Found by: bde


25173 26-Apr-1997 peter

Back out bogus code that slipped past my read of the pre-merge diff
(Problems noted by Bruce)


25172 26-Apr-1997 peter

Fix some SMP merge bugs (from Bruce) -
#include out of order
pccard_configure() called twice
munged tab (existing problem made worse)


25164 26-Apr-1997 peter

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


25159 26-Apr-1997 kato

Add new cpu type, CPU_CY486DX, which shows Cyrix 486S/DX series CPUs,
and initialization routine for those CPUs.

Tested by: Bob Bishop <rb@gid.co.uk>


25083 22-Apr-1997 jdp

Make the necessary changes so that an ELF kernel can be built. I
have successfully built, booted, and run a number of different ELF
kernel configurations, including GENERIC. LINT also builds and
links cleanly, though I have not tried to boot it.

The impact on developers is virtually nil, except for two things.
All linker sets that might possibly be present in the kernel must be
listed in "sys/i386/i386/setdefs.h". And all C symbols that are
also referenced from assembly language code must be listed in
"sys/i386/include/asnames.h". It so happens that failure to do
these things will have no impact on the a.out kernel. But it will
break the build of the ELF kernel.

The ELF bootloader works, but it is not ready to commit quite yet.


25038 20-Apr-1997 phk

Fix up the "hlt vector" change I made.
Reviewed by: bde, bde, bde


25015 19-Apr-1997 kato

Don't disable CPU cache in init_486dlc. If BIOS supports Cyrix 486,
BIOS enables CPU cache and other registers. If BIOS does not supports
it, CPU cache is disabled at reset time.

This commit closes PR/3292.

PR: 3292


24980 16-Apr-1997 kato

Use reset port before clearing page table in cpu_reset if PC98 is
defined. Clearing page table could hang some new PC-98.


24933 14-Apr-1997 phk

Forget all about APM. Instead of "hlt" call through a vector which
APM can then fiddle with. Default for the vector is to "htl; ret"


24929 14-Apr-1997 bde

Use the same IOPL check as in syscons.
Reviewed by: pst, joerg


24925 14-Apr-1997 bde

Fixed printing of registers in dbflalt_handler(). The registers
were always in a tss; that tss just changed from the one in the
pcb to common_tss (who knows where it was when there was no curpcb?).
Not using the pcb also fixed the problem that there is no pcb in
idle(), so we now always get useful register values.


24900 13-Apr-1997 bde

Don't forget to set `runtime' in fork_trampoline(). The time slice before
switching to a child for the first time was being counted twice. I think
this only affected unimportant statistics.

Simplified arg handling in fork_trampoline(). splz() doesn't actually
smash the registers of interest.


24852 13-Apr-1997 dyson

Decrease the amount of memory allocated for bouncing. This will
allow large systems to boot successfully with bounce buffers compiled
in. We are now limiting bounce space to 512K. The 8MB allocated for
a 512MB system is very bogus -- and that is now fixed.


24851 13-Apr-1997 dyson

The pmap code was too generous in the allocation of kva space for
the pv entries. This problem has become obvious due to the increase
in the size of the pv entries. We need to create a more intelligent
policy for pv entry management eventually.
Submitted by: David Greenman <dg@freebsd.org>


24848 13-Apr-1997 dyson

Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.

Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.


24702 07-Apr-1997 peter

Lower the spl() of the new process from splhigh() right away, since
nothing else will lower it until either much later, or never(?) for
kernel processes.

This basically re-fixes what Bruce fixed in rev 1.29 of kern_fork.c,
which was broken again now the child does not execute back up the fork()
calling tree.


24693 07-Apr-1997 peter

Clean up some dead wood. Kill the page table page for mapping the
proc0/idlePTD/bootstrap stack into place in user space. We save 4K.
Remove p0upa, it is now unneeded.


24691 07-Apr-1997 peter

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


24690 07-Apr-1997 peter

No longer use an i386tss as the basis of our pcb - it wasn't particularly
convenient and makes life difficult for my next commit. We still need
an i386tss to point to for the tss slot in the gdt, so we use a common
tss shared between all processes.

Note that this is going to break debugging until this series of commits
is finished. core dumps will change again too. :-( we really need
a more modern core dump format that doesn't depend on the pcb/upages.

This change makes VM86 mode harder, but the following commits will remove
a lot of constraints for the VM86 system, including the possibility of
extending the pcb for an IO port map etc.

Obtained from: bde


24676 06-Apr-1997 mckay

Prevent wedging of the stat clock because of missed interrupts.
This should cure the "alternate system clock has died!" problem.

Discussed with: bde, joerg


24666 06-Apr-1997 dyson

Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)


24494 01-Apr-1997 bde

Removed a wrong comment of mine.

Removed unused #includes.


24437 31-Mar-1997 dg

Changed the way that the exec image header is read to be filesystem-
centric rather than VM-centric to fix a problem with errors not being
detectable when the header is read.
Killed exech_map as a result of these changes.
There appears to be no performance difference with this change.


24418 30-Mar-1997 joerg

Implement the `detach' command for remote GDB. It gets you back at DDB.


24361 29-Mar-1997 bde

Don't keep cpu interrupts enabled during the lookup in vm_page_zero_idle().
Lookup isn't done every time the system goes idle now, but it can still
take > 1800 instructions in the worst case, so if cpu interrupts are kept
disabled then it might lose 20 characters of sio input at 115200 bps.

Fixed style in vm_page_zero_idle().


24345 28-Mar-1997 bde

Added a setjmp() and a longjmp() so that an unexpected trap inside
ddb isn't necessarily fatal. You can now do silly things like
`call vprint' and `show map' without losing control.


24342 28-Mar-1997 joerg

Something long overdue: compile inb() and outb() into the kernel as
functions if DDB is available. The remaining occurences are usually
only inlined and thus not available in DDB.

I'm sure Bruce will have 23 additions to these 30 lines of code, but
at least it's a starting point. ;-)


24283 25-Mar-1997 mpp

Change sigreturn() to return EFAULT if it is passed an
address outside of the process's address space.
Now it matches its man page :-). Closes PR# 2682.

Discussed with: bde
Submitted by: Jonathan Lemon <jlemon@americantv.com>


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


24200 24-Mar-1997 kato

Fix typo.
Submitted by: Bruce Evans <bde@zeta.org.au>


24113 22-Mar-1997 kato

Oops, I forgot to `cvs add'. This file is a part of new CPU
identification and initialization routines.


24112 22-Mar-1997 kato

Improved CPU identification and initialization routines. This
supports All Cyrix CPUs, IBM Blue Lightning CPU and NexGen (now AMD)
Nx586 CPU, and initialize special registers of Cyrix CPU and msr of
IBM Blue Lightning CPU.

If revision of Cyrix 6x86 CPU < 2.7, CPU cache is enabled in
write-through mode. This can be disabled by kernel configuration
options.

Reviewed by: Bruce Evans <bde@freebsd.org> and
Jordan K. Hubbard <jkh@freebsd.org>


24099 22-Mar-1997 dyson

Decrease the latency/overhead in the prezero code when there is
an adequate number of prezeroed pages.


23409 05-Mar-1997 bde

Made FPU stuff conditional on npx as well as I586_CPU.


23393 05-Mar-1997 bde

Only print clock calibration messages if the system was booted with -v.

Submitted by: partly by gpalmer


23386 05-Mar-1997 gpalmer

Back out the patch to break up the clock probe lines. Instead, follow
Bruce's suggestion of deleting "relative to mc146818A clock ",
thus shortening the line ...


23375 04-Mar-1997 gpalmer

Split the rather long and line-wrapping clock probe messages on boot.
(2.2?)

Submitted by: Mathew Dood <winter@jurai.net>


23230 01-Mar-1997 ache

Add missing #include <machine/segments.h> for ISPL and SEL_UPL macros


23206 28-Feb-1997 bde

Print function args in the current radix instead of always in hex.

Print the stack pointer together with the frame pointer in the trap,
syscall and interrupt messages. The frame pointer is not very useful
for locating syscall args since syscall functions don't have a frame
pointer.

Print all the numbers in the trap, syscall and interrupt messages in
the default radix. The syscall number was confusing because it was
printed in decimal.

Use %#n format more and 0x%x less. 0x%x of course doesn't work with
a variable radix. ddb is now fairly consistent about using %+#n to
print all numbers. It omits the '+' for signed numbers the '#' in a
few cases (e.g., for function args) to save space.


23070 24-Feb-1997 alex

Typo police.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22564 11-Feb-1997 bde

Restored changes from rev.1.58-1.60 which were blown away by the
previous commit.


22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


22130 30-Jan-1997 dg

Removed PG_N from here, too. Some machines don't like it and it's unnecessary.


22129 30-Jan-1997 dg

Removed unnecessary PG_N flag from device memory mappings. This is handled
by the CPU/chipset already and was apparantly triggering a hardware bug that
causes strange parity errors.


22106 29-Jan-1997 bde

Estimate an initial overhead of 0 usec instead of 20 usec in DELAY().
I have code to calibrate the overhead fairly accurately, but there
is little point in using it since it is most accurate on machines
where an estimate of 0 works well. On slow machines, the accuracy
of DELAY() has a large variance since it is limited by the resolution
of getit() even if the initial delay is calibrated perfectly.

Use fixed point and long longs to speed up scaling in DELAY().
The old method slowed down a lot when the frequency became variable.
Assume the default frequency for short delays so that the fixed
point calculation can be exact.

Fast scaling is only important for small delays. Scaling is done
after looking at the counter and outside the loop, so it doesn't
decrease accuracy or resolution provided it completes before the
delay is up. The comment in the code is still confused about this.


22093 29-Jan-1997 bde

Disabled logging of masked exceptions on exit. Keep the side effect of
saving the state (see rev.1.17).


21979 24-Jan-1997 bde

Fixed some formatting bugs (mostly regressions in rev.1.48). Replaced
some magic numbers by pmap constants. Cosmetic.


21975 24-Jan-1997 bde

Initialize CR0_MP in setregs() in case npx0 is disabled or not configured.
Disabling npx0 works right now.

Don't reference `npxdriver' if npx0 is not configured. Not configuring
npx0 doesn't quite work yet.

Don't clear potential non-npx pcb flags in setregs().


21974 24-Jan-1997 obrien

KNF style police.

Reported by: Bruce
Thanks to: Bruce for also providing a diff.


21953 23-Jan-1997 dyson

Remove some dead code from trapwrite.
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


21944 22-Jan-1997 dyson

Fix I386 copyout support. The new page-table management code will
not lazy-fault page table pages. Update the copyout support to take
that into account. This should fix some segfault problems on such
machines.

After a short test period, we'll move this into 2.2.

Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


21857 19-Jan-1997 obrien

Add bits to identify AMD K5 and K6 cpu's.
Tested only on my AMD K5 PR-133. Bit values for K6 taken from AMD document
on how to test such things.

2.2 Candidate.


21783 16-Jan-1997 bde

Guard against the i8254 timer being uninitialzed if DELAY() is
called early for console i/o. The timer is usually in BIOS mode
if it isn't explicitly initialized. Then it counts twice as fast
and has a max count of 65535 instead of 11932. The larger count
tended to cause infinite loops for delays of > 20 us. Such delays
are rare. For syscons and kbdio, DELAY() is only called early
enough to matter for ddb input after booting with -d, and the delay
is too small to matter (and too small to be correct) except in the
PC98 case. For pcvt, DELAY() is not used for small delays (pcvt
uses its own broken routine instead of the standard broken one),
but some versions call DELAY() with a large arg when they unnecessarily
initialize the keyboard for doing console output. The problem is
more serious for pcvt because there is always some early console
output.

Guard against the i8254 timer being partially or incorrectly
initialized. This would have prevented the endless loop.

Should be in 2.2.


21737 15-Jan-1997 dg

Fix bug related to map entry allocations where a sleep might be attempted
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.

Reviewed by: wollman


21734 15-Jan-1997 bde

Fixed longstanding annoying warning about a type mismatch. pmap doesn't
really uses pt_entry_t internally, so don't use it here.

Fixed range checking for writing. The partial page (if any) following
etext wasn't writable.


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


21568 11-Jan-1997 dyson

When we changed pmap_protect to support adding the writeable
attribute to a page range, we forgot to set the PG_WRITEABLE
flag in the vm_page_t. This fixes that problem.


21540 11-Jan-1997 nate

Moved pccard_configure() to the end of the configure() list. This
avoids problems with the PCIC controller grabbing an interrupt that
another card needs.

Closes PR: kernel/2405

Reviewed by: bde


21529 11-Jan-1997 dyson

Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.) Upper level recoded to use
pmap_ts_referenced.


21279 04-Jan-1997 bde

Reenabled i586_optimized_copyin/out yet again.


21278 04-Jan-1997 bde

Fixed context switching of FPU state after a fault in
i586_optimized_copyin/out.


21277 04-Jan-1997 bde

Fixed botched tables:
- the operands for bt, bts, arpl and `enter' were reversed.
- btr was reported as bts (with the correct operand order).
- cmpxchg was misplaced. It was misplaced differently in the
comments. It is misplaced differently again in the i486 manual.
I put it where the i586 manual and gas say it is.
- fucompp was misplaced.
- the rr table for(s) some versions of fstp, fcom and fcomp was non-null.
This caused some invalid opcodes to be reported as "" instead of as
"<bad instruction>".
- the word and long versions of the fi* instructions were reversed.
- aaa and daa were reversed.

Fixed bugs involving unusual operand sizes:
- 32-bit registers weren't always forced for bswap or for moves to and
from special registers.
- the operand sizes weren't reported for [l]call or [l]jmp.
- displacements weren't truncated mod 2^16 when the operand size was
16-bit.
- too-large displacements and offsets were fetched, and too-large
offsets were reported, when the operand size was 16-bit.
- sign extended immediate bytes were extended too far when the operand
size was 16-bit.

Fixed bugs involving usual operand sizes:
- 8-bit source registers weren't forced for mov[sz]b[wl].
- 16-bit source registers weren't forced for mov[sz]w[wl].
- immediate bytes were sometimes reported as sign extended even for
byte operations. Same for immediate words in word operations.
- the immediate byte was not reported as sign extended for `push'.

Finished Pentium support:
- cpuid, cmpxchg8b and rsm were missing.

Finished i287 support:
- fneni, fndisi and fsetpm were missing. These are harmless nops on
later FPUs.

Improvements:
- report invalid opcodes 0xd6 and 0xf1 using .byte. They are special
in not causing invalid operand exceptions when executed.
- report the immediate byte for unusual aam and aad instuctions.
Immediate bytes other than 0x0a always worked and are documented to
work on Pentiums.


21181 02-Jan-1997 se

Add code to copy the LDT, if required.

This code was sent to me by Bruce Evans, and seems to fix some
possible kernel panic in case of an execution error. It did not
cause any problems on my system, but I did never observe the
problem this patch is supposed to fix, anyway.

This patch is a NOP, unless the kernel is built with "options
USER_LDT", and doesn't affect the GENERIC kernel for this reason.

I want to have it in 2.2: it fixes a bug ...

Submitted by: bde


20998 29-Dec-1996 dyson

Superficial clean-up of useracc calls. (The useracc usage of
B_READ/B_WRITE is bogus anyway.) Might as well make the call prettier
anyway.


20997 29-Dec-1996 dyson

Allow pmap_protect to increase permissions. This mod can eliminate
the need for unnecessary vm_faults.
Submitted by: Alan Cox <alc@cs.rice.edu>


20969 28-Dec-1996 bde

Disabled i586-optimized copyin and copyout again. The fault handler
is still broken - it doesn't restore the floating point state.

2.2-BETA users should disable it using npx0 flags 0x04 the same as
2.2-ALPHA users should have.


20748 21-Dec-1996 phk

Give MFS_ROOT priority over NFS as root filesystem.

2.2 candidate.


20651 18-Dec-1996 bde

Only handle copyin/out/etc faults when not in an interrupt handler.
This makes unexpected faults (in an interrupt handler) more likely
to crash properly. It could be done even better (more robustly and
more efficiently) using lazy fault handling.


20650 18-Dec-1996 bde

Fixed formatting of KERN_DUMPDEV.

Should be in 2.2.


20641 18-Dec-1996 bde

Moved the printing of the BIOS geometries from cpu_startup() into
configure() where it always belonged. It was originally slightly
misplaced after configure(). Rev.138 left it completely misplaced
before the DEVFS, DRIVERS and CONFIGURE sysinits by not moving it
together with configure().

Restored the printing of bootinfo.bi_n_bios_used now that it can
be nonzero.


20578 17-Dec-1996 dg

Fix nbuf calculation /4 -> /8. 2.2 already has it this way.

Reviewed by: dyson


20471 14-Dec-1996 jkh

Make the USERCONFIG_BOOT semantics closer to what was original
intended.


20368 12-Dec-1996 swallace

Soften range-check for LDTs.

Reviewed by: bde


20348 12-Dec-1996 dg

Fix allocation for exech_map to be 16*PAGE_SIZE rather than 32*PAGE_SIZE
so that it is scaled the same as exec_map (16 concurrent exec'ers).


20313 11-Dec-1996 dyson

One minor mod to set the limit of nbufs to 2048 from 1536. More important
fix to exech_map, it used 32*ARG_MAX, and it should use 32*PAGE_SIZE.


20146 05-Dec-1996 dyson

Clean-up of the new buffer kva allocation code. Also, there was an
error in the !BOUNCE_BUFFERS case.


20070 01-Dec-1996 bde

Removed all references to b_cylinder (aka b_cylin). It was evil and
hasn't been used for a year or two since disksort() started sorting
on b_pblkno.


20068 01-Dec-1996 dyson

Fix a problem with the new buffer_map management code. Additionally,
decrease the size of buffer_map to approx 2/3 of what it used to be
(buffer_map can be smaller now.) The original commit of these changes
increased the size of buffer_map to the point where the system would
not boot on large systems -- now large systems with large caches will
have even less problems than before.


20044 30-Nov-1996 bde

Reenabled i586-optimized copyin/out.

Should be in 2.2. Don't put it there for a while.


20018 29-Nov-1996 bde

Fixed EFAULT handling in i586_copyin() and i586_copyout(). Use a
consistent stack frame in fastmove() so that only one new fault handler
is necessary.

Should be in 2.2. Harmless until the i586 versions are reenabled.


20017 29-Nov-1996 bde

Don't print bootinfo.bi_n_bios_used in cpu_startup() since it is always
zero because no drivers have had a chance to change it.


20016 29-Nov-1996 bde

Don't clobber the SIGCONT bit in the signal mask in sigreturn(). Use
the `sigcantmask' macro to get the correct set of unmaskable signals.

Found by: NIST-PCTS.


19957 25-Nov-1996 phk

Make a kernel with DDB but without sio possible again. This is only
a stopgap measure, a more complete solution is on somebodys whiteboard
(and we all know that THAT means :-).

Reviewed by: pst


19828 17-Nov-1996 dyson

Improve the caching of small files like directories, while not
substantially increasing buffer space. Specifically, we double
the number of buffers, but allocate only half the amount of memory
per buffer. Note that VDIR files aren't cached unless instantiated
in a buffer. This will significantly improve caching.


19798 15-Nov-1996 bde

Disabled i586-optimized copyin and copyout. They usually panic if the
user supplies a bad address, because they push a lot of stuff that the
fault handler doesn't know about onto the stack. This has been broken
for more than half a year despite being tested for almost half a year
in -current.


19772 15-Nov-1996 jkh

Change this back to movl for -current since it seems to work there.
Bruce says that movl is broken in -stable, which would certainly explain
why this didn't work there.


19750 14-Nov-1996 jkh

movl instruction should have been lea (this is why userconfig didn't
work in 2.1).

Spotted-by-the-keen-eyes-of: Don Lewis <Don.Lewis@tsc.tdk.com>


19678 12-Nov-1996 bde

Removed another #include of opt_temporary.h.

YA2.2C.


19674 12-Nov-1996 bde

Removed #include of "opt_temporary.h". All the temporary options went
away, so this header is no longer generated.

This change should be in 2.2. The old version shouldn;t have been in
2.2 (blush).


19653 11-Nov-1996 bde

Replaced I586_OPTIMIZED_BCOPY and I586_OPTIMIZED_BZERO with boot-time
negative-logic flags (flags 0x01 and 0x02 for npx0, defaulting to unset = on).
This changes the default from off to on. The options have been in current
for several months with no problems reported.

Added a boot-time negative-logic flag for the old I5886_FAST_BCOPY option
which went away too soon (flag 0x04 for npx0, defaulting to unset = on).

Added a boot-time way to set the memory size (iosiz in config, iosize in
userconfig for npx0).

LINT:
Removed old options. Documented npx0's flags and iosiz.

options.i386:
Removed old options.

identcpu.c:
Don't set the function pointers here. Setting them has to be delayed
until after userconfig has had a chance to disable them and until after
a good npx0 has been detected.

machdep.c:
Use npx0's iosize instead of MAXMEM if it is nonzero.

support.s:
Added vectors and glue code for copyin() and copyout().
Fixed ifdefs for i586_bzero().
Added ifdefs for i586_bcopy().

npx.c:
Set the function pointers here.
Clear hw_float when an npx exists but is too broken to use.
Restored style from a year or three ago in npxattach().


19621 11-Nov-1996 dyson

Support the PG_G flag on Pentium-Pro processors. This pretty
much eliminates the unnecessary unmapping of the kernel during
context switches and during invtlb...


19523 08-Nov-1996 asami

Remove option I586_FAST_BCOPY. The code will be included by default
if I586_CPU is defined. Note there is a runtime check so the code
won't be run for non-Pentium CPUs anyway.

2.2 candidate, this code has been tested for almost half year in -current.


19503 07-Nov-1996 joerg

Fix the message buffer mapping. This actually allows to increase
the message buffer size in <sys/msgbuf.h>.

Reviewed by: davidg,joerg
Submitted by: bde


19346 03-Nov-1996 dyson

Fix a problem with running down processes that have left wired
mappings with mlock. This problem only occurred because of the
quick unmap code not respecting the wired-ness of pages in the
process. In the future, we need to eliminate the dependency
intrinsic to the design of the code that wired pages actually
be mapped. It is kind-of bogus not to have wired pages mapped,
but it is also a weakness for the code to fall flat because
of a missing page.

This show fix a problem that Tor Egge has been having, and also
should be included into 2.2-RELEASE.


19274 31-Oct-1996 julian

Further improved version of hadling a HALT when there is no console.


19269 30-Oct-1996 asami

More merge and update.

(1) deleted #if 0

pc98/pc98/mse.c

(2) hold per-unit I/O ports in ed_softc

pc98/pc98/if_ed.c
pc98/pc98/if_ed98.h

(3) merge more files by segregating changes into headers.

new file (moved from pc98/pc98):

i386/isa/aic_98.h

deleted:

well, it's already in the commit message so I won't repeat the
long list here ;)

Submitted by: The FreeBSD(98) Development Team


19186 26-Oct-1996 bde

Removed initialization of a variable that went away. Oops.


19173 25-Oct-1996 bde

Print the clock calibration messages all on one (long) line again so
that they are easy to grep for.

Removed now-unused i586 counter variables.

Fixed some style bugs.


19119 23-Oct-1996 dyson

Account for the UPAGES in the same way as before moving the MD code
from vm_glue into pmap.c. Now RSS should appear to be the same as before.


19064 20-Oct-1996 phk

Removing old isdn stuff.


19000 17-Oct-1996 bde

Improved non-statistical (GUPROF) profiling:
- use a more accurate and more efficient method of compensating for
overheads. The old method counted too much time against leaf
functions.
- normally use the Pentium timestamp counter if available.
On Pentiums, the times are now accurate to within a couple of cpu
clock cycles per function call in the (unlikely) event that there
are no cache misses in or caused by the profiling code.
- optionally use an arbitrary Pentium event counter if available.
- optionally regress to using the i8254 counter.
- scaled the i8254 counter by a factor of 128. Now the i8254 counters
overflow slightly faster than the TSC counters for a 150MHz Pentium :-)
(after about 16 seconds). This is to avoid fractional overheads.

files.i386:
permon.c temporarily has to be classified as a profiling-routine
because a couple of functions in it may be called from profiling code.

options.i386:
- I586_CTR_GUPROF is currently unused (oops).
- I586_PMC_GUPROF should be something like 0x70000 to enable (but not
use unless prof_machdep.c is changed) support for Pentium event
counters. 7 is a control mode and the counter number 0 is somewhere
in the 0000 bits (see perfmon.h for the encoding).

profile.h:
- added declarations.
- cleaned up separation of user mode declarations.

prof_machdep.c:
Mostly clock-select changes. The default clock can be changed by
editing kmem. There should be a sysctl for this.

subr_prof.c:
- added copyright.
- calibrate overheads for the new method.
- documented new method.
- fixed races and and machine dependencies in start/stop code.

mcount.c:
Use the new overhead compensation method.

gmon.h:
- changed GPROF4 counter type from unsigned to int. Oops, this should
be machine-dependent and/or int32_t.
- reorganized overhead counters.

Submitted by: Pentium event counter changes mostly by wollman


18963 16-Oct-1996 bde

Fixed miscounting for non-statistical (GUPROF) profiling:
- use CROSSJUMP() and CROSSJUMP_LABEL() for conditional jumps from idle()
into cpu_switch() and vice versa.
- moved badsw code to after cpu_switch().

Cosmetic changes:
- moved sw0 string to be immediately after its caller (badsw).
- removed unused #include.


18937 15-Oct-1996 dyson

Move much of the machine dependent code from vm_glue.c into
pmap.c. Along with the improved organization, small proc fork
performance is now about 5%-10% faster.


18904 13-Oct-1996 dyson

Minor optimization for final rundown of a pmap.


18897 12-Oct-1996 dyson

Performance optimizations. One of which was meant to go in before the
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.


18896 12-Oct-1996 bde

Cleaned up:
- fixed a sloppy common-style declaration.
- removed an unused macro.
- moved once-used macros to the one file where they are used.
- removed unused forward struct declarations.
- removed __pure.
- declared inline functions as inline in their prototype as well
as in theire definition (gcc unfortunately allows the prototype
to be inconsistent).
- staticized.


18892 12-Oct-1996 bde

Removed nested include if <sys/socket.h> from <net/if.h> and
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.


18842 09-Oct-1996 bde

Put I*86_CPU defines in opt_cpu.h.


18837 09-Oct-1996 bde

Enable the i586-optimized bcopy if the cpu is a "586" and option
I586_OPTIMIZED_BCOPY is configured.

Similarly for bzero/I586_OPTIMIZED_BZERO.

Fake 586's had better have a hardware FPU with non-broken exception
handling (we mask exceptions, but broken exception handling may trap
on the instructions that do the masking). I guess this means that
the routines won't work on most 386's or FPUless 486's even when they
have a h/w FPU.


18835 09-Oct-1996 bde

Added i586-optimized bcopy() and bzero().

These are based on using the FPU to do 64-bit stores. They also
use i586-optimized instruction ordering, i586-optimized cache
management and a couple of other tricks. They should work on any
i*86 with a h/w FPU, but are slower on at least i386's and i486's.
They come close to saturating the memory bus on i586's. bzero()
can maintain a 3-3-3-3 burst cycle to 66 MHz non-EDO main memory
on a P133 (but is too slow to keep up with a 2-2-2-2 burst cycle
for EDO - someone with EDO should fix this). bcopy() is several
cycles short of keeping up with a 3-3-3-3 cycle for writing. For
a P133 writing to 66 MHz main memory, it just manages an N-3-3-3,
3-3-3-3 pair of burst cycles, where N is typically 6.

The new routines are not used by default. They are always configured
and can be enabled at runtime using a debugger or an lkm to change
their function pointer, or at compile time using new options (see
another log message).

Removed old, dead i586_bzero() and i686_bzero(). Read-before-write is
usually bad for i586's. It doubles the memory traffic unless the data
is already cached, and data is (or should be) very rarely cached for
large bzero()s (the system should prefer uncached pages for cleaning),
and the amount of data handled by small bzero()s is relatively small
in the kernel.

Improved comments about overlapping copies.

Removed unused #include.


18702 05-Oct-1996 jkh

Multiple changes stacked as one commit since they all depend on one another.

First, change sysinstall and the Makefile rules to not build the kernel
nlist directly into sysinstall now. Instead, spit it out as an ascii
file in /stand and parse it from sysinstall later. This solves the chicken-n-
egg problem of building sysinstall into the fsimage before BOOTMFS is built
and can have its symbols extracted. Now we generate the symbol file in
release.8.

Second, add Poul-Henning's USERCONFIG_BOOT changes. These have two
effects:

1. Userconfig is always entered, rather than only after a -c
(don't scream yet, it's not as bad as it sounds).

2. Userconfig reads a message string which can optionally be
written just past the boot blocks. This string "preloads"
the userconfig input buffer and is parsed as user input.
If the first command is not "USERCONFIG", userconfig will
treat this as an implied "quit" (which is why you don't need
to scream - you never even know you went through userconfig
and back out again if you don't specifically ask for it),
otherwise it will read and execute the following commands
until a "quit" is seen or the end is reached, in which case
the normal userconfig command prompt will then be presented.

How to create your own startup sequences, using any boot.flp image
from the next snap forward (not yet, but soon):

% dd of=/dev/rfd0 seek=1 bs=512 count=1 conv=sync <<WAKKA_WAKKA_DOO
USERCONFIG
irq ed0 10
iomem ed0 0xcc000
disable ed1
quit
WAKKA_WAKKA_DOO


Third, add an intro screen to UserConfig so that users aren't just thrown
into this strange screen if userconfig is auto-launched. The default
boot.flp startup sequence is now, in fact, this:

USERCONFIG
intro
visual

(Since visual never returns, we don't need a following "quit").

Submitted-By: phk & jkh


18548 28-Sep-1996 dyson

Essentially rename pmap_update to be invltlb. It is a very machine
dependent operation, and not really a correct name. invltlb and invlpg
are more descriptive, and in the case of invlpg, a real opcode.

Additionally, fix the tlb management code for 386 machines.


18538 28-Sep-1996 bde

Restored my change in rev.1.119 which was clobbered by the previous commit.


18528 28-Sep-1996 dyson

Move pmap_update_1pg to cpufunc.h. Additionally,
use the invlpg opcode instead of the nasty looking .byte directives.
There are some other minor micro-level code improvements to pmap.c


18514 27-Sep-1996 peter

part 2 of the bsdi compat tweak attempt. I believe that BSDI use both
lcall 7,0 (ie: ldt slot 0) and lcall 0x87,0 (ldt slot 16, it's shifted
three bits to the left). I was fiddling with this so long ago, I don't
recall the specifics.


18511 27-Sep-1996 peter

I've been meaning to commit this for months. Implement select()
for /dev/random and /dev/urandom. Both are always writable, urandom is
always readable, and /dev/random is readable when >= 8 bits are in the
pool.


18428 20-Sep-1996 bde

Changed an arg name in the pseudo-prototype for bzero() to match
the prototype.

Put the jump table for i486_bzero() in the data section. This
speeds up i486_bzero() a little on Pentiums without significantly
affecting its speed on 486's.

Don't waste time falling through 14 nop's to return from do1 in
i486_bzero().

Use fastmove() for counts >= 1024 (was > 1024). Cosmetic.

Fixed profiling of fastmove().

Restored meaningful labels from the pre-1.1 version in fastmove().
Local labels are evil.

Fixed (high resolution non-) profiling of __bb_init_func().


18380 19-Sep-1996 phk

Add APM_IDLE_CPU option, that is off by default.
I maintain that it saves more power to simply "hlt" the CPU than to
spend tons of time trying to tell the APM bios to do the same.
In particular if you do it 100 times a second...


18297 14-Sep-1996 bde

Attached simple external ddb commands `show rtc', `show pgrpdump'
and `show cbstat'. The pgrpdump code was previously controlled by
`#ifdef DEBUG'.


18288 14-Sep-1996 bde

Changed cncheckc() interface so that it is 8-bit clean - return -1
instead of 0 if there is no input.

syscons.c:
Added missing spl locking in sccncheckc(). Return the same value as
sccngetc() would. It is wrong for sccngetc() to return non-ASCII, but
stripping the non-ASCII bits doesn't help.


18275 13-Sep-1996 bde

Made debugging code (pmap_pvdump()) compile again so that I can test LINT.
I don't know if it actually works.


18260 12-Sep-1996 dyson

Primarily a fix so that pages are properly tracked for being
modified. Pages that are removed by the pageout daemon were
the worst affected. Additionally, numerous minor cleanups,
including better handling of busy page table pages. This
commit fixes the worst of the pmap problems recently introduced.


18252 11-Sep-1996 phk

Make userconfig two (default: on) options:
USERCONFIG to enable
VISUAL_USERCONFIG to get the gui stuff too.
Requested by: pst


18239 11-Sep-1996 dyson

A minor fix to the new pmap code. This might not fix the global problems
with the last major pmap commits.


18232 10-Sep-1996 bde

Removed bogus LARGMEM code and option. The code paniced when
biosextmem > 65536, but biosextmem is a 16-bit quantity so it is
guaranteed to be < 65536. Related cruft for biosbasemem was
mostly cleaned up in rev.1.26.


18207 10-Sep-1996 bde

Updated #includes to 4.4Lite style.


18169 08-Sep-1996 dyson

Addition of page coloring support. Various levels of coloring are afforded.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)


18163 08-Sep-1996 dyson

Improve the scalability of certain pmap operations.


18084 06-Sep-1996 phk

Remove devconf, it never grew up to be of any use.


18023 03-Sep-1996 nate

Cleaned up version of my 'extended BIOS' patch. This one is commented
better and much simpler to understand, and works just as well (better)
as a bonus.

Submitted by: bde


17986 01-Sep-1996 dg

Change an splclock that needs to be an splhigh into an splhigh.

Reviewed by: bde


17983 01-Sep-1996 nate

If the basemem value supplied by the bootblocks, differs from the value
returned by the RTC, use the bootblock supplied value. Also, map the
'stolen by BIOS' memory in the same manner as the ISA-hole memory, since
it is really an extenstion of the BIOS. This is necessary for 32-bit
BIOS functions such as APM support on laptops, and the loss of memory
for non-necessary functions seems to be at most 4k.

Reviewed by: phk
Obtained from: email conversation with jtk@atria.com


17950 30-Aug-1996 pst

Improvements from Bruce Evans


17865 28-Aug-1996 pst

Clean up formatting and fix an & -> && bug pointed out by bde


17847 27-Aug-1996 pst

Support for GDB remote debug protocol.

Sponsored by: Juniper Networks, Inc. <pst@jnx.com>


17677 19-Aug-1996 julian

Collect all the functioons concerned with rebooting into one place
also add the at_shutdown callout list, and change the one user of
the present (broken) method (the vn driver) to use the new scheme.


17559 12-Aug-1996 wollman

Back out mistaken local change that sneaked in on the last commit.


17558 12-Aug-1996 wollman

Don't declare the user_ldt functions unless USER_LDT is defined.
Eliminates an obnoxious warning.


17521 11-Aug-1996 dg

Add support for i686 machine check trap.


17490 10-Aug-1996 peter

Add recognition for the AMD 5x86 CPU models.

Submitted by: A JOSEPH KOSHY <koshy@india.hp.com>


17488 10-Aug-1996 peter

Trivial cosmetic tweak to make the i[56]86 CPU MHz reprting round to the
nearest .01 Mhz rather than simply truncating it downwards.

This hack makes this 89.999928 Mhz clock correctly round to the closer
90.00-MHz rather than 89.99-MHz:
> i586 clock: 89999928 Hz, i8254 clock: 1193152 Hz
> CPU: Pentium (90.00-MHz 586-class CPU)


17395 02-Aug-1996 bde

Eliminated i586_ctr_rate. Use i586_ctr_freq instead.


17371 31-Jul-1996 bde

Eliminated pcb_inl. It was always 0 because context switches don't occur
in interrupt handlers.


17366 31-Jul-1996 dg

Converted timer/run queues to 4.4BSD queue style. Removed old and unused
sleep(). Implemented wakeup_one() which may be used in the future to combat
the "thundering herd" problem for some special cases.

Reviewed by: dyson


17355 30-Jul-1996 bde

Fixed longstanding bug of not checking `dumpdev' or setting `dumplo'
early enough when the dump device is specified in the config file.

Removed stale comment about configuration root and swap devices.

Don't bother clearing dumplo when dumpdev is set to NODEV. Everything
is controlled by dumpdev.

Fixed the kern.dumpdev sysctl. Writes were handle bogusly.


17353 30-Jul-1996 bde

Fixed the machdep.i8254_freq and machdep.i586_freq sysctls. Writes were
handled bogusly.

Centralized the setting of all the frequency variables. Set these
variables atomically. Some new ones aren't used yet.


17334 30-Jul-1996 dyson

Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.


17329 29-Jul-1996 dyson

Fix a problem with a DEBUG section of code.


17325 29-Jul-1996 dyson

Fix an error in statement order in pmap_remove_pages, remove the pmap
pte hint (for now), and general code cleanup.


17321 28-Jul-1996 dyson

Fix a problem that pmap update was not being done for kernel_pmap. Also
remove some (currently) gratuitious tests for PG_V... This bug could
have caused various anomolous (temporary) behavior.


17294 27-Jul-1996 dyson

This commit is meant to solve a couple of VM system problems or
performance issues.

1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.


17236 21-Jul-1996 joerg

Post-commit review by Bruce. Mostly stylistic changes.

Submitted by: bde


17231 20-Jul-1996 joerg

Major cleanup of the timerX_{acquire,release} stuff. In particular,
make it more intelligible, improve the partially bogus locking, and
allow for a ``quick re-acquiration'' from a pending release of timer 0
that happened ``recently'', so it was not processed yet by clkintr().
This latter modification now finally allows to play XBoing over
pcaudio without losing sounds or getting complaints. ;-) (XBoing
opens/writes/closes the sound device all over the day.)

Correct locking for sysbeep().

Extensively (:-) reviewed by: bde


17194 17-Jul-1996 bde

Fixed adjustment of `time' when timer0 is released. 27465 was 27645 in
a comment and in code that was only used when pcaudio was closed. The
maximum error was 66 usec.


17178 15-Jul-1996 nate

Moved declaration of zbuf outside of #ifdef DEVFS code.


17174 15-Jul-1996 bde

Quick fix for previous commit: don't free zbuf on close since it may be
in use in another process that blocked in uiomove().


17166 14-Jul-1996 dyson

Almost gratuitious improvement of the performance of reading
/dev/zero.


17121 12-Jul-1996 bde

Removed "optimization" using gcc's builtin memcpy instead of bcopy.
There is little difference now since the amount copied is large,
and bcopy will become much faster on some machines.


17120 12-Jul-1996 bde

Renamed upa to p0upa to match p0upt.

Cleaned up some comments.


17118 12-Jul-1996 bde

Export `dumpmag' to utilities but not to the kernel.

Restored a truncated comment.


17117 12-Jul-1996 bde

Fixed cloned comments about npx traps to match context.


17109 12-Jul-1996 bde

Fixed operand order for shld and shrd.

Finished the constant poisoning that was begun in rev.1.14. Consts
aren't very poisonous (or useful) unless -Wcast-qual is in CFLAGS,
and it isn't in the default CFLAGS.


17108 12-Jul-1996 bde

Don't use NULL in non-pointer contexts.


17014 08-Jul-1996 wollman

Fix something that's been bugging me for a long time: move the CPU
type identification code out of machdep.c and into a new file of its
own. Hopefully other grot can be moved out of machdep.c as well
(by other people) into more descriptively-named files.


16874 01-Jul-1996 bde

Use the standard timer (interrupt) frequency while calibrating the clocks.
Testing with the high frequency of 20000 Hz (to find problems) only found
the problem that this frequency is too high for slow i386's.

Disable interrupts while setting the timer frequency. This was unnecessary
before rev.1.57 and forgotten in rev.1.57. The critical (i8254) interrupts
are disabled in another way at boot time but not in the sysctl to change
the frequency.


16747 26-Jun-1996 dyson

When page table pages were removed from process address space, the
resident page stats were not being decremented. This mode corrects
that problem.


16733 25-Jun-1996 bde

Added #include of <machine/md_var.h>. This will be needed when
some declarations are moved from <machine/cpufunc.h> to better
places.


16725 25-Jun-1996 bde

trap.c:
Fixed profiling of system times. It was pre-4.4Lite and didn't support
statclocks. System times were too small by a factor of 8.

Handle deferred profiling ticks the 4.4Lite way: use addupc_task() instead
of addupc(). Call addupc_task() directly instead of using the ADDUPC()
macro.

Removed vestigial support for PROFTIMER.

switch.s:
Removed addupc().

resourcevar.h:
Removed ADDUPC() and declarations of addupc().

cpu.h:
Updated a comment. i386's never were tahoe's, and the deferred profiling
tick became (possibly) multiple ticks in 4.4Lite.

Obtained from: mostly from NetBSD


16723 25-Jun-1996 bde

Save John Polstra's initial fix for profiling for reference. The
multiplication in addupc() overflowed for addresses >= 256K, assuming
the usual profil(2) scale parameter of 0x8000. addupc() will go away
soon.

Submitted by: John Polstra <jdp@polstra.com>


16680 25-Jun-1996 dyson

Limit the scan for preloading pte's to the end of an object.


16532 20-Jun-1996 dg

Properly account for non-page aligned buffers.


16530 20-Jun-1996 dg

Minor KNF formatting change to vmapbuf() and vunmapbuf().


16499 19-Jun-1996 dyson

Clean up vmapbuf and vunmapbuf significantly. The previous code was
very rough.


16471 18-Jun-1996 bde

Removed unused #includes of <i386/isa/icu.h> and <i386/isa/icu.h>. icu.h
is only used by the icu support modules and by a few drivers that know
too much about the icu (most only use it to convert `n' to `IRQn'). isa.h
is only used by ioconf.c and by a few drivers that know too much about
isa addresses (a few have to, because config is deficient).


16428 17-Jun-1996 bde

In getit(), use read_eflags()/write_eflags() to preserve the interrupt
enable flag instead of enable_intr() to restore it to its usual state.
getit() is only called from DELAY() so there is no point in optimising
its speed (this wasn't so clear when it was extern), and using
enable_intr() made it inconvenient to call DELAY() from probes that need
to run with interrupts disabled.


16415 17-Jun-1996 dyson

Several bugfixes/improvements:
1) Make it much less likely to miss a wakeup in vm_page_free_wakeup
2) Create a new entry point into pmap: pmap_ts_referenced, eliminates
the need to scan the pv lists twice in many cases. Perhaps there
is alot more to do here to work on minimizing pv list manipulation
3) Minor improvements to vm_pageout including the use of pmap_ts_ref.
4) Major changes and code improvement to pmap. This code has had
several serious bugs in page table page manipulation. In order
to simplify the problem, and hopefully solve it for once and all,
page table pages are no longer "managed" with the pv list stuff.
Page table pages are only (mapped and held/wired) or
(free and unused) now. Page table pages are never inactive,
active or cached. These changes have probably fixed the
hold count problems, but if they haven't, then the code is
simpler anyway for future bugfixing.
5) The pmap code has been sorely in need of re-organization, and I
have taken a first (of probably many) steps. Please tell me
if you have any ideas.


16344 13-Jun-1996 asami

A fast memory copy for Pentiums using floating point registers.
It is called from copyin and copyout.

The new routine is conditioned on I586_CPU and I586_FAST_BCOPY, so you
need

options "I586_FAST_BCOPY"

(quotes essenstial) in your kernel config file.

Also, if you have other kernel types configured in your kernel, an
additional check to make sure it is running on a Pentium is inserted.
(It is not clear why it doesn't help on P6s, it may be just that the
Orion chipset doesn't prefetch as efficiently as Tritons and friends.)

Bruce can now hack this away. :)


16324 12-Jun-1996 dyson

Fix a very significant cnt.v_wire_count leak in vm_page.c, and some
minor leaks in pmap.c. Bruce Evans made me aware of this problem.


16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


16300 11-Jun-1996 pst

Move warning messages under bootverbose


16299 11-Jun-1996 pst

Put clock calibration #defines in opt_clock.h to ease reconfiguration


16215 08-Jun-1996 bde

Stop using the alias `pcb_ptd' for `pcb_tcc.tss_cr3'. Use the (existing)
alias `pcb_cr3' instead. That is still one alias too many, but is convenient
for me since I've replaced the tss in the pcb by a few scalar variables in
the pcb.


16213 08-Jun-1996 bde

Removed bogus `altfmt' code. No alternative formats are supported, but
altfmt was abused to sometimes screw up the disassembly of the bytes
following unconditional jump instructions. Gas doesn't pad to a longword
boundary like the comment said - that is the programmer's responsibility.


16211 08-Jun-1996 bde

Removed recently introduced unnecessary #includes of <machine/cpu.h>
(bootverbose isn't there in -current) and nearby unnecessary #includes.


16197 08-Jun-1996 dyson

Adjust the threshold for blocking on movement of pages from the cache
queue in vm_fault.

Move the PG_BUSY in vm_fault to the correct place.

Remove redundant/unnecessary code in pmap.c.

Properly block on rundown of page table pages, if they are busy.

I think that the VM system is in pretty good shape now, and the following
individuals (among others, in no particular order) have helped with this
recent bunch of bugs, thanks! If I left anyone out, I apologize!

Stephen McKay, Stephen Hocking, Eric J. Chet, Dan O'Brien, James Raynard,
Marc Fournier.


16166 07-Jun-1996 dyson

Fix a bug in the pmap_object_init_pt routine that pages aren't taken
from the cache queue before being mapped into the process.


16136 05-Jun-1996 dyson

I missed a case of the page table page dirty-bit fix.


16122 05-Jun-1996 dyson

Keep page-table pages from ever being sensed as dirty. This should fix
some problems with the page-table page management code, since it can't
deal with the notion of page-table pages being paged out or in transit.
Also, clean up some stylistic issues per some suggestions from
Stephen McKay.


16079 02-Jun-1996 dyson

Don't carry the modified or referenced bits through to the child
process during pmap_copy. This minimizes unnecessary swapping or creation of
swap space. If there is a hold_count flaw for page-table
pages, clear the page before freeing it to lessen the chance of a system
crash -- this is a robustness thing only, NOT a fix.


16075 02-Jun-1996 joerg

Be slightly more verbose during configure() in the bootverbose case.
This breaks the long silence after the ``npx0'' message and allows to
track some of the problems regarding the root f/s decisions.


16057 01-Jun-1996 dyson

Fix the problem with pmap_copy that breaks X in small memory machines. Also
close some windows that are opened up by page table allocations. The
prefaulting code no longer uses hold counts, but now uses the busy
flag for synchronization.


16029 31-May-1996 peter

Jump some hoops to have the *.s code being able to be run through both an
ansi and traditional cpp.

The nesting rules of macros are different, which required some changes.
Use __CONCAT(x,y) instead of /**/.
Redo some comments to use /* */ rather than "# comment" because the ansi
cpp cares about those, and also cares about quote matching.


16026 31-May-1996 dyson

This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also. There is still
a hang problem using X in small memory machines.


15977 29-May-1996 dyson

The wrong address (pindex) was being used for the page table directory. No
negative side effects right now, but just a clean-up.


15926 27-May-1996 phk

Cleanup the last of the assembly time "-KERNBASE" relocations.


15868 22-May-1996 peter

Fix harmless warning.. pmap_nw_modified was not having it's arg
cast to pt_entry_t like the others inside the DIAGNOSTIC code.


15858 22-May-1996 dyson

A serious error in pmap.c(pmap_remove) is corrected by this. When
comparing the PTD pointers, they needed to be masked by PG_FRAME, and
they weren't. Also, the "improved" non-386 code wasn't really an
improvement, so I simplified and fixed the code. This might have
caused some of the panics caused by the VM megacommit.


15832 21-May-1996 dyson

To quote Stephen McKay: pmap_copy is a complex NOP at this moment :-).

With this fix from Stephen, we are getting the target fork performance
that I have been trying to attain: P5-166, before the mega-commit: 700-800usecs,
after: 600usecs, with Stephen's fix: 500usecs!!! Also, this could be the
solution of some strange panic problems...
Reviewed by: dyson@freebsd.org
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


15819 19-May-1996 dyson

Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.


15809 18-May-1996 dyson

This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.


15722 10-May-1996 wollman

Allocate mbufs from a separate submap so that NMBCLUSTERS works as
expected.


15694 09-May-1996 phk

Fix brino on my part. _etext doesn't include the padding to a page
boundary, which means that it doesn't mark the start of the data
section (which is then inaccessible to the programmer ??).
Hopefully fixes recent locore reboot problems.


15583 03-May-1996 phk

Another sweep over the pmap/vm macros, this time with more focus on
the usage. I'm not satisfied with the naming, but now at least there is
less bogus stuff around.


15565 02-May-1996 phk

Move atdevbase out of locore.s and into machdep.c
Macroize locore.s' page table setup even more, now it's almost readable.
Rename PG_U to PG_A (so that I can...)
Rename PG_u to PG_U. "PG_u" was just too ugly...
Remove some unused vars in pmap.c
Remove PG_KR and PG_KW
Remove SSIZE
Remove SINCR
Remove BTOPKERNBASE

This concludes my spring cleaning, modulus any bug fixes for messes I
have made on the way.

(Funny to be back here in pmap.c, that's where my first significant
contribution to 386BSD was... :-)


15543 02-May-1996 phk

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


15538 02-May-1996 phk

First pass at cleaning up macros relating to pages, clusters and all that.


15534 02-May-1996 phk

KGDB is dead. It may come back one day if somebody does it.


15508 01-May-1996 bde

Added calibration the i8254 and the i586 clocks agains the RTC at boot
time. The results are currently ignored unless certain temporary options
are used.

Added sysctls to support reading and writing the clock frequency variables
(not the frequencies themselves). Writing is supposed to atomically
adjust all related variables.

machdep.c:
Fixed spelling of a function name in a comment so that I can log this
message which should have been with the previous commit.

Initialize `cpu_class' earlier so that it can be used in startrtclock()
instead of in calibrate_cyclecounter() (which no longer exists).

Removed range checking of `cpu'. It is always initialized to CPU_XXX
so it is less likely to be out of bounds than most variables.

clock.h:
Removed I586_CYCLECTR(). Use rdtsc() instead.

clock.c:
TIMER_FREQ is now a variable timer_freq that defaults to the old value of
TIMER_FREQ. #define'ing TIMER_FREQ should still work and may be the best
way of setting the frequency.

Calibration involves counting cycles while watching the RTC for one second.
This gives values correct to within (a few ppm) + (the innaccuracy of the
RTC) on my systems.


15507 01-May-1996 bde

i386/machdep.c
include/clock.h
isa/clock.c


15501 01-May-1996 bde

Don't return unused values in cpu_switch() or savectx().

Don't preserve unused registers in the NPX case in savectx().


15471 30-Apr-1996 phk

Remove a spurious mapping that was introduced earlier.


15428 28-Apr-1996 phk

Fix some bugs I introduced and some old ones as well.
Add BDE_DEBUGGER back.
Improve quality of comments.
Thanks Bruce!

Reviewed by: phk
Submitted by: bde


15403 26-Apr-1996 bde

Fixed a bug introoduced in the previous change. ISA device memory was
mapped to semi-random place(s) depending on the content(s) of physical
address 0xA0000. This was fatal at least on my system with a some
memory-mapped devices. Console syscons somehow wasn't affected. It
bogusly hardcodes the address. Sigh.


15392 26-Apr-1996 phk

A significant debogofication of locore.s. I havn't found any actualy
bugs, but it is a lot easier to navigate this twisted code now.


15379 25-Apr-1996 phk

Fix cpu_fork for real.

Suggested by: bde


15345 22-Apr-1996 nate

- add apm to the GENERIC kernel (disabled by default), and add some comments
regarding apm to LINT
- Disabled the statistics clock on machines which have an APM BIOS and
have the options "APM_BROKEN_STATCLOCK" enabled (which is default
in GENERIC now)
- move around some of the code in clock.c dealing with the rtc to make
it more obvios the effects of disabling the statistics clock

Reviewed by: bde


15340 22-Apr-1996 dyson

This fixes a troubling oversight in some of the pmap code enhancements.
One of the manifiestations of the problem includes the -4 RSS problem
in ps.

Reviewed by: dyson
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


15304 19-Apr-1996 phk

savectx returns through cpu_switch in case of the child, so it must
return void just like cpu_switch. Fix prototype and usage from machdep.c


15301 18-Apr-1996 phk

Fix a bogon. cpu_fork & savectx ecpected cpu_switch to restore %eax,
they shouldn't.


15232 13-Apr-1996 bde

Use PCB_SAVEFPU_SIZE instead of a too-small size in savectx(). This
bug only affected FPU emulators. It might have caused bogus FPU states
in core dumps and in the child pcb after a fork. Emulated FPU states
in core dumps don't work for other reasons, and the child FPU state
is reinitialized by exec, so the problem might not have caused any
noticeable affects.

Cleaned up #includes.


15231 13-Apr-1996 bde

Generate #define of PCB_SAVEFPU_SIZE for use in savectx().


15215 12-Apr-1996 phk

Make alltraps a .globl so that DDB doesn't make people belive they have
an ALIGNFLT on their hands all the time.


15146 08-Apr-1996 wollman

Added a $Id$ keyword. Bruce still needs to put a copyright notice
on this file.


15123 07-Apr-1996 bde

Use breakpoint() function instead of inline assembler.


15117 07-Apr-1996 bde

Removed never-used #includes of <machine/cpu.h>. Many were apparently
copied from bad examples.


15109 07-Apr-1996 bde

Fixed the ownership and permissions of /dev/io. Rev.1.32 broke rev.1.29.


15088 07-Apr-1996 dyson

Major cleanups for the pmap code.


15065 06-Apr-1996 dg

Switch 586/686 back to generic_bzero and #if 0'd the "optimized" code. It
turns out that it actually reduces performance in real-world cases.

Noticed by: bde


15054 05-Apr-1996 ache

Fix adjkerntz expression priority


15045 05-Apr-1996 ache

Add wall_cmos_clock sysctl variable, needed to manage adjkerntz even for
UTC cmos clocks (needed for Local Timezone FSes)


14988 01-Apr-1996 scrappy

Convert from using devfs_add_devsw() to devfs_add_devswf()

Fixed Permissions/Ownership in DEVFS to reflect /dev


14967 31-Mar-1996 dg

Change if/goto into a while loop.


14943 31-Mar-1996 bde

Moved rtcin() to clock.c.

Always delay using one inb(0x84) after each i/o in rtcin() - don't
do this conditional on the bogus option DUMMY_NOPS not being defined.
If you want an optionally slightly faster rtcin() again, then inline
it and use a better named option or sysctl variable. It only needs
to be fast in rtcintr().


14887 28-Mar-1996 wollman

Teach the disassembler about the 0f,3x family of instructions
(RDMSR, RDTSC, WRMSR, and RDPMC).


14868 28-Mar-1996 dyson

Remove a now unnecessary prototype from pmap.c. Also remove now
unnecessary vm_fault's of page table pages in trap.c.


14867 28-Mar-1996 dyson

Significant code cleanup, and some performance improvement. Also,
mlock will now work properly without killing the system.


14846 27-Mar-1996 bde

Fixed permissions of /devfs/*random.

Fixed group and permissions of /devfs/perfmon.


14837 27-Mar-1996 bde

Print stack pointer and frame pointer in trap messages.

Fixed "trace/trap" message.

Reviewed by: davidg


14836 27-Mar-1996 bde

Eliminated dependency on opt_sysvipc.h.


14835 27-Mar-1996 bde

Removed vestiges of dummy frame at top of tmpstk.

Use alignment macros where appropriate.

Cleaned up #includes.


14834 27-Mar-1996 bde

Fixed traceback for the following cases:
- legitimate null frames from idle() (traceback was aborted after a null
pointer trap)
- second instruction of normal function prologue, and last instruction of
a function (caller wasn't reported).

Reviewed by: davidg


14825 26-Mar-1996 wollman

Add support for Pentium and Pentium Pro performance counters.
(This code is as yet untested; to come after man page is written.)
This also adds inlines to cpufunc.h for the RDTSC, RDMSR, WRMSR, and RDPMC
instructions. The user-mode interface is via a subdevice of mem.c;
there is also a kernel-size interface which might be used to aid
profiling.


14773 23-Mar-1996 nate

Whoops, back out the last commit, which was accidentally committed at
the same time as the if_zp cleanup patch.

The commit that occurred was an incomplete patch for APM on my laptop
and needs more work.


14772 23-Mar-1996 nate

Now that ac->ac_ipaddr and arpwhohas() no longer exist, remove the
ifdef'd out code that used it.


14691 19-Mar-1996 nate

Always enable interrupts before calling the APM idle/busy routines.

Suggested by: phk@FreeBSD.org


14607 13-Mar-1996 dyson

Make sure that we pmap_update AFTER modifying the page table entries.
The P6 can do a serious job of reordering code, and our stuff could
execute incorrectly.


14595 12-Mar-1996 dg

Killed some historical #define cruft that we've never used in FreeBSD:

UDOT_SZ
SYSPTSIZE
USRPTSIZE
MSGBUFPTECNT
DMMIN
DMMAX
DMTEXT
USRIOSIZE
VM_PHYS_SIZE


14577 12-Mar-1996 nate

Removed undocumented an unused APM_SLOWSTART code.


14527 11-Mar-1996 hsu

For Lite2: proc LIST changes.
Reviewed by: david & bde


14503 11-Mar-1996 hsu

Change type of code argument to sendsig from unsigned to u_long to make it
consistent w/ signalvar.h and kern_sig.c.
Reviewed by: davidg & bde


14470 10-Mar-1996 dyson

Improved efficiency in pmap_remove, and also remove some of the pmap_update
optimizations that were probably incorrect.


14433 09-Mar-1996 dyson

Correct some new and older lurking bugs. Hold count wasn't being
handled correctly. Fix some incorrect code that was included
to improve performance. Significantly simplify the pmap_use_pt and
pmap_unuse_pt subroutines. Add some more diagnostic code.


14348 03-Mar-1996 jkh

USER_LDT changes for the Willows TwinXPDK toolkit. Only tested with WINE
since that's the only other USER_LDT using code that I know of.
Submitted by: Gary Jennejohn <Gary.Jennejohn@munich.netsurf.de>
Obtained from: {Origin of diffs may be someone else - I only rec'd them from
Gary}


14331 02-Mar-1996 peter

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


14328 02-Mar-1996 peter

Add more options into the conf/options and i386/conf/options.i386 files
and the #include hooks so that 'make depend' is more useful. This
covers most of the options I regularly use (but not all) and some other
easy ones.


14245 25-Feb-1996 dyson

Re-insert a missing pmap_remove operation.


14243 25-Feb-1996 dyson

Fix a problem with tracking the modified bit. Eliminate the
ugly inline-asm code, and speed up the page-table-page tracking.


14081 13-Feb-1996 phk

Correct & Update the printing of CPU features. We have printed rubbish
since version 1.117 when Garrett made the switch to %b. Updated to
reflect Intel AP-485 (241618-004).


13915 05-Feb-1996 dg

Unspam my changes in rev 1.54 that John spammed in rev 1.55.


13909 04-Feb-1996 dyson

Changed vm_fault_quick in vm_machdep.c to be global. Needed for
new pipe code.


13908 04-Feb-1996 dg

Rewrote cpu_fork so that it doesn't use pmap_activate, and removed
pmap_activate since it's not used anymore. Changed cpu_fork so that
it uses one line of inline assembly rather than calling mvesp() to
get the current stack pointer. Removed mvesp() since it is no longer
being used.


13852 02-Feb-1996 dg

Killed last change - it was bogus. cpu_switch() already assumes that
return address is on the stack.


13758 30-Jan-1996 wollman

No longer use the cyclecounter to attempt to correct for late or missed
clock interrupts.

Keep a 1-in-16 smoothed average of the length of each tick. If the
CPU speed is correctly diagnosed, this should give experienced users
enough information to figure out a more suitable value for `tick'.


13740 30-Jan-1996 dg

savectx() strikes again: the saved stack pointer wasn't properly adjusted
to remove the return address. It's only the frame pointer and luck that
allowed the code to work at all.


13729 30-Jan-1996 dg

Increase tmpstk size to 8K and make certain it is longword aligned.


13646 27-Jan-1996 bde

Allocate DMA bounce buffers only when requested by drivers. Only the
fd and wt drivers need bounce buffers, so this normally saves 32K-1K
of kernel memory.

Keep track of which DMA channels are busy. isa_dmadone() must now be
called when DMA has finished or been aborted.

Panic for unallocated and too-small (required) bounce buffers.

fd.c:
There will be new warnings about isa_dmadone() not being called after
DMA has been aborted.

sound/dmabuf.c:
isa_dmadone() needs more parameters than are available, so temporarily
use a new interface isa_dmadone_nobounce() to avoid having to worry
about panics for fake parameters. Untested.


13580 23-Jan-1996 dg

Simplified savectx() a little and fixed a bug that caused it to return
garbage in the child process rather than "1" like it is supposed to.

Reviewed by: bde


13543 21-Jan-1996 joerg

Initialize the cpu_class variable. This prevents i386 machines from
panicing with a privileged instruction fault early at boot time.
Submitted by: rock@wurzelausix.CS.Uni-SB.DE (D. Rock)


13495 19-Jan-1996 peter

Some trivial fixes to get it to compile again, plus some new lint:
- cpuclass should be cpu_class
- CPUCLASS_I386 should be CPUCLASS_386
(^^ those only show up if you compile for i386)
- two missing prototypes on new functions
- one missing static


13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


13453 16-Jan-1996 ache

Since new bcd* macros not argument range overflow resistant,
fix argument overflow for years >= 2000


13446 15-Jan-1996 phk

Get rid of two and a half printf in the kernel.
Add more features to the one remaining to handle the job:
+ signed quantity.
# alternate format
- left padding
* read width as next arg.
n numeric in (argument specified) default radix.

Fix the DDB debugger to use these.
Use vprintf in debug routine in pcvt.

The warnings from gcc may become more wrong and intolerable because
of this.

Warning: I have not checked the entire source for unsupported or
changed constructs, but generally belive that there are only a few.

Suggested by: bde


13445 15-Jan-1996 phk

My wife is busy making me a new conical hat, so you don't need to
send any to me this time. Commited an old copy of this files where
the tables were swapped. Duh!.


13444 15-Jan-1996 phk

Soren called an said that I screwed up badly, so I backup until
I find out how... Sorry.


13438 15-Jan-1996 phk

Make bin2bcd and bcd2bin global macroes instead of having local
implementations all over the place.


13402 12-Jan-1996 bde

Fixed handling of Feb 29 in resettodr().


13350 08-Jan-1996 ache

Replace ugly year/month calculations in resettodr to more clean
variants, idea taken from NetBSD clock.c.
At least year calculation was wrong, pointed by Bruce.
Use different strategy to store year for BIOS without RTC_CENTURY


13290 06-Jan-1996 peter

Choose a different name to hold the option definition.. The original one
was overlapping with another file, and making some undesirable behavior a
little worse - it's triggering a bug in config that appears to have been
there for some time (before the options files, anyway.)


13265 05-Jan-1996 wollman

Convert BOUNCE_BUFFERS and BOUNCEPAGES to new option scheme.


13228 04-Jan-1996 wollman

Convert DDB to new-style option.


13226 04-Jan-1996 wollman

Convert SYSV IPC to new-style options. (I hope I got everything...)
The LKMs will need an extra file, to come later.


13225 04-Jan-1996 wollman

convert the math emulation to use the new-style options.


13203 03-Jan-1996 wollman

Converted two options over to the new scheme: USER_LDT and KTRACE.


13130 31-Dec-1995 joerg

Restrict /devfs/io perms to 0600.

Nobody in our regular source tree, or in the non-distfile part of the
ports tree does use /dev/io anyway, so this might be replaced by
another scenario some day.


13125 30-Dec-1995 dg

In memory test, cast pointer as "volatile int *", not "int *" to make sure
that gcc doesn't cache the value used in the test. Pointed out by Erich
Boleyn <erich@uruk.org>.


13107 29-Dec-1995 bde

Implemented non-statistical kernel profiling. This is based on
looking at a high resolution clock for each of the following events:
function call, function return, interrupt entry, interrupt exit,
and interesting branches. The differences between the times of
these events are added at appropriate places in a ordinary histogram
(as if very fast statistical profiling sampled the pc at those
places) so that ordinary gprof can be used to analyze the times.

gmon.h:
Histogram counters need to be 4 bytes for microsecond resolutions.
They will need to be larger for the 586 clock.
The comments were vax-centric and wrong even on vaxes. Does anyone
disagree?

gprof4.c:
The standard gprof should support counters of all integral sizes
and the size of the counter should be in the gmon header. This
hack will do until then. (Use gprof4 -u to examine the results
of non-statistical profiling.)

config/*:
Non-statistical profiling is configured with `config -pp'.
`config -p' still gives ordinary profiling.

kgmon/*:
Non-statistical profiling is enabled with `kgmon -B'. `kgmon -b'
still enables ordinary profiling (and distables non-statistical
profiling) if non-statistical profiling is configured.


13085 28-Dec-1995 dg

Made bzero a function vector and added a 586/686 optimized version of
bzero.
Deprecated blkclr (removed it).
Removed some old cruft from cpufunc.h.

The optimized bzero was submitted by Torbjorn Granlund <tege@matematik.su.se>
The kernel adaption and other changes by me.


13081 28-Dec-1995 dg

Fix one more label that I overlooked with the P6 support. Sigh.


13065 27-Dec-1995 dg

Update bcopyb & bcopy to reflect changes I made in the libc version of
bcopy:

Be smarter about handling overlapped copies and only go backwards if it
is really necessary. Going backwards on a P6 is much slower than forwards
and it's a little slower on a P5. Also moved the count mask and 'std'
down a few lines - it's a couple percent faster this way on a P5.


13056 27-Dec-1995 markm

Modify the ioctl to handle revectored interrupts for the entropy gatherers.


13031 26-Dec-1995 bde

Removed almost all traces of libkern.a. The objects that were in
libkern.a are now specified by listing their source files in
files.${MACHINE}. The list is machine-dependent to save space.
All the necessary object for each machine must be linked into the
kernel in case an lkm wants one.


13014 25-Dec-1995 dg

Fix a lable goofup I made in the previous P6 support changes.


13004 25-Dec-1995 dg

Fix typo in CPUCLASS.


13000 24-Dec-1995 dg

Add Pentium Pro CPU detection and special handling. For now, all the
optimizations we have for 586s also apply to 686s...this will be fine-
tuned in the future as appropriate.


12990 23-Dec-1995 dg

Use FASTER_NOP rather than NOP in rtcin() - only one inb delay was ever
needed.
Reviewed by: bde


12978 22-Dec-1995 bde

Staticized code that was hidden by `#ifdef DEBUG'.


12977 22-Dec-1995 bde

Increased the double fault stack size from 512 to PAGE_SIZE. This is
wasteful, but better than clobbering the variables below the stack.
About 300 bytes of variables were clobbered when I examined double
faults using ddb. Perhaps a page that is known not to be accessed by
the double fault handler could be used. Such pages are not easy to
find, since the double fault handler calls panic() which calls sync()
and possibly dumpsys().


12958 22-Dec-1995 dg

Fix a small logic bug that caused the arguments of the previous frame to
be used instead of the ones for the current frame if a breakpoint had been
set at the entry to a function.


12953 21-Dec-1995 julian

Reviewed by: peter (verbally :)
Move functions specific to mem.c to mem.c from conf.c


12952 21-Dec-1995 dg

Rewrote most of the ddb stack traceback code. These changes are smarter
about decoding trap/syscall/interrupt frames and generally works better
than the previous stuff.
Removed some special (incorrect) frobbing of the frame pointer that
was messing some things up with the new traceback code.


12941 20-Dec-1995 wollman

Increase Pentium cyclecounter calibration time to 131072 us. This
experimentally seems to give better results on my machine.


12930 19-Dec-1995 dg

Corrected a typo in a comment.


12929 19-Dec-1995 dg

Implemented a (sorely needed for years) double fault handler to catch stack
overflows.
It sure would be nice if there was an unmapped page between the PCB and
the stack (and that the size of the stack was configurable!). With the
way things are now, the PCB will get clobbered before the double fault
handler gets control, making somewhat of a mess of things. Despite this,
it is still fairly easy to poke around in the overflowed stack to figure
out the cause.


12904 17-Dec-1995 bde

Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.


12889 16-Dec-1995 peter

Catch a couple more null devsw dereferences...


12850 14-Dec-1995 bde

Added a prototype. Merged prototype lists.


12849 14-Dec-1995 bde

Added a prototype.


12848 14-Dec-1995 bde

Moved some more prototypes outside of ifdefs and grouped them together.


12844 14-Dec-1995 bde

Fixed staticization of DDB functions.


12827 14-Dec-1995 peter

GENERIC/LINT: Remove redundant quoting on some option lines.
LINT: add a couple of new/missing/undocumented options
files.i386: add linux code so that you can compile a kernel with static
linux emulation ("options LINUX")
i386/*: use #if defined(COMPAT_LINUX) || defined(LINUX) to enable static
support of linux emulation (just like "IBCS2" makes ibcs2 static)

The main thing this is going to make obvious, is that the LINUX code
(when compiled from LINT) has a lot of warnings, some of which dont look
too pleasant..


12819 14-Dec-1995 phk

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


12817 14-Dec-1995 phk

Make math_emulators LKMable.


12813 13-Dec-1995 julian

devsw tables are now arrays of POINTERS to struct [cb]devsw
seems to work hre just fine though I can't check every file
that changed due to limmited h/w, however I've checked enught to be petty
happy withe hte code..

WARNING... struct lkm[mumble] has changed
so it might be an idea to recompile any lkm related programs


12791 12-Dec-1995 gibbs

Have Eisa and PCI probes occur before ISA probes. Buslogic EISA and PCI cards
can be found in ISA compatibility mode by the ISA driver, but since the
EISA and PCI probes are non-invasive, we prefer them to find the card first.
Since both EISA and PCI probes can rely on interrupts, enable them before
probing of any type is performed. All ISA probes are still "protected" by
splhigh().


12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


12731 10-Dec-1995 bde

Removed new alias d_size_t for d_psize_t.

Removed old aliases d_rdwr_t and d_ttycv_t for d_read_t/d_write_t and
d_devtotty_t.

Sorted declarations of switch functions into switch order.

Removed duplicated comments and declarations of nonexistent switch
functions.


12724 10-Dec-1995 phk

Staticize and cleanup.


12722 10-Dec-1995 phk

Staticize and cleanup.
remove a TON of #includes from machdep.


12713 10-Dec-1995 bde

Restored used function fusword() (used by GPL math emulator).


12702 09-Dec-1995 phk

Remove various unused symbols and procedures.


12701 09-Dec-1995 phk

Move sysctl machdep.consdev to cons.c


12678 08-Dec-1995 phk

Julian forgot to make the *devsw structures static.


12675 08-Dec-1995 julian

Pass 3 of the great devsw changes
most devsw referenced functions are now static, as they are
in the same file as their devsw structure. I've also added DEVFS
support for nearly every device in the system, however
many of the devices have 'incorrect' names under DEVFS
because I couldn't quickly work out the correct naming conventions.
(but devfs won't be coming on line for a month or so anyhow so that doesn't
matter)

If you "OWN" a device which would normally have an entry in /dev
then search for the devfs_add_devsw() entries and munge to make them right..
check out similar devices to see what I might have done in them in you
can't see what's going on..
for a laugh compare conf.c conf.h defore and after... :)
I have not doen DEVFS entries for any DISKSLICE devices yet as that will be
a much more complicated job.. (pass 5 :)

pass 4 will be to make the devsw tables of type (cdevsw * )
rather than (cdevsw)
seems to work here..
complaints to the usual places.. :)


12673 07-Dec-1995 peter

The static prototype for setroot() was apparently accidently moved
into a block of code that was #ifdef CD9660, meaning you got a compile
failure if you didn't have the CD9660 filesystem configured.


12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


12649 06-Dec-1995 peter

Moving the kern.dumpdev sysctl handler from kern_sysctl.c to swapgeneric.c
is not real helpful since swapgeneric.c doesn't seem to be used, except
perhaps on a GENERIC kernel. (Sorry Paul.. :-)

I've moved it from swapgeneric.c to autoconf.c, since autoconf.c also deals
with dumpdev things. There may be a better place.....


12623 04-Dec-1995 phk

A major sweep over the sysctl stuff.

Move a lot of variables home to their own code (In good time before xmas :-)

Introduce the string descrition of format.

Add a couple more functions to poke into these marvels, while I try to
decide what the correct interface should look like.

Next is adding vars on the fly, and sysctl looking at them too.

Removed a tine bit of defunct and #ifdefed notused code in swapgeneric.


12607 03-Dec-1995 bde

Completed function declarations and/or added prototypes.


12604 03-Dec-1995 bde

Staticized.

Completed function declarations and added prototypes.

Cleaned up prototypes.

Cleaned up #includes.

Removed unused variable `dkn'.


12533 29-Nov-1995 wollman

Fix Pentium CPU rate diagnosis:
- Don't print out meaningless iCOMP numbers, those are for droids.
- Use a shorter wait to determine clock rate to avoid deficiencies
in DELAY().
- Use a fixed-point representation with 8 bits of fraction to store
the rate and rationalize the variable name. It would be
possible to use even more fraction if it turns out to be
worthwhile (I rather doubt it).

The question of source code arrangement remains unaddressed.


12521 29-Nov-1995 julian

If you're going to mechanically replicate something in 50 files
it's best to not have a (compiles cleanly) typo in it! (sigh)


12517 29-Nov-1995 julian

OK, that's it..
That's EVERY SINGLE driver that has an entry in conf.c..
my next trick will be to define cdevsw[] and bdevsw[]
as empty arrays and remove all those DAMNED defines as well..

Each of these drivers has a SYSINIT linker set entry
that comes in very early.. and asks teh driver to add it's own
entry to the two devsw[] tables.

some slight reworking of the commits from yesterday (added the SYSINIT
stuff and some usually wrong but token DEVFS entries to all these
devices.

BTW does anyone know where the 'ata' entries in conf.c actually reside?
seems we don't actually have a 'ataopen() etc...

If you want to add a new device in conf.c
please make sure I know
so I can keep it up to date too..

as before, this is all dependent on #if defined(JREMOD)
(and #ifdef DEVFS in parts)


12499 28-Nov-1995 peter

After having put on my Asbestos suit, complete the MFS_ROOT part of Terry's
mountroot changes. This means that the mfs_initminiroot functionality
into the root mfs_mount....


12471 24-Nov-1995 bde

Staticized. Moved some ero-initialized values to the bss.

Added prototypes.


12429 20-Nov-1995 phk

Mega commit for sysctl.
Convert the remaining sysctl stuff to the new way of doing things.
the devconf stuff is the reason for the large number of files.
Cleaned up some compiler warnings while I were there.


12417 20-Nov-1995 phk

Remove unused vars.


12357 18-Nov-1995 bde

Fixed the type of vm_fault_quick() - don't convert types back and forth
through bogus immediate types.

Added prototypes.


12356 18-Nov-1995 bde

Fixed handling of trace traps when cons_unavail is set. Added comments
about handing of other cases.


12290 14-Nov-1995 phk

Fix a couple of printfs.


12243 12-Nov-1995 phk

The entire sysctl callback to read/write version. I havn't tested this as
much as I'd like to, but the malloc stunt I tried for an interim for
sure does worse.
Now we can read and write from any kind of address-space, not only
user and kernel, using callbacks.
This may be over-generalization for now, but it's actually simpler.


12223 12-Nov-1995 bde

Oops, forgot the following log message in the previous commit:

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


12220 12-Nov-1995 bde

Reviewed by:
Submitted by:
Obtained from:


12186 10-Nov-1995 phk

convert more sysctl variables.


12091 05-Nov-1995 gibbs

Modifications for the new eisaconf.


12078 04-Nov-1995 markm

Remove the #ifdev DEVRANDOM's, as promised.

/dev/random is now a part of the kernel! you will need to make
the device in /dev: sh MAKEDEV random
and take a look at some test code in src/tools/test/random.


12072 04-Nov-1995 bde

Finished(?) moving prototypes for devswitch functions to <machine/conf.h>.
One was hidden in an ifdef.

Continued cleaning up not so new init stuff.

Removed some more /*ARGSUSED*/ for devswitch functions.


12008 02-Nov-1995 peter

When the sync-on-shutdown fails to clear all buffers, this bit of code
can print them out.
I have seen that MFS can leave BUSY buffers, preventing a clean reboot...


11963 31-Oct-1995 peter

Add a simplistic netisr register routine - I need this now for ppp-2.2.


11946 30-Oct-1995 markm

Security fix - do not allow anyone but root to choose the interrupts used
in the the randomising process.
(This is a change to the /dev/random ioctl()))


11940 30-Oct-1995 bde

Removed bogus statics in declarations that don't allocate storage.

Added prototypes.


11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


11919 29-Oct-1995 bde

Fix mmioctl() for !DEVRANDOM case. mmioctl() is a function, not a
pointer to a function.


11875 28-Oct-1995 markm

Theodore Ts'po's random number gernerator for Linux, ported by me.
This code will only be included in your kernel if you have
'options DEVRANDOM', but that will fall away in a couple of days.
Obtained from: Theodore Ts'o, Linux


11872 28-Oct-1995 phk

Remove unused functions and variables, make things static, and other cleanups.


11865 28-Oct-1995 phk

Sorry, the last commit screwed up for me, this is the right one (I hope!)
Please refer to the previous commit message about sysctl variables.


11703 23-Oct-1995 dg

Remove PG_W bit setting in some cases where it should not be set.

Submitted by: John Dyson <dyson>


11693 23-Oct-1995 dg

More improvements to the logic for modify-bit checking. Removed
pmap_prefault() code as we don't plan to use it at this point in time.

Submitted by: John Dyson <dyson>


11642 22-Oct-1995 dg

Simplified some expressions.


11602 21-Oct-1995 phk

A mixed bag of changes, relating to getting the state in "lsdev" right,
and pccard support to work sensibly. Better by far, but still not good.


11523 15-Oct-1995 phk

Pull all of libkern.a in (though not mcount) so the LKM's don't come
out shorthanded. Makes the idea of libkern pretty void now...


11452 12-Oct-1995 wollman

Reduce jitter of Pentium microtime() implementation by letting the counter
free-run and doing a subtract in microtime() rather than resetting the
counter to zero at every clock tick. In combination with the changes to
kern_clock.c, this should eliminate all the immediately obvious sources
of systematic jitter in timekeeping on Pentium machines.


11390 10-Oct-1995 bde

Include <sys/sysproto.h> so that machdep.c compiles cleanly again
(the prototype for sync() moved).

KNFize and otherwise clean up printing of BIOS geometries.

Add prototypes.

Continue cleaning up new init stuff.


11343 09-Oct-1995 bde

Fix tracing of syscalls. The previous fix required the undocumented
option DDB_NO_LCALLS to stop ddb getting control and broke all ddb
tracing. Now there is no option and no way for ddb to trace at
address _Xsyscall or to _Xsyscall, but tracing everywhere else
works. The previous fix did unnecessary things for Linux syscalls.

Don't bother checking that syscall frames are for user mode.

Make debugger traps inside the kernel (except at addresses _Xsyscall
and _Xsyscall+1) fatal if ddb is not configured. They "can't happen".

Add prototypes.

Remove stupid comments, e.g., /*ARGSUSED*/ for args that are used.


11222 05-Oct-1995 phk

remove GCC divsi3 routines which are never used.


11163 04-Oct-1995 julian

Submitted by: Juergen Lock <nox@jelal.hb.north.de>
Obtained from: other people on the net ?

1. stepping over syscalls (gdb ni) sends you to DDB, and returned
to the wrong address afterwards, with or without DDB. patch in
i386/i386/trap.c below.

2. the linux emulator (modload'ed) still causes panics with DIAGNOSTIC,
re-applied a patch posted to one of the lists...


11116 01-Oct-1995 dg

Insert zeroed pages at the head of the zero queue rather than at the tail.
A measurable performance improvement results from the potential for the
page to be partially cached when it is eventually used.


10924 20-Sep-1995 dg

Fix rounding bug in last commit that would have caused the problem to not
be completely fixed.


10910 19-Sep-1995 bde

Fix benign type mismatches in isa interrupt handlers. Many returned int
instead of void.


10826 16-Sep-1995 pst

Our existing Cyrix cache-disable code was short-cutting the steps for
setting the control register. Make the read and write operations two
completely separate steps.

While we're at it, pull in the whole set of Cyrix cache control options
from NetBSD-current, since a few motherboards do the right thing with
the Cyrix chip.

There is no option to disable the internal cache completely (yet).

Reviewed by: pst
Obtained from: NetBSD


10812 15-Sep-1995 dg

Check for page being resident when doing I/O with /dev/kmem and return
EFAULT if it is not resident. This prevents the system from manufacturing
a zero-fill page for unused but allocated areas of the kernel's VM. This
should fix the "CMAP busy" panic that some people saw during system
startup.


10782 15-Sep-1995 dg

1) Killed 'BSDVM_COMPAT'.
2) Killed i386pagesperpage as it is not used by anything.
3) Fixed benign miscalculations in pmap_bootstrap().
4) Moved allocation of ISA DMA memory to machdep.c.
5) Removed bogus vm_map_find()'s in pmap_init() - the entire range was
already allocated kmem_init().
6) Added some comments.

virual_avail is still miscalculated NKPT*NBPG too large, but in order to
fix this properly requires moving the variable initialization into locore.s.
Some other day.


10762 15-Sep-1995 dg

1) Don't double map the kernel page tables. The double mapping was never
used and went a long way toward confusing the code.
2) Fix proc0's initial stack to not be 48 bytes smaller than it needs to
be.
3) Correct comment about 'first' arg to init386().


10666 10-Sep-1995 bde

Make pcvt and syscons live in the same kernel. If both are enabled, then
the first one in the config has priority. They can be switched using
userconfig().

i386/i386/conf.c:
Initialize the shared syscons/pcvt cdevsw entry to `nx'.

Add cdevsw registration functions.

Use devsw functions of the correct type if they exist.

i386/i386/cons.c:
Add renamed syscons entry points to constab.

i386/i386/cons.h:
Declare the renamed syscons entry points.

i386/i386/machdep.c:
Repeat console initialization after userconfig() in case the current
console has become wrong. This depends on cn functions not wiring down
anything important.

sys/conf.h:
Declare new functions.

i386/isa/isa.[ch]:
Add a function to decide which display driver has priority. Should be
done better.

i386/isa/syscons.c:
Rename pccn* -> sccn*.

Initialize CRTC start address in case the previous driver has moved it.

i386/isa/syscons.c, i386/isa/pcvt/*
Initialize the bogusly shared variable Crtat dynamically in case the
stored value was changed by the previous driver.

Initialize cdevsw table from a template.

Don't grab the console if another display driver has priority.

i386/isa/syscons.h, i386/isa/pcvt/pcvt_hdr.h:
Don't externally declare now-static cdevsw functions.

i386/isa/pcvt/pcvt_hdr.h:
Set the sensitive hardware flag so that pcvt doesn't always have lower
priority than syscons. This also fixes the "stupid" detection of the
display after filling the display with text.

i386/isa/pcvt/pcvt_out.c:
Don't be confused the off-screen cursor offset 0xffff set by syscons.

kern/subr_xxx.c:
Add enough nxio/nodev/null devsw functions of the correct type for syscons
and pcvt.


10665 10-Sep-1995 bde

cons.c:
Split off cdevsw initialization in cninit() into a new function
cninit_finish() that isn't called until all hardware device drivers
have been attached. The bdevsw entry of the driver for the physical
console needs to be hooked after the physical driver has been
attached in case the attachment modified the entry.

Rearrange cninit() to avoid changing cn_tab until the driver for the
physical console has been initialized, so that the previous driver
(if any) can be used for debugging.

Start removing half-baked lint support. bdevsw functions usually have
unused args but /*ARGSUSED*/ was used for only about 5% of them.

cons.h:
Declare cn_init_finish().

autoconf.c:
Call cn_init_finish().

Start adding prototypes. Functions with bogus linkage (extern where
static is probably should be static) are explicitly declared as extern
so that the can be found easily (extern in a non-header is usually
wrong).

All:
Continue cleaning up init stuff: init functions shall be static;
INITs should be at the start of files...


10653 09-Sep-1995 dg

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


10624 08-Sep-1995 bde

Fix benign type mismatches in devsw functions. 82 out of 299 devsw
functions were wrong.


10616 08-Sep-1995 dg

1) Really print 'real' memory - use Maxmem, not physmem.
2) Output K bytes instead of pages as this means something to more people.
3) Moved printf of avail memory to after vm_bounce_init() call so that
bounce buffers are included in the figure.
4) Killed initcpu(); it's an unused vestige from the VAX.


10609 07-Sep-1995 dg

Minor cleanup and (very) small micro optimization to Xsyscall (and the
linux one)..


10594 06-Sep-1995 wpaul

Put back the "real memory =" printf() that vanished when the code to
handle holes in memory was added.


10546 03-Sep-1995 dyson

Machine dependent routines to support pre-zeroed free pages. This
significantly improves demand zero performance.


10537 03-Sep-1995 julian

devfs changes..
changes to allow devices that don't probe (e.g. /dev/mem)
to create devfs entries
this required giving 'configure' its own SYSINIT entry
so we could duck in just before it with a DEVFS init
and some device inits..
my devfs now looks like:
./misc
./misc/speaker
./misc/mem
./misc/kmem
./misc/null
./misc/zero
./misc/io
./misc/console
./misc/pcaudio
./misc/pcaudioctl
./disks
./disks/rfloppy
./disks/rfloppy/fd0.1440
./disks/rfloppy/fd1.1200
./disks/floppy
./disks/floppy/fd0.1440
./disks/floppy/fd1.1200
also some sligt cleanups.. DEVFS needs a lot of work
but I'm getting back to it..


10431 30-Aug-1995 bde

Declare vfs_mountroot() in the right place.


10425 29-Aug-1995 bde

Remove relocation of Crtat. Drivers already relocate it (somewhat
bogusly). We used to undo the driver relocation here before doing
a somewhat less bogus relocation. The result was a null relocation
here.


10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


10268 25-Aug-1995 bde

Remove extra args from the calls to getit(). The bug was benign with the
default function call convention.


10157 21-Aug-1995 dg

A couple of micro optimizations to improve NULL syscall performance by
about 2%.


10126 20-Aug-1995 dg

Fixed a few bugs and annoyances with boot():

1) deal with cold flag better
2) check for key input more often
3) get rid of unused variables
4) minor formatting improvements


10092 17-Aug-1995 dg

Killed some unused stuff inherited from Bill Jolitz. Note that since
this changes the size of the pcb struct, gdb will need to be rebuilt
or debugging won't work correctly.

Reviewed by: Bruce Evans


10063 15-Aug-1995 bde

Fake a call frame for traps so that `gdb -k' can report where fatal
traps occurred. This also helps ddb backtrace through trap frames.
Backtracing through syscall and interrupt frames still doesn't work
but it is relatively unimportant and more expensive to fix.


9799 30-Jul-1995 dg

Fix a bug in my disabled version of trap_pfault()...curpcb may be NULL even
when curproc isn't. This condition occurs at system startup and perhaps
at other times.


9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


9744 28-Jul-1995 dg

Fixed bug I introduced with the memory-size code rewrite that broke
floppy DMA buffers...use avail_start not "first". Removed duplicate
(and wrong) declaration of phys_avail[].

Submitted by: Bruce Evans, but fixed differently by me.


9578 19-Jul-1995 dg

Rewrote memory sizing code to generally deal with holes in extended memory.
This code change should allow certain Compaq machines with a 128K hole
at 16MB to work.


9550 16-Jul-1995 peter

This fixes a compiler warning, and a cosmetic problem with the linux
emul code when compiling with "options KTRACE".
ktrsyscall() was expecting an array of integers, this was passing the
address of a structure containing an array of integers..
The cosmetic problem was that it was calling the "enter syscall"
trace hook twice - this looks like a cut/paste error/typo.


9547 16-Jul-1995 phk

Reviewed by: phk
Submitted by: Andrew McRae <andrew@mega.com.au>

Some initial commits from the pcmcia stuff, to make life easier for the
testers.

We will use the name "pccard" since that is really the buzzword at present.


9546 16-Jul-1995 phk

Make the bootinfo structure visible from sysctl.
This can be used in libdisk to guess a better bios-geometry.


9545 16-Jul-1995 joerg

Include ``options POWERFAIL_NMI'' for owners of older (non-apm)
notebooks where a powerfail condition (external power drop; battery
state low) is signalled by an NMI. Makes it beep instead of panicing.

Reviewed by: davidg


9533 16-Jul-1995 dg

Truncate the fault address to a page boundry when calling vm_fault(). The
last change to fix the fault-twice bug with page tables wasn't quite
complete.


9524 14-Jul-1995 dg

Fixed bug that caused page tables to be faulted twice instead of once.

Submitted by: John Dyson


9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


9345 28-Jun-1995 dg

Killed redundant vnode_pager_umount() call. This is already done at
FS unmount time.


9344 28-Jun-1995 dg

Make path to kernel absolute if it is passed in relative. This fixes
a related bug in some of the new 'foo'boot bootstrap code that has been
added over the past months. This change makes it no longer necessary
for the bootstrap to fix up the path (i.e. it can be removed).


9326 26-Jun-1995 bde

Partially fix `sysctl machdep.console_device'. The fix will be complete
when syscons stops mapping the console to minor MAXCONS. There is
usually no corresponding device in /dev, and the correct device has
minor 0.

cons.c:
Initialize cn_tty properly, so that CPU_CONSDEV can work.
Comment about too many variants of the console tty pointer.

machdep.c:
Return device NODEV and not error EFAULT when there is no console device.


9202 11-Jun-1995 rgrimes

Merge RELENG_2_0_5 into HEAD


8876 30-May-1995 rgrimes

Remove trailing whitespace.


8833 29-May-1995 dg

Fix setdumpdev():
- the major number wasn't checked, so accesses beyond the end of bdevsw[]
were possible. Bogus major numbers are easy to get because `sysctl -w'
doesn't handle dev_t's reasonably - it doesn't convert names to dev_t's
and it converts the number 1025 to the dev_t 0x35323031.
- Driver d_psize() functions return -1 to indicate error ENXIO or ENODEV
(the interface is too braindamaged to say which). -1 was interpreted
as a size and resulted in the bogus error ENOSPC.
- it was possible to set the dumpdev for devices without a d_psize()
function. This is equivalent to setting the dumpdev to NODEV except
it confuses sysctl.
- change a 512 to DEV_BSIZE. There is an official macro dtoc() for
converting "pages" to disk blocks but it is never used in /usr/src/sys.
There is much confusion between PAGE_SIZE sized pages and NBPG sized
pages. Maxmem consists of both.

Not fixed:
- there is nothing to invalidate the dumpdev if the media goes away.
This reduces the benefits of the early calculation of dumplo. Bounds
checking in the dump routines is relied on to reduce the risk of
damage and little would be lost by relying on the dump routines to
calculate dumplo.
- no attempt is made to stay away from the start of the device to
avoid clobbering labels.

Fix wrong && anachronistic comment about the type of bootdev.

Reviewed by: davidg
Submitted by: Bruce Evans


8748 25-May-1995 dg

Made "NMBCLUSTERS" calculation dynamic and fixed bogus use of "NMBCLUSTERS"
in machdep.c (it should use the global nmbclusters). Moved the calculation
of nmbclusters into conf/param.c (same place where nmbclusters has always
been assigned), and made the calculation include an extra amount based
on "maxusers". NMBCLUSTERS can still be overrided in the kernel config
file as always, but this change will make that generally unnecessary. This
fixes the "bug" reports from people who have misconfigured kernels seeing
the network hang when the mbuf cluster pool runs out.

Reviewed by: John Dyson


8590 18-May-1995 dg

Added "BROKEN_KEYBOARD_RESET" option to disable using the keyboard reset
in cpu_reset(). Some MBs don't deal with this properly.

Submitted by: Rod Grimes


8504 14-May-1995 dg

Changed swap partition handling/allocation so that it doesn't
require specific partitions be mentioned in the kernel config
file ("swap on foo" is now obsolete).

From Poul-Henning:

The visible effect is this:

As default, unless
options "NSWAPDEV=23"
is in your config, you will have four swap-devices.
You can swapon(2) any block device you feel like, it doesn't have
to be in the kernel config.

There is a performance/resource win available by getting the NSWAPDEV right
(but only if you have just one swap-device ??), but using that as default
would be too restrictive.

The invisible effect is that:

Swap-handling disappears from the $arch part of the kernel.
It gets a lot simpler (-145 lines) and cleaner.

Reviewed by: John Dyson, David Greenman
Submitted by: Poul-Henning Kamp, with minor changes by me.


8481 12-May-1995 wollman

The death of `options NODUMP'. Now the dump area can be dynamically
configured (and unconfigured) on the fly. A sysctl(3) MIB variable is
provided to inspect and modify the dump device setting.


8456 11-May-1995 rgrimes

Fix -Wformat warnings from LINT kernel.


8448 11-May-1995 bde

Add variable `idelayed' and macros setdelayed() and schedsofttty()
to access it. setdelayed() actually ORs the bits in `idelayed' into
`ipending' and clears `idelayed'.

Call setdelayed() every (normal) clock tick to convert delayed
interrupts into pending ones.

Drivers can set bits in `idelayed' at any time to schedule an interrupt
at the next clock tick. This is more efficient than calling timeout().
Currently only software interrupts can be scheduled.


8433 11-May-1995 wpaul

If you config a kernel with 'config kernel swap generic' and try to
boot diskless with it, you get a panic because setconf() is only
called for mountroot == ffs_mountroot. It really needs to be called
no matter what manner of rootfs we have. I can't really say if
swapgeneric will work with a CD-ROM though. (I get the feeling I'm
the only one who uses swapgeneric these days anyway.)


8427 11-May-1995 wollman

Delete two debugging printfs that mistakenly crept in.


8426 11-May-1995 wollman

Make networking domains drop-ins, through the magic of GNU ld. (Some day,
there may even be LKMs.) Also, change the internal name of `unixdomain'
to `localdomain' since AF_LOCAL is now the preferred name of this family.
Declare netisr correctly and in the right place.


8214 02-May-1995 dg

Added a memcpy() routine.


8211 01-May-1995 dyson

Fixed a problem that can cause left-over pv_entries and as
as side-effect, removed some legacy code that was necessary
when we called vm_fault inside of vm_fault_quick instead of using
the kernel/user space byte move routines.


8074 26-Apr-1995 rgrimes

Add outb to keyboard controller to do a cpu_reset, this fixes 2 known
cases of motherboards that failed to reboot.


8055 25-Apr-1995 phk

Add support for MFS root filesystem.


8015 23-Apr-1995 julian

hmm spotted a difference resulting from a merge I didn't examine close enough


8014 23-Apr-1995 julian

include hooks for EISA configuration (possibly wrong :)


8007 23-Apr-1995 phk

Forgot this commit the other day. The receiving end of the "boot -C" option.


7994 22-Apr-1995 wpaul

Tiny printf formatting change: if we have no cpu_vendor or cpu_id info,
don't generate a newline. (Yeah, I'm picking nits, but that empty line
I get on my 386 just looks dumb, okay? :)


7930 18-Apr-1995 rgrimes

Reapply my fix for this:
Output the CPU features line during the probe on a seperate line, for
folks with lots of features the output use to wrap and look ugle.


7908 17-Apr-1995 phk

Print the BIOS geometries in a human-readable format.


7874 16-Apr-1995 dg

Remove gratuitous waste of 2K of memory for BIOS variables. We never load
the kernel at 0-640k; we haven't had the ability to do that since before
2.0R. Furthermore, I fail to see how putting an instruction at 0 and then
doing a .org 0x500 is going to prevent the stuff from getting clobbered
in the first place; a.out is just too stupid to know about sparse address
spaces.


7818 14-Apr-1995 dufault

Add scsi target. Add "after config" call to autoconf so that scsi
targets will be configured after all scsi busses have been configured.


7814 14-Apr-1995 wpaul

Hopefully I won't get flamed for this: insert a few more #if defined(I486_CPU)
and #if defined (I586_CPU) thingies into identifycpu() so that we only
compile in what's actually needed for a given CPU. So far as I can tell,
none of my 386 machines generate a cpu_vendor code, so I made the extra vendor
and feature line conditional on I486_CPU and I586_CPU. (Otherwise we
print out a blank line which looks silly.)


7792 13-Apr-1995 wpaul

This a subtle reminder to people that not everybody compiles their
kernels with 'options I586_CPU.'

The declaration for pentium_mhz is hidden inside an #ifdef I586_CPU,
but machdep.c refers to it whether I586_CPU is defined or not. This
temporary hack puts the offending code inside an #ifdef I586_CPU as
well so that a kernel without it will successfully compile.

I must emphasize the word 'temporary:' somebody needs to seriously
beat on the identifycpu() function with an #ifdef stick so that
I386_CPU, I486_CPU and I586_CPU will do the right things.


7780 12-Apr-1995 wollman

Add a class field to devconf and mst drivers.
For those where it was easy, drivers were also fixed to call
dev_attach() during probe rather than attach (in keeping with the
new design articulated in a mail message five months ago). For
a few that were really easy, correct state tracking was added as well.
The `fd' driver was fixed to correctly fill in the description.
The CPU identify code was fixed to attach a `cpu' device. The code
was also massively reordered to fill in cpu_model with somethingremotely
resembling what identifycpu() prints out. A few bytes saved by using
%b to format the features list rather than lots of ifs.


7731 10-Apr-1995 phk

Changes to make FreeBSD use a CDROM as rootdev, for installation purposes.
If "BOOTCDROM" is defined, you get this pretty special case stuff.


7693 09-Apr-1995 dg

Cosmetic changes.


7680 08-Apr-1995 joerg

Implement a simple hook (or hack?) to allow graphics device console
drivers to protect DDB from being invoked while the console is in
process-controlled (i.e., graphics) mode.

Implement the logic to use this hook from within pcvt. (I'm sure
Søren will do the syscons part RSN).

I've still got one occasion where the system stalled, but my attempts
to trigger the situation artificially resulted int the expected
behaviour. It's hard to track bugs without the console and DDB
available. :-/


7644 06-Apr-1995 rgrimes

Output the CPU features line during the probe on a seperate line, for
folks with lots of features the output use to wrap and look ugle.

Reviewed by: phk


7490 30-Mar-1995 dg

Made pmap_testbit a static function.


7402 26-Mar-1995 dg

Changed pmap_changebit() into a static function as it always should
have been.

Submitted by: John Dyson


7214 21-Mar-1995 dg

Added a new version of trap_pfault() that disallows kernel page faults
to the user address space unless pcb_onfault is set. The code is currently
commented out because iBCS2 and process debugging parts of the kernel
need to be changed/fixed first.


7213 21-Mar-1995 dg

Changed some #ifdef DIAGNOSTIC code that I added to be #ifdef DEBUG.


7170 19-Mar-1995 dg

Removed redundant newlines that were in some panic strings.


7103 17-Mar-1995 dg

Call dev_shutdownall() just after unmounting filesystems.


7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


6995 11-Mar-1995 phk

Moved bb stuff to support.s per Bruces suggestion.


6981 10-Mar-1995 phk

Add a dummy ___bb_init_func for BB profiling of the kernel.
To use this: recompile src/gnu/usr.bin/cc, compile your kernel. The
files you want to profile should be compiled with '-a -g'. "strip -x"
the kernel and run. You don't need to profile all files in the kernel.
My next commit is the program to extract the data from the running kernel.


6978 10-Mar-1995 dg

kmem_alloc() returns zero-filled memory; it isn't necessary to bzero()
it.


6977 10-Mar-1995 dg

Removed unnecessary routines vm_get_pmap() and vm_put_pmap().
kmem_alloc() returns zero filled memory, so no need to explicitly
bzero() it.


6949 07-Mar-1995 dg

Increased number of buffers to 1/12 of (page_count - 1024). This makes the
cache minimum closer to 10% in the usual case.


6902 05-Mar-1995 wpaul

Changed the printf()s in npxattach a bit so you don't end up with
messages like this:

wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <ST506>
wd0: size unknown, using BIOS values: 615 cyl, 4 head, 17 sec, bytes/sec 512
npx0 at 0xf0-0xff irq 13 on motherboard
npx0: changing root device to wd0a
^^^^^^

The spurious 'npx0: ' pops up if you have a 386 with a 387 FPU.


6874 04-Mar-1995 dg

Removed obsolete vtrace() and reorganized a little.


6846 03-Mar-1995 dg

Use copyout to install the sigframe rather than directly writing to the
user's stack.


6817 01-Mar-1995 dg

Use su/fubyte instead of directly touching the user's address space.


6807 01-Mar-1995 dg

Various changes from John and myself that do the following:

New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.


6806 01-Mar-1995 dg

Slight change to include file order to accommodate upcoming changes.


6734 26-Feb-1995 bde

Replace all remaining instances of `i386/include' by `machine' and fix
nearby #include inconsistencies.


6664 23-Feb-1995 bde

Submitted by: seb@erix.ericsson.se (Sebastian Strollo)

Remove over-cautious early fnop() synchronization. It caused the probe to
hang on systems without an FPU.


6579 20-Feb-1995 dg

Use of vm_allocate() and vm_deallocate() has been deprecated.


6547 18-Feb-1995 wpaul

Do away with 'options SWAP_GENERIC' once and for all: I get ill
just thinking about it.

Two changes need to be made to allow 'config kernel swap generic' to
work properly without requiring any compile-time flags:

/usr/src/usr.sbin/config/mkswapconf.c: we need to define a dummy stub
for the setconf() function to replace the one in swapgeneric.c that
isn't available in non-generic configurations.

/usr/src/sys/i386/i386/autoconf.c: the -a boot flag causes setroot()
to be skipped and lets setconf() prompt the user for a root device.
If you skip setroot() in a non-generic kernel, you could get severely
hosed. To avoid this, we silently ignore the -a flag if rootdev != NODEV.
(rootdev is always initialized to NODEV in swapgeneric.c, so if
we find that rootdev is something other than NODEV, we know we're
not using a generic configuration.)


6535 17-Feb-1995 bde

Undo the busy latch changes made in the previous revision. They broke
some 386/387 systems.

Don't print the IRQ number twice in the boot diagnostics.


6512 17-Feb-1995 phk

This is the latest version of the APM stuff from HOSOKAWA, I have looked
briefly over it, and see some serious architectural issues in this stuff.

On the other hand, I doubt that we will have any solution to these issues
before 2.1, so we might as well leave this in.

Most of the stuff is bracketed by #ifdef's so it shouldn't matter too much
in the normal case.

Reviewed by: phk
Submitted by: HOSOKAWA, Tatsumi <hosokawa@mt.cs.keio.ac.jp>


6439 15-Feb-1995 dg

Use proc0's proc struct rather than curproc's when calling sync.


6423 15-Feb-1995 dg

Killed the pmap_use_pt and pmap_unuse_pt prototypes as they are now in
machine/pmap.h.


6380 14-Feb-1995 sos

First attempt to run linux binaries. This is only the changes needed to
the generic kernel. The actual emulator is a separate LKM. (not finished
yet, sorry).
Submitted by: sos@freebsd.org & sef@kithrup.com


6377 14-Feb-1995 phk

Removed a YF comment.


6354 14-Feb-1995 phk

Yves has sent us a ~600 Kb patch, which shuts up gcc entirely for the
entire kernel.
Unfortunately we didn't send him a copy of the style guide before he did it.
I'm trying to find all the benign and downright sound bits and will commit
them without any other explanation than "YF fix" if they are merely cosmetic.

Reviewed by: phk
Submitted by: yves@dutncp8.tn.tudelft.nl (Yves Fonk)


6327 12-Feb-1995 dg

Carefully choose the low limit for number of buffers to acheive the best
performance on small memory machines.


6325 12-Feb-1995 dg

Fixed a bogus comment and made a stylistic change (testl instead of orl
to test for zero).


6308 11-Feb-1995 phk

Intels App Note AP-485 applied.
We will now tell a good deal more about the CPU if Intel made it.

What is a i486DX2 Write-Back Enhanced CPU ?


6301 10-Feb-1995 dg

Changed extended memory test so that it's non-destructive and not a
complete test (it never was "complete", which is why it was bogus). Now
only a single longword is checked in each page.


6299 10-Feb-1995 dg

Removed obsolete and unused vmtime() function.


6297 10-Feb-1995 dg

Removed unnecessary check for pr_scale in the AST/OWEUPC case.


6296 10-Feb-1995 dg

Check P_PROFIL flag for profiling rather than pr_scale as it makes more
sense.


6126 02-Feb-1995 dg

Mostly cosmetic changes. Use KERNBASE instead of UPT_MAX_ADDRESS in
some comparisons as it is more correct (we want the kernel page tables
included).
Reorganized some of the expressions for efficiency.
Fixed the new pmap_prefault() routine - it would sometimes pick up the
wrong page if the page in the shadow was present but the page in object
was paged out. The routine remains unused and commented out, however.
Explicitly free zero reference count page tables (rather than waiting
for the pagedaemon to do it).

Submitted by: John Dyson


6105 01-Feb-1995 se

Reviewed by: se
Submitted by: wolf (Wolfgang Stanglmeier)
PCI specific code moved to /sys/pci.


6008 29-Jan-1995 bde

Fix disassembly of `bt[crs] $Ib,E'.


5999 29-Jan-1995 ats

Correct a name of one structure member in the sigaltstack structure.
Now it matches the man page and also the only other commercial implementation
i have found so far ( Solaris 2.x).
Changed the name from ss_base to ss_sp.


5943 26-Jan-1995 dg

Fix from Doug Rabson for a panic related to not initializing the kernel's
PTD.

Submitted by: John Dyson


5916 26-Jan-1995 dg

Comment out pmap_prefault for the time being (perhaps until after 2.1).
The object_init_pt routine is still enabled and used, however, and this
is where most of the 'pre-faulting' performance improvement comes from.


5914 26-Jan-1995 dg

Make sure that the pages being 'pre-faulted' are currently on a queue.


5910 25-Jan-1995 dg

Be a bit less fast and loose about setting non-cacheablity of pages.


5908 25-Jan-1995 bde

Load the kernel symbol table in the boot loader and not at compile time.
(Boot with the -D flag if you want symbols.)

Make it easier to extend `struct bootinfo' without losing either forwards
or backwards compatibility.

ddb_aout.c:
Get the symbol table from wherever the loader put it.
Nuke db_symtab[SYMTAB_SPACE].

boot.c:
Enable loading of symbols. Align them on a page boundary. Add printfs
about the symbol table sizes.
Pass the memory sizes to the kernel.
Fix initialization of `unit' (it got moved out of the loop).
Fix adding the bss size (it got moved inside an ifdef).
Initialize serial port when RB_SERIAL is toggled on.
Fix comments.
Clean up formatting of recently added code.

io.c:
Clean up formatting of recently added code.

netboot/main.c, machdep.c, wd.c:
Change names of bootinfo fields.

LINT:
Nuke SYMTAB_SPACE.
Fix comment about DODUMP.

Makefile.i386:
Nuke use of dbsym.
Exclude gcc symbols from kernel unless compiling with -g.
Remove unused macro.
Fix comments and formatting.

genassym.c:
Generate defines for some new bootinfo fields. Change names of old ones.

locore.s:
Copy only the valid part of the `struct bootinfo' passed by the loader.
Reserve space for symbol table, if any.

machdep.c:
Check the memory sizes passed by the loader, if any. Don't use them yet.

bootinfo.h:
Add a size field so that we can resolve some mismatches between the loader
bootinfo and the kernel boot info. The version number is not so good for
this because of historical botches and because it's harder to maintain.
Add memory size and symbol table fields. Change the names of everything.

Hacks to save a few bytes:

asm.S, boot.c, boot2.S:
Replace `ouraddr' by `(BOOTSEG << 4)'.

boot.c:
Don't statically initialize `loadflags' to 0. Disable the "REDUNDANT"
code that skips the BIOS variables. Eliminate `total'. Combine some
more printfs.

boot.h, disk.c, io.c, table.c:
Move all statically initialzed data to table.c.

io.c:
Don't put the A20 gate bits in a variable.


5837 24-Jan-1995 dg

Changed buffer allocation policy (machdep.c)
Moved various pmap 'bit' test/set functions back into real functions; gcc
generates better code at the expense of more of it. (pmap.c)
Fixed a deadlock problem with pv entry allocations (pmap.c)
Added a new, optional function 'pmap_prefault' that does clustered page
table preloading (pmap.c)
Changed the way that page tables are held onto (trap.c).

Submitted by: John Dyson


5771 21-Jan-1995 bde

Don't use mi_switch() to terminate cpu_exit(). Calling it just happened to
work (mi_switch() counted the last timeslice again but this didn't affect
the exiting process' rusage because the rusage has already been finalized).

Remove stale comment.


5770 21-Jan-1995 bde

Remove unused definitions of vm statistics counters. Most of the
counting is now done in C. There are still about 100 unused
definitions for other things.


5769 21-Jan-1995 bde

Don't count context switches here, they are already counted in mi_switch().


5722 19-Jan-1995 ats

Submitted by: Bruce Evans
Put in the much shorter and cleaner version for the calibrate_cycle_counter
for the Pentium that Bruce suggested. Tested here on my Pentium and
it works okay.


5675 17-Jan-1995 bde

The %eflags checking introduced in the previous commit was too zealous.
sigreturn() sometimes failed for ordinary returns from signal handlers.
Failures of ordinary returns "can't happen" and are badly handled.
"Temporary" fix: allow users to corrupt PSL_RF. This is fairly
harmless. A correct fix would involve saving the old %eflags (and
perhaps the old segment registers) where the user can't get at them.


5642 15-Jan-1995 dg

Fixed some page table reference count problems; these changes may not be
complete, but should be closer to correct than before.


5603 14-Jan-1995 bde

Fix security holes in sigreturn(), ptrace() and procfs. sigreturn()
attempted to check for insecure and fatal eflags and segment
selectors, but missed many cases and got the IOPL check back to
front. The other syscalls didn't check at all.

sys_process.c, machdep.c:
Only allow PT_WRITE_U to write to the registers (ordinary and FP).

psl.h, locore.s, machdep.c:
Eliminate PSL_MBZ, PSL_MBO and PSL_USERCLR. We are not supposed
to assume anything about the reserved bits. Use PSL_USERCHANGE
and PSL_KERNEL instead. Rename PSL_USERSET to PSL_USER.

exception.s:
Define a private label for use by doreti when returning to user
mode fails.

machdep.c:
In syscalls, allow changing only the eflags that can be changed on
486's in user mode (no longer attempt to allow benign IOPL changes;
allow changing the nasty PSL_NT; don't allow changing the i586
bits).

Don't attempt to check all the cases involving invalid selectors
and %eip's. Just check for privilege violations and let the invalid
things cause a trap.

procfs_machdep.c:
Call the ptrace register functions to do all the work for reading
and writing ordinary registers and for single stepping.

trap.c:
Ignore traps caused by PSL_NT being set. Previously, users could
cause a fatal trap in user mode by setting PSL_NT and executing an
iret, and a fatal trap in kernel mode by setting PSL_NT and making
a syscall. PSL_NT was cleared too late and not in enough modes to
fix the problem.

Make all traps in user mode (except T_NMI) nonfatal.

Recover from traps caused by attempting to load invalid user
registers in doreti by restarting the traps so that they appear to
occur in user mode.
---

Fix bogons that I noticed while fixing the above:

psl.h:
Fix some comments.

Uniformize idempotency ifdef.

exception.s, machdep.c:
Remove rsvd[0-14]. rsvd0 hasn't been reserved since the 486 came
out. Replace rsvd0 by `align'. rsvd[0-11] used wrong (magic
non-unique) trap numbers. Replace rsvd[1-14] by rsvd.

locore.s:
Enable alignment check flag on 486's and 586's.

machdep.c:
Use a better type for kstack[].

Use TFREGP() to find the registers.

Reformat ptrace functions from SEF to something closer to KNF.

procfs_machdep.c:
The wrong pointer to the registers got fixed as a side effect.

Implement reading and writing of FP registers.

/proc/*/*regs now work (only) for processes that are in memory.

Clean up comments.

trap.c, trap.h:
Remove unused trap types.


5588 14-Jan-1995 bde

Eliminate T_KDBTRAP, which will soon go away. It was only used for an
unreachable case label in kdb_trap().

Use the correct case labels in kdb_trap() so that normal ddb entry doesn't
print a message.

Change all printf's to db_printf's. Now you can put a breakpoint at printf,
and ddb entry messages don't spam the syslog output.

Cosmetic:

Use ISPL() instead of magic numbers.

Don't compile the unused function kdb_kbd_trap().

Improve some asms.

Print the arg to Debugger().


5580 14-Jan-1995 dg

Add missing object_lock/unlock.


5516 11-Jan-1995 jkh

Change GENERIC to SWAP_GENERIC for now. I need the GENERIC kernel to
build by default again! When the furor subsides, maybe something better
can be done, but..


5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


5431 07-Jan-1995 ats

Work around a compiler bug in gcc2.6.3 in handling (long long) variables and
shifting. Also correct the original code as Garrett noticed it in mail.
Leave the mishandled code in to use it later if future versions of gcc
are correct. The code was part of the calibrate_cyclecounter routine to
get the speed of the pentium chip.


5413 05-Jan-1995 se

Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Reviewed by: <wollman>
First hooks and defines for the ISDN driver,
that soon will see the light ...


5351 03-Jan-1995 bde

Use sufficient parentheses in macros.

Remove bogus input operands for fnsave(), fnstcw() and fnstsw().

Change all fwait's to fnop's. This might help avoid hardware bugs.
Wait after fninit with an fnop. This should be safer now.

Fix some spelling and formatting errors.

Use natural sizes for control and status words (u_short, promotes to int).

Don't clobber the SWI_CLOCK_MASK bits in npx0_imask when using IRQ13.

Set the devconf state correctly (always busy, if configured). Improve
code for npx_registerdev() a little (gcc can't keep id->id_unit in a
register for some reason). Don't register a nonexistent npx device.

Print a useful message in npxattach() again (delete references to errors
and not the whole message). Don't print "387 emulator" if there is no
emulator in the kernel.

Use %p for pointers in error messages.

Don't clobber the FPU state when there is an FPU exception. Just clear
the exception flags (after saving the flags as before). This allows
debuggers and SIGFPE handlers to look at the full exception state.
SIGFPE handlers should normally return via longjmp(), which restores a
good FPU state (as before). Returning from a SIGFPE handler may leave
the FPU in the wrong state (as before).

Clear the busy latch _after_ clearing the exception flags so that there
is less chance of getting a bogus h/w interrupt for a control operation.

Clear the saved exception status word when the next FPU instruction is
excuted so that it doesn't stick around until the next exception.

Clear the busy latch after fnsave() in npxsave() in case it was set when
npxsave() was called.


5321 31-Dec-1994 jkh

From Bill Paul:

- /sys/i386/i386/swapgeneric.c is just plain broke. But fear not, for I
have unbroken it. One thing that swapgeneric.c does is walk through the
list of configured devices searching for a boot device. The only easy
way to accomplish this in 2.0 is to use Garret Wollman's kern_devconf
stuff. *BUT*, the head of the kern_devconf linked list (dc_list) is declared
static in /sys/kern/kern_devconf.c. This means that swapgeneric.c can't
see it at link time. I had to remove the 'static' keyword to get around
this little problem. I hope this doesn't break anything anywhere.

*Furthermore,* there's a small matter of making the call to setconf()
in swapgeneric.c disappear when 'config kernel swap generic' isn't used.
You could change /sbin/config to create a dummy setconf() function in
swapkernel.c, but that seems messy somehow. (It's also someting of an
'it isn't broken, why are you fixing it' situation.) My solution was to
do what the NetBSD people did and put an #ifdef GENERIC around the call
to setconf(). If your kernel is called GENERIC or you define 'options
GENERIC,' then you can use 'config kernel swap generic' and it'll work.

That aside, the upshot is that: a) swapgeneric.c actually works, and
and b) the -a boot flag now works as well. If you boot with -a, as in
"Boot: wd(0,a)/kernel -a" you will be presented with a 'root device?'
prompt after the autoconfig phase, at which point you can specify what
device you want mounted as root. Regrettably, you can't specify an NFS
filesystem. Yet. Three files are affected: /sys/i386/i386/swapgeneric.c,
/sys/i386/i386/autoconf.c and /sys/kern/kern_devconf.c.

Submitted by: wpaul


5291 30-Dec-1994 bde

icu.s:
Move definition of `stat_imask' to clock.c.

clock.c:
Rename `rtcmask' to `stat_imask' and export it. Rename `clkmask' to
`clk_imask' for consistency.

Only calculate TIMER_DIV(hz) once.

Merge debugging and "garbage" code to produce debugging code and format the
output better.

Make writertc() static inline and use it everywhere. Now all accesses to
the clock registers go through rtcin() and writertc().

Move rtc initialization to cpu_initclocks().

Merge enablertclock() with cpu_initclocks() and remove enablertclock().
The extra entry point was just a leftover from 1.1.5.


5220 24-Dec-1994 bde

Obtained from: 1.1.5

Fix single-stepping of emulated FPU instructions.

Don't panic if an FPU instruction is attempted but there is no FPU
and no FPU emulator is configured.


5153 18-Dec-1994 dg

Move page_unhold's in pmap_object_init_pt down one line to gard against
a potential race condition.


5144 18-Dec-1994 dg

Check for PG_FAKE too in pmap_object_init_pt.


5037 11-Dec-1994 dg

Removed inappropriate comment.


5036 11-Dec-1994 dg

Add additional comment.


5035 11-Dec-1994 dg

Fix bogus comment.


4929 03-Dec-1994 bde

i386/exception.s,
Keep track of interrupt nesting level. It is normally 0
for syscalls and traps, but is fudged to 1 for their exit
processing in case they metamorphose into an interrupt
handler.

i386/genassym.c;
Remove support for the obsolete pcb_iml and pcb_cmap2.

Add support for pcb_inl.

i386/swtch.s:
Fudge the interrupt nesting level across context switches and in
the idle loop so that the work for preemptive context switches
gets counted as interrupt time, the work for voluntary context
switches gets counted mostly as system time (the part when
curproc == 0 gets counted as interrupt time), and only truly idle
time gets counted as idle time.

Remove obsolete support (commented out and otherwise) for pcb_iml.

Load curpcb just before curproc instead of just after so that
curpcb is always valid if curproc is. A few more changes like
this may fix tracing through context switches.

Remove obsolete function swtch_to_inactive().

include/cpu.h:
Use the new interrupt nesting level variable to implement a
non-fake CLF_INTR() so that accounting for the interrupt state
works.

You can use top, iostat or (best) an up to date systat to see
interrupt overheads. I see the expected huge interrupt overheads
for ISA devices (on a 486DX/33, about 55% for an IDE drive
transferring 1250K/sec and the same for a WD8013EBT network card
transferring 1100K/sec). The huge interrupt overheads for serial
devices are unfortunately normally invisible.

include/pcb.h:
Remove the obsolete pcb_iml and pcb_cmap2. Replace them by
padding to preserve binary compatibility.

Use part of the new padding for pcb_inl.

isa/icu.s:
isa/vector.s:
Keep track of interrupt nesting level.


4829 27-Nov-1994 phk

I made a syntax error yesterday.
Submitted by: John Capo


4819 26-Nov-1994 phk

Set the bootverbose if so desired.
if (bootverbose)
Print the geometries the bios passes to us (through the bootblocks).


4600 18-Nov-1994 phk

Grap the bootinfo structure the bootblock passes us.


4517 16-Nov-1994 dg

Allow MAXMEM to be larger than the detected physical memory. This change
was supposed to have already been made, but got botched somewhere.
Don't clobber the last page of memory (where the message buffer is). Some
BIOS don't gratuitously wipe it out on reboot.


4501 15-Nov-1994 bde

Make gdt_segs[] public again for APM.

Make ldt[] public again and restore currentldt and _default_ldt for
USER_LDT.


4478 14-Nov-1994 bde

Log processes that exit with an masked npx exception that would trap
with the current default exception (un)mask. There should be no such
processes unless you change the mask. Someday the mask should be
changed to the IEEE default of everything masked. The npx state
gets saved so that it can be checked and this may have the side effect
of fixing a bug that was reported for 1.1.5. (npx exceptions may
sometimes leak across exits and clobber another process. I can't see
how this can happen.)

Get some missing/wrong declarations from headers now that the headers
have them.


4476 14-Nov-1994 bde

Oops, the previous commit got the diff for the log message instead of
the following.

Move declarations to and from <machine/segments.h>. Make segment stuff
static if possible.

Remove unused (although initialized) global variables _default_ldt,
currentldt, _gsel_tss (rename the latter to the auto variable
gtel_tss).

Use "correct" and consistent types for interrupt handlers.

Remove a mailing address from the code.

Fix type mismatches found by adding prototypes.


4475 14-Nov-1994 bde

(Bogus several hundred line diff for a log message deleted. See rev 1.91
for the intended log message. -DG)


4463 14-Nov-1994 bde

Undo a previous change. <sys/disklabel.h> was broken, not these files.


4396 12-Nov-1994 ache

Revision 1.6 fix was lost: don't write 0 to RTC_DIAG


4370 12-Nov-1994 phk

Make a kernel sans FFS possible.


4341 10-Nov-1994 ache

Use adjkerntz into inittodr too (for APM stuff)


4221 07-Nov-1994 phk

Added a kernel variable, "dodump" defaulting to zero, which disables dumps.
Somebody should make a mib variable for it.
Just now it is pointless to dump the kernel, since we have nothing which
can read the dump.
Furthermore is should never be the default to dump.
options DODUMP
will enable dumps.


4217 06-Nov-1994 phk

Initialize %fs and %gs from %ds.
This seems to stabilize the APM-bios on my Gateway Handbook, and it makes
sense in general too.


4201 06-Nov-1994 dg

Do a better job at preparing registers for the new process in setregs()
by setting them all to a known state.


4193 06-Nov-1994 bde

Nuke the losing version of microtime. The assembler version now works
for all reasonable HZ's. HZ > 1000 doesn't work because of sloppy
conversions in hzto() (division by (tick / 1000) == 0). This was
fixed in 1.1.5.

Eliminate some extern declarations by including the appropriate header
files that now contain appropriate declarations.


4188 06-Nov-1994 bde

Public function declarations moved to <machine/npx.h>.


4180 05-Nov-1994 bde

Maintain a new variable `timer0_overflow_threshold' so that microtime()
doesn't have to calculate it every call.

Rename `timer0_prescale' to `timer0_prescaler_count' and maintain it
correctly. Previously we lost a few 8253 cycles for every "prescaled"
clock interrupt, and the lossage grows rapidly at 16 KHz. Now we
only lose a few cycles for every standard clock interrupt.

Rename `*_divisor' to `*_max_count'.

Do the calculation of TIMER_DIV(rate) only once instead of 3 times each
time the rate is changed.

Don't allow preposterously large interrupt rates. Bug fixes elsewhere
should allow the system to survive rates that saturate the system, however.

Clean up declarations.

Include <machine/clock.h> to check our own declarations.


4116 03-Nov-1994 jkh

Unconditionalize USERCONFIG. Uh, thanks, David.


4038 01-Nov-1994 ache

Implement CPU_ADJKERNTZ in different way: call resettodr()
on writting this variable. adjkerntz pgm changes will follow.


4031 31-Oct-1994 joerg

Added hooks for an easy drop-in of the pcvt concole driver.
Don't panic:-), this is simple stuff just doing exactly the same as for syscons.
(files.i386 did already contain the necessary stuff.)


4014 30-Oct-1994 bde

Fix selector arg to match the (missing) prototype for sdtossd().
Cosmetic.

Return from trap() if trap_fatal() returns. trap_fatal() isn't
fatal if you have ddb. Returning from trap() is usually the right
thing to do and much better than falling through.


4013 30-Oct-1994 bde

Fix selector arg to match the (missing) prototype for ssdtosd().
Cosmetic.


4012 30-Oct-1994 bde

locore.s:
Build a dummy frame at the top of tmpstk to help debuggers trace the stack
when the system is idle.

swtch.s: idle():
Initialize the frame pointer so that debuggers don't try to trace a bogus
stack.

Load the frame pointer, load the stack pointer and switch out the old
stack in the unique order that never leaves one of the pointers pointers
invalid so that debuggers can trace idle(). Disabling interrupts
provides sufficient validity for normal operation, but debuggers use
(trace) traps.


3962 28-Oct-1994 jkh

From: fredriks@mcs.com (Lars Fredriksen)
...
It turns out that these files do not include <sys/dkbad.h> before
<sys/disklabel.h>.
Submitted by: fredriks


3940 27-Oct-1994 jkh

Julian Elischer's disklabel fixes.


3919 26-Oct-1994 bde

Fix compiler warnings.


3918 26-Oct-1994 bde

Move definition and initialization of video_mode_pointer to syscons.c.


3907 26-Oct-1994 jkh

Invoke userconfig() if kernel compiled with options USERCONFIG and
-c flag used.


3889 26-Oct-1994 jkh

Fix two very minor nits, one of which caused a warning (no return type for
main).


3867 25-Oct-1994 se

BEWARE: Interface change of register_intr() !

Changed the fifth parameter to register_intr() from u_int mask into
u_int *maskptr in preparation for new features (shared interrupts and
removable devices, eg. for PCMCIA).


3861 25-Oct-1994 bde

Use the correct macro for deciding whether syscons' variables should
be accessed.

Remove some unused declarations and document a bogus one.


3846 25-Oct-1994 dg

Allow MAXMEM kernel option to indicate more memory than is detected; it
previously could only be used to limit the amount of memory.


3844 25-Oct-1994 dg

Restricted maximum bufpages to 1500; this is required for machines >64MB
of memory to work without running out of kernel VM (and increasing it to
even more than it is now (96MB) is out of the question. Changed bufpages
calculation to allocation a little less bufer cache (16% of mem-2MB instead
of 20%); this is simply a better figure for most systems.


3842 25-Oct-1994 dg

Moved initialization of tmpstk so that it immediately follows the kernel
text. Fixed rounding bug that caused the last page of kernel text to be
read/write instead of read-only. This is important now that tmpstk can
crash into it. Removed +4 bias of tmpstk because it screws up ddb's
ability to traceback correctly.


3816 23-Oct-1994 wollman

Finished device configuration database work for all ISA devices (except `ze')
and all SCSI devices (except that it's not done quite the way I want). New
information added includes:

- A text description of the device
- A ``state''---unknown, unconfigured, idle, or busy
- A generic parent device (with support in the m.i. code)
- An interrupt mask type field (which will hopefully go away) so that
. ``doconfig'' can be written

This requires a new version of the `lsdev' program as well (next commit).


3795 22-Oct-1994 phk

Autoconf is the one to realize that we are booted disk-less and start the
ball rolling. locore is just moving some data from the boot-program.


3744 21-Oct-1994 wollman

Make my ALLDEVS kernel compile (basically, LINT minus a lot of options).


3728 20-Oct-1994 phk

Peter Dufaults comconsole changes.

Submitted by: Peter Dufault


3703 19-Oct-1994 wollman

Implement disk_externalize().


3682 18-Oct-1994 ache

Remove CPU_COLORDISP, GIO_COLOR now exists


3661 17-Oct-1994 ache

Ifdef color_display by NSC, pointed by Rod


3627 15-Oct-1994 ache

ADd CPU_COLORDISP sysctl to handle console display type


3612 15-Oct-1994 dg

1) Some of the counters in the vmmeter struct don't fit well into the Mach VM
scheme of things, so I've changed them to be more appropriate. page in/ous
are now associated with the pager that did them. Nuked v_fault as the
only fault of interest that wouldn't be already counted in v_trap is a VM
fault, and this is counted seperately.
2) Implemented most of the remaining counters and corrected the counting of
some that were done wrong. They are all almost correct now...just a few
minor ones left to fix.


3513 11-Oct-1994 sos

Ouch, fixed bug in errno translation (ibcs2 support).


3502 10-Oct-1994 phk

minaddr #ifdef lost in previous commit. Sorry.


3495 10-Oct-1994 sos

Hmm, only translate errno when doing an actual return.

Reviewed by: sef@freefall.cdrom.com


3489 10-Oct-1994 phk

locore.s: Made the APM stuff depend on NAPM > 0 rather than a separate
"APM" macro.
machdep.c: Made the APM-descriptors unconditional.
Bruce: if these still conflict with your debugger, please put in a reservation
for your debugger. These three desc. can be anywhere, as long as they are
contiguous, so just move them as needed.


3476 09-Oct-1994 sos

Updated to convert errno return in syscall if conversion tabel present.


3451 09-Oct-1994 dg

Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.


3440 08-Oct-1994 phk

A couple of prototypes moved out from here.


3436 08-Oct-1994 phk

db_disasm.c: Unused var zapped.
pmap.c: tons of unused vars zapped, various other warnings silenced.
trap.c: unused vars zapped.
vm_machdep.c: A wrong argument, which by chance did the right thing, was
corrected.


3426 08-Oct-1994 rgrimes

Correct #ifdef for nfs_disless support is #ifdef NFS, there will be no
option DISKLESS for the 2.0 nfs diskless support. A 2.0 diskless kernel
simple needs NFS linked in statically.


3406 07-Oct-1994 dg

#ifdef DISKLESS the copying of the nfs_diskless structure. Not the best
solution, but the only one I have time for at the moment.


3384 06-Oct-1994 rgrimes

1. Eliminate unused esym global from locore, our boot code never supported
that and when it does it will be done differently.

2. The kernel now does a frame setup on entry so it ``looks'' like a
real function call. This will be needed by future boot code and
debuggers.

3. Clean up stack offsets to all be in decimal and use %ebp when copying
parameters in from the boot code.

4. Implement version 1 of the uniform boot code passing mechanism with
support for kernelname passing and nfs_diskless structure passing.

5. Document the 3 different ways the kernel is called depending on what code
is calling it.


3367 04-Oct-1994 ache

Add code to handle CPU_DISRTCSET


3366 04-Oct-1994 ache

Add disable_rtc_set variable to block resettodr() call, needed for
adjkerntz -i, per Bruce suggestion


3355 04-Oct-1994 ache

RTC_CENTURY usage ifdefed out by USE_RTC_CENTURY compile option,
pointed by Bruce


3314 02-Oct-1994 dg

Patch from HOSOKAWA Tatsumi to fix bug in the size of apm_current_gdt_pdesc

Submitted by: HOSOKAWA Tatsumi


3306 02-Oct-1994 phk

Unused variables, except one with a omnious comment.


3294 02-Oct-1994 rgrimes

If you are building a kernel without NFS statically linked in you
must #define NFS before including <sys/mount.h> to pick up some of
the definitions needed for struct diskless. Be sure to undef it after this
so you do not effect other code.

This is kinda sick, but it does the job. Problem found by davidg.


3291 02-Oct-1994 dg

"idle priority" support. Based on code from Henrik Vestergaard Draboel,
but substantially rewritten by me.


3284 02-Oct-1994 rgrimes

1. Remove all references to cyloffset, it has been unused for some time.

2. New detection code so we know what boot code called us.

3. Remove old DISKLESS support code and halt if we are called by that boot
code as it will NOT work with the new nfs_diskless structure.

This is really in preperation for new boot code and new diskless support.

Reviewed by: davidg


3283 02-Oct-1994 rgrimes

Add code to generate NFSDISKLESS_SIZE for use in locore for copying the
nfs_diskless structure in from the boot code.
Reviewed by: davidg


3258 01-Oct-1994 dg

Laptop Advanced Power Management support by HOSOKAWA Tatsumi.

Submitted by: HOSOKAWA Tatsumi


3185 29-Sep-1994 sos

Updated pcaudio.c to latest from 1.1.5.1
Enabled timer reprogramming in clock.c (this could use more work).

Obtained from: FreeBSD-1.1.5.1


3156 28-Sep-1994 bde

Ensure normal selection and alignment of the text and data sections before
including files. vector.s sometimes left the data section misaligned
(depending on the configuration) so all the time-critical globals in icu.s
were sometimes misaligned.


3117 26-Sep-1994 pst

Make Cyrix CPU flush internal cache any time it goes into hold state.
(Meant to commit this a long time ago... oh well).


3102 25-Sep-1994 dg

Inlined ins/outs functions.

Obtained from: NetBSD


3047 24-Sep-1994 dg

Nuked splnet before sync. Not only is this unnecessary, but it appears
to cause problems by making it impossible to sync NFS related buffers
when rebooting.


2977 22-Sep-1994 dg

From 1.1.5:

>revision 1.8
>date: 1994/06/03 06:42:30; author: davidg; state: Exp; lines: +2 -2
>Patch from Bruce Evans: npxintr() needs to mask softclock().


2932 20-Sep-1994 bde

Don't lose the RTC interrupt in resettodr().


2913 20-Sep-1994 ache

resettodr() implemented, inittodr() fixed
Submitted by: me & chris@gnome.co.uk


2873 18-Sep-1994 bde

Remove some unnecessary #includes.

Restore the simple leap year calculation as a macro and document it so
that it doesn't become complicated again. The simple version works
for all leap years covered by 32-bit time_t's. The complicated version
doesn't work for all leap years covered by 64-bit time_t's since among
other reasons, the solar system is not stable for long enough.

Fix declarations.

Nuke spinwait().


2858 18-Sep-1994 wollman

Redo Kernel NTP PLL support, kernel side.

This code is mostly taken from the 1.1 port (which was in turn taken from
Dave Mills's kern.tar.Z example). A few significant differences:

1) ntp_gettime() is now a MIB variable rather than a system call. A few
fiddles are done in libc to make it behave the same.

2) mono_time does not participate in the PLL adjustments.

3) A new interface has been defined (in <machine/clock.h>) for doing
possibly machine-dependent things around the time of the clock update.
This is used in Pentium kernels to disable interrupts, set `time', and
reset the CPU cycle counter as quickly as possible to avoid jitter in
microtime(). Measurements show an apparent resolution of a bit more than
8.14usec, which is reasonable given system-call overhead.


2826 16-Sep-1994 dg

Removed inclusion of pio.h and cpufunc.h (cpufunc.h is included from
systm.h). Merged functionality of pio.h into cpufunc.h. Cleaned up some
related code.


2822 16-Sep-1994 phk

Made the kernel compile even without "ether".


2818 16-Sep-1994 ache

CPU_ADJKERNTZ added to cpu_sysctl


2802 15-Sep-1994 paul

Removed some macros that are now in cpufunc.h
Reviewed by: Bruce


2792 15-Sep-1994 dg

Brought over from 1.1.5:

From Bruce Evans:
Protect against reentering Debugger().


2789 15-Sep-1994 dg

Brought over from 1.1.5:

Fix from Bruce Evans. There were missing sets of parantheses:

1. The checks for the standard data selectors were botched, so %ss == 0
and probably %cs == 0 were allowed. A fix is enclosed. The checks
for the standard selectors could be omitted without losing anything
since the standard selectors pass the valid_ldt_sel() tests.


2783 15-Sep-1994 sos

Added support for many more videomodes, including graphic modes up til
320x200 256col VGA. This is nessesary for the iBCS stuff to work right.
(And we get the benefit of more video modes). Uses the videocard BIOS
to optain mode tables.
Added a "green" saver, switches off the syncs for "green" monitors.

Reviewed by:
Submitted by:
Obtained from:


2772 14-Sep-1994 wollman

Beginnings of support for loadable protocol domains. In particular,
don't hard-code netisr values in icu.s, but rather, use an array of
function pointers and set them all up in machdep.c for statically-linked
protocol families. (This will eventually be done differently.)


2770 14-Sep-1994 ache

1. adjkerntz variable added for preparation to resettodr() implementation
2. Leap year calculations fixed


2689 12-Sep-1994 dg

Eliminated a whole pile of ancient (we're taking 4.3BSD) VM system
related #define constants. Corrected incorrect VM_MAX_KERNEL_ADDRESS.

Reviewed by: John Dyson


2660 11-Sep-1994 dg

Be more careful about dereferencing curproc, p_vmspace, and curpcb,
otherwise the machine will overflow the stack in a recursive fault loop
(causing the machine to spontaneously reboot because of the stack fault
that ultimately happens).

Submitted by: Inspired by Bruce Evans, but this change is different
than what he suggested.


2631 09-Sep-1994 wollman

Define new MIB variable, hw.floatingpoint, which is true if FP hardware
is present, and false if an emulator is being used.


2578 08-Sep-1994 bde

Remove <machine/eflags.h> and all dependencies on it. eflags.h is just
the Mach/i386 version of the BSD/vax(?) <machine/psl.h>. The Mach
version has slightly better names for many macros but is now out of
date and little used. It was originally used even less (for spelling
PSL_T as EFL_TF in <machine/db_machdep.h>).


2512 05-Sep-1994 bde

Fix comments.


2500 05-Sep-1994 dg

DOn't allow I/O register access in process 1 (oops).


2495 04-Sep-1994 pst

Detect if we're running on a Cyrix 486DLC and enable automatic cache
negation whenever we access memory between 640k and 1M.

Original code from NetBSD 1.0-BETA. The exact origins are unclear but
Theo de Raadt, Charles, and Michael V. may have contributed to it.

Submitted by: pst


2493 04-Sep-1994 dg

Rewrote last vestige of code that used gs (copyinstr). The use of gs in
this routine caused problems for machines that don't set it up properly
before boot (such was the case on an EVEREX machine sitting next to me).


2492 04-Sep-1994 dg

Added pmap_mapdev() function to map device memory.


2486 04-Sep-1994 dg

Initialize eflags register - brought over from 1.1.5.


2457 02-Sep-1994 dg

Converted P_LINK -> P_FORW, P_RLINK -> P_BACK, minor optimization.


2455 02-Sep-1994 dg

Removed all vestiges of tlbflush(). Replaced them with calls to pmap_update().
Made pmap_update an inline assembly function.


2452 02-Sep-1994 dg

It's not necessary to make page tables write-through, so get rid of this
(this was an experimental change which probably shouldn't have been
committed). I/O pages are still marked non-cacheable, however.


2441 01-Sep-1994 dg

Realtime priority scheduling support.

Submitted by: Henrik Vestergaard Draboel


2430 31-Aug-1994 se

Reviewed by: Stefan Esser <se>
Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Added PCI support (call of pci_config(), if NPCI > 0).


2426 31-Aug-1994 dg

Fixed bug that surfaced with last commit for NOBOUNCE -> BOUNCE_BUFFERS by
adding appropriate #ifdefs and changing some variables to externs (as they
should have always been).


2422 31-Aug-1994 dg

Rather than exclude bounce buffers support with NOBOUNCE, include it
with BOUNCE_BUFFERS. This is more intuitive, and is better for future
multiplatform support. Added BOUNCE_BUFFERS option to the GENERIC and
LINT kernel config files.


2410 30-Aug-1994 bde

Don't define LOCORE (as nothing) in sources. It is now defined
consistently (as 1) in Makefile.i386 for all assembler sources.


2400 29-Aug-1994 ache

Fake floppy partition RAW_PART=2 now


2357 28-Aug-1994 bde

Don't test if a u_int is < 0. The remaining test is sufficient and the
extra one caused a warning.


2320 27-Aug-1994 dg

1) Changed ddb into a option rather than a pseudo-device (use options DDB
in your kernel config now).
2) Added ps ddb function from 1.1.5. Cleaned it up a bit and moved into its
own file.
3) Added \r handing in db_printf.
4) Added missing memory usage stats to statclock().
5) Added dummy function to pseudo_set so it will be emitted if there
are no other pseudo declarations.


2257 24-Aug-1994 sos

Changes preparing for iBCS support
Reviewed by:
Submitted by:


2254 24-Aug-1994 sos

Changes preparing for iBCS2 support

Reviewed by:
Submitted by:


2216 22-Aug-1994 bde

Pad `_cpu_vendor' to finish on a 32-bit boundary so that most of the
locore globals aren't misaligned.


2152 20-Aug-1994 dg

Implemented filesystem clean bit via:

machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
Removed unused rootfs global.

vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.


2138 19-Aug-1994 dg

Removed bogus save of CMAP2.


2124 19-Aug-1994 dg

Terry Lambert's loadable kernel module support w/improvements from the
NetBSD group.


2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


2103 18-Aug-1994 dg

Bruce Evans' dynamic interrupt support.

/usr/src/sys/i386/isa/clock.c:
o Garrett's statclock changes.
o Wire xxxintr, not Vclk.
o Wire using register_intr(), not setidt().

/usr/src/sys/i386/isa/icu.s:
o Garrett's statclock changes.
o Removed unused variable high_imask.
o Fake int 8 for rtc as well as int 0 for clk. Required for kernel
profiling with statclock, harmless otherwise.

/usr/src/sys/i386/isa/isa.c:
o Allow isdp->id_irq and other things in *isdp to be changed by
probes. Changing interrupts later requires direct calls to
register_intr() and unregister_intr() and more care.
ALLOW_CONFLICT_* is brought over from 1.1.5, except
ALLOW_CONFLICT_IRQ is not supported. IRQ conflict checking is
delayed until after probing so that drivers can change the IRQ
to a free one; real conflicts require more cooperation between
drivers to handle.
o Too many details to list.
o This file requires splitting and a lot more work.

/usr/src/sys/i386/isa/isa_device.h:
o Declare more things more completely.

/usr/src/sys/i386/isa/sio.c:
o Prepare to register interrupt handlers as fast.

/usr/src/sys/i386/isa/vector.s:
o Generate entry code for 16 fast interrupt handlers and 16 normal
interrupt handlers. Changed some constants to variables:
# $unit is now intr_unit[intr]. Type is int. Someday it should
be a cookie suitable for the handler (e.g., a struct com_s for
sio).
# $handler is now intr_handler[intr].
# intrcnt_actv[id_num] is now *intr_countp[intr]. The indirection
is required to get a contiguous range of counters for vmstat
and so that the drivers depend more in the driver than on the
interrupt number (drivers could take turns using an interrupt
and the counts would remain correct). There is a separate
counter for each device and for each stray interrupt. In
1.1.5, stray interrupt 7 clobbers the count for device 7 or
something worse if there is no device 7 :-(.
# mask is now intr_mask[intr] (was already indirect).
o Entry points are now _XintrI and _XfastintrI (I = intr = 0-15),
not _VdevU (U = unit).
o Removed BUILD_VECTORS stuff. There's a trace of it left for
the string table for vmstat but config now generates the
string in one piece because nothing more is required.
o Removed old handling of stray interrupts and older comments
about it.

Submitted by: Bruce Evans


2074 15-Aug-1994 wollman

Enable use of the RTC chip for the statistical clock. While this does
not provide the full accuracy of a randomized statistical clock, it does
provide greater accuracy than the previous method, while not significantly
increasing overhead. It also provides profiling support at 1024 Hz.

You must re-compile config before making a new kernel, or you will end
up with unresolved symbols.

Reviewed uy: Bruce evans said it worked for him.


2059 13-Aug-1994 dg

Made the kernel compile cleanly with gcc 2.6.0. Thanks go to Bruce
Evans for suggesting a method to detect various versions of gcc.


2056 13-Aug-1994 wollman

Change all #includes to follow the current Berkeley style. Some of these
``changes'' are actually not changes at all, but CVS sometimes has trouble
telling the difference.

This also includes support for second-directory compiles. This is not
quite complete yet, as `config' doesn't yet do the right thing. You can
still make it work trivially, however, by doing the following:

rm /sys/compile
mkdir /usr/obj/sys/compile
ln -s M-. /sys/compile
cd /sys/i386/conf
config MYKERNEL
cd ../../compile/MYKERNEL
ln -s /sys @
rm machine
ln -s @/i386/include machine
make depend
make


2017 11-Aug-1994 wollman

For Pentium machines, use a faster version of microtime with 8 usec
resolution (can probably be improved somewhat). Other machines take
a three-instruction hit if I586_CPU is defined, none otherwise.


2014 10-Aug-1994 wollman

Tell Pentium users their CPU speed. (More changes to make use of this
to come later.)


2001 10-Aug-1994 wollman

Handle NMI's in accordance with data in van Gilluwe book.


1999 10-Aug-1994 wollman

Some programs (like GNU configure programs) depend on the output of
`uname -s' to be something reasonable (traditionally, `i386') rather
than `PC-Class'. Make it so.


1998 10-Aug-1994 wollman

Add back in CPU detection copde from 1.1.5. As an added bonus, the
hw.model MIB variable is now declared correctly.


1975 09-Aug-1994 dg

Removed ntohl and ntohs functions. These were already inlined assembly in
endian.h.


1896 07-Aug-1994 dg

Made pmap_kenter "TLB safe". ...and then removed all the pmap_updates that
are no longer needed because of this.


1895 07-Aug-1994 dg

Provide support for upcoming merged VM/buffer cache, and fixed a few bugs
that haven't appeared to manifest themselves (yet).

Submitted by: John Dyson


1894 07-Aug-1994 dg

Don't kremove process VM pages (oops!). This was the cause of the instability
that was introduced last night.

Submitted by: John Dyson


1890 06-Aug-1994 dg

Fixed various prototype problems with the pmap functions and the subsequent
problems that fixing them caused.


1889 06-Aug-1994 dg

Incorporated 1.1.5 improvements to the bounce buffer code (i.e. make it
actually work), and additionally improved it's performance via new pmap
routines and "pbuf" allocation policy.

Submitted by: John Dyson


1888 06-Aug-1994 dg

Made the tmpstk start at tmpstk. Not doing so causes problems for the
debugger.

Submitted by: John Dyson


1887 06-Aug-1994 dg

Incorporated post 1.1.5 work from John Dyson. This includes performance
improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/
pmap_kremove. These routine allow fast mapping of pages for those
architectures that have "normal" MMUs. Also included is a fix to the
pageout daemon to properly check a queue end condition.

Submitted by: John Dyson


1838 04-Aug-1994 dg

Added assembly versions of ffs() and bcmp().


1829 04-Aug-1994 dg

Nuked #if 0'd _insque and _remque routines - they are now inlined in
cpufunc.h.


1825 03-Aug-1994 dg

Merged in post-1.1.5 work done by myself and John Dyson. This includes:

me:
1) TLB flush optimization that effectively eliminates half of all of the
TLB flushes. This works by only flushing the TLB when a page is "present"
in memory (i.e. the valid bit is set in the page table entry). See section
5.3.5 of the Intel 386 Programmer's Reference Manual.
2) The handling of "CMAP" has been improved to catch attempts at multiple
simultaneous use.

John:
1) Added pmap_qenter/pmap_qremove functions for fast mapping of pages into
the kernel. This is for future optimizations and support for the upcoming
merged VM/buffer cache.

Reviewed by: John Dyson


1810 01-Aug-1994 dg

Removed all code related to the pagescan daemon, and changed 'act_count'
adjustments to compensate for a world without the pagescan daemon.


1705 11-Jun-1994 dg

Fix from Bruce Evans:
Set npx_exists = 0 in the case of broken error reporting.


1704 11-Jun-1994 dg

Fixed minor spelling error.


1703 11-Jun-1994 dg

Bruce found a bug in my changes to stop using the gs selector.

From Bruce Evans:

fu[i]byte() checked the wrong register. This caused interesting behaviour
in the GPL math emulator. The emulator does not check the values returned
by fu*() or su*() (:-() and it interpreted the address of -12(%ebp) as
-1(%ebp). The same probably occurs for all signed 8-bit offsets from
registers.

I cleaned up the new bzero() a bit.


1691 06-Jun-1994 dg

Added some missing cld's (OOPS!) and changed the position of some of
the others to make them easier to spot.


1690 06-Jun-1994 dg

trap.c:
Vastly improved trap.c from me. This rewritten version has a variety of
features, amoung them: higher performance and much higher code quality.

support.s, cpufunc.h:
No longer use gs override to enforce range limits - compare directly
against VM_MAXUSER_ADDRESS instead. The old way caused problems in
preserving the gs selector...and this method is just as fast or faster.


1689 06-Jun-1994 dg

Back out previous change for the moment - I need to commit some other
changes first.


1688 06-Jun-1994 dg

Added some missing cld's (OOPS!) and changed the position of some of
the others to make them easier to spot.


1678 04-Jun-1994 dg

Removed extra (bogus) declaration of Xrsvd14 that was confusing me.


1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


1445 03-May-1994 ats

Add two routines insl and outsl, that should do 32bit string ins and outs.
Both are completely untested in the moment. They are used from the
if_ep.c driver for the EISA card.


1442 02-May-1994 sos

Update the reprogram timer stuff, now the frequency of timer 0
can only be changed at the "right" times. Accuracy should be
assured.


1440 02-May-1994 dg

Removed some tlbflush optimizations as some of them were bogus and lead
to some strange behavior.


1431 29-Apr-1994 gclarkii

Added ifdef for GPL_MATH_EMULATE to keep the sytem from panicing when
using it.


1415 25-Apr-1994 dg

From John Dyson:

Fixed physio in the 386 case - write faults weren't properly implemented.


1407 23-Apr-1994 wollman

Define new option, INACCURATE_MICROTIME_IS_OK. When this is defined,
the NTP kernel PLL is disabled, and acquire_timer0() is enabled, thus
opening the door for microtime() (and hence gettimeofday()) to return
bogus timestamps. This option is necessary for the `pca' driver to
work, but is implemented to underscore the fact that accurate timekeeping
and the `pca' driver are incompatible at present. If someone writes a version
of microtime() that works when the `pca' driver is being used, this can get
junked.


1390 21-Apr-1994 sos

New support for sharing the timers
acquire_timer / release_timer

Pulled in timer related functions from isa.c


1379 20-Apr-1994 dg

Bug fixes and performance improvements from John Dyson and myself:

1) check va before clearing the page clean flag. Not doing so was
causing the vnode pager error 5 messages when paging from
NFS. (pmap.c)
2) put back interrupt protection in idle_loop. Bruce didn't think
it was necessary, John insists that it is (and I agree). (swtch.s)
3) various improvements to the clustering code (vm_machdep.c). It's
now enabled/used by default.
4) bad disk blocks are now handled properly when doing clustered IOs.
(wd.c, vm_machdep.c)
5) bogus bad block handling fixed in wd.c.
6) algorithm improvements to the pageout/pagescan daemons. It's amazing
how well 4MB machines work now.


1362 14-Apr-1994 dg

Changes from John Dyson and myself:

1) Removed all instances of disable_intr()/enable_intr() and changed
them back to splimp/splx. The previous method was done to improve
the performance, but Bruces recent changes to inline spl* have
made this unnecessary.
2) Cleaned up vm_machdep.c considerably. Probably fixed a few bugs, too.
3) Added a new mechanism for collecting page statistics - now done by
a new system process "pagescan". Previously this was done by the
pageout daemon, but this proved to be impractical.
4) Improved the page usage statistics gathering mechanism - performance is
much improved in small memory machines.
5) Modified mbuf.h to enable the support for an external free routine when
using mbuf clusters. Added appropriate glue in various places to
allow this to work.
6) Adapted a suggested change to the NFS code from Yuval Yurom to take
advantage of #5.
7) Added fault/swap statistics support.


1342 07-Apr-1994 dg

Make Bruce happy: silently enter ddb on a BPT or trace trap if ddb is
configured in the kernel.


1335 05-Apr-1994 dg

from John Dyson:

1) fixed some bugs related to the bounce buffer code
2) vnode pager now supports clustered pageouts
3) experimental code for clustering all I/O via a new "cldisksort"
4) added >16MB check to Bustek driver
5) made some experimental algorithmic changes to the pageout daemon
6) fixed bugs in truncating mapped files (esp when mapped via NFS)
7) reorganized vnode pager I/O code


1321 02-Apr-1994 dg

New interrupt code from Bruce Evans. In additional to Bruce's attached
list of changes, I've made the following additional changes:

1) i386/include/ipl.h renamed to spl.h as the name conflicts with the
file of the same name in i386/isa/ipl.h.
2) changed all use of *mask (i.e. netmask, biomask, ttymask, etc) to
*_imask (net_imask, etc).
3) changed vestige of splnet use in if_is to splimp.
4) got rid of "impmask" completely (Bruce had gotten rid of netmask),
and are now using net_imask instead.
5) dozens of minor cruft to glue in Bruce's changes.

These require changes I made to config(8) as well, and thus it must
be rebuilt.

-DG

from Bruce Evans:

sio:
o No diff is supplied. Remove the define of setsofttty(). I hope
that is enough.

*.s:
o i386/isa/debug.h no longer exists. The event counters became too
much trouble to maintain. All function call entry and exception
entry counters can be recovered by using profiling kernel (the new
profiling supports all entry points; however, it is too slow to
leave enabled all the time; it also). Only BDBTRAP() from debug.h
is now used. That is moved to exception.s. It might be worth
preserving SHOW_BITS() and calling it from _mcount() (if enabled).
o T_ASTFLT is now only set just before calling trap().
o All exception handlers set SWI_AST_MASK in cpl as soon as possible
after entry and arrange for _doreti to restore it atomically with
exiting. It is not possible to set it atomically with entering
the kernel, so it must be checked against the user mode bits in
the trap frame before committing to using it. There is no place
to store the old value of cpl for syscalls or traps, so there are
some complications restoring it.

Profiling stuff (mostly in *.s):
o Changes to kern/subr_mcount.c, gcc and gprof are not supplied yet.
o All interesting labels `foo' are renamed `_foo' and all
uninteresting labels `_bar' are renamed `bar'. A small change
to gprof allows ignoring labels not starting with underscores.
o MCOUNT_LABEL() is to provide names for counters for times spent
in exception handlers.
o FAKE_MCOUNT() is a version of MCOUNT() suitable for exception
handlers. Its arg is the pc where the exception occurred. The
new mcount() pretends that this was a call from that pc to a
suitable MCOUNT_LABEL().
o MEXITCOUNT is to turn off any timer started by MCOUNT().

/usr/src/sys/i386/i386/exception.s:
o The non-BDB BPTTRAP() macros were doing a sti even when interrupts
were disabled when the trap occurred. The sti (fixed) sti is
actually a no-op unless you have my changes to machdep.c that make
the debugger trap gates interrupt gates, but fixing that would
make the ifdefs messier. ddb seems to be unharmed by both
interrupts always disabled and always enabled (I had the branch in
the fix back to front for some time :-().
o There is no known pushal bug.
o tf_err can be left as garbage for syscalls.

/usr/src/sys/i386/i386/locore.s:
o Fix and update BDE_DEBUGGER support.
o ENTRY(btext) before initialization was dangerous.
o Warm boot shot was longer than intended.

/usr/src/sys/i386/i386/machdep.c:
o DON'T APPLY ALL OF THIS DIFF. It's what I'm using, but may require
other changes.
Use the following:
o Remove aston() and setsoftclock().
Maybe use the following:
o No netisr.h.
o Spelling fix.
o Delay to read the Rebooting message.
o Fix for vm system unmapping a reduced area of memory
after bounds_check_with_label() reduces the size of
a physical i/o for a partition boundary. A similar
fix is required in kern_physio.c.
o Correct use of __CONCAT. It never worked here for non-
ANSI cpp's. Is it time to drop support for non-ANSI?
o gdt_segs init. 0xffffffffUL is bogus because ssd_limit
is not 32 bits. The replacement may have the same
value :-), but is more natural.
o physmem was one page too low. Confusing variable names.
Don't use the following:
o Better numbers of buffers. Each 8K page requires up to
16 buffer headers. On my system, this results in 5576
buffers containing [up to] 2854912 bytes of memory.
The usual allocation of about 384 buffers only holds
192K of disk if you use it on an fs with a block size
of 512.
o gdt changes for bdb.
o *TGT -> *IDT changes for bdb.
o #ifdefed changes for bdb.

/usr/src/sys/i386/i386/microtime.s:
o Use the correct asm macros. I think asm.h was copied from Mach
just for microtime and isn't used now. It certainly doesn't
belong in <sys>. Various macros are also duplicated in
sys/i386/boot.h and libc/i386/*.h.
o Don't switch to and from the IRR; it is guaranteed to be selected
(default after ICU init and explicitly selected in isa.c too, and
never changed until the old microtime clobbered it).

/usr/src/sys/i386/i386/support.s:
o Non-essential changes (none related to spls or profiling).
o Removed slow loads of %gs again. The LDT support may require
not relying on %gs, but loading it is not the way to fix it!
Some places (copyin ...) forgot to load it. Loading it clobbers
the user %gs. trap() still loads it after certain types of
faults so that fuword() etc can rely on it without loading it
explicitly. Exception handlers don't restore it. If we want
to preserve the user %gs, then the fastest method is to not
touch it except for context switches. Comparing with
VM_MAXUSER_ADDRESS and branching takes only 2 or 4 cycles on
a 486, while loading %gs takes 9 cycles and using it takes
another.
o Fixed a signed branch to unsigned.

/usr/src/sys/i386/i386/swtch.s:
o Move spl0() outside of idle loop.
o Remove cli/sti from idle loop. sw1 does a cli, and in the
unlikely event of an interrupt occurring and whichqs becoming
zero, sw1 will just jump back to _idle.
o There's no spl0() function in asm any more, so use splz().
o swtch() doesn't need to be superaligned, at least with the
new mcounting.
o Fixed a signed branch to unsigned.
o Removed astoff().

/usr/src/sys/i386/i386/trap.c:
o The decentralized extern decls were inconsistent, of course.
o Fixed typo MATH_EMULTATE in comments. */
o Removed unused variables.
o Old netmask is now impmask; print it instead. Perhaps we
should print some of the new masks.
o BTW, trap() should not print anything for normal debugger
traps.

/usr/src/sys/i386/include/asmacros.h:
o DON'T APPLY ALL OF THIS DIFF. Just use some of the null macros
as necessary.

/usr/src/sys/i386/include/cpu.h:
o CLKF_BASEPRI() changes since cpl == SWI_AST_MASK is now normal
while the kernel is running.
o Don't use var++ to set boolean variables. It fails after a mere
4G times :-) and is slower than storing a constant on [3-4]86s.

/usr/src/sys/i386/include/cpufunc.h:
o DON'T APPLY ALL OF THIS DIFF. You need mainly the include of
<machine/ipl.h>. Unfortunately, <machine/ipl.h> is needed by
almost everything for the inlines.

/usr/src/sys/i386/include/ipl.h:
o New file. Defines spl inlines and SWI macros and declares most
variables related to hard and soft interrupt masks.

/usr/src/sys/i386/isa/icu.h:
o Moved definitions to <machine/ipl.h>

/usr/src/sys/i386/isa/icu.s:
o Software interrupts (SWIs) and delayed hardware interrupts (HWIs)
are now handled uniformally, and dispatching them from splx() is
more like dispatching them from _doreti. The dispatcher is
essentially *(handler[ffs(ipending & ~cpl)]().
o More care (not quite enough) is taken to avoid unbounded nesting
of interrupts.
o The interface to softclock() is changed so that a trap frame is
not required.
o Fast interrupt handlers are now handled more uniformally.
Configuration is still too early (new handlers would require
bits in <machine/ipl.h> and functions to vector.s).
o splnnn() and splx() are no longer here; they are inline functions
(could be macros for other compilers). splz() is the nontrivial
part of the old splx().

/usr/src/sys/i386/isa/ipl.h
o New file. Supposed to have only bus-dependent stuff. Perhaps
the h/w masks should be declared here.

/usr/src/sys/i386/isa/isa.c:
o DON'T APPLY ALL OF THIS DIFF. You need only things involving
*mask and *MASK and comments about them. netmask is now a pure
software mask. It works like the softclock mask.

/usr/src/sys/i386/isa/vector.s:
o Reorganize AUTO_EOI* macros.
o Option FAST_INTR_HANDLER_USERS_ES for people who don't trust
fastintr handlers.
o fastintr handlers need to metamorphose into ordinary interrupt
handlers if their SWI bit has become set. Previously, sio had
unintended latency for handling output completions and input
of SLIP framing characters because this was not done.

/usr/src/sys/net/netisr.h:
o The machine-dependent stuff is now imported from <machine/ipl.h>.

/usr/src/sys/sys/systm.h
o DON'T APPLY ALL OF THIS DIFF. You need mainly the different
splx() prototype. The spl*() prototypes are duplicated as
inlines in <machine/ipl.h> but they need to be duplicated here
in case there are no inlines. I sent systm.h and cpufunc.h
to Garrett. We agree that spl0 should be replaced by splnone
and not the other way around like I've done.

/usr/src/sys/kern/kern_clock.c
o splsoftclock() now lowers cpl so the direct call to softclock()
works as intended.
o softclock() interface changed to avoid passing the whole frame
(some machines may need another change for profile_tick()).
o profiling renamed _profiling to avoid ANSI namespace pollution.
(I had to improve the mcount() interface and may as well fix it.)
The GUPROF variant doesn't actually reference profiling here,
but the 'U' in GUPROF should mean to select the microtimer
mcount() and not change the interface.


1314 30-Mar-1994 dg

Eliminated the "physstrat" wart and merged it into kern_physio.c. This
patch also fixes a bug which causes a kernel VM leak.


1313 30-Mar-1994 dg

Eliminated the "physstrat" wart and merged it into kern_physio.c. This
patch also fixes a bug which causes a kernel VM leak.


1312 30-Mar-1994 dg

New routine "pmap_kenter", designed to take advantage of the special
case of the kernel pmap.


1307 24-Mar-1994 dg

From John Dyson: performance improvements to the new bounce buffer
code.


1298 23-Mar-1994 dg

Bounce buffers. From John Dyson with help from me.


1291 21-Mar-1994 ache

Now printf("changing root... indicates raw partition for floppy
f.e. fd1d


1290 21-Mar-1994 ache

Fix printf for root system mounted on second floppy


1289 21-Mar-1994 ache

Fix for root system mounted on second floppy


1288 21-Mar-1994 dg

Changed dynamic stack grow code to grow by "SGROWSIZ" amount. Initially
allocate SGROWSIZ amount of stack. Also set vm_ssize to the initial
stack VM size. Increased DFLSSIZ stack rlimit default to 8MB.


1281 19-Mar-1994 wollman

Added cpu_model and machine variables.


1262 14-Mar-1994 dg

Performance improvements from John Dyson.

1) A new mechanism has been added to prevent pages from being paged
out called "vm_page_hold". Similar to vm_page_wire, but
much lower overhead.
2) Scheduling algorithm has been changed to improve interactive
performance.
3) Paging algorithm improved.
4) Some vnode and swap pager bugs fixed.


1247 07-Mar-1994 dg

1) enhanced in_cksum from Bruce Evans.
2) minor comment change in machdep.c
3) enhanced bzero from John Dyson (twice as fast on a 486DX/33)


1246 07-Mar-1994 dg

1) "Pre-faulting" in of pages into process address space
Eliminates vm_fault overhead on process startup and
mmap referenced data for in-memory pages.

(process startup time using in-memory segments *much* faster)

2) Even more efficient pmap code. Code partially cleaned up.
More comments yet to follow.

(generally more efficient pte management)

3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein
clustering.)

(much faster paging performance on non-write behind disk
subsystems, slightly faster performance on other systems.)

4) Slightly changed vm_pageout code for more efficiency and
better statistics. Also, resist swapout a little more.

(less likely to pageout a recently used page)

5) Slight improvement to the page table page trap efficiency.

(generally faster system VM fault performance)

6) Defer creation of unnamed anonymous regions pager until needed.

(speeds up shared memory bss creation)

7) Remove possible deadlock from swap_pager initialization.

8) Enhanced procfs to provide "vminfo" about vm objects and user
pmaps.

9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net &
socket performance and to prepare for things to come.

John Dyson
dyson@implode.root.com
David Greenman
davidg@root.com


1208 24-Feb-1994 hsu

validate sigcontext before restoring it


1151 13-Feb-1994 dg

Fixed bug in handling of COW - the original code was bogus and it was
only accidental that it worked. Also, don't cache non-managed pages.


1139 10-Feb-1994 dg

Patch from John Dyson:

a pv chain was being traversed while interrupts were
fully enabled in pmap_remove_all ... this is bogus, and
has been fixed in pmap.c. (sorry for adding the splimp)


1129 08-Feb-1994 dg

From: Dave Matthews <dave@prlng.co.uk>

Description:
The integer overflow instruction (into) and the interrupt instruction with
value 4 (int #4) both give rise to SIGBUS signals rather than SIGFPE. The
problem is that overflow is a trap not a fault (unlike the BOUND instruction).


1127 08-Feb-1994 dg

Fixed bugs in stack grow code, and moved it back into a seperate function
like it was originally. Also added back call to "grow" in sendsig now
that this routine actually works.


1124 08-Feb-1994 dg

Fixes from John Dyson to fix out-of-memory hangs and other problems (such
as increased swap space usage) related to (incorrectly) paging out the
page tables.


1116 07-Feb-1994 dg

Fixed calculation of physmem when the special MAXMEM kernel config overide
is used. This bug caused the buffer cache to be WAY too big when memory
was being restricted - resulting in hangs and other out of memory problems.


1104 06-Feb-1994 dg

At the suggestion of Bruce Evans, don't zero RTC diag register. Doing so
was causing problems for some machines.


1072 01-Feb-1994 dg

Minor cleanup. Decode state information better in the case of a fatal
trap.


1066 01-Feb-1994 dg

Bug fix from previous WINE commit. From Jeffrey Hsu.


1058 01-Feb-1994 dg

Removed all uses of "USE_486_WRITE_PROTECT" and made this automatic.
Reordered and removed some NOP's.


1055 31-Jan-1994 dg

Added four pattern memory test routine that is done at startup.


1051 31-Jan-1994 dg

WINE/user LDT support from John Brezak, ported to FreeBSD by Jeffrey Hsu
<hsu@soda.berkeley.edu>.


1046 31-Jan-1994 dg

Make I/O memory explicitly non-cacheable. This is purely an asthetic
change.


1045 31-Jan-1994 dg

VM system performance improvements from John Dyson and myself. The
following is a summary:

1) increased object cache back up to a more reasonable value.
2) removed old & bogus cruft from machdep.c (clearseg, copyseg,
physcopyseg, etc).
3) inlined many functions in pmap.c
4) changed "load_cr3(rcr3())" into tlbflush() and made tlbflush inline
assembly.
5) changed the way that modified pages are tracked - now vm_page struct
is kept updated directly - no more scanning page tables.
6) removed lots of unnecessary spl's
7) removed old unused functions from pmap.c
8) removed all use of page_size, page_shift, page_mask variables - replaced
with PAGE_ constants.
9) moved trunc/round_page, atop, ptoa, out of vm_param.h and into i386/
include/param.h, and optimized them.
10) numerous changes to sys/vm/ swap_pager, vnode_pager, pageout, fault
code to improve performance. LRU algorithm modified to be more
effective, read ahead/behind values tuned for better performance,
etc, etc...


1028 27-Jan-1994 dg

Made pmap_is_managed a static inline function.


991 21-Jan-1994 dg

Remove some old, unused, major UGLY code.


990 21-Jan-1994 dg

System V IPC code from Danny Boulet, chewed on a bit by the NetBSD group
and then some more by Jeffrey Hsu (who provided this port for FreeBSD).


989 20-Jan-1994 dg

Pointed out by Wolfgang Solfrank:
Correct parameters of sync


988 20-Jan-1994 dg

Removed some more old unused code/comments. Added hack to "fix" the
problem with some chipsets (UMC) remapping the 'hole' memory even when
you've got 16MB. People were led to believe that since there was only
16MB of memory in the machine, that they were okay wrt the ISA DMA
limit. This hack simply causes the extra memory to be ignored if it
appears around the 16MB limit.


987 20-Jan-1994 dg

Improved algorithm that calculates the pages in the base memory - If the
BIOS says that the amount is *between* 0-640K, believe it. Cleaned up
the comments a bit, removed some old cruff, etc.


981 17-Jan-1994 dg

Improvements mostly from John Dyson, with a little bit from me.

* Removed pmap_is_wired
* added extra cli/sti protection in idle (swtch.s)
* slight code improvement in trap.c
* added lots of comments
* improved paging and other algorithms in VM system


975 16-Jan-1994 martin

NFS Diskless booting support added.


974 14-Jan-1994 dg

"New" VM system from John Dyson & myself. For a run-down of the
major changes, see the log of any effected file in the sys/vm
directory (swap_pager.c for instance).


924 03-Jan-1994 dg

Convert syscall to trapframe. Based on work done by John Brezak.


911 22-Dec-1993 dg

Raised minimum buffer cache from 128k to 256k.


879 19-Dec-1993 wollman

Make everything compile with -Wtraditional. Make it easier to distribute
a binary link-kit. Make all non-optional options (pagers, procfs) standard,
and update LINT to reflect new symtab requirements.

NB: -Wtraditional will henceforth be forgotten. This editing pass was
primarily intended to detect any constructions where the old code might
have been relying on traditional C semantics or syntax. These were all
fixed, and the result of fixing some of them means that -Wall is now a
realistic possibility within a few weeks.


856 13-Dec-1993 dg

added some panics to catch the condition where pmap_pte returns null
- indicating that the page table page is non-resident.


849 12-Dec-1993 dg

1) Added proc file system from Paul Kranenburg with changes from
John Dyson to make it reliably work under FreeBSD.
2) Added and enabled PROCFS in the GENERICxx and LINT kernels.
3) New execve() from me. Still work to be done here, but this version
works well and is needed before other changes can be made. For
a description of the design behind this, see freebsd-arch or
ask me.
4) Rewrote stack fault code; made user stack VM grow as needed rather
than all up front; improves performance a little and reduces
process memory requirements.
5) Incorporated fix from Gene Stark to fault/wire a user page table
page to fix a problem in copyout. This is a temporary fix and
is not appropriate for pageable page tables. For a description
of the problem, see Gene's post to the freebsd-hackers mailing
list.
6) Tighten up vm_page struct to reduce memory requirements for it. ifdef
pager page lock code as it's not being used currently.
7) Introduced new element to vmspace struct - vm_minsaddr; initial
(minimum) stack address. Compliment to vm_maxsaddr.
8) Added a panic if the allocation for process u-pages fails.
9) Improve performance and accuracy of kernel profiling by putting in
a little inline assembly instead of spl().
10) Made serial console with sio driver work. Still has problems with
serial input, but is almost useable.
11) Added -Bstatic to SYSTEM_LD in Makefile.i386 so that kernels will
build properly with the new ld.


827 03-Dec-1993 alm

From: Jeffrey Hsu <hsu@soda.berkeley.edu>

The following patch adds the addr argument to signal handlers.

The kernel with the patch is no more and no less in compliance or in
violation of POSIX and ANSI C than the kernel before the patch.

The added functionality this addr argument provides is quite useful. It
enables an entire class of algorithms which use mprotect to trace memory
references. Beside garbage collectors, I have heard of this technique being
applied to debuggers and profilers. The only benchmarking I've performed is
using akcl to compile maxima: without the kernel patch, it takes 7 hours to
compile maxima, while with stratified garbage collection, it only takes 50
minutes.

Basically, I can't think of a reason not to add the addr argument and there
is a compelling need for it.

If you find the patch acceptable, please let me know so I can send my
FreeBSD akcl config files to wfs for inclusion in the core akcl release.
The old 386BSD config files there won't work on either NetBSD or FreeBSD.


806 28-Nov-1993 dg

Patch from Gene Stark:

Subject: Page fault in PTE area fails in copyout
Index: sys/i386/i386/trap.c FreeBSD-1.0.2

Description:
Reading files of several megabytes into Emacs, or many small
files all at once, would fail with "IO error - bad address".

Repeat-By:
The bug can be exercised by a test program that malloc()'s
a 5MB chunk of memory, and then, without accessing the memory
first, filling it with data from a file using read().
(I read 64k chunks from /dev/wd0d into successive 64k regions
of the 5MB chunk.) The read() will fail with EFAULT at the first
virtual address boundary that is a multiple of 0x400000.

Fix:
The problem was code in sys/i386/i386/trap.c that tries to
figure out what kind of trap occurred and to handle it appropriately.
It was interpreting any page fault with virtual address
>= vm->vm_maxsaddr as being a user stack segment fault.
In fact, addresses >= USRSTACK are in the user structure/PTE area,
and if they are handled as stack faults, the proper PTE will
not be paged in when it is supposed to be. This situation comes
up in copyout() and copyoutstr(), if PTE's are accessed for the
first time ever. The page fault on accessing the nonexistent PTE
is mishandled as a stack fault, and then the fault that occurs on
the subsequent access to the page itself causes copyout to fail
with EFAULT.


798 25-Nov-1993 wollman

Make the LINT kernel compile with -W -Wreturn-type -Wcomment -Werror, and
add same (sans -Werror) to Makefile for future compilations.


790 22-Nov-1993 dg

patches from Julian Elischer -
Added support for mmapping /dev/mem


778 17-Nov-1993 wollman

Fixed comments that start within a comment, so code compiles cleanly with
-Wcomment.


774 16-Nov-1993 dg

new process tracing code from Sean Eric Fagen (sef@kithrup.com).
...also, fixed up the syscall args to make GCC happy.


760 14-Nov-1993 rgrimes

Add _bde_exists: label so that the global is really defined. Fix spelling
error (mount -> amount)


757 13-Nov-1993 dg

First steps in rewriting locore.s, and making info useful
when the machine panics.

i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.

i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr

i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.

i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.

kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c

i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.

i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s

...and various changes to various header files to make all of the
above happen.


724 07-Nov-1993 wollman

Get rid of WFJ's use of sleep() for more user-friendly tsleep().


718 07-Nov-1993 wollman

Made all header files idempotent and moved incorrect common data from
headers into a related source file. (This is the only change to locore.s).
Also fixed pg() to be properly declared and use stdargs.


701 04-Nov-1993 dg

splnone()'s in the trap code can be deadly. Save/restore previous priority
instead.


700 04-Nov-1993 ache

DST offset calculation removed, it is wrong in any case.


695 03-Nov-1993 paul

Restored comments that were removed from npx.c using # comment
format rather than /* */, as per advise from Jordan.


690 03-Nov-1993 paul

Removed comments from within asm block.

New gas fails to parse comments within asm blocks properly. Simply
remove them until gas gets fixed.


689 01-Nov-1993 chmr

Modified the "rude stack hack" that it only applies to addresses within
the stack area and not memory above VM_MAXUSER_ADDRESS.
That way, copyout and friends now work for pages whose page table entries
have not yet been allocated/been paged out.


683 29-Oct-1993 dg

Whoops, the algorithm I last used was messed up - I left off parans, and
should have used PGSHIFT instead of PAGE_SHIFT.


682 29-Oct-1993 dg

Change filesystem buffer cache size calculation to be less for 4MB
machines (now 20% of all memory after the first 3MB). This is necessary
in order for 4MB machine to be able to rebuild the entire source tree
and not run out of physical memory because of fixed memory requirements
of processes and kernel VM.


620 16-Oct-1993 rgrimes

Removed all patch kit headers, sccsid and rcsid strings, put $Id$ in, some
minor cleanup. Added $Id$ to files that did not have any version info, etc


619 16-Oct-1993 rgrimes

Removed all patch kit headers, sccsid and rcsid strings, put $Id$ in, some
minor cleanup. Added $Id$ to files that did not have any version info, etc


608 15-Oct-1993 rgrimes

genassym.c:
Remove NKMEMCLUSTERS, it is no longer define or used.

locores.s:
Fix comment on PTDpde and APTDpde to be pde instead of pte
Add new equation for calculating location of Sysmap
Remove Bill's old #ifdef garbage for counting up memory,
that stuff will never be made to work and was just cluttering
up the file.

Add code that places the PTD, page table pages, and kernel
stack below the 640k ISA hole if there is room for it, otherwise
put this stuff all at 1MB. This fixes the 28K bogusity in
the boot blocks, that can now go away!

Fix the caclulation of where first is to be dependent on
NKPDE so that we can skip over the above mentioned areas.
The 28K thing is now 44K in size due to the increase in
kernel virtual memory space, but since we no longer have
to worry about that this is no big deal.

Use if NNPX > 0 instead of ifdef NPX for floating point code.

machdep.c
Change the calculation of for the buffer cache to be
20% of all memory above 2MB and add back the upper limit
of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL
of the kernel memory space on large memory machines, note
that this will not even come into effect unless you have
more than 32MB. The current buffer cache limit is 6.7MB
due to this caclulation.

It seems that we where erroniously allocating bufpages pages
for buffer_map. buffer_map is UNUSED in this implementation
of the buffer cache, but since the map is referenced in
several if statements a quick fix was to simply allocate
1 vm page (but no real memory) to it.

pmap.h
Remove rcsid, don't want them in the kernel files!

Removed some cruft inside an #ifdef DEBUGx that caused
compiler errors if you where compiling this for debug.

Use the #defines for PD_SHIFT and PG_SHIFT in place of
constants.

trap.c:
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Remove a now completly invalid check for a maximum virtual
address, the virtual address now ends at 0xFFFFFFFF so
there is no more MAX!! (Thanks David, I completly missed
that one!)

vm_machdep.c
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Replace several 0xFE00000 constants with KERNBASE


604 14-Oct-1993 rgrimes

>From David Greenman

Bruce Evans had limited the kernel virtual address space to not include the
last 4MB since it was not being used. Other changes are being made that will
reloate the Alternate Page Directory Table (APDT) into this area so the limit
is being fixed to be the last virtual address. (Infact with this patch you
can now do that relocation)


593 13-Oct-1993 rgrimes

ALL:

Removed patch kit headers and rcsid strings, add $Id$.

isa.c:

Removed old #ifdef notyet isa_configure code, since it will never be
used, and I have done 90% of what it attempted to.

Add conflict checking code that searchs back through the devtab's looking
for any device that has already been found that may conflict with what
we are about to probe. Checks are mode for I/O address, memory address,
IRQ, and DRQ. This should stop the screwing up of any device that has
alread been found by other device probes.
Print out messages when we are not going to probe a device due to
a conflict so the user knows WHY something was not found. For example:

aha0 not probed due to irq conflict with ahb0 at 11

Now print out a message when a device is not found so the user knows
that it was probed for, but could not be found. For example:

ed1 not found at 0x320

For devices that have I/O address < 0x100 say that they are on the
motherboard, not on isa! The 0x100 magic number is per ISA spec. It
may seem funny that pc0 and sc0 report as being on the motherboard, but
this is due to the fact that the I/O address used is that of the keyboard
controller which IS on the motherboard. We really need to split the
keyboard probe from the display probe. It is completly legal to build
a pc with out one or the other, or even with out both!

npx.c:

Return -1 from the probe routine if we are using the Emulator so
that the i/o addresses are not printed, this is the same trick used
for 486's.

Do not print the ``Errors reported via Exception 16'', and
``Errors reported via IRQ 13'' messages any more, since these just lead
to more user confusion that anything. It still prints the message
``Error reporting broken, using 387 emulator'' so that the person is
aware that there mother board is ill.


592 13-Oct-1993 rgrimes

Removed hack that did the R_SHIFT of unsigned numbers, no longer need
to do this as I have changed to using PDTI's as the bases for the vm
system layout.

Eliminate constants SYSPDROFF and SYSPDREND, now use NKPTE to control the size
of the kernel virtual space.

Eliminate constant PDRPDROFF, now use PDTDTPI to control location of PTD,
PTDmap and PTDpde

Eliminate constant APDRPDROFF, now use APTDPTDI to control location of APTD,
APTDmap and APTDpde.

Still need to fix Sysmap location (it is still a constant).

.globl statements are now consistent with respect to <comma><space>, the
<space> being removed from all .globl statements.

Document the fillkpt macro as to what registers control what.

Fix some comments that went past column 80, and clean/line some others up.

Remove constand for _Crtat, now use KERNBASE+constant, this still needs work.

Replace constants for offsets of sigcode parameters with symbolic names
from assym.s

Mark the sigreturn() call with XXX since we use the hardcoded constant
for the system call number, this is bogus and should be a #define or
something some place!

The kernel before and after this change was verified with cmp, not one
byte changed. These are all cosmetic clean up changes that makes the
code more correct and easier to move the kernels virtual address space
and size.


590 12-Oct-1993 rgrimes

Add Page Table Directory Indexes (NKPDE, KPTDI, PTDPTDI, APTDPTDI) to
be used to replace more constants in locore.


589 12-Oct-1993 rgrimes

KPTDI_LAST renamed to KPTDI


587 12-Oct-1993 rgrimes

Eliminate definition of I386_PAGE_SIZE and use NBPG instead
Replace 0xFE000000 constants with KERNBASE
Use new definition NKPDE in place of a first-last+1 calculation.


570 10-Oct-1993 rgrimes

SYSPDROFF and SYSPDREND are now calculated using KERNBASE, KERNSIZE and
PDRSHIFT.

The SYSTEM constant that was defined in this file has been replaced
with KERNBASE from param.h.

Changed almost all # style comments to /* */ C style comments. Several
comments cleaned up so that they make a little more since.

In the comments that describe C calling conventions to assembler routines
used a comma space sequence to seperate arguments and removed the space
between the function name and the argument list.

Removed useless comments like /* clr eax */.

Changed all comma space sequences on assemble instructions to just be comma.

Removed spaces after $ operators to make the file consistent, this may need
to change again (ie: $KERNBASE should probably be $(KERNBASE), but for now
it all seems to work just fine.) This may become a problem with the C
pre-processor.

Changed several double blank lines to single blank lines that where used
to seperate the I/O routines, these routines are blocked enough that we
don't need double blank lines between them.

Changed sequence of I/O routines to be all input functions, all output
functions instead of just the opposite.

Moved the SHOW_A_LOT debug stuff to near the end of the file.

Changed two occurances of the constant 0xfff to NBPG-1.


569 10-Oct-1993 rgrimes

Added a compile time #error so that if the user does not specify on of
the proper I_X86CPU in the config file the following error will occur
while building the kernel: (had to line wrap the error for this message)

../../i386/i386/machdep.c:343: #error This kernel is not configured for one \
of the supported CPUs


567 10-Oct-1993 rgrimes

Added PDRSHIFT and KERNSIZE so that the PDR offsets can be calculated in
locore.s instead of being constants (3F8, 3FA).


556 08-Oct-1993 rgrimes

All:
removed patch kit headers and sccsids, add $Id$. This is a general
clean up and reallignment with NetBSD-current where possible.

genassym.c:
removed extranious include of reg.h
removed old FP_* defines that have been ifdefed out since the patch kit
removed PCB_SIGC that is not referenced anywhere
add trapframe and sigframe defines
add KERNBASE define for use in locore.s

locore.s:
include npx.h and use NNPX for turning on and off FPU
include machine/cputypes.h for the types of cpu (used in cpu_identify)
change SYSPDREND to be one higher, this is really the base of the
next area, and will be changing again next time I revise the file
Reverse the NOP defines, you now get slow NOP's by default, this
may be what is casuing us trouble with some systems. If you want
the NOPS to be null you now need to have options DUMMY_NOPS.
Now get esym from the boot blocks which don't pass it yet, and
it is not used, but this will be changing.
Move the bit_colors stuff to be in with the rest of Bruces SHOW_A_LOT
things for debugging.
Added NetBSD's CPU type probe code, we now know what type of CPU
we are running on.
Adjust kernel pde calcuation to correct for change in SYSPDREND, no
longer need the +1.

machdep.c
include npx.h and use NNPX for turning on and off FPU
include isa.h, map.h(new file), exec.h in preperation for
changes that are still in process.
Add some of the code for MACHINE_NONCONTIG that will alow us
to better map around the BIOS memory area.
Now print the version, cpu id, real memory and availiable memory
during boot.
Correct the calculation of bufpages, the code was mixing pages
and bytes, it now does the right things. Removed Bill's hack
for limiting the erronous calculation.
add the identifycpu print out code from NetBSD.
remove the definition of the sigframe struct, it belongs in
frame.h
put in printf's about syncing disks on a halt/reboot.
Change the halted message to be a little easier reading.
Clean up of the dump messages, makes the source and the output
much more readable.
Change 0,0 in several places to have spaces after the commas.


549 08-Oct-1993 rgrimes

Removed patch kit headers, and rcsid, add $Id$, relocate Terry Lamberts
copyright to match the location that it is in NetBSD.

Remove the __main() {} dummy function, it belongs in kern/init_main.c


528 30-Sep-1993 rgrimes

This is a fix for the 32K DMA buffer region that was not accounted for,
it relocates it to be after the BIOS memory hole instead of right below
the 640K limit.
THANK YOU CHRIS!!!

From: <cgd@postgres.Berkeley.EDU>
Date: Wed, 29 Sep 93 18:49:58 -0700
basically, reserve a new 32k space right after firstaddr,
and put the buffer space there...

the diffs are below, and are in ~cgd/sys/i386/i386 (in machdep.c)
on freefall. i obviously can't test them, so if some of you would
look the diffs over and try them out...


519 29-Sep-1993 rgrimes

Add symbolic name for system page directory end, and change constant to
a calculation for the system page directory tables.


504 24-Sep-1993 rgrimes

>From: rich@id.slip.bcm.tmc.edu.cdrom.com (Rich Murphey)
Date: Sun, 12 Sep 1993 18:19:05 -0500
This will allow you to compile and run a freebsd kernel with shared
memory support. I haven't tested the shm*() calls yet.

You run out of page table descriptors if you specify 4Mb of sharable
memory (SHMMAXPGS=1024). I don't know what the limit is, but
SHMMAXPGS=64 works. Rich


434 10-Sep-1993 nate

Removed volatile functions which were causing grief in the system, since
volatile functions are undefined, and there is no reason to have them
in our kernel.


433 10-Sep-1993 rgrimes

This is just to shut the compiler up
===================================================================
RCS file: /a/cvs/386BSD/src/sys/i386/i386/vm_machdep.c,v
retrieving revision 1.3
diff -c -r1.3 vm_machdep.c
*** 1.3 1993/07/27 10:52:21
--- vm_machdep.c 1993/09/10 20:12:53
***************
*** 179,184 ****
--- 179,186 ----
#endif
splclock();
swtch();
+ /*NOTREACHED*/
+ for(;;);
}

cpu_wait(p) struct proc *p; {


424 09-Sep-1993 rgrimes

Changed the pg("ptdi> %x") to a printf and then a panic, since we are
going to panic shortly after this anyway. Destroys less state, and
keeps the machine from waiting for someone to smash the return key
a few times before it panics!


351 28-Aug-1993 rgrimes

Changed trap.c so that a panic will occur if we do not have hardware
FP and we try to call the emulator when it is not compiled in.
Removed the #if defined(i486) || defined(i387) that use to call the
panic if we did not have a math emulator.
Removed an extranious include of i386/i386/math_emu.h from math_emulate.c.


269 10-Aug-1993 rgrimes

Removed one more reverence to the old Adaptec 1542b as.c driver, one less
dependent for autoconf.c.


259 09-Aug-1993 rgrimes

From guido@gvr.win.tue.nl Sat Aug 7 06:58:04 1993

I posted some patches on the 386bsd_patchkit list to prohibit io access.
Because of a noninitialised filed in the tss, this was possible.
It is included below as the patch to machdep.c
However, when you do this *necessary* fix (security), it will be
impossible form within user space to do io.

therefor, I included another fix: when you open /dev/io, you
get the access. Of course you can rewrite it to use another minor
and thus giving access to the iospace when /dev/mem is opened, e.g.

NOTE: The /dev/io entry has not been added to /dev/MAKEDEV yet.
The patch is in NetBSD.


256 08-Aug-1993 rgrimes

Removed the asking for a root file system when booting from floppy as that
is now handled by the new boot blocks immediatly after the kernel is loaded.


200 27-Jul-1993 dg

* Applied fixes from Bruce Evans to fix COW bugs, >1MB kernel loading,
profiling, and various protection checks that cause security holes
and system crashes.
* Changed min/max/bcmp/ffs/strlen to be static inline functions
- included from cpufunc.h in via systm.h. This change
improves performance in many parts of the kernel - up to 5% in the
networking layer alone. Note that this requires systm.h to be included
in any file that uses these functions otherwise it won't be able to
find them during the load.
* Fixed incorrect call to splx() in if_is.c
* Fixed bogus variable assignment to splx() in if_ed.c


140 18-Jul-1993 paul

Added volatile void to cpu_exit() in the hope that it would
stop warning about returning from gcc.

It hasn't but the declaration is still correct.


134 16-Jul-1993 dg

New locore from Christoph Rubitschko.


132 16-Jul-1993 dg

Updated kernel files to move occurances of "struct args" syscall
argument definitions outside of the function parameter list. This is
to reduce the copious warning messages that (non-Jolitz) gcc produces.
Also fixed some bogus variable declarations and casts to make the
compiler happy.


126 15-Jul-1993 dg

Modified attach printf's so that the output is compatible with the "new"
way of doing things. There still remain several drivers that need to
be updated. Also added a compile-time option to pccons to switch the
control and caps-lock keys (REVERSE_CAPS_CTRL) - added for my personal
sanity.


118 12-Jul-1993 rgrimes

Fixed two occarances of ldos which should have been lods.
(From Christoph Robitschko)


8 18-Jun-1993 paul

Upgrade to GCC 2.X


5 12-Jun-1993 rgrimes

This commit was generated by cvs2svn to compensate for changes in r4,
which included commits to RCS files with non-trunk default branches.