History log of /freebsd-10.1-release/sys/i386/i386/
Revision Date Author Comments
272461 03-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


271999 22-Sep-2014 jhb

MFC 270850,271053,271192,271717:
Save and restore FPU state across suspend and resume on i386.
- Create a separate structure for per-CPU state saved across suspend and
resume that is a superset of a pcb.
- Store the FPU state for suspend and resume in the new structure
(for amd64, this moves it out of the PCB)
- On both i386 and amd64, all of the FPU suspend/resume handling is now
done in C.

Approved by: re (hrs)


271905 20-Sep-2014 kib

MFC r271716:
Presence of any VM_PROT bits in the permission argument on x86 implies
that the entry is readable and valid.

Approved by: re (gjb)


271541 13-Sep-2014 pfg

MFC r271149:
Apply known workarounds for less modern MacBooks.

The legacy USB circuit tends to give trouble on older MacBooks.
While the original report covered MacBook4, extend the fix
preemptively for the newer MacBookPro4 too.

PR: 191693
Reviewed by: emaste
Approved by: re


271191 06-Sep-2014 jhb

MFC 270826:
MFamd64: Add a machdep.bootmethod sysctl to inform the installer which
firmware method was used for booting. This is hardcoded to BIOS on i386.

PR: 192962
Approved by: re (gjb)


271071 04-Sep-2014 pfg

MFC r270844:
Minor space/tab cleanups.

Most of them were ripped from the GSoC 2104
SMAP + kpatch project (but unrelated).
Only cosmetic changes.

Taken from: Oliver Pinter (op@)


270920 01-Sep-2014 kib

Fix a leak of the wired pages when unwiring of the PROT_NONE-mapped
wired region. Rework the handling of unwire to do the it in batch,
both at pmap and object level.

All commits below are by alc.

MFC r268327:
Introduce pmap_unwire().

MFC r268591:
Implement pmap_unwire() for powerpc.

MFC r268776:
Implement pmap_unwire() for arm.

MFC r268806:
pmap_unwire(9) man page.

MFC r269134:
When unwiring a region of an address space, do not assume that the
underlying physical pages are mapped by the pmap. This fixes a leak
of the wired pages on the unwiring of the region mapped with no access
allowed.

MFC r269339:
In the implementation of the new function pmap_unwire(), the call to
MOEA64_PVO_TO_PTE() must be performed before any changes are made to the
PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic.

MFC r269365:
Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)

MFC r269433:
Handle wiring failures in vm_map_wire() with the new functions
pmap_unwire() and vm_object_unwire().
Retire vm_fault_{un,}wire(), since they are no longer used.

MFC r269438:
Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable
"rv" is uninitialized.

MFC r269485:
Retire pmap_change_wiring().

Reviewed by: alc


270441 24-Aug-2014 kib

MFC r270038:
Complete r254667, do not destroy pmap lock if KVA allocation failed.


270439 24-Aug-2014 kib

Merge the changes to pmap_enter(9) for sleep-less operation (requested
by flag). The ia64 pmap.c changes are direct commit, since ia64 is
removed on head.

MFC r269368 (by alc):
Retire PVO_EXECUTABLE.

MFC r269728:
Change pmap_enter(9) interface to take flags parameter and superpage
mapping size (currently unused).

MFC r269759 (by alc):
Update the text of a KASSERT() to reflect the changes in r269728.

MFC r269822 (by alc):
Change {_,}pmap_allocpte() so that they look for the flag
PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether
to sleep on page table page allocation.

MFC r270151 (by alc):
Replace KASSERT that no PV list locks are held with a conditional
unlock.

Reviewed by: alc
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation


269752 09-Aug-2014 markj

MFC r266826, r266827
Move some duplicated hook definitions from machine-dependent files to
kern_dtrace.c.


269253 29-Jul-2014 markj

MFC r263329:
Only invoke fasttrap hooks for traps from user mode, and ensure that they're
called with interrupts enabled. Calling fasttrap_pid_probe() with interrupts
disabled can lead to deadlock if fasttrap writes to the process' address
space.


269235 29-Jul-2014 marius

MFC: r269050

- Copying and zeroing pages via temporary mappings involves updating the
corresponding page tables followed by accesses to the pages in question.
This sequence is subject to the situation exactly described in the "AMD64
Architecture Programmer's Manual Volume 2: System Programming" rev. 3.23,
"7.3.1 Special Coherency Considerations" [1, p. 171 f.]. Therefore, issuing
the INVLPG right after modifying the PTE bits is crucial.
For pmap_copy_page(), this has been broken in r124956 and later on carried
over to pmap_copy_pages() derived from the former, while all other places
in the i386 PMAP code use the correct order of instructions in this regard.
Fixing the latter breakage solves the problem of data corruption seen with
unmapped I/O enabled when running at least bare metal on AMD R-268D APUs.
However, this might also fix similar corruption reported for virtualized
environments.
- In pmap_copy_pages(), correctly set the cache bits on the source page being
copied. This change is thought to be a NOP for the real world, though. [2]

1: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf

Submitted by: kib [2]
Reviewed by: alc, kib
Sponsored by: Bally Wulff Games & Entertainment GmbH


269072 24-Jul-2014 kib

MFC r267213 (by alc):
Add a page size field to struct vm_page.

Approved by: alc


268661 15-Jul-2014 kib

MFC r268383:
Correct si_code for the SIGBUS signal generated by the alignment trap.


267964 27-Jun-2014 jhb

MFC 261781:
Don't waste a page of KVA for the boot-time memory test on x86. For amd64,
reuse the first page of the crashdumpmap as CMAP1/CADDR1. For i386,
remove CMAP1/CADDR1 entirely and reuse CMAP3/CADDR3 for the memory test.


267714 22-Jun-2014 kib

MFC r267492:
Fix some cosmetic issues with the use of kmem_malloc() in the i386 LDT
sysarch(2) code.


266312 17-May-2014 ian

MFC 263036, 263059: delete advertising clause in licenses, renumber.


265606 07-May-2014 scottl

Merge r264984

Retire smp_active. It was racey and caused demonstrated problems with
the cpufreq code. Replace its use with smp_started. There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Obtained from: Netflix, Inc.


265476 07-May-2014 alc

MFC r262338
When the kernel is running in a virtual machine, it cannot rely upon the
processor family to determine if the workaround for AMD Family 10h Erratum
383 should be enabled. To enable virtual machine migration among a
heterogeneous collection of physical machines, the hypervisor may have
been configured to report an older processor family with a reduced feature
set. Effectively, the reported processor family and its features are like
a "least common denominator" for the collection of machines.

Therefore, when the kernel is running in a virtual machine, instead of
relying upon the processor family, we now test for features that prove
that the underlying processor is not affected by the erratum. (The
features that we test for are unlikely to ever be emulated in software
on an affected physical processor.)

PR: 186061


264147 05-Apr-2014 kib

MFC r263912:
Clear the kernel grab of the FPU state on fork.


264118 04-Apr-2014 royger

MFC r263001

Move asm IPIs handlers to C code, so both Xen and native IPI handlers
share the same code.

Approved by: gibbs
Sponsored by: Citrix Systems R&D


262042 17-Feb-2014 avg

MFC r257417: Remove references to an unused fasttrap probe hook


258996 05-Dec-2013 royger

MFC 258176:

Fix accounting for hw.realmem on the i386 and amd64 platforms.

sys/i386/i386/machdep.c:
sys/amd64/amd64/machdep.c:
The value reported by FreeBSD as "real memory" when booting
doesn't match what is later reported by sysctl as hw.realmem.
This is due to the fact that the value printed during the
boot process is fetched from smbios data (when possible),
and accounts for holes in physical memory. On the other
hand, the value of hw.realmem is unconditionally set to be
one larger than the highest page of the physical address
space.

Fix this by setting hw.realmem to the same value printed
during boot, this makes hw.realmem honour it's name and
account properly for physical memory present in the system.

Submitted by: Roger Pau Monné
Reviewed by: gibbs
Approved by: gibbs (mentor)
Approved by: re (gjb)


258559 25-Nov-2013 emaste

MFC r258135: x86: Allow users to change PSL_RF via ptrace(PT_SETREGS...)

Debuggers may need to change PSL_RF. Note that tf_eflags is already stored
in the signal context during signal handling and PSL_RF previously could
be modified via sigreturn, so this change should not provide any new
ability to userspace.

For background see the thread at:
http://lists.freebsd.org/pipermail/freebsd-i386/2007-September/005910.html

Reviewed by: jhb, kib

Sponsored by: DARPA, AFRL
Approved by: re (gjb)


258159 15-Nov-2013 kib

MFC r257856:
Add bits for the AMD features from CPUID function 0x80000001 ECX,
described in the rev. 3.0 of the Kabini BKDG, document 48751.pdf.

Approved by: re (gjb)


258158 15-Nov-2013 kib

MFC r257858:
Fix signal delivery for the iBCS2 binaries.

Approved by: re (gjb)


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255827 23-Sep-2013 kib

Free both KVA and backing pages when freeing TSS memory.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)


255786 22-Sep-2013 glebius

- Create kern.ipc.sendfile namespace, and put the new "readhead" OID
there as "kern.ipc.sendfile.readahead".
- Push all nsfbuf related tunables into MD code. Don't move them
to new namespace in favor of POLA.

Reviewed by: scottl
Approved by: re (gjb)


255744 20-Sep-2013 gibbs

Merge Xen PVHVM support into the GENERIC kernel config for both
amd64 and i386.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
- Introduce two new CPU hooks for initialization and resume
purposes. This allows us to get rid of the XENHVM ifdefs in
mp_machdep, and also sets some hooks into common code that can be
used by other hypervisor implementations.

sys/amd64/conf/XENHVM:
sys/i386/conf/XENHVM:
- Remove these configs now that GENERIC has builtin support for Xen
HVM.

sys/kern/subr_smp.c:
- Make sure there are no pending IPIs when suspending a system.

sys/x86/xen/hvm.c:
- Add cpu init and resume vectors that are called from mp_machdep
using the new hooks.
- Only clear the vcpu_info mapping data on resume. It is already
clear for the BSP on a cold boot and is set correctly as APs
are started.
- Gate xen_hvm_init_cpu only to systems running under Xen.

sys/x86/xen/xen_intr.c:
- Gate the setup of event channels only to systems running under Xen.


255726 20-Sep-2013 gibbs

Add support for suspend/resume/migration operations when running as a
Xen PVHVM guest.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
- Make sure that are no MMU related IPIs pending on migration.
- Reset pending IPI_BITMAP on resume.
- Init vcpu_info on resume.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
- Add a "suspend_cancelled" parameter to pic_resume(). For the
Xen PIC, restoration of interrupt services differs between
the aborted suspend and normal resume cases, so we must provide
this information.

sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
- Don't swap out "suspend safe" timers across a suspend/resume
cycle. This includes the Xen PV and ACPI timers.

sys/dev/xen/control/control.c:
- Perform proper suspend/resume process for PVHVM:
- Suspend all APs before going into suspension, this allows us
to reset the vcpu_info on resume for each AP.
- Reset shared info page and callback on resume.

sys/dev/xen/timer/timer.c:
- Implement suspend/resume support for the PV timer. Since FreeBSD
doesn't perform a per-cpu resume of the timer, we need to call
smp_rendezvous in order to correctly resume the timer on each CPU.

sys/dev/xen/xenpci/xenpci.c:
- Don't reset the PCI interrupt on each suspend/resume.

sys/kern/subr_smp.c:
- When suspending a PVHVM domain make sure there are no MMU IPIs
in-flight, or we will get a lockup on resume due to the fact that
pending event channels are not carried over on migration.
- Implement a generic version of restart_cpus that can be used by
suspended and stopped cpus.

sys/x86/xen/hvm.c:
- Implement resume support for the hypercall page and shared info.
- Clear vcpu_info so it can be reset by APs when resuming from
suspension.

sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
- Support UP kernel configurations.

sys/x86/xen/xen_intr.c:
- Properly rebind per-cpus VIRQs and IPIs on resume.


255724 20-Sep-2013 alc

The pmap function pmap_clear_reference() is no longer used. Remove it.

pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


255677 18-Sep-2013 pjd

Fix panic in ktrcapfail() when no capability rights are passed.
While here, correct all consumers to pass NULL instead of 0 as we pass
capability rights as pointers now, not uint64_t.

Reported by: Daniel Peyrolon
Tested by: Daniel Peyrolon
Approved by: re (marius)


255620 16-Sep-2013 kib

Merge the change r255607 from amd64 to i386.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (gjb)


255474 11-Sep-2013 alc

Prior to r254304, we only began scanning the active page queue when the
amount of free memory was close to the point at which we would begin
reclaiming pages. Now, we continuously scan the active page queue,
regardless of the amount of free memory. Consequently, we are continuously
calling pmap_ts_referenced() on active pages.

Prior to this change, pmap_ts_referenced() would always demote superpage
mappings in order to obtain finer-grained reference information. This made
sense because we were coming under memory pressure and would soon have to
begin reclaiming pages. Now, however, with continuous scanning of the
active page queue, these demotions are taking a toll on performance. To
address this problem, I have replaced the demotion with a heuristic for
periodically clearing the reference flag on superpage mappings.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


255331 06-Sep-2013 gibbs

Implement PV IPIs for PVHVM guests and further converge PV and HVM
IPI implmementations.

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Submitted by: gibbs (misc cleanup, table driven config)
Reviewed by: gibbs
MFC after: 2 weeks

sys/amd64/include/cpufunc.h:
sys/amd64/amd64/pmap.c:
Move invltlb_globpcid() into cpufunc.h so that it can be
used by the Xen HVM version of tlb shootdown IPI handlers.

sys/x86/xen/xen_intr.c:
sys/xen/xen_intr.h:
Rename xen_intr_bind_ipi() to xen_intr_alloc_and_bind_ipi(),
and remove the ipi vector parameter. This api allocates
an event channel port that can be used for ipi services,
but knows nothing of the actual ipi for which that port
will be used. Removing the unused argument and cleaning
up the comments surrounding its declaration helps clarify
its actual role.

sys/amd64/amd64/mp_machdep.c:
sys/amd64/include/cpu.h:
sys/i386/i386/mp_machdep.c:
sys/i386/include/cpu.h:
Implement a generic framework for amd64 and i386 that allows
the implementation of certain CPU management functions to
be selected at runtime. Currently this is only used for
the ipi send function, which we optimize for Xen when running
on a Xen hypervisor, but can easily be expanded to support
more operations.

sys/x86/xen/hvm.c:
Implement Xen PV IPI handlers and operations, replacing native
send IPI.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
sys/i386/include/smp.h:
Remove NR_VIRQS and NR_IPIS from FreeBSD headers. NR_VIRQS
is defined already for us in the xen interface files.
NR_IPIS is only needed in one file per Xen platform and is
easily inferred by the IPI vector table that is defined in
those files.

sys/i386/xen/mp_machdep.c:
Restructure to more closely match the HVM implementation by
performing table driven IPI setup.


255040 29-Aug-2013 gibbs

Implement vector callback for PVHVM and unify event channel implementations

Re-structure Xen HVM support so that:
- Xen is detected and hypercalls can be performed very
early in system startup.
- Xen interrupt services are implemented using FreeBSD's native
interrupt delivery infrastructure.
- the Xen interrupt service implementation is shared between PV
and HVM guests.
- Xen interrupt handlers can optionally use a filter handler
in order to avoid the overhead of dispatch to an interrupt
thread.
- interrupt load can be distributed among all available CPUs.
- the overhead of accessing the emulated local and I/O apics
on HVM is removed for event channel port events.
- a similar optimization can eventually, and fairly easily,
be used to optimize MSI.

Early Xen detection, HVM refactoring, PVHVM interrupt infrastructure,
and misc Xen cleanups:

Sponsored by: Spectra Logic Corporation

Unification of PV & HVM interrupt infrastructure, bug fixes,
and misc Xen cleanups:

Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D

sys/x86/x86/local_apic.c:
sys/amd64/include/apicvar.h:
sys/i386/include/apicvar.h:
sys/amd64/amd64/apic_vector.S:
sys/i386/i386/apic_vector.s:
sys/amd64/amd64/machdep.c:
sys/i386/i386/machdep.c:
sys/i386/xen/exception.s:
sys/x86/include/segments.h:
Reserve IDT vector 0x93 for the Xen event channel upcall
interrupt handler. On Hypervisors that support the direct
vector callback feature, we can request that this vector be
called directly by an injected HVM interrupt event, instead
of a simulated PCI interrupt on the Xen platform PCI device.
This avoids all of the overhead of dealing with the emulated
I/O APIC and local APIC. It also means that the Hypervisor
can inject these events on any CPU, allowing upcalls for
different ports to be handled in parallel.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
Map Xen per-vcpu area during AP startup.

sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
Increase the FreeBSD IRQ vector table to include space
for event channel interrupt sources.

sys/amd64/include/pcpu.h:
sys/i386/include/pcpu.h:
Remove Xen HVM per-cpu variable data. These fields are now
allocated via the dynamic per-cpu scheme. See xen_intr.c
for details.

sys/amd64/include/xen/hypercall.h:
sys/dev/xen/blkback/blkback.c:
sys/i386/include/xen/xenvar.h:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/xen/gnttab.c:
Prefer FreeBSD primatives to Linux ones in Xen support code.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
sys/dev/xen/balloon/balloon.c:
sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/console/xencons_ring.c:
sys/dev/xen/control/control.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/xenpci/xenpci.c:
sys/i386/i386/machdep.c:
sys/i386/include/pmap.h:
sys/i386/include/xen/xenfunc.h:
sys/i386/isa/npx.c:
sys/i386/xen/clock.c:
sys/i386/xen/mp_machdep.c:
sys/i386/xen/mptable.c:
sys/i386/xen/xen_clock_util.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/xen_rtc.c:
sys/xen/evtchn/evtchn_dev.c:
sys/xen/features.c:
sys/xen/gnttab.c:
sys/xen/gnttab.h:
sys/xen/hvm.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstore.c:
sys/xen/xenstore/xenstore_dev.c:
sys/xen/xenstore/xenstorevar.h:
Pull common Xen OS support functions/settings into xen/xen-os.h.

sys/amd64/include/xen/xen-os.h:
sys/i386/include/xen/xen-os.h:
sys/xen/xen-os.h:
Remove constants, macros, and functions unused in FreeBSD's Xen
support.

sys/xen/xen-os.h:
sys/i386/xen/xen_machdep.c:
sys/x86/xen/hvm.c:
Introduce new functions xen_domain(), xen_pv_domain(), and
xen_hvm_domain(). These are used in favor of #ifdefs so that
FreeBSD can dynamically detect and adapt to the presence of
a hypervisor. The goal is to have an HVM optimized GENERIC,
but more is necessary before this is possible.

sys/amd64/amd64/machdep.c:
sys/dev/xen/xenpci/xenpcivar.h:
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/sys/kernel.h:
Refactor magic ioport, Hypercall table and Hypervisor shared
information page setup, and move it to a dedicated HVM support
module.

HVM mode initialization is now triggered during the
SI_SUB_HYPERVISOR phase of system startup. This currently
occurs just after the kernel VM is fully setup which is
just enough infrastructure to allow the hypercall table
and shared info page to be properly mapped.

sys/xen/hvm.h:
sys/x86/xen/hvm.c:
Add definitions and a method for configuring Hypervisor event
delievery via a direct vector callback.

sys/amd64/include/xen/xen-os.h:
sys/x86/xen/hvm.c:

sys/conf/files:
sys/conf/files.amd64:
sys/conf/files.i386:
Adjust kernel build to reflect the refactoring of early
Xen startup code and Xen interrupt services.

sys/dev/xen/blkback/blkback.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkfront/block.h:
sys/dev/xen/control/control.c:
sys/dev/xen/evtchn/evtchn_dev.c:
sys/dev/xen/netback/netback.c:
sys/dev/xen/netfront/netfront.c:
sys/xen/xenstore/xenstore.c:
sys/xen/evtchn/evtchn_dev.c:
sys/dev/xen/console/console.c:
sys/dev/xen/console/xencons_ring.c
Adjust drivers to use new xen_intr_*() API.

sys/dev/xen/blkback/blkback.c:
Since blkback defers all event handling to a taskqueue,
convert this task queue to a "fast" taskqueue, and schedule
it via an interrupt filter. This avoids an unnecessary
ithread context switch.

sys/xen/xenstore/xenstore.c:
The xenstore driver is MPSAFE. Indicate as much when
registering its interrupt handler.

sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusvar.h:
Remove unused event channel APIs.

sys/xen/evtchn.h:
Remove all kernel Xen interrupt service API definitions
from this file. It is now only used for structure and
ioctl definitions related to the event channel userland
device driver.

Update the definitions in this file to match those from
NetBSD. Implementing this interface will be necessary for
Dom0 support.

sys/xen/evtchn/evtchnvar.h:
Add a header file for implemenation internal APIs related
to managing event channels event delivery. This is used
to allow, for example, the event channel userland device
driver to access low-level routines that typical kernel
consumers of event channel services should never access.

sys/xen/interface/event_channel.h:
sys/xen/xen_intr.h:
Standardize on the evtchn_port_t type for referring to
an event channel port id. In order to prevent low-level
event channel APIs from leaking to kernel consumers who
should not have access to this data, the type is defined
twice: Once in the Xen provided event_channel.h, and again
in xen/xen_intr.h. The double declaration is protected by
__XEN_EVTCHN_PORT_DEFINED__ to ensure it is never declared
twice within a given compilation unit.

sys/xen/xen_intr.h:
sys/xen/evtchn/evtchn.c:
sys/x86/xen/xen_intr.c:
sys/dev/xen/xenpci/evtchn.c:
sys/dev/xen/xenpci/xenpcivar.h:
New implementation of Xen interrupt services. This is
similar in many respects to the i386 PV implementation with
the exception that events for bound to event channel ports
(i.e. not IPI, virtual IRQ, or physical IRQ) are further
optimized to avoid mask/unmask operations that aren't
necessary for these edge triggered events.

Stubs exist for supporting physical IRQ binding, but will
need additional work before this implementation can be
fully shared between PV and HVM.

sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
sys/i386/xen/mp_machdep.c
sys/x86/xen/hvm.c:
Add support for placing vcpu_info into an arbritary memory
page instead of using HYPERVISOR_shared_info->vcpu_info.
This allows the creation of domains with more than 32 vcpus.

sys/i386/i386/machdep.c:
sys/i386/xen/clock.c:
sys/i386/xen/xen_machdep.c:
sys/i386/xen/exception.s:
Add support for new event channle implementation.


255028 29-Aug-2013 alc

Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division


254671 22-Aug-2013 gibbs

Rename definition of HYPERVISOR_VIRT_START to avoid conflict with
upstream Xen definition found in xen/interface/arch-x86/xen-x86_32.h.

Submitted by: Roger Pau Monné
Reviewed by: gibbs
MFC after: 2 weeks


254667 22-Aug-2013 kib

Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


254619 21-Aug-2013 jkim

Reimplement atomic_load_acq_64() and atomic_store_rel_64() for i386. These
functions are now real functions rather than function pointers. Supposedly,
it is faster for modern processors.

Suggested by: bde


254384 15-Aug-2013 jkim

Simplify check for CMPXCHG8B instruction. Note CMPXCHG8B instruction is
always available for Rise mP6 processors although it is not set by CPUID.


254182 10-Aug-2013 kib

Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue. Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked. See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members. Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


254141 09-Aug-2013 attilio

On all the architectures, avoid to preallocate the physical memory
for nodes used in vm_radix.
On architectures supporting direct mapping, also avoid to pre-allocate
the KVA for such nodes.

In order to do so make the operations derived from vm_radix_insert()
to fail and handle all the deriving failure of those.

vm_radix-wise introduce a new function called vm_radix_replace(),
which can replace a leaf node, already present, with a new one,
and take into account the possibility, during vm_radix_insert()
allocation, that the operations on the radix trie can recurse.
This means that if operations in vm_radix_insert() recursed
vm_radix_insert() will start from scratch again.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc (older version)
Reviewed by: jeff
Tested by: pho, scottl


254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


254025 07-Aug-2013 jeff

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253949 05-Aug-2013 jeff

- Introduce a specific function, pmap_remove_kernel_pde, for removing
huge pages in the kernel's address space. This works around several
asserts from pmap_demote_pde_locked that did not apply and gave false
warnings.

Discovered by: pho
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division


253747 28-Jul-2013 avg

x86: detect mwait capabilities and extensions, when present

Reviewed by: kib (earlier amd64-only version)
MFC after: 2 weeks


253685 26-Jul-2013 jeff

- Use kmem_malloc rather than kmem_alloc() for GDT/LDT/tss allocations etc.
This eliminates some unusual uses of that API in favor of more typical
uses of kmem_malloc().

Discussed with: kib/alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253367 15-Jul-2013 ae

Include sys/systm.h after sys/param.h.

Suggested by: pluknet


253361 15-Jul-2013 glebius

Nuke mbstat. It wasn't used for mbuf statistics since FreeBSD 5.

Now that r253351 moved sendfile() stats to a separate struct, the
last field used in mbstat is m_mcfail, which is updated, but never
read or obtained from userland.


253351 15-Jul-2013 ae

Introduce new structure sfstat for collecting sendfile's statistics
and remove corresponding fields from struct mbstat. Use PCPU counters
and SFSTAT_INC() macro for update these statistics.

Discussed with: glebius


253328 13-Jul-2013 kib

Create a proper stack frame for i386 version of bcopy(), despite the
function is leaf. The frame allows ddb to not loose the direct caller
of bcopy() in backtrace.

Other functions from support.s would benefit from the same change as
well, but for now bcopy() is the most frequent offender.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


253186 11-Jul-2013 kib

Explicitely panic instead of possibly doing undefined things when
ptelist KVA is exhausted. Currently this cannot happen, the added
panic serves as assert.

Discussed with: alc
Sponsored by: The FreeBSD Foundation


253185 11-Jul-2013 kib

MFamd64 r253140:
Clear m->object for the page taken from the delayed free list in
pmap_pv_reclaim().

Noted by: alc


251988 19-Jun-2013 kib

Some clarifications and updates for the comments, mostly retrieved
from Bruce Evans. Trim the trailing spaces.

MFC after: 1 week


251703 13-Jun-2013 jeff

- Add a BIT_FFS() macro and use it to replace cpusetffs_obj()

Discussed with: attilio
Sponsored by: EMC / Isilon Storage Division


251324 03-Jun-2013 kib

Assert that interrupts are enabled in the trap handlers on x86 before
calling generic code to deliver signals.

Discussed with: bde
Tested by: pho
MFC after: 1 week


251283 03-Jun-2013 kib

MFamd64: when printing the trap information, show the %esp value.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week


251039 27-May-2013 kib

Use slightly more idiomatic expression to get the address of array.

Tested by: dim, pgj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


251033 27-May-2013 kib

When handling an exception from the attempt from loading the faulting
context on return from the trap handler, re-enable the interrupts on
i386 and amd64. The trap return path have to disable interrupts since
the sequence of loading the machine state is not atomic. The trap()
function which transfers the control to the special handler would
enable the interrupt, but an iret loads the previous eflags with PSL_I
clear. Then, the special handler calls trap() on its own, which now
sees the original eflags with PSL_I set and does not enable
interrupts.

The end result is that signal delivery and process exiting code could
be executed with interrupts disabled, which is generally wrong and
triggers several assertions.

For amd64, the interrupts are enabled conditionally based on PSL_I in
the eflags of the outer frame, as it is already done for
doreti_iret_fault. For i386, the interrupts are enabled
unconditionally, the ast loop could have opened a window with
interrupts enabled just before the iret anyway.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


250884 21-May-2013 attilio

o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


250840 21-May-2013 marcel

Add basic support for FDT to i386 & amd64. This change includes:
1. Common headers for fdt.h and ofw_machdep.h under x86/include
with indirections under i386/include and amd64/include.
2. New modinfo for loader provided FDT blob.
3. Common x86_init_fdt() called from hammer_time() on amd64 and
init386() on i386.
4. Split-off FDT specific low-level console functions from FDT
bus methods for the uart(4) driver. The low-level console
logic has been moved to uart_cpu_fdt.c and is used for arm,
mips & powerpc only. The FDT bus methods are shared across
all architectures.
5. Add dev/fdt/fdt_x86.c to hold the fdt_fixup_table[] and the
fdt_pic_table[] arrays. Both are empty right now.

FDT addresses are I/O ports on x86. Since the core FDT code does
not handle different address spaces, adding support for both I/O
ports and memory addresses requires some thought and discussion.
It may be better to use a compile-time option that controls this.

Obtained from: Juniper Networks, Inc.


250624 13-May-2013 ed

Improve readability of static assertions for OFFSET_* macros.

Instead of doing all sorts of weird casting of constants to
pointer-pointers, simply use the standard C offsetof() macro to obtain
the offset of the respective fields in the structures.


249439 13-Apr-2013 kib

Fix the name of the pcb member in the comments.

Submitted by: Oliver Pinter <oliver.pntr@gmail.com>
MFC after: 3 days


248508 19-Mar-2013 kib

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


248449 18-Mar-2013 attilio

Sync back vmcontention branch into HEAD:
Replace the per-object resident and cached pages splay tree with a
path-compressed multi-digit radix trie.
Along with this, switch also the x86-specific handling of idle page
tables to using the radix trie.

This change is supposed to do the following:
- Allowing the acquisition of read locking for lookup operations of the
resident/cached pages collections as the per-vm_page_t splay iterators
are now removed.
- Increase the scalability of the operations on the page collections.

The radix trie does rely on the consumers locking to ensure atomicity of
its operations. In order to avoid deadlocks the bisection nodes are
pre-allocated in the UMA zone. This can be done safely because the
algorithm needs at maximum one new node per insert which means the
maximum number of the desired nodes is the number of available physical
frames themselves. However, not all the times a new bisection node is
really needed.

The radix trie implements path-compression because UFS indirect blocks
can lead to several objects with a very sparse trie, increasing the number
of levels to usually scan. It also helps in the nodes pre-fetching by
introducing the single node per-insert property.

This code is not generalized (yet) because of the possible loss of
performance by having much of the sizes in play configurable.
However, efforts to make this code more general and then reusable in
further different consumers might be really done.

The only KPI change is the removal of the function vm_page_splay() which
is now reaped.
The only KBI change, instead, is the removal of the left/right iterators
from struct vm_page, which are now reaped.

Further technical notes broken into mealpieces can be retrieved from the
svn branch:
http://svn.freebsd.org/base/user/attilio/vmcontention/

Sponsored by: EMC / Isilon storage division
In collaboration with: alc, jeff
Tested by: flo, pho, jhb, davide
Tested by: ian (arm)
Tested by: andreast (powerpc)


248280 14-Mar-2013 kib

Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks


248140 10-Mar-2013 alc

The kernel pmap is statically allocated, so there is really no need to
explicitly initialize its pm_root field to zero.

Sponsored by: EMC / Isilon Storage Division


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


247622 02-Mar-2013 attilio

Merge from vmc-playground branch:
Rename the pv_entry_t iterator from pv_list to pv_next.
Besides being more correct technically (as the name seems to suggest
this is a list while it is an iterator), it will also be needed by
vm_radix work to avoid a nameclash on macro expansions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc, jeff
Tested by: flo, pho, jhb, davide


247454 28-Feb-2013 davide

MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.


247400 27-Feb-2013 attilio

Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


246855 15-Feb-2013 jkim

Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is
no functional change.


246248 02-Feb-2013 avg

cpususpend_handler: mark AP as resumed only after fully setting up lapic

Reviewed by: jhb
Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>,
KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
MFC after: 12 days


245640 19-Jan-2013 jhb

Fix build with SMP disabled.`

Reported by: bf


245577 17-Jan-2013 jhb

Don't attempt to use clflush on the local APIC register window. Various
CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on
Pentium-M, and trashing of the local APIC registers on a VIA C7). The
local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't
necessary anyway.

MFC after: 2 weeks


243836 03-Dec-2012 kib

Print the frame addresses for the backtraces on i386 and amd64. It
allows both to inspect the frame sizes and to manually peek into the
frames from ddb, if needed.

Reviewed by: dim
MFC after: 2 weeks


242534 03-Nov-2012 attilio

Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by: alc, jimharris


242010 24-Oct-2012 kib

Add missed sched_pin().

Submitted by: Svatopluk Kraus <onwahe@gmail.com>
Reviewed by: alc
MFC after: 3 days


241880 22-Oct-2012 eadler

The 'testing memory' patch gets printed too many times

Approved by: cperciva (implicit)


241850 22-Oct-2012 eadler

Explain the upcoming delay by printing a message when the kernel
is about to begin testing memory.

Reviewed by: dteske, adri
Approved by: cperciva
MFC after: 1 week


241550 14-Oct-2012 kib

MFamd64: add machdep.uprintf_signal.

MFC after: 1 week


241371 09-Oct-2012 attilio

Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by: marius


241356 08-Oct-2012 kib

Add several asserts to i386 pmap, which mostly state that pv entry shall
have corresponding pte.

Reviewed by: alc
Tested by: pho
MFC after: 3 days


241353 08-Oct-2012 alc

In a few places, like the implementation of ptrace(), a thread may call
upon pmap_enter() to create a mapping within a different address space,
i.e., not the thread's own address space. On i386, this entails the
creation of a temporary mapping to the affected page table page (PTP). In
general, pmap_enter() will read from this PTP, allocate a PV entry, and
write to this PTP. The trouble comes when the system is short of memory.
In order to allocate a new PV entry, an older PV entry has to be
reclaimed. Reclaiming a PV entry involves destroying a mapping, which
requires access to the affected PTP. Thus, the PTP mapped at the
beginning of pmap_enter() is no longer mapped at the end of pmap_enter(),
which leads to pmap_enter() modifying the wrong PTP. To address this
problem, pmap_pv_reclaim() is changed to use an alternate method of
mapping PTPs.

Update a related comment.

Reported by: pho
Diagnosed by: kib
MFC after: 5 days


241020 28-Sep-2012 alc

Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.


240773 21-Sep-2012 dim

After r205013, amd64 and i386 CPU family and model IDs were printed out
in hexadecimal, but without any 0x prefix, which can be very misleading.

MFC after: 3 days


240317 10-Sep-2012 alc

Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(),
pmap_unmapdev()'s own direct efforts to destroy the page table entries are
redundant, so eliminate them.

Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS.
Setting PTE_W on MIPS is inconsistent with the implementation of this
function on other architectures. Moreover, PTE_W should not be set, unless
the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}()
doesn't do.

MFC after: 10 days


240244 08-Sep-2012 attilio

userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by: kib
MFC after: 1 week


240126 05-Sep-2012 alc

Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the
comment describing them. Both the function names and the comment had grown
stale. Quite some time has passed since these pmap implementations last
used the page's hold count to track the number of valid mapping within a
page table page. Also, returning TRUE from pmap_unwire_ptp() rather than
_pmap_unwire_ptp() eliminates a few instructions from callers like
pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used
directly by a conditional statement.


239241 13-Aug-2012 jhb

Remove the deassert INIT IPI from the IPI startup sequence for APs.
It is not listed in the boot sequence in the MP specification (1.4),
and it is explicitly ignored on modern CPUs. It was only ever required
when bootstrapping systems with external APICs (that is, SMP machines
with 486s), which FreeBSD has never supported (and never will).

While here, tidy some comments and remove some banal ones.


239235 13-Aug-2012 jhb

Add a 10 millisecond delay after sending the initial INIT IPI. This
matches the algorithm in the MP specification (1.4). Previously we
were sending out the deassert INIT IPI immediately after the initial
INIT IPI was sent.


238792 26-Jul-2012 kib

MFamd64 r238623:
Introduce curpcb magic variable.

Requested and reviewed by: bde
MFC after: 3 weeks


238678 21-Jul-2012 kib

MFCamd64 r238598:
Provide siginfo.si_code for floating point errors when error occurs
using the SSE math processor.

MFC after: 3 weeks


238675 21-Jul-2012 kib

MFamd64 r238669:
Force clean FPU state in PCB user FPU save area for PT_I386_{GET,SET}XMMREGS.

Reported by: bde
MFC after: 1 week


238310 09-Jul-2012 jhb

Partially revert r217515 so that the mem_range_softc variable is always
present on x86 kernels. This fixes the build of kernels that include
'device acpi' but do not include 'device mem'.

MFC after: 1 month


237996 02-Jul-2012 brueffer

Fix XEN build, broken in r237924.

Reported by: gcooper
Pointy hat: brueffer


237924 01-Jul-2012 brueffer

Replace an unreachable panic() in vm86_getptr (been there for 13 years) with
a KASSERT() behind the functions's only consumer.

Suggested by: kib
Reviewed by: kib
CID: 4494
Found with: Coverity Prevent(tm)
MFC after: 2 weeks


237623 27-Jun-2012 alc

Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.


237445 22-Jun-2012 kib

Commit changes missed from r237435. Properly calculate the signal
trampoline addresses after the shared page is enabled. Handle FreeBSD
ABIs without shared page support too.

Reported and tested by: David Wolfskill <david catwhisker org>
(previous version)
Pointy hat to: kib
MFC after: 1 month


237435 22-Jun-2012 kib

Enable shared page on i386, now it has a use for vdso_timehands.

MFC after: 1 month


237027 13-Jun-2012 jkim

- Fix resumectx() prototypes to reflect reality.
- For i386, simply jump to resumectx() with PCB in %ecx.
- Fix a style(9) nit while I am here.


236938 12-Jun-2012 iwasaki

Share IPI init and startup code of mp_machdep.c with acpi_wakeup.c
as ipi_startup().


236830 10-Jun-2012 iwasaki

Some fixes for r236772.

- Remove cpuset stopped_cpus which is no longer used.
- Add a short comment for cpuset suspended_cpus clearing.
- Fix the un-ordered x86/acpica/acpi_wakeup.c in conf/files.amd64 and i386.

Pointed-out by: attilio@


236772 09-Jun-2012 iwasaki

Add x86/acpica/acpi_wakeup.c for amd64 and i386. Difference of
suspend/resume procedures are minimized among them.

common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().

amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().

i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.

Reviewed by: attilio@, acpi@


236534 04-Jun-2012 alc

Various small changes to PV entry management:

Constify pc_freemask[].

pmap_pv_reclaim()
Eliminate "freemask" because it was a pessimization. Add a comment about
the resident count adjustment.

free_pv_entry() [i386 only]
Merge an optimization from amd64 (r233954).

get_pv_entry()
Eliminate the move to tail of the pv_chunk on the global pv_chunks list.
(The right strategy needs more thought. Moreover, there were unintended
differences between the amd64 and i386 implementation.)

pmap_remove_pages()
Eliminate unnecessary ()'s.


236503 03-Jun-2012 avg

free wdog_kern_pat calls in post-panic paths from under SW_WATCHDOG

Those calls are useful with hardware watchdog drivers too.

MFC after: 3 weeks


236494 02-Jun-2012 alc

Isolate the global pv list lock from data and other locks to prevent false
sharing within the cache.


236378 01-Jun-2012 alc

Eliminate code duplication in free_pv_entry() and pmap_remove_pages() by
introducing free_pv_chunk().


236291 30-May-2012 alc

Eliminate some purely stylistic differences among the amd64, i386 native,
and i386 xen PV entry allocators.


236213 29-May-2012 kevlo

Make sure that each va_start has one and only one matching va_end,
especially in error cases.


236190 28-May-2012 alc

Update a comment in get_pv_entry() to reflect the changes to the
synchronization of pv_vafree in r236158.


236158 27-May-2012 alc

Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c. This new r/w lock is used primarily to synchronize access
to the PV lists. However, it will be used in a somewhat unconventional
way. As finer-grained PV list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

X-MFC after: r236045


236045 26-May-2012 alc

Rename pmap_collect() to pmap_pv_reclaim() and rewrite it such that it no
longer uses the active and inactive paging queues. Instead, the pmap now
maintains an LRU-ordered list of pv entry pages, and pmap_pv_reclaim() uses
this list to select pv entries for reclamation.

Note: The old pmap_collect() tried to avoid reclaiming mappings for pages
that have either a hold_count or a busy field that is non-zero. However,
this isn't necessary for correctness, and the locking in pmap_collect() was
insufficient to guarantee that such mappings weren't reclaimed. The new
pmap_pv_reclaim() doesn't even try.

MFC after: 5 weeks


235912 24-May-2012 alc

MF amd64 r233097, r233122

With the changes over the past year to how accesses to the page's dirty
field are synchronized, there is no need for pmap_protect() to acquire
the page queues lock unless it is going to access the pv lists or
PMAP1/PADDR1.

Style fix to pmap_protect().


235683 20-May-2012 iwasaki

Remove cpususpend IDT vector for XEN.
This broke XEN kernel building.


235622 18-May-2012 iwasaki

Add SMP/i386 suspend/resume support.
Most part is merged from amd64.

- i386/acpica/acpi_wakecode.S
Replaced with amd64 code (from realmode to paging enabling code).

- i386/acpica/acpi_wakeup.c
Replaced with amd64 code (except for wakeup_pagetables stuff).

- i386/include/pcb.h
- i386/i386/genassym.c
Added PCB new members (CR0, CR2, CR4, DS, ED, FS, SS, GDT, IDT, LDT
and TR) needed for suspend/resume, not for context switch.

- i386/i386/swtch.s
Added suspendctx() and resumectx().
Note that savectx() was not changed and used for suspending (while
amd64 code uses it).
BSP and AP execute the same sequence, suspendctx(), acpi_wakecode()
and resumectx() for suspend/resume (in case of UP system also).

- i386/i386/apic_vector.s
Added cpususpend().

- i386/i386/mp_machdep.c
- i386/include/smp.h
Added cpususpend_handler().

- i386/include/apicvar.h
- kern/subr_smp.c
- sys/smp.h
Added IPI_SUSPEND and suspend_cpus().

- i386/i386/initcpu.c
- i386/i386/machdep.c
- i386/include/md_var.h
- pc98/pc98/machdep.c
Moved initializecpu() declarations to md_var.h.

MFC after: 3 days


234723 26-Apr-2012 attilio

Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by: gibbs
Reviewed by: jhb, marius
MFC after: 1 week


234350 16-Apr-2012 jkim

- When interrupt is not requested for VM86 call, make a fake exit point and
push the address onto stack as we do for INTn emulation. This avoids stack
underflow when we encounter RETF instruction in VM86 mode. Lack of this
exit point actually caused page fault in VM86 mode with VESA module when we
resume from suspend state[1].
- Remove unnecessary CLI and STI instructions from BIOS interrupt emulation.
INTn and IRET must be able to emulate the flag correctly.

Reported by: gavin [1]
Tested by: gavin (early revision)
MFC after: 3 days


234208 13-Apr-2012 avg

add actual interrupt counters to back ipi_invlcache_counts

Otherwise one could run into a panic with COUNT_IPIS when cache
invalidation actually happened.

Reviewed by: jhb
MFC after: 1 week


234105 10-Apr-2012 marius

Fix !SMP build after r234074.

Reviewed by: attilio, jhb


234074 09-Apr-2012 attilio

BSP is not added to the mask of valid target CPUs for interrupts
in set_apic_interrupt_ids(). Besides, set_apic_interrupts_ids() is not
called in the !SMP case too.
Fix this by:
- Adding the BSP as an interrupt target directly in cpu_startup().
- Remove an obsolete optimization where the BSP are skipped in
set_apic_interrupt_ids().

Reported by: jh
Reviewed by: jhb
MFC after: 3 days
X-MFC: r233961
Pointy hat to: me


234059 09-Apr-2012 jhb

Recognize the RDRAND instruction feature.

Submitted by: Michael Fuckner michael fuckner net
MFC after: 3 days


233781 02-Apr-2012 jhb

Make machine check exception logging more readable. On newer Intel systems,
an uncorrected ECC error tends to fire on all CPUs in a package
simultaneously and the current printf hacks are not sufficient to make
the messages legible. Instead, use the existing mca_lock spinlock to
serialize calls to mca_log() and change the machine check code to panic
directly when an unrecoverable error is encoutered rather than falling
back to a trap_fatal() call in trap() (which adds nearly a screen-full of
logging messages that aren't useful for machine checks).

MFC after: 2 weeks


233707 30-Mar-2012 jhb

Move the legacy(4) driver to x86.


233676 29-Mar-2012 jhb

Use a more proper fix for enabling HT MSI mapping windows on Host-PCI
bridges. Rather than blindly enabling the windows on all of them, only
enable the window when an MSI interrupt is enabled for a device behind
the bridge, similar to what already happens for HT PCI-PCI bridges.

To implement this, each x86 Host-PCI bridge driver has to be able to
locate it's actual backing device on bus 0. For ACPI, use the _ADR
method to find the slot and function of the device. For the non-ACPI
case, the legacy(4) driver already scans bus 0 looking for Host-PCI
bridge devices. Now it saves the slot and function of each bridge that
it finds as ivars that the Host-PCI bridge driver can then use in its
pcib_map_msi() method.

This fixes machines where non-MSI interrupts were broken by the previous
round of HT MSI changes.

Tested by: bapt
MFC after: 1 week


233628 28-Mar-2012 fabient

Add software PMC support.

New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).

Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.

Sponsored by: NETASQ
MFC after: 1 month


233433 24-Mar-2012 alc

Disable detailed PV entry accounting by default. Add a config option
to enable it.

MFC after: 1 week


233291 22-Mar-2012 alc

Handle spurious page faults that may occur in no-fault sections of the
kernel.

When access restrictions are added to a page table entry, we flush the
corresponding virtual address mapping from the TLB. In contrast, when
access restrictions are removed from a page table entry, we do not
flush the virtual address mapping from the TLB. This is exactly as
recommended in AMD's documentation. In effect, when access
restrictions are removed from a page table entry, AMD's MMUs will
transparently refresh a stale TLB entry. In short, this saves us from
having to perform potentially costly TLB flushes. In contrast,
Intel's MMUs are allowed to generate a spurious page fault based upon
the stale TLB entry. Usually, such spurious page faults are handled
by vm_fault() without incident. However, when we are executing
no-fault sections of the kernel, we are not allowed to execute
vm_fault(). This change introduces special-case handling for spurious
page faults that occur in no-fault sections of the kernel.

In collaboration with: kib
Tested by: gibbs (an earlier version)

I would also like to acknowledge Hiroki Sato's assistance in
diagnosing this problem.

MFC after: 1 week


233168 19-Mar-2012 kib

If we ever allow for managed fictitious pages, the pages shall be
excluded from superpage promotions. At least one of the reason is
that pv_table is sized for non-fictitious pages only.

Consistently check for the page to be non-fictitious before accesing
superpage pv list.

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
MFC after: 2 weeks


233031 16-Mar-2012 nyan

- Fix to build a native i386 kernel without the SMP and atpic.
- Merge r232744 changes to pc98.
(Allow a kernel to be built with 'nodevice atpic'.)
- Move ICU related defines from x86/isa/atpic.c to x86/isa/icu.h and
use them in x86/x86/intr_machdep.c.

Reviewed by: jhb


232851 12-Mar-2012 alc

Simplify the error checking in one branch of trap_pfault() and update
the nearby comment.

Correct the style of two return statements in trap_pfault().

Merge a comment from amd64's trap_pfault().


232747 09-Mar-2012 jhb

Move i386's intr_machdep.c to the x86 tree and share it with amd64.


232744 09-Mar-2012 jhb

Allow a native i386 kernel to be built with 'nodevice atpic'. Just as on
amd64, if 'device isa' is present quiesce the 8259A's during boot and
resume from suspend.

While here, be more selective on amd64 about which kernel configurations
need elcr.c.

MFC after: 2 weeks


232231 27-Feb-2012 jhb

MFamd64: Don't whine about interrupts being disabled for an NMI.


230767 30-Jan-2012 kib

Finally, try to enable the nxstacks on amd64 and powerpc64 for both 64bit
and 32bit ABIs. Also try to enable nxstacks for PAE/i386 when supported,
and some variants of powerpc32.

MFC after: 2 months (if ever)


230426 21-Jan-2012 kib

Add support for the extended FPU states on amd64, both for native
64bit and 32bit ABIs. As a side-effect, it enables AVX on capable
CPUs.

In particular:

- Query the CPU support for XSAVE, list of the supported extensions
and the required size of FPU save area. The hw.use_xsave tunable is
provided for disabling XSAVE, and hw.xsave_mask may be used to
select the enabled extensions.

- Remove the FPU save area from PCB and dynamically allocate the
(run-time sized) user save area on the top of the kernel stack,
right above the PCB. Reorganize the thread0 PCB initialization to
postpone it after BSP is queried for save area size.

- The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as
well. FPU state is only useful for suspend, where it is saved in
dynamically allocated suspfpusave area.

- Use XSAVE and XRSTOR to save/restore FPU state, if supported and
enabled.

- Define new mcontext_t flag _MC_HASFPXSTATE, indicating that
mcontext_t has a valid pointer to out-of-struct extended FPU
state. Signal handlers are supplied with stack-allocated fpu
state. The sigreturn(2) and setcontext(2) syscall honour the flag,
allowing the signal handlers to inspect and manipilate extended
state in the interrupted context.

- The getcontext(2) never returns extended state, since there is no
place in the fixed-sized mcontext_t to place variable-sized save
area. And, since mcontext_t is embedded into ucontext_t, makes it
impossible to fix in a reasonable way. Instead of extending
getcontext(2) syscall, provide a sysarch(2) facility to query
extended FPU state.

- Add ptrace(2) support for getting and setting extended state; while
there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries.

- Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to
consumers, making it opaque. Internally, struct fpu_kern_ctx now
contains a space for the extended state. Convert in-kernel consumers
of fpu_kern KPI both on i386 and amd64.

First version of the support for AVX was submitted by Tim Bird
<tim.bird am sony com> on behalf of Sony. This version was written
from scratch.

Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org>
MFC after: 1 month


229085 31-Dec-2011 gavin

Default to not performing the early-boot memory tests when we detect we
are booting inside a VM. There are three reasons to disable this:

o It causes the VM host to believe that all the tested pages or RAM are
in use. This in turn may force the host to page out pages of RAM
belonging to other VMs, or otherwise cause problems with fair resource
sharing on the VM cluster.
o It adds significant time to the boot process (around 1 second/Gig in
testing)
o It is unnecessary - the host should have already verified that the
memory is functional etc.

Note that this simply changes the default when in a VM - it can still be
overridden using the hw.memtest.tests tunable.

MFC after: 4 weeks


228962 29-Dec-2011 jhb

Use curthread rather than PCPU_GET(curthread). 'curthread' uses
special-case optimizations on several platforms and is preferred.

Reported by: dim (indirectly)
MFC after: 2 weeks


228935 28-Dec-2011 alc

Fix a bug in the Xen pmap's implementation of pmap_extract_and_hold():
If the page lock acquisition is retried, then the underlying thread is
not unpinned.

Wrap nearby lines that exceed 80 columns.


228923 27-Dec-2011 alc

Eliminate many of the unnecessary differences between the native and
paravirtualized pmap implementations for i386. This includes some
style fixes to the native pmap and several bug fixes that were not
previously applied to the paravirtualized pmap.

Tested by: sbruno
MFC after: 3 weeks


228535 15-Dec-2011 alc

Simplify the implementation of the identity mapping in start_all_aps().
Since mpboot.s enables processor support for PG_PS before enabling
paging, there is no reason that the identity must use 4 KB page mappings.

Discussed with: jhb


228513 15-Dec-2011 alc

Create large page mappings in pmap_map().

MFC after: 6 weeks


227843 22-Nov-2011 marius

- There's no need to overwrite the default device method with the default
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.


227442 11-Nov-2011 kib

Weaken the part of assertions added in the r227394. Only check that the
process state is stopped.

MFC after: 1 week


227399 09-Nov-2011 kib

Attempt to improve formatting and content of several comments for
amd64 and i386 MD code.

Based on suggestions by: bde
MFC after: 1 week


227394 09-Nov-2011 kib

Stopped process may legitimately have some threads sleeping and not
suspended, if the sleep is uninterruptible.

Reported and tested by: pho
MFC after: 1 week


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227290 07-Nov-2011 rstone

Fix the DTrace pid return trap interrupt vector. Previously we were using
31, but that vector is reserved.

Without this fix, running dtrace -p <pid> would either cause the target
process to crash or the kernel to page fault.

Obtained from: rpaulo
MFC after: 3days


226925 30-Oct-2011 marcel

Revert rev. 226893: subr_syscall.c is being included from C files and
on amd64 with FREEBSD32 enabled, this means that systrace_probe_func
gets defined twice.


226893 29-Oct-2011 marcel

Define systrace_probe_func in subr_syscall.c where it's used, instead
of defining it in MD code. This eliminates porting to other architectures.


226843 27-Oct-2011 alc

Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls to
vm_page_alloc(). While I'm here, for the sake of consistency, always
specify the allocation class, such as VM_ALLOC_NORMAL, as the first of
the flags.


226498 18-Oct-2011 des

Trace attempts to call restricted MD syscalls.


225943 03-Oct-2011 kib

Do not allow the kernel to access usermode pages without installed
fault handler. Panic immediately in such situation, on i386 and amd64.

Reviewed by: avg, jhb
MFC after: 1 week


225936 03-Oct-2011 attilio

Add some improvements in the idle table callbacks:
- Replace instances of manual assembly instruction "hlt" call
with halt() function calling.
- In cpu_idle_mwait() avoid races in check to sched_runnable() using
the same pattern used in cpu_idle_hlt() with the 'hlt' instruction.
- Add comments explaining the logic behind the pattern used in
cpu_idle_hlt() and other idle callbacks.

In collabouration with: jhb, mav
Reviewed by: adri, kib
MFC after: 3 weeks


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


225474 11-Sep-2011 kib

Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by: jhb
Approved by: re (bz)
MFC after: 2 weeks


225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


225048 20-Aug-2011 bz

In HEAD when doing no further checkes there is no reason use the
temporary variable and check with if as TUNABLE_*_FETCH do not
alter values unless successfully found the tunable.

Reported by: jhb, bde
MFC after: 3 days
X-MFC with: r224516
Approved by: re (kib)


224746 09-Aug-2011 kib

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


224516 30-Jul-2011 bz

Introduce a tunable to disable the time consuming parts of bootup
memtesting, which can easily save seconds to minutes of boot time.
The tunable name is kept general to allow reusing the code in
alternate frameworks.

Requested by: many
Discussed on: arch (a while a go)
Obtained from: Sandvine Incorporated
Reviewed by: sbruno
Approved by: re (kib)
MFC after: 2 weeks


224187 18-Jul-2011 attilio

- Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
This area can be widely improved by applying the same to other
architectures and likely finding an unified approach among them and
move the whole code to be MI. More work in this area is expected to
happen fairly soon.

No MFC is previewed for this patch.

Tested by: pluknet
Reviewed by: jhb
Approved by: re (kib)


223758 04-Jul-2011 attilio

With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by: marcel, marius, alc
Tested by: pluknet
MD testing by: marcel, marius, gonzo, andreast


223732 02-Jul-2011 alc

When iterating over a paging queue, explicitly check for PG_MARKER, instead
of relying on zeroed memory being interpreted as an empty PV list.

Reviewed by: kib


223692 30-Jun-2011 jonathan

Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.

Approved by: mentor(rwatson), re(bz)


223677 29-Jun-2011 alc

Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing this
option to vm_object_page_remove() asserts that the specified range of pages
is not mapped, or more precisely that none of these pages have any managed
mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on
the pages.

This change not only saves time by eliminating pointless calls to
pmap_remove_all(), but it also eliminates an inconsistency in the use of
pmap_remove_all() versus related functions, like pmap_remove_write(). It
eliminates harmless but pointless calls to pmap_remove_all() that were being
performed on PG_UNMANAGED pages.

Update all of the existing assertions on pmap_remove_all() to reflect this
change.

Reviewed by: kib


223668 29-Jun-2011 jonathan

We may split today's CAPABILITIES into CAPABILITY_MODE (which has
to do with global namespaces) and CAPABILITIES (which has to do with
constraining file descriptors). Just in case, and because it's a better
name anyway, let's move CAPABILITIES out of the way.

Also, change opt_capabilities.h to opt_capsicum.h; for now, this will
only hold CAPABILITY_MODE, but it will probably also hold the new
CAPABILITIES (implying constrained file descriptors) in the future.

Approved by: rwatson
Sponsored by: Google UK Ltd


222929 10-Jun-2011 jhb

Implement BUS_ADJUST_RESOURCE() for the x86 drivers that sit between the
Host-PCI bridge drivers and nexus.


222853 08-Jun-2011 avg

remove code for dynamic offlining/onlining of CPUs on x86

The code has definitely been broken for SCHED_ULE, which is a default
scheduler. It may have been broken for SCHED_4BSD in more subtle ways,
e.g. with manually configured CPU affinities and for interrupt devilery
purposes.
We still provide a way to disable individual CPUs or all hyperthreading
"twin" CPUs before SMP startup. See the UPDATING entry for details.

Interaction between building CPU topology and disabling CPUs still
remains fuzzy: topology is first built using all availble CPUs and then
the disabled CPUs should be "subtracted" from it. That doesn't work
well if the resulting topology becomes non-uniform.

This work is done in cooperation with Attilio Rao who in addition to
reviewing also provided parts of code.

PR: kern/145385
Discussed with: gcooper, ambrisko, mdf, sbruno
Reviewed by: attilio
Tested by: pho, pluknet
X-MFC after: never


222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


222756 06-Jun-2011 avg

don't use cpuid level 4 in x86 cpu topology detection if it's not supported

This regression was introduced in r213323.
There are probably no Intel cpus that support amd64 mode, but do not
support cpuid level 4, but it's better to keep i386 and amd64 versions
of this code in sync.

Discovered by: pho
Tested by: pho
MFC after: 2 weeks


222043 17-May-2011 jkim

Update CPUID bits to reflect AMD Bulldozer and Intel Sandy Bridge features.
Note AMD dropped SSE5 extensions in order to avoid ISA overlap with Intel
AVX instructions. The SSE5 bit was recycled as XOP extended instruction
bit, CVT16 was deprecated in favor of F16C (half-precision float conversion
instructions for AVX), and the remaining FMA4 (4-operand FMA instructions)
gained a separate CPUID bit. Replace non-existent references with today's
CPUID specifications.


221835 13-May-2011 mav

Refactor Xen PV code to use new event timers subsystem. That uses one-shot
Xen timer and time counter to provide one-shot and periodic time events.

On my tests this reduces idle interruts rate down to about 30Hz, and accor-
ding to Xen VM Manager reduces host CPU load by three times comparing to
the previous periodic 100Hz clock. Also now, when needed, it is possible to
increase HZ rate without useless CPU burning during idle periods.

Now only ia64 and some ARMs left not migrated to the new event timers.


221599 07-May-2011 mav

Don't use MWAIT for short sleeps under XEN, as it was before r212541.
This fixes panic during boot in PV mode on Xen 3.2.


221527 06-May-2011 avg

prepare code that does topology detection for amd cpus for bulldozer

This also introduces a new detection path for family 10h and newer
pre-bulldozer cpus, pre-10h hardware should not be affected.

Tested by: Gary Jennejohn <gljennjohn@googlemail.com>
(with pre-10h hardware)
MFC after: 2 weeks


221188 28-Apr-2011 jkim

Define "Hypervisor Present" bit. This bit is used by several hypervisors to
identify CPUs running under emulation. Currently QEMU-KVM, Xen-HVM, VMware,
and MS Hyper-V are known to set this bit.

MFC after: 3 days


221173 28-Apr-2011 attilio

Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by: Sandvine Incorporated
Discussed with: emaste, des
MFC after: 2 weeks


221032 25-Apr-2011 rmacklem

Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.
Move the files used for a diskless NFS root from sys/nfsclient
to sys/nfs in preparation for them to be used by both NFS
clients. Also, move the declaration of the three global data
structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c
so that they are defined when either client uses them.

Reviewed by: jhb
MFC after: 2 weeks


220803 18-Apr-2011 kib

Make pmap_invalidate_cache_range() available for consumption on amd64.

Add pmap_invalidate_cache_pages() method on x86. It flushes the CPU
cache for the set of pages, which are not neccessary mapped. Since its
supposed use is to prepare the move of the pages ownership to a device
that does not snoop all CPU accesses to the main memory (read GPU in
GMCH), do not rely on CPU self-snoop feature.

amd64 implementation takes advantage of the direct map. On i386,
extract the helper pmap_flush_page() from pmap_page_set_memattr(), and
use it to make a temporary mapping of the flushed page.

Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks


220584 13-Apr-2011 jkim

Reduce errors in effective frequency calculation.


220583 12-Apr-2011 jkim

Reinstate cpu_est_clockrate() support for P-state invariant TSC if APERF and
MPERF MSRs are available. It was disabled in r216443. Remove the earlier
hack to subtract 0.5% from the calibrated frequency as DELAY(9) is little
bit more reliable now.


220579 12-Apr-2011 jkim

Probe capability to find effective frequency. When the TSC is P-state
invariant, APERF/MPERF ratio can be used to find effective frequency.


220453 08-Apr-2011 rstone

Add tunables that mirror the functionality of sysctls machdep.panic_on_nmi
and machdep.kdb_on_nmi.

Approved by: emaste (mentor)
MFC after: 1 week


220433 07-Apr-2011 jkim

Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now. More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly). Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).


220404 07-Apr-2011 jkim

Implement atomic_load_acq_64(9) and atomic_store_rel_64(9) for i386. These
functions are implemented with CMPXCHG8B instruction where it is available,
i. e., all Pentium-class and later processors. Note this instruction is
also used for atomic_store_rel_64() because a simple XCHG-like instruction
for 64-bit memory access does not exist, unfortunately. If the processor
lacks the instruction, i. e., 80486-class CPUs, two 32-bit load/store are
performed with interrupt temporarily disabled, assuming it does not support
SMP. Although this assumption may be little naive, it is true in reality.
This implementation is inspired by Linux.


220018 26-Mar-2011 jkim

Improve CPU identifications of various IDT/Centaur/VIA, Rise and Transmeta
CPUs. These CPUs need explicit MSR configuration to expose ceratin CPU
capabilities (e.g., CMPXCHG8B) to work around compatibility issues with
ancient software. Unfortunately, Rise mP6 does not set the CX8 bit in CPUID
and there is no MSR to expose the feature although all mP6 processors are
capable of CMPXCHG8B according to datasheets I found from the Net. Clean up
and simplify VIA PadLock detection while I am in the neighborhood.


219673 15-Mar-2011 jkim

Deprecate tsc_present as the last of its real consumers finally disappeared.


219473 11-Mar-2011 jkim

Add a tunable "machdep.disable_tsc" to turn off TSC. Specifically, it turns
off boot-time CPU frequency calibration, DELAY(9) with TSC, and using TSC as
a CPU ticker. Note tsc_present does not change by this tunable.


219467 10-Mar-2011 jkim

Detect NSC/AMD Geode SC1100 properly, not just Stepping 0. Although it is
unclear that "TSC stops ticking with HLT instruction" problem is present
with other steppings, it is limited to Stepping 0 for now.


219461 10-Mar-2011 jkim

Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.


219405 08-Mar-2011 dchagin

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


219134 01-Mar-2011 rwatson

Continue to introduce Capsicum capability mode:

White list sysarch calls allowed in capability mode; arguably, there
should be some link between the capability mode model and the privilege
model here. Sysarch is a morass similar to ioctl, in many senses.

Submitted by: anderson
Discussed with: benl, kris, pjd
Sponsored by: Google, Inc.
Obtained from: Capsicum Project
MFC after: 3 months


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218744 16-Feb-2011 dchagin

To avoid excessive code duplication create wrapper for fill regs
from stack frame. Change the trap() code to use newly created function
instead of explicit regs assignment.


218329 05-Feb-2011 kib

Fix linking of the kernel without device npx.

MFC after: 2 weeks


218327 05-Feb-2011 kib

Clear the padding when returning context to the usermode, for
MI ucontext_t and x86 MD parts.
Kernel allocates the structures on the stack, and not clearing
reserved fields and paddings causes leakage.

Noted and discussed with: bde
MFC after: 2 weeks


218195 02-Feb-2011 mdf

Put the general logic for being a CPU hog into a new function
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after: 1 week


217886 26-Jan-2011 mdf

Set td_kstack_pages for thread0. This was already being done for most
architectures, but i386 and amd64 were missing it.

Submitted by: Mohd Fahadullah <mfahadullah AT isilon DOT com>


217688 21-Jan-2011 pluknet

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


217587 19-Jan-2011 jkim

Fix yet another fallout from r208833. VM86 BIOS call may cause page fault
when FPU is in use.

Reported by: Marc UBM Bocklet (ubm dot freebsd at googlemail dot com)
Tested by: b. f. (bf1783 at googlemail dot com)
MFC after: 3 days


217561 18-Jan-2011 kib

For architectures not using direct map , and requiring real KVA page for
sf buf allocation, use wakeup() instead of wakeup_one() to notify sf
buffer waiters about free buffer.

sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given,
and for simultaneous wakeup and signal delivery, msleep() returns
EINTR/ERESTART despite the thread was selected for wakeup_one(). As
result, we loose a wakeup, and some other waiter will not be woken up.

Reported and tested by: az
Reviewed by: alc, jhb
MFC after: 1 week


217543 18-Jan-2011 jhb

- Remove some always-true checks (checking for unsigned < 0).
- Only check largs->num against max_ldt_segment on amd64 for I386_SET_LDT
when descriptors are provided. Specifically, allow the 'start == 0'
and 'num == 0' special case used to free all LDT entries that previously
failed with EINVAL.

Submitted by: clang via rdivacky (some of 1)
Reviewed by: kib


217515 17-Jan-2011 jkim

Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.

MFC after: 1 month


217506 17-Jan-2011 jkim

Avoid preemption while manipulating CRs and MTRRs.

Tested by: ariff


217360 13-Jan-2011 jhb

If an interrupt on an I/O APIC is moved to a different CPU after it has
started to execute, it seems that the corresponding ISR bit in the "old"
local APIC can be cleared. This causes the local APIC interrupt routine
to fail to find an interrupt to service. Rather than panic'ing in this
case, simply return from the interrupt without sending an EOI to the
local APIC. If there are any other pending interrupts in other ISR
registers, the local APIC will assert a new interrupt.

Tested by: steve


216847 31-Dec-2010 cperciva

Make i386_set_ldt work on i386/XEN, step 5/5.

When cleaning up a thread, reset its LDT to the default LDT.

Note: Casting the LDT pointer to an int and storing it in pc_currentldt is
wildly bogus, but is harmless since pc_currentldt is a write-only variable.

MFC after: 3 days


216846 31-Dec-2010 cperciva

Make i386_set_ldt work on i386/XEN, step 4/5.

Use xen_update_descriptor to update the LDT rather than bcopy. Under Xen,
pages used for holding LDTs must be read-only, so we can't make the change
ourselves.

Ths obvious alternative of "remap the page read-write, make the change, then
map it read-only again" doesn't work since Xen won't allow an LDT page to be
remapped as R/W. An arguably better solution is used by NetBSD: They don't
modify LDTs in-place at all, but instead copy the entire LDT, modify the new
version, then atomically swap.

MFC after: 3 days


216845 31-Dec-2010 cperciva

Make i386_set_ldt work on i386/XEN, step 3/5.

Synchronize reality with comment: The user_ldt_alloc function is supposed to
return with dt_lock held. Due to broken locking in i386/xen/pmap.c, we drop
dt_lock during the call to pmap_map_readonly and then pick it up again; this
can be removed once the Xen pmap locking is fixed.

MFC after: 3 days


216555 19-Dec-2010 alc

Redo some parts of r216333, specifically, the locking changes to
pmap_extract_and_hold(), and undo the rest. In particular, I forgot
that PG_PS and PG_PTE_PAT are the same bit.


216516 18-Dec-2010 kib

In pmap_extract(), unlock pmap lock earlier. The calculation does not need
the lock when operating on local variables.

Reviewed by: alc


216443 14-Dec-2010 jkim

Stop lying about supporting cpu_est_clockrate() when TSC is invariant. This
function always returned the nominal frequency instead of current frequency
because we use RDTSC instruction to calculate difference in CPU ticks, which
is supposedly constant for the case. Now we support cpu_get_nominal_mhz()
for the case, instead. Note it should be just enough for most usage cases
because cpu_est_clockrate() is often times abused to find maximum frequency
of the processor.


216333 09-Dec-2010 alc

When r207410 eliminated the acquisition and release of the page queues
lock from pmap_extract_and_hold(), it didn't take into account that
pmap_pte_quick() sometimes requires the page queues lock to be held.
This change reimplements pmap_extract_and_hold() such that it no
longer uses pmap_pte_quick(), and thus never requires the page queues
lock.

For consistency, adopt the same idiom as used by the new
implementation of pmap_extract_and_hold() in pmap_extract() and
pmap_mincore(). It also happens to make these functions shorter.

Fix a style error in pmap_pte().

Reviewed by: kib@


216316 09-Dec-2010 cperciva

Replace i386/i386/busdma_machdep.c and amd64/amd64/busdma_machdep.c
(which are identical) with a single x86/x86/busdma_machdep.c.


216312 08-Dec-2010 jkim

Do not subtract 0.5% from estimated frequency if DELAY(9) is driven by TSC.
Remove a confusing comment about converting to MHz as we never did.


216308 08-Dec-2010 cperciva

On amd64, we have (since r1.72, in December 2005) MAX_BPAGES=8192,
while on i386 we have MAX_BPAGES=512. Implement this difference via
'#ifdef __i386__'.

With this commit, the i386 and amd64 busdma_machdep.c files become
identical; they will soon be replaced by a single file under sys/x86.


216283 08-Dec-2010 jkim

Merge sys/amd64/amd64/tsc.c and sys/i386/i386/tsc.c and move to sys/x86/x86.

Discussed with: avg


216279 07-Dec-2010 jkim

Use int for 'tsc_present' instead of u_int. It is just a boolean.


216276 07-Dec-2010 jkim

Remove stale comments about P-state invariant TSC and fix style(9) nits.


216275 07-Dec-2010 jkim

Do not register a event handler for CPU freqency changes when it is found
P-state invariant. This is continuation of r216274.


216274 07-Dec-2010 jkim

Now the P-state invariant TSC is probed early enough, do not register event
handlers for CPU freqency changes when it is found P-state invariant.
Adjust a comment about non-existent tsc_freq_max() while I am here.


216272 07-Dec-2010 jkim

Probe P-state invariant TSC from rightful place.


216194 05-Dec-2010 cperciva

MFamd64 r204214: Enforce stronger alignment semantics (require that the
end of segments be aligned, not just the start of segments) in order to
allow Xen's blkfront driver to operate correctly.

PR: kern/152818
MFC after: 3 days


216191 04-Dec-2010 cperciva

Remove gratuitous i386/amd64 inconsistency in favour of the less verbose
version of declaring a variable initialized to zero.


216190 04-Dec-2010 cperciva

Remove unnecessary #includes which seem to have been accidentally added
as part of CVS r1.76 (in January 2006).


216163 03-Dec-2010 jkim

Revert r216161. It is not necessary because we zero-fill BSS anyway.

Requested by: jhb


216161 03-Dec-2010 jkim

Explicitly initialize TSC frequency. To calibrate TSC frequency, we use
DELAY(9) and it may use TSC in turn if TSC frequency is non-zero.

MFC after: 3 days


216159 03-Dec-2010 jkim

Do not change CPU ticker frequency if TSC is P-state invariant. Note this
change was meant to be committed with r184102 (and its subsequent MFCs) but
it fell off somehow.

Pointyhat to: jkim
MFC after: 3 days


216041 29-Nov-2010 cperciva

Fix bug introduced by r194784: Under XEN, the page(s) allocated to dpcpu
for CPU #0 weren't being properly reserved. Under VM pressure this would
cause problems when the dpcpu structures were overwritten by arbitrary
data; the most common symptom was a panic when netisr attempted to lock a
mutex.

For some reason the XEN code keeps track of the start of available memory
in the variables 'first', 'physfree', and 'init_first'; as far as I can
tell, we always have first == physfree == init_first * PAGE_SIZE. The
earlier commit adjusted 'first' (which, on !XEN, is the only variable
which tracks this value) but not the other two variables.

Exercise for reader: Eliminate two of these three variables.


216012 28-Nov-2010 kib

Calling fill_fpregs() for curthread is legitimate, and ELF coredump
does this.

Reported and tested by: pho
MFC after: 5 days


215865 26-Nov-2010 kib

Remove npxgetregs(), npxsetregs(), fpugetregs() and fpusetregs()
functions, they are unused. Remove 'user' from npxgetuserregs()
etc. names.

For {npx,fpu}{get,set}regs(), always use pcb->pcb_user_save for FPU
context storage. This eliminates the need for ugly copying with
overwrite of the newly added and reserved fields in ucontext on i386
to satisfy alignment requirements for fpusave() and fpurstor().

pc98 version was copied from i386.

Suggested and reviewed by: bde
Tested by: pho (i386 and amd64)
MFC after: 1 week


215854 26-Nov-2010 uqs

Remove kernel support for BB profiling, now that kernbb(8) is gone, too.

PR: bin/83558
Reviewed by: jkim


215754 23-Nov-2010 jkim

Remove a stale tunable introduced in r215703.


215703 22-Nov-2010 jkim

- Disable caches and flush caches/TLBs when we update PAT as we do for MTRR.
Flushing TLBs is required to ensure cache coherency according to the AMD64
architecture manual. Flushing caches is only required when changing from a
cacheable memory type (WB, WP, or WT) to an uncacheable type (WC, UC, or
UC-). Since this function is only used once per processor during startup,
there is no need to take any shortcuts.
- Leave PAT indices 0-3 at the default of WB, WT, UC-, and UC. Program 5 as
WP (from default WT) and 6 as WC (from default UC-). Leave 4 and 7 at the
default of WB and UC. This is to avoid transition from a cacheable memory
type to an uncacheable type to minimize possible cache incoherency. Since
we perform flushing caches and TLBs now, this change may not be necessary
any more but we do not want to take any chances.
- Remove Apple hardware specific quirks. With the above changes, it seems
this hack is no longer needed.
- Improve pmap_cache_bits() with an array to map PAT memory type to index.
This array is initialized early from pmap_init_pat(), so that we do not need
to handle special cases in the function any more. Now this function is
identical on both amd64 and i386.

Reviewed by: jhb
Tested by: RM (reuf_m at hotmail dot com)
Ryszard Czekaj (rychoo at freeshell dot net)
army.of.root (army dot of dot root at googlemail dot com)
MFC after: 3 days


215523 19-Nov-2010 avg

specialreg.h: add AMD-specific "Hardware Configuration Register" MSR

It seems that this MSR has been available in a range of AMD processors
families for quite a while now.

Note1: not all AMD MSRs that are found in amd64 specialreg.h are also in
the i386 version.
Note2: perhaps some additional name component is needed to distinguish
AMD-specific MSRs.

MFC after: 5 days


215415 16-Nov-2010 jkim

Restore CR0 after MTRR initialization for correctness sakes. There will be
no noticeable change because we enable caches before we enter here for both
BSP and AP cases. Remove another pointless optimization for CR4.PGE bit
while I am here.


215414 16-Nov-2010 jkim

Invalidate TLBs explicitly. r1.4 of sys/i386/i386/i686_mem.c removed this
code but probably it only worked by chance because modifying CR4.PGE bit
causes invlidation of entire TLBs. Since these are very rare events, this
micro-optimization seems useless.

Reviewed by: jhb


215321 14-Nov-2010 kib

Do not use __FreeBSD_version prefix for the special osrel version.
The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot
grok several constants with the prefix.

Reported and tested by: swell.k gmail com
MFC after: 1 week


215309 14-Nov-2010 kib

Use symbolic names instead of hardcoding values for magic p_osrel constants.

MFC after: 1 week


215009 08-Nov-2010 jhb

Sync the APIC startup sequence with amd64:
- Register APIC enumerators at SI_SUB_TUNABLES - 1 instead of SI_SUB_CPU - 1.
- Probe CPUs at SI_SUB_TUNABLES - 1. This allows i386 to set a truly
accurate mp_maxid value rather than always setting it to MAXCPU - 1.


215008 08-Nov-2010 jhb

Remove stub symbols for APIC-related functions when 'device apic' is not
included in a kernel config. These stubs had existed previously so that
acpi.ko could always include the MADT parsing code and still link with a
kernel that did not include 'device apic'.


214938 07-Nov-2010 alc

Eliminate a possible race between pmap_pinit() and pmap_kenter_pde() on
superpage promotion or demotion.

Micro-optimize pmap_kenter_pde().

Reviewed by: kib, jhb (an earlier version)
MFC after: 1 week


214835 05-Nov-2010 jhb

Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger. Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock. However, trap interrupts for single-stepping
can still occur even when interrupts are disabled. Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased. Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero. To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with: bde
MFC after: 1 month


214774 04-Nov-2010 avg

x86 topo_probe: do not probe smp topology if only one cpu is visible

This could lead to a division by zero if hardware is multi-core and/or
multi-threaded, but for some (quite unusual) reason FreeBSD sees only
one logical processor. This could happen, for example, if neither MADT
nor MP Table are presented by BIOS.

Also:
- assert in topo_probe_0x4 that BSP is accounted for
- neither cpu_cores nor cpu_logical should be zero after successful
probing, so either being zero is an indication of failed probing

Reported by: vwe, Dan Allen <danallen46@airwired.net>
Tested by: Dan Allen <danallen46@airwired.net>
MFC after: 3 days


214681 02-Nov-2010 jhb

Further tweaks to the ram_attach() routine:
- Use > 2^32 - 1 instead of >= when checking for memory regions above 4G.
- Skip SMAP entries > 4G on i386 rather than breaking out of the loop
since SMAP entries are not guaranteed to be in order.
- Remove 'i' and loop over 'rid' directly in the dump_avail[] case.
- Only check for 4G regions in the dump_avail[] case on i386 if PAE is
enabled since vm_paddr_t is 32-bit in the !PAE case.

Submitted by: alc


214631 01-Nov-2010 jhb

Move <machine/apicreg.h> to <x86/apicreg.h>.


214630 01-Nov-2010 jhb

Move the <machine/mca.h> header to <x86/mca.h>.


214457 28-Oct-2010 attilio

Merge nexus.c from amd64 and i386 to x86 subtree.

Sponsored by: Sandvine Incorporated
Tested by: gianni


214448 28-Oct-2010 jhb

Use 'PCPU_GET(apic_id)' to determine the BSP's APIC ID on a UP machine
when routing interrupts instead of cpu_apic_ids[0] since cpu_apic_ids[]
is only populated for multiple-CPU machines. This also matches what the
code does when SMP is not enabled.

PR: bin/151616
Tested by: "Damian S. Kolodziejczyk" damkol | gmail
Submitted by: avg
MFC after: 1 week


214446 28-Oct-2010 attilio

Merge the mptable support from MD bits to x86 subtree.

Sponsored by: Sandvine Incorporated
Discussed with: jhb


214373 26-Oct-2010 attilio

Merge dump_machdep.c i386/amd64 under the x86 subtree.

Sponsored by: Sandvine Incorporated
Tested by: gianni


214347 25-Oct-2010 jhb

Use 'saveintr' instead of 'savecrit' or 'eflags' to hold the state returned
by intr_disable().

Requested by: bde


214346 25-Oct-2010 jhb

Use intr_disable() and intr_restore() instead of frobbing the flags register
directly to disable interrupts.

Reviewed by: bde (earlier version)
MFC after: 2 weeks


213748 12-Oct-2010 jkim

Remove trailing ", " from `sysctl machdep.idle_available' output.


213455 05-Oct-2010 alc

Initialize KPTmap in locore so that vm86.c can call vtophys() (or really
pmap_kextract()) before pmap_bootstrap() is called.

Document the set of pmap functions that may be called before
pmap_bootstrap() is called.

Tested by: bde@
Reviewed by: kib@
Discussed with: jhb@
MFC after: 6 weeks


213452 05-Oct-2010 kib

Display PCID capability of CPU and add CPUID define for it.

MFC after: 1 week


213323 01-Oct-2010 avg

i386 and amd64 mp_machdep: improve topology detection for Intel CPUs

This patch is significantly based on previous work by jkim.
List of changes:
- added comments that describe topology uniformity assumption
- added reference to Intel Processor Topology Enumeration article
- documented a few global variables that describe topology
- retired weirdly set and used logical_cpus variable
- changed fallback code for mp_ncpus > 0 case, so that CPUs are treated
as being different packages rather than cores in a single package
- moved AMD-specific code to topo_probe_amd [jkim]
- in topo_probe_0x4() follow Intel-prescribed procedure of deriving SMT
and core masks and match APIC IDs against those masks [started by
jkim]
- in topo_probe_0x4() drop code for double-checking topology parameters
by looking at L1 cache properties [jkim]
- in topo_probe_0xb() add fallback path to topo_probe_0x4() as
prescribed by Intel [jkim]

Still to do:
- prepare for upcoming AMD CPUs by using new mechanism of uniform
topology description [pointed by jkim]
- probe cache topology in addition to CPU topology and probably use that
for scheduler affinity topology; e.g. Core2 Duo and Athlon II X2 have
the same CPU topology, but Athlon cores do not share L2 cache while
Core2's do (no L3 cache in both cases)
- think of supporting non-uniform topologies if they are ever
implemented for platforms in question
- think how to better described old HTT vs new HTT distinction, HTT vs
SMT can be confusing as SMT is a generic term
- more robust code for marking CPUs as "logical" and/or "hyperthreaded",
use HTT mask instead of modulo operation
- correct support for halting logical and/or hyperthreaded CPUs, let
scheduler know that it shouldn't schedule any threads on those CPUs

PR: kern/145385 (related)
In collaboration with: jkim
Tested by: Sergey Kandaurov <pluknet@gmail.com>,
Jeremy Chadwick <freebsd@jdc.parodius.com>,
Chip Camden <sterling@camdensoftware.com>,
Steve Wills <steve@mouf.net>,
Olivier Smedts <olivier@gid0.org>,
Florian Smeets <flo@smeets.im>
MFC after: 1 month


213282 29-Sep-2010 neel

Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.

The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by: NetApp
Submitted by: Will McGovern (will at netapp dot com)
Reviewed by: mjacob, jhb


213256 29-Sep-2010 davidxu

Remove a redundant instruction for casuword.


213226 27-Sep-2010 jhb

Rewrite the i386 memory probe:
- Check for SMAP data from the loader first. If it exists, don't bother
doing any VM86 calls at all. This will be more friendly for non-BIOS
boot environments such as EFI, etc.
- Move the base memory setup into a new basemem_setup() routine instead
of duplicating it.
- Simplify the XEN case by removing all of the VM86/SMAP parsing code rather
than just jumping over it.
- Adjust some comments to better explain the code flow.

MFC after: 2 weeks


212541 13-Sep-2010 mav

Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.


212413 10-Sep-2010 avg

bus_add_child: change type of order parameter to u_int

This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to: r212213
MFC after: 10 days


211924 28-Aug-2010 rpaulo

Register an interrupt vector for DTrace return probes. There is some
code missing in lapic to make sure that we don't overwrite this entry,
but this will be done on a sequent commit.

Sponsored by: The FreeBSD Foundation


211839 26-Aug-2010 rpaulo

Sync DTrace bits with amd64 and fix the build.

Sponsored by: The FreeBSD Foundation


211804 25-Aug-2010 rpaulo

Call the necessary DTrace function pointers when we have different kinds
of traps.

Sponsored by: The FreeBSD Foundation


211518 19-Aug-2010 attilio

Revert part of the r211149 as I erroneously ported the logical_cpus from
Yahoo! patchset as a mask (and according manipulating variables) while
it is actually a CPU count.

Submitted by: neel
MFC after: 1 month
X-MFC: 211149


211515 19-Aug-2010 jhb

Remove unused KTRACE includes.


211424 17-Aug-2010 gahr

- The iMac9,1 needs the PAT workaround as well

Approved by: cognet


211220 12-Aug-2010 attilio

Revert r211176:
As long as interrupts are disabled and there is not explicit call to
sched_add() there can't be any preemption there, thus the calls may be
consistent.

Reported by: kib, jhb


211197 11-Aug-2010 jhb

Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int. Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.


211176 11-Aug-2010 attilio

IPI handlers may run generally with interrupts disabled because they
are served via an interrupt gate.

However, that doesn't explicitly prevent preemption and thread
migration thus scheduler pinning may be necessary in some handlers.
Fix that.

Tested by: gianni
MFC after: 1 month


211151 10-Aug-2010 attilio

Fix a typo due to a stale version of the patch.

Reported by: gianni, rdivacky
MFC after: 1 month
X-MFC: 211149


211149 10-Aug-2010 attilio

Fix some places that may use cpumask_t while they still use 'int' types.
While there, also fix some places assuming cpu type is 'int' while
u_int is really meant.

Note: this will also fix some possible races in per-cpu data accessings
to be addressed in further commits.

In collabouration with: Yahoo! Incorporated (via sbruno and peter)
Tested by: gianni
MFC after: 1 month


211117 09-Aug-2010 attilio

Simplify the logic for handling ipi_selected() and ipi_cpu() in the
amd64/i386 case.

Reviewed by: jhb
Tested by: gianni
MFC after: 1 month
X-MFC: 210939


211082 08-Aug-2010 dwmalone

Don't pass sizeof(u_int) to an argument of SYSCLT_PROC that ends up not
being used.


210939 06-Aug-2010 jhb

Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month


210868 05-Aug-2010 jhb

Change the MPTable and $PIR PCI-PCI bridge drivers to inherit from the
generic PCI-PCI bridge driver and only override specific methods. This
should fix suspend/resume of PCI-PCI bridges using these drivers.


210774 02-Aug-2010 jhb

Tweak the logic to disable CLFLUSH in virtual environments to work around
problems with flushing the local APIC register range so that it checks
vm_guest directly.

Reviewed by: kib, alc
MFC after: 2 weeks


210617 29-Jul-2010 jkim

MFamd64: r210615

Fix another fallout from r208833. savectx() is used to save CPU context
for crash dump (dumppcb) and kdb (stoppcbs). For both cases, we cannot
have a valid pointer in pcb_save. This should restore the previous
behaviour.


210294 20-Jul-2010 tijl

Store fsbase and gsbase in the right fields of the mcontext. They were
switched.

PR: i386/148344
Approved by: kib (mentor)
MFC after: 1 week


209887 10-Jul-2010 alc

Reduce the number of global TLB shootdowns generated by pmap_qenter().
Specifically, teach pmap_qenter() to recognize the case when it is being
asked to replace a mapping with the very same mapping and not generate
a shootdown. Unfortunately, the buffer cache commonly passes an entire
buffer to pmap_qenter() when only a subset of the mappings are changing.
For the extension of buffers in allocbuf() this was resulting in
unnecessary shootdowns. The addition of new pages to the end of the
buffer need not and did not trigger a shootdown, but overwriting the
initial mappings with the very same mappings was seen as a change that
necessitated a shootdown. With this change, that is no longer so.

For a "buildworld" on amd64, this change eliminates 14-15% of the
pmap_invalidate_range() shootdowns, and about 4% of the overall
shootdowns.

MFC after: 3 weeks


209862 09-Jul-2010 kib

For both i386 and amd64 pmap,
- change the type of pm_active to cpumask_t, which it is;
- in pmap_remove_pages(), compare with PCPU(curpmap), instead of
dereferencing the long chain of pointers [1].
For amd64 pmap, remove the unneeded checks for validity of curpmap
in pmap_activate(), since curpmap should be always valid after
r209789.

Submitted by: alc [1]
Reviewed by: alc
MFC after: 3 weeks


209649 02-Jul-2010 mav

Revert r209638. After commit, there appeared to be more people who liked
previous name of stray interrupt counters, then responded to the list.


209638 01-Jul-2010 mav

Make stray irq counters have format alike to other counters. Unified format
makes string processing (for example by `systat -vm`) easier.


209613 30-Jun-2010 jhb

Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.


209483 23-Jun-2010 kib

Clear DF bit in eflags/rflags on the kernel entry. The i386 and amd64
ABI specifies the DF should be zero, and newer compilers do not clear
DF before using DF-sensitive instructions.

The DF clearing for signal handlers was done some time ago.

MFC after: 1 week


209463 23-Jun-2010 kib

Fix bugs on pc98, use npxgetuserregs() instead of npxgetregs() for
get_fpcontext(), and npxsetuserregs() for set_fpcontext). Also,
note that usercontext is not initialized anymore in fpstate_drop().

Systematically replace references to npxgetregs() and npxsetregs()
by npxgetuserregs() and npxsetuserregs() in comments.

Noted by: bde


209462 23-Jun-2010 kib

After the FPU use requires #MF working due to INT13 FPU exception handling
removal, MFi386 r209198:
Use critical sections instead of disabling local interrupts to ensure
the consistency between PCPU fpcurthread and the state of FPU.

Reviewed by: bde
Tested by: pho


209461 23-Jun-2010 kib

Remove the support for int13 FPU exception reporting on i386. It is
believed that all 486-class CPUs FreeBSD is capable to run on, either
have no FPU and cannot use external coprocessor, or have FPU on the
package and can use #MF.

Reviewed by: bde
Tested by: pho (previous version)


209460 23-Jun-2010 kib

Remove unused i586 optimized bcopy/bzero/etc implementations that utilize
FPU registers for copying. Remove the switch table and jumps from
bcopy/bzero/... to the actual implementation.
As a side-effect, i486-optimized bzero is removed.

Reviewed by: bde
Tested by: pho (previous version)


209432 22-Jun-2010 mav

Some style fixes for r209371.

Submitted by: jhb@


209371 20-Jun-2010 mav

Implement new event timers infrastructure. It provides unified APIs for
writing event timer drivers, for choosing best possible drivers by machine
independent code and for operating them to supply kernel with hardclock(),
statclock() and profclock() events in unified fashion on various hardware.

Infrastructure provides support for both per-CPU (independent for every CPU
core) and global timers in periodic and one-shot modes. MI management code
at this moment uses only periodic mode, but one-shot mode use planned for
later, as part of tickless kernel project.

For this moment infrastructure used on i386 and amd64 architectures. Other
archs are welcome to follow, while their current operation should not be
affected.

This patch updates existing drivers (i8254, RTC and LAPIC) for the new
order, and adds event timers support into the HPET driver. These drivers
have different capabilities:
LAPIC - per-CPU timer, supports periodic and one-shot operation, may
freeze in C3 state, calibrated on first use, so may be not exactly precise.
HPET - depending on hardware can work as per-CPU or global, supports
periodic and one-shot operation, usually provides several event timers.
i8254 - global, limited to periodic mode, because same hardware used also
as time counter.
RTC - global, supports only periodic mode, set of frequencies in Hz
limited by powers of 2.

Depending on hardware capabilities, drivers preferred in following orders,
either LAPIC, HPETs, i8254, RTC or HPETs, LAPIC, i8254, RTC.
User may explicitly specify wanted timers via loader tunables or sysctls:
kern.eventtimer.timer1 and kern.eventtimer.timer2.
If requested driver is unavailable or unoperational, system will try to
replace it. If no more timers available or "NONE" specified for second,
system will operate using only one timer, multiplying it's frequency by few
times and uing respective dividers to honor hz, stathz and profhz values,
set during initial setup.


209248 17-Jun-2010 mav

Merge COUNT_XINVLTLB_HITS and COUNT_IPIS kernel options from i386 to amd64.
This information can be very valuable for CPU sleep-time (and respectively
idle power consumption) optimization.

Add counters for timer-related IPIs.

Reviewed by: jhb@ (previous version)


209155 14-Jun-2010 mav

Fix bug introduced in SVN rev 194985. When calling pic_assign_cpu()
for pre-bound IRQs during boot, submit there LAPIC ID, same as in other
places, not CPU ID.


209103 12-Jun-2010 mav

Check general TSC presence before doing more specific checks and printfs.


209059 11-Jun-2010 jhb

Update several places that iterate over CPUs to use CPU_FOREACH().


209048 11-Jun-2010 alc

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


208990 10-Jun-2010 alc

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


208922 08-Jun-2010 jhb

Move the MD support for PCI message signalled interrupts to the x86 tree
as it is identical for i386 and amd64.


208921 08-Jun-2010 jhb

Move the machine check support code to the x86 tree since it is identical
on i386 and amd64.

Requested by: alc


208919 08-Jun-2010 jhb

Move the I/O APIC code to the x86 tree since it is identical on i386 and
amd64.


208915 08-Jun-2010 jhb

- Use a bit more care when moving I/O APIC interrupts between CPUs. Mask
the interrupt followed by a brief delay if it is not currently masked
before moving the interrupt.
- Move the icu_lock out of ioapic_program_intpin() and into callers. This
closes a race where ioapic_program_intpin() could use a stale value of
the masked state to compute the masked bit in the register.

Reviewed by: mav
MFC after: 2 weeks


208833 05-Jun-2010 kib

Introduce the x86 kernel interfaces to allow kernel code to use
FPU/SSE hardware. Caller should provide a save area that is chained
into the stack of the areas; pcb save_area for usermode FPU state is
on top. The pcb now contains a pointer to the current FPU saved area,
used during FPUDNA handling and context switches. There is also a
facility to allow the kernel thread to use pcb save_area.

Change the dreaded warnings "npxdna in kernel mode!" into the panics
when FPU usage is not registered.

KPI discussed with: fabient
Tested by: pho, fabient
Hardware provided by: Sentex Communications
MFC after: 1 month


208765 03-Jun-2010 alc

In the unlikely event that pmap_ts_referenced() demoted five superpage
mappings to the same underlying physical page, the calling thread would be
left forever pinned to the same processor.

MFC after: 3 days


208667 31-May-2010 alc

Eliminate a stale comment.


208657 30-May-2010 alc

Simplify the inner loop of pmap_collect(): While iterating over the page's
pv list, there is no point in checking whether or not the pv list is empty.
Instead, wait until the loop completes.


208645 29-May-2010 alc

When I pushed down the page queues lock into pmap_is_modified(), I created
an ordering dependence: A pmap operation that clears PG_WRITEABLE and calls
vm_page_dirty() must perform the call first. Otherwise, pmap_is_modified()
could return FALSE without acquiring the page queues lock because the page
is not (currently) writeable, and the caller to pmap_is_modified() might
believe that the page's dirty field is clear because it has not seen the
effect of the vm_page_dirty() call.

When I pushed down the page queues lock into pmap_is_modified(), I
overlooked one place where this ordering dependence is violated:
pmap_enter(). In a rare situation pmap_enter() can be called to replace a
dirty mapping to one page with a mapping to another page. (I say rare
because replacements generally occur as a result of a copy-on-write fault,
and so the old page is not dirty.) This change delays clearing PG_WRITEABLE
until after vm_page_dirty() has been called.

Fixing the ordering dependency also makes it easy to introduce a small
optimization: When pmap_enter() used to replace a mapping to one page with a
mapping to another page, it freed the pv entry for the first mapping and
later called the pv entry allocator for the new mapping. Now, pmap_enter()
attempts to recycle the old pv entry, saving two calls to the pv entry
allocator.

There is no point in setting PG_WRITEABLE on unmanaged pages, so don't.
Update a comment to reflect this.

Tidy up the variable declarations at the start of pmap_enter().


208621 28-May-2010 jhb

Defer initializing machine checks for the boot CPU until the local APIC is
fully configured.

MFC after: 1 month


208609 28-May-2010 alc

Defer freeing any page table pages in pmap_remove_all() until after the
page queues lock is released. This may reduce the amount of time that the
page queues lock is held by pmap_remove_all().


208604 27-May-2010 kib

Clarify a potential issue in get_fpcontext() use.

MFC after: 1 week


208574 26-May-2010 alc

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


208556 25-May-2010 jhb

Only enable CMCI on i386 if 'device apic' is enabled in the kernel since
it requires the local APIC to work.


208507 24-May-2010 jhb

Add support for corrected machine check interrupts. CMCI is a new local
APIC interrupt that fires when a threshold of corrected machine check
events is reached. CMCI also includes a count of events when reporting
corrected errors in the bank's status register. Note that individual
banks may or may not support CMCI. If they do, each bank includes its own
threshold register that determines when the interrupt fires. Currently
the code uses a very simple strategy where it doubles the threshold on
each interrupt until it succeeds in throttling the interrupt to occur
only once a minute (this interval can be tuned via sysctl). The threshold
is also adjusted on each hourly poll which will lower the threshold once
events stop occurring.

Tested by: Sailaja Bangaru sbappana at yahoo com
MFC after: 1 month


208504 24-May-2010 alc

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


208494 24-May-2010 mav

- Implement MI helper functions, dividing one or two timer interrupts with
arbitrary frequencies into hardclock(), statclock() and profclock() calls.
Same code with minor variations duplicated several times over the tree for
different timer drivers and architectures.
- Switch all x86 archs to new functions, simplifying the code and removing
extra logic from timer drivers. Other archs are also welcome.


208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


208452 23-May-2010 mav

Unify local_apic.c for x86 archs,


208175 16-May-2010 alc

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


208111 15-May-2010 phk

Apply a patch that has been lingering in my inbox for far too long:

On a soekris Net5501, if you do a watchdog -t 16, followed by a watchdog
-t 0 to disable the watchdog, and then after some time (16s) re-enable
the watchdog the box reboots immediatly. This prevents also to stop and
restart watchdogd(8).

This is because when you stop the watchdog, the timer is not stoped,
only the hard reset is disabled. So when the timer has elapsed, the C2
event of the timer is set.

But when the hard reset is re-enabled, the event is not cleared and the
box reboots.

The attached patch stops and resets the counter when the watchdog is
disabled and do not disable the hard reset of the timer (if the timer
has elapsed it's too late).

Submitted by: Patrick Lamaizière


207796 08-May-2010 alc

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


207702 06-May-2010 alc

Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free(). Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.


207676 05-May-2010 kib

Add definitions for Intel AESNI CPUID bits and print the capabilities
on boot.

Hardware provided by: Sentex Communications
MFC after: 1 week


207410 30-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


207329 28-Apr-2010 attilio

- Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by: Sandvine Incorporated
PR: threads/116181
Reviewed by: emaste, marcel
MFC after: 3 weeks


207205 25-Apr-2010 alc

Clearing a page table entry's accessed bit (PG_A) and setting the
page's PG_REFERENCED flag in pmap_protect() can't really be justified.
In contrast to pmap_remove() or pmap_remove_all(), the mapping is not
being destroyed, so the notion that the page was accessed is not lost.
Moreover, clearing the page table entry's accessed bit and setting the
page's PG_REFERENCED flag can throw off the page daemon's activity
count calculation. Finally, in my tests, I found that 15% of the
atomic memory operations being performed by pmap_protect() were only
to clear PG_A, and not change protection. This could, by itself, be
fixed, but I don't see the point given the above argument.

Remove a comment from pmap_protect_pde() that is no longer meaningful
after the above change.


207163 24-Apr-2010 kmacy

- fix style issues on i386 as well

requested by: alc@


207155 24-Apr-2010 alc

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


207081 22-Apr-2010 jkim

If a conditional jump instruction has the same jt and jf, do not perform
the test and jump unconditionally.


206901 20-Apr-2010 rpaulo

Rename the cyclic global variable lapic_cyclic_clock_func to just
cyclic_clock_func. This will make more sense when we start developing non
x86 cyclic version.


206553 13-Apr-2010 kib

Change printf() calls to uprintf() for sigreturn() and trap() complaints
about inacessible or wrong mcontext, and for dreaded "kernel trap with
interrupts disabled" situation. The later is changed when trap is
generated from user mode (shall never be ?).

Normalize the messages to include both pid and thread name.

MFC after: 1 week


206379 07-Apr-2010 joel

Switch to our preferred 2-clause BSD license.

Approved by: jfv


205851 29-Mar-2010 jhb

Add a handler for the local APIC error interrupt. For now it just prints
out the current value of the local APIC error register when the interrupt
fires.

MFC after: 1 week


205778 28-Mar-2010 alc

Correctly handle preemption of pmap_update_pde_invalidate().

X-MFC after: r205573


205773 27-Mar-2010 alc

Simplify pmap_growkernel(), making the i386 version more like the amd64
version.

MFC after: 3 weeks


205652 25-Mar-2010 alc

A ptrace(2) by one processor may trigger a promotion in the address space
of another process. Modify pmap_promote_pde() to handle this. (This is
not a problem on amd64 due to implementation differences.)

Reported by: jh@
MFC after: 1 week


205642 25-Mar-2010 nwhitehorn

Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by: jhb


205573 24-Mar-2010 alc

Adapt r204907 and r205402, the amd64 implementation of the workaround for
AMD Family 10h Erratum 383, to i386.

Enable machine check exceptions by default, just like r204913 for amd64.

Enable superpage promotion only if the processor actually supports large
pages, i.e., PG_PS.

MFC after: 2 weeks


205444 22-Mar-2010 emaste

Merge r197455 from amd64:

Add a backtrace to the "fpudna in kernel mode!" case, to help track down
where this comes from.

Reviewed by: bde


205334 19-Mar-2010 avg

pmap amd64/i386: fix a typo in a comment

MFC after: 3 days


205214 16-Mar-2010 jhb

- Extend the machine check record structure to include several fields useful
for parsing model-specific and other fields in machine check events
including the global machine check capabilities and status registers,
CPU identification, and the FreeBSD CPU ID.
- Report these added fields in the console log of a machine check so that
a record structure can be reconstituted from the console messages.
- Parse new architectural errors including memory controller errors.

MFC after: 1 week


205063 12-Mar-2010 jhb

Fix the previous attempt to fix kernel builds of HEAD on 7.x. Use the
__gnu_inline__ attribute for PMAP_INLINE when using the 7.x compiler to
match what 7.x uses for PMAP_INLINE.


205013 11-Mar-2010 jhb

Print out the family and model from the cpu_id. This is especially useful
given the advent of the extended family and extended model fields. The
values are printed in hex to match their common usage in documentation.

Submitted by: Alexander Best
MFC after: 1 week


204972 10-Mar-2010 jhb

Make NKPT a kernel option on i386 so that it can be set to a non-default
value from kernel config files.

Tested by: Charles Sprickman spork of bway net
MFC after: 2 weeks


204957 10-Mar-2010 kib

Fall back to wbinvd when region for CLFLUSH is >= 2MB.

Submitted by: Kevin Day <toasty dragondata com>
Reviewed by: jhb
MFC after: 2 weeks


204641 03-Mar-2010 attilio

Improving the clocks auto-tunning by firstly checking if the atrtc may be
correctly initialized and just then assign to softclock/profclock.
Right now, some atrtc seems reporting strange diagnostic error* making the
current pattern bogus.

In order to do that cleanly, lapic_setup_clock(), on both ia32 and amd64,
now accepts as arguments the desired sources to handle, and returns the
actual ones (LAPIC_CLOCK_NONE is forbidden because otherwise there is no
meaning in calling such function).
This allows to bring out into commont x86 code the handling part for
machdep.lapic_allclocks tunable, which is retained.

Sponsored by: Sandvine Incorporated
Tested by: yongari, Richard Todd
<rmtodd at ichotolot dot servalan dot com>
MFC: 3 weeks
X-MFC: r202387, 204309


204518 01-Mar-2010 jhb

Print the contents of the miscellaneous (MISC) register to the console if
it is valid along with the other register values when a machine check is
encountered.

MFC after: 1 week


204420 27-Feb-2010 alc

When running as a guest operating system, the FreeBSD kernel must assume
that the virtual machine monitor has enabled machine check exceptions.
Unfortunately, on AMD Family 10h processors the machine check hardware
has a bug (Erratum 383) that can result in a false machine check exception
when a superpage promotion occurs. Thus, I am disabling superpage
promotion when the FreeBSD kernel is running as a guest operating system
on an AMD Family 10h processor.

Reviewed by: jhb, kib
MFC after: 3 days


204309 25-Feb-2010 attilio

Introduce the new kernel sub-tree x86 which should contain all the code
shared and generalized between our current amd64, i386 and pc98.

This is just an initial step that should lead to a more complete effort.
For the moment, a very simple porting of cpufreq modules, BIOS calls and
the whole MD specific ISA bus part is added to the sub-tree but ideally
a lot of code might be added and more shared support should grow.

Sponsored by: Sandvine Incorporated
Reviewed by: emaste, kib, jhb, imp
Discussed on: arch
MFC: 3 weeks


204041 18-Feb-2010 ed

Allow the pmap code to be built with GCC from FreeBSD 7 again.

This patch basically gives us the best of both worlds. Instead of
forcing the compiler to emulate GNU-style inline semantics even though
we're using ISO C99, it will only use GNU-style inlining when the
compiler is configured that way (__GNUC_GNU_INLINE__).

Tested by: jhb


203351 01-Feb-2010 alc

Change the default value for the flag enabling superpage mapping and
promotion to "on".


203289 31-Jan-2010 rnoland

Enable MTRR on all VIA CPUs that claim support.

This may not be entirely correct either, but the existing check is
bogus. I have both a C3 and a C7 that fail this check, but work fine.

MFC after: 2 weeks


203160 29-Jan-2010 avg

add static qualifier to definition of a function already declared static

This is for improving code readibility only.

MFC after: 1 week


203085 27-Jan-2010 alc

Optimize pmap_demote_pde() by using the new KPTmap to access a kernel
page table page instead of creating a temporary mapping to it.

Set the PG_G bit on the page table entries that implement the KPTmap.

Locore initializes the unused portions of the NKPT kernel page table
pages that it allocates to zero. So, pmap_bootstrap() needn't zero
the page table entries referenced by CMAP1 and CMAP3.

Simplify pmap_set_pg().

MFC after: 10 days


202894 23-Jan-2010 alc

Handle a race between pmap_kextract() and pmap_promote_pde(). This race is
known to cause a kernel crash in ZFS on i386 when superpage promotion is
enabled.

Tested by: netchild
MFC after: 1 week


202882 23-Jan-2010 kib

For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

For i386, amd64 and ia32 on amd64 MD syscall(), reread syscall number
and arguments after ptracestop(), if debugger modified anything in the
process environment. Since procfs stopeven requires number of syscall
arguments in p_xstat, this cannot be solved by moving stop/trace point
before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reported by: Ali Polatel <alip exherbo org>
Briefly discussed with: jhb
MFC after: 3 weeks


202628 19-Jan-2010 ed

Recommit r193732:

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc

It was backed out, because it prevented us from building kernels using a
7.x compiler. Now that most people use 8.x, there is nothing that holds
us back. Even if people run 7.x, they should be able to build a kernel
if they run `make kernel-toolchain' or `make buildworld' first.


202387 15-Jan-2010 attilio

Handling all the three clocks (hardclock, softclock, profclock) with the
LAPIC may lead to aliasing for softclock and profclock because frequencies
are sized in order to fit mainly hardclock.
atrtc used to take care of the softclock and profclock and it does still
do, if the LAPIC can't handle the clocks properly.

Revert the change when the LAPIC started taking charge of all three of
them and let atrtc handle softclock and profclock if not explicitly
requested. Such request can be made setting != 0 the new tunable
machdep.lapic_allclocks or if the new device ATPIC is not present
within the i386 kernel config (atrtc is linked to atpic presence).

Diagnosed by: Sandvine Incorporated
Reviewed by: jhb, emaste
Sponsored by: Sandvine Incorporated
MFC: 3 weeks


202161 12-Jan-2010 gavin

Spell "Hz" correctly wherever it is user-visible.

PR: bin/142566
Submitted by: N.J. Mann njm njm.me.uk
Approved by: ed (mentor)
MFC after: 2 weeks


202097 11-Jan-2010 marcel

Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after: 1 week


202085 11-Jan-2010 alc

Simplify pmap_init(). Additionally, correct a harmless misbehavior on i386.
Specifically, where locore had created large page mappings for the kernel,
the wrong vm page array entries were being initialized. The vm page array
entries for the pages containing the kernel were being initialized instead
of the vm page array entries for page table pages.

MFC after: 1 week


201934 09-Jan-2010 alc

Long ago, in r120654, the rounding of KERNend and physfree in locore
was changed from a small page boundary to a large page boundary. As
a consequence pmap_kmem_choose() became a pointless waste of address
space. Eliminate it.


201751 07-Jan-2010 alc

Make pmap_set_pg() static.


201716 07-Jan-2010 alc

Eliminate unused variables (see r137912).


201223 29-Dec-2009 rnoland

Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.

This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by: jhb@
MFC after: Not in this lifetime...


200352 10-Dec-2009 kmacy

for PV XEN translate page table entries from machine (real) to physical (logical) addresses so that kgdb can
translate them to the correct coredump offsets


200346 10-Dec-2009 kmacy

- revert pmap_kenter_temporary to taking a physical address
- make minidump work


200288 09-Dec-2009 kmacy

make PV core dump actually dump memory - still need to fix program header initialization


200064 03-Dec-2009 avg

mca: small enhancements related to cpu quirks

- use utility macros for CPU family/model checking
- limit Intel P6 quirk to pre-Nehalem models (taken from OpenSolaris)
- add AMD GartTblWkEn quirk for families 0Fh and 10h; I haven't experienced
any problems without the quirk but both Linux and OpenSolaris do this
- slightly re-arrange quirk code to provide for the future generalization
and separation of vendor-specific quirk functions

Reviewed by: jhb
MFC after: 1 week


200033 02-Dec-2009 avg

mca: improve status checking, recording and reporting

- directly print mca information in case we fail to allocate memory
for a record
- include bank number into mca record
- print raw mca status value for extended information

Reviewed by: jhb
MFC after: 10 days


199968 30-Nov-2009 avg

x86 cpu features: add MOVBE reporting and flag

The check is glimpsed from Linux and OpenSolaris.
MOVBE instruction is found in Intel Atom processors.


199868 27-Nov-2009 alc

Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY. The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with: kib


199721 23-Nov-2009 jkim

- Add more aggressive BPF JIT optimization. This is in more favor of i386
while the previous commit was more amd64-centric.
- Use calloc(3) instead of malloc(3)/memset(3) in user land[1].

Submitted by: ed[1]


199619 21-Nov-2009 jkim

Add an experimental and rudimentary JIT optimizer to reduce unncessary
overhead from short BPF filter programs such as "get the first 96 bytes".


199615 20-Nov-2009 jkim

General style cleanup, no functional change.


199603 20-Nov-2009 jkim

- Allocate scratch memory on stack instead of pre-allocating it with
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9). It has no
performance benefit over malloc(9)/free(9)[2].

Requested by: rwatson[1]
Pointed out by: rwatson, jhb, alc[2]


199531 19-Nov-2009 jkim

Fix tinderbox build for i386 and sync amd64 with it.


199498 18-Nov-2009 jkim

- Change internal function bpf_jit_compile() to return allocated size of
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.


199492 18-Nov-2009 jkim

- Make BPF JIT compiler working again in userland. We are limiting size of
generated native binary to page size for now.
- Update copyright date and fix some style nits.


199219 12-Nov-2009 nyan

Fix cpu model for PODP5V83. It is P24T, not P54T.
Also remove redundant 'Overdrive' word.

Pointed out by: SATOU Tomokazu (tomo1770 at maple ocn ne jp)
MFC after: 1 week


199215 12-Nov-2009 kuriyama

- Style nits.
- Remove unneeded TUNABLE_INT().

Suggested by: avg, kib


199184 11-Nov-2009 avg

reflect that pg_ps_enabled is a tunable, not just a read-only sysctl

Nod from: jhb


199135 10-Nov-2009 kib

Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by: marcel
Reviewed by: marcel, davidxu
PowerPC, ARM, ia64 changes: marcel
Sparc64 tested and reviewed by: marius, also sunv reviewed
MIPS tested by: gonzo
MFC after: 1 month


199067 09-Nov-2009 kuriyama

- Add hw.clflush_disable loader tunable to avoid panic (trap 9) at
map_invalidate_cache_range() even if CPU is not Intel.
- This tunable can be set to -1 (default), 0 and 1. -1 is same as
current behavior, which automatically disable CLFLUSH on Intel CPUs
without CPUID_SS (should be occured on Xen only). You can specify 1
when this panic happened on non-Intel CPUs (such as AMD's). Because
disabling CLFLUSH may reduce performance, you can try with setting 0
on Intel CPUs without SS to use CLFLUSH feature.

Reviewed by: kib
Reported by: karl, kuriyama
Related to: kern/138863


198950 05-Nov-2009 attilio

Strip from messages for users external URLs the project cannot directly
control.

Requested by: kib, rwatson


198868 04-Nov-2009 attilio

Opteron rev E family of processor expose a bug where, in very rare
ocassions, memory barriers semantic is not honoured by the hardware
itself. As a result, some random breakage can happen in uninvestigable
ways (for further explanation see at the content of the commit itself).

As long as just a specific familly is bugged of an entire architecture
is broken, a complete fix-up is impratical without harming to some
extents the other correct cases.
Considering that (and considering the frequency of the bug exposure)
just print out a warning message if the affected machine is identified.

Pointed out by: Samy Al Bahra <sbahra at repnop dot org>
Help on wordings by: jeff
MFC: 3 days


198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198341 21-Oct-2009 marcel

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


198170 16-Oct-2009 kib

Move intr_describe() out of #ifdef SMP; the function is always required.

Reviewed by: jhb


198134 15-Oct-2009 jhb

Add a facility for associating optional descriptions with active interrupt
handlers. This is primarily intended as a way to allow devices that use
multiple interrupts (e.g. MSI) to meaningfully distinguish the various
interrupt handlers.
- Add a new BUS_DESCRIBE_INTR() method to the bus interface to associate
a description with an active interrupt handler setup by BUS_SETUP_INTR.
It has a default method (bus_generic_describe_intr()) which simply passes
the request up to the parent device.
- Add a bus_describe_intr() wrapper around BUS_DESCRIBE_INTR() that supports
printf(9) style formatting using var args.
- Reserve MAXCOMLEN bytes in the intr_handler structure to hold the name of
an interrupt handler and copy the name passed to intr_event_add_handler()
into that buffer instead of just saving the pointer to the name.
- Add a new intr_event_describe_handler() which appends a description string
to an interrupt handler's name.
- Implement support for interrupt descriptions on amd64 and i386 by having
the nexus(4) driver supply a custom bus_describe_intr method that invokes
a new intr_describe() MD routine which in turn looks up the associated
interrupt event and invokes intr_event_describe_handler().

Requested by: many
Reviewed by: scottl
MFC after: 2 weeks


197729 03-Oct-2009 bz

Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by: kib
MFC after: 1 month


197693 01-Oct-2009 kmacy

make read_eflags and write_eflags accomplish the same effect on PVM as native,
simplifying interrupt handling


197663 01-Oct-2009 kib

As a workaround, for Intel CPUs, do not use CLFLUSH in
pmap_invalidate_cache_range() when self-snoop is apparently not reported
in cpu features. We get a reserved trap when clflushing APIC registers
window.

XEN in full system virtualization mode removes self-snoop from CPU
features, making this a problem.

Tested by: csjp
Reviewed by: alc
MFC after: 3 days


197410 22-Sep-2009 jhb

- Split the logic to parse an SMAP entry out into a separate function on
amd64 similar to i386. This fixes a bug on amd64 where overlapping
entries would not cause the SMAP parsing to stop.
- Change the SMAP parsing code to do a sorted insertion into physmap[]
instead of an append to support systems with out-of-order SMAP entries.

PR: amd64/138220
Reported by: James R. Van Artsdalen james of jrv org
MFC after: 3 days


197389 21-Sep-2009 kib

If CPU happens to be in usermode when a T_RESERVED trap occured,
then trapsignal is called with ksi.ksi_signo = 0. For debugging kernels,
that should end up in panic, for non-debugging kernels behaviour is
undefined.

Do panic regardeless of execution mode at the moment of trap.

Reviewed by: jhb
MFC after: 1 month


197317 18-Sep-2009 alc

When superpages are enabled, add the 2 or 4MB page size to the array of
supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


197070 10-Sep-2009 jkim

Consolidate CPUID to CPU family/model macros for amd64 and i386 to reduce
unnecessary #ifdef's for shared code between them.


196829 04-Sep-2009 kib

Add missing ';'.


196771 02-Sep-2009 jkim

Fix confusing comments about default PAT entries.


196769 02-Sep-2009 jkim

- Work around ACPI mode transition problem for recent NVIDIA 9400M chipset
based Intel Macs. Since r189055, these platforms started freezing when
ACPI is being initialized for unknown reason. For these platforms, we just
use the old PAT layout. Note this change is not enough to boot fully on
these platforms because of other problems but it makes debugging possible.
Note MacBook5,2 may be affected as well but it was not added here because
of lack of hardware to test.
- Initialize PAT MSR fully instead of reading and modifying it for safety.

Reported by: rpaulo, hps, Eygene Ryabinkin (rea-fbsd at codelabs dot ru)
Reviewed by: jhb


196745 02-Sep-2009 jhb

Don't attempt to bind the current thread to the CPU an IRQ is bound to
when removing an interrupt handler from an IRQ during shutdown. During
shutdown we are already bound to CPU 0 and this was triggering a panic.

MFC after: 3 days


196707 31-Aug-2009 jhb

Simplify pmap_change_attr() a bit:
- Always calculate the cache bits instead of doing it on-demand.
- Always set changed to TRUE rather than only doing it if it is false.

Discussed with: alc
MFC after: 3 days


196705 31-Aug-2009 jhb

Improve pmap_change_attr() so that it is able to demote a large (2/4MB)
page into 4KB pages as needed. This should be fairly rare in practice
on i386. This includes merging the following changes from the amd64 pmap:
180430, 180485, 180845, 181043, 181077, and 196318.
- Add basic support for changing attributes on PDEs to pmap_change_attr()
similar to the support in the initial version of pmap_change_attr() on
amd64 including inlines for pmap_pde_attr() and pmap_pte_attr().
- Extend pmap_demote_pde() to include the ability to instantiate a new page
table page where none existed before.
- Enhance pmap_change_attr(). Use pmap_demote_pde() to demote a 2/4MB page
mapping to 4KB page mappings when the specified attribute change only
applies to a portion of the 2/4MB page. Previously, in such cases,
pmap_change_attr() gave up and returned an error.
- Correct a critical accounting error in pmap_demote_pde().

Reviewed by: alc
MFC after: 3 days


196653 30-Aug-2009 bz

Make sure FreeBSD binaries without .note.ABI-tag section work
correctly and do not match a colliding Debian GNU/kFreeBSD
brandinfo statements.
For this mark the Debian GNU/kFreeBSD brandinfo that it must have
an .note.ABI-tag section and ignore the old EI_OSABI brandinfo
when comparing a possibly colliding set of options.

Due to SYSINIT we add the brandinfo in a non-deterministic order,
so native FreeBSD is not always first. We may want to consider
to force native FreeBSD to come first as well.

The only way a problem could currently be noticed is when running an
i386 binary without the .note.ABI-tag on amd64 and the Debian GNU/kFreeBSD
brandinfo was matched first, as the fallback to ld-elf32.so.1 does
not exist in that case.

Reported and tested by: ticso
In collaboration with: kib
MFC after: 3 days


196643 29-Aug-2009 rnoland

Swap the start/end virtual addresses in pmap_invalidate_cache_range().

This fixes the functionality on non SelfSnoop hardware.

Found by: rnoland
Submitted by: alc
Reviewed by: kib
MFC after: 3 days


196512 24-Aug-2009 bz

Fix handling of .note.ABI-tag section for GNU systems [1].
Handle GNU/Linux according to LSB Core Specification 4.0,
Chapter 11. Object Format, 11.8. ABI note tag.

Also check the first word of desc, not only name, according to
glibc abi-tags specification to distinguish between Linux and
kFreeBSD.

Add explicit handling for Debian GNU/kFreeBSD, which runs
on our kernels as well [2].

In {amd64,i386}/trap.c, when checking osrel of the current process,
also check the ABI to not change the signal behaviour for Linux
binary processes, now that we save an osrel version for all three
from the lists above in struct proc [2].

These changes make it possible to run FreeBSD, Debian GNU/kFreeBSD
and Linux binaries on the same machine again for at least i386 and
amd64, and no longer break kFreeBSD which was detected as GNU(/Linux).

PR: kern/135468
Submitted by: dchagin [1] (initial patch)
Suggested by: kib [2]
Tested by: Petr Salinger (Petr.Salinger seznam.cz) for kFreeBSD
Reviewed by: kib
MFC after: 3 days


196412 20-Aug-2009 jkim

Check whether the SMBIOS reports reasonable amount of memory. If it is
less than "avail memory", fall back to Maxmem to avoid user confusion.
We use SMBIOS information to display "real memory" since r190599 but
some broken SMBIOS implementation reported only half of actual memory.

Tested by: bz
Approved by: re (kib)


196390 19-Aug-2009 ed

Make the MacBookPro3,1 hardware boot again.

Tested by: Patrick Lamaiziere <patfbsd davenulle org>
Approved by: re (kib)


196224 14-Aug-2009 jhb

Adjust the handling of the local APIC PMC interrupt vector:
- Provide lapic_disable_pmc(), lapic_enable_pmc(), and lapic_reenable_pmc()
routines in the local APIC code that the hwpmc(4) driver can use to
manage the local APIC PMC interrupt vector.
- Do not enable the local APIC PMC interrupt vector by default when
HWPMC_HOOKS is enabled. Instead, the hwpmc(4) driver explicitly
enables the interrupt when it is succesfully initialized and disables
the interrupt when it is unloaded. This avoids enabling the interrupt
on unsupported CPUs which may result in spurious NMIs.

Reported by: rnoland
Reviewed by: jkoshy
Approved by: re (kib)
MFC after: 2 weeks


196196 13-Aug-2009 attilio

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


196033 02-Aug-2009 ed

Make the MacBook3,1 boot again.

Approved by: re (kib)


195940 29-Jul-2009 kib

As was done in r195820 for amd64, use clflush for flushing cache lines
when memory page caching attributes changed, and CPU does not support
self-snoop, but implemented clflush, for i386.

Take care of possible mappings of the page by sf buffer by utilizing
the mapping for clflush, otherwise map the page transiently. Amd64
used direct map.

Proposed and reviewed by: alc
Approved by: re (kensmith)


195907 27-Jul-2009 rpaulo

Refine the MacBook hack to only match early models that have Intel ICH.

Discussed with: kjim
Approved by: re (kib)


195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


195836 23-Jul-2009 alc

Eliminate unnecessary cache and TLB flushes by pmap_change_attr(). (This
optimization was implemented in the amd64 version roughly 1 year ago.)

Approved by: re (kensmith)


195774 19-Jul-2009 alc

Change the handling of fictitious pages by pmap_page_set_memattr() on
amd64 and i386. Essentially, fictitious pages provide a mechanism for
creating aliases for either normal or device-backed pages. Therefore,
pmap_page_set_memattr() on a fictitious page needn't update the direct
map or flush the cache. Such actions are the responsibility of the
"primary" instance of the page or the device driver that "owns" the
physical address. For example, these actions are already performed by
pmap_mapdev().

The device pager needn't restore the memory attributes on a fictitious
page before releasing it. It's now pointless.

Add pmap_page_set_memattr() to the Xen pmap.

Approved by: re (kib)


195749 18-Jul-2009 alc

An addendum to r195649, "Add support to the virtual memory system for
configuring machine-dependent memory attributes...":

Don't set the memory attribute for a "real" page that is allocated to
a device object in vm_page_alloc(). It is a pointless act, because
the device pager replaces this "real" page with a "fake" page and sets
the memory attribute on that "fake" page.

Eliminate pointless code from pmap_cache_bits() on amd64.

Employ the "Self Snoop" feature supported by some x86 processors to
avoid cache flushes in the pmap.

Approved by: re (kib)


195649 12-Jul-2009 alc

Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.

vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by: re (kib)


195415 06-Jul-2009 jhb

After the per-CPU IDT changes, the IDT vector of an interrupt could change
when the interrupt was moved from one CPU to another. If the interrupt was
enabled, then the old IDT vector needs to be disabled and the new IDT vector
needs to be enabled. This was mostly masked prior to the recent MSI changes
since in the older code almost all allocated IDT vectors were already enabled
and the enabled vectors on the BSP during boot covered enough of the IDT
range. However, after the MSI changes, MSI interrupts that were allocated
but not enabled (e.g. DRM with MSI) during boot could result in an allocated
IDT vector that wasn't enabled. The round-robin at the end of boot could
place another interrupt at the same IDT vector without enabling the IDT
vector causing trap 30 faults.

Fix this by explicitly disabling/enabling the old and new IDT vectors for
enabled interrupt sources when moving an interrupt between CPUs via the
pic_assign_cpu() method. While here, fix a bug in my earlier changes so
that an I/O APIC interrupt pin is left unchanged if ioapic_assign_cpu()
fails to allocate a new IDT vector and returns ENOSPC.

Approved by: re (kensmith)


195385 05-Jul-2009 alc

PAE adds another level to the i386 page table. This level is a small
4-entry table that must be located within the first 4GB of RAM. This
requirement is met by defining an UMA zone with a custom back-end
allocator function. This revision makes two changes to this back-end
allocator function: (1) It replaces the use of contigmalloc() with the
use of kmem_alloc_contig(). This eliminates "double accounting", i.e.,
accounting by both the UMA zone and malloc tags. (I made the same
change for the same reason to the zones supporting jumbo frames a week
ago.) (2) It passes through the "wait" parameter, i.e., M_WAITOK,
M_ZERO, etc. to kmem_alloc_contig() rather than ignoring it.
pmap_init() calls uma_zalloc() with both M_WAITOK and M_ZERO. At the
moment, this is harmless only because the default behavior of
contigmalloc()/kmem_alloc_contig() is to wait and because pmap_init()
doesn't really depend on the memory being zeroed.

The back-end allocator function in the Xen pmap is dead code. I am
changing it nonetheless because I don't want to leave any "bad examples"
in the source tree for someone to copy at a later date.

Approved by: re (kib)


195249 01-Jul-2009 jhb

Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.

Approved by: re (kib)


195202 30-Jun-2009 dfr

Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.

Approved by: re


195104 27-Jun-2009 rwatson

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


195002 25-Jun-2009 jhb

Fix kernels compiled without SMP support. Make intr_next_cpu() available
for UP kernels but as a stub that always returns the single CPU's local
APIC ID.

Reported by: kib


194985 25-Jun-2009 jhb

- Restore the behavior of pre-allocating IDT vectors for MSI interrupts.
This is mostly important for the multiple MSI message case where the
IDT vectors for the entire group need to be allocated together. This
also restores the assumptions made by the PCI bus code that it could
invoke PCIB_MAP_MSI() once MSI vectors were allocated.
- To avoid whiplash with CPU assignments, change the way that CPUs are
assigned to interrupt sources on activation. Instead of assigning the
CPU via pic_assign_cpu() before calling enable_intr(), allow the
different interrupt source drivers to ask the MD interrupt code which
CPU to use when they allocate an IDT vector. I/O APIC interrupt pins
do this in their pic_enable_intr() routines giving the same behavior as
before. MSI sources do it when the IDT vectors are allocated during
msi_alloc() and msix_alloc().
- Change the intr_table_lock from an sx lock to a mutex.

Tested by: rnoland


194889 24-Jun-2009 jhb

Whitespace fix.


194784 23-Jun-2009 jeff

Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.

Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas


194237 15-Jun-2009 mav

Forbid multi-vector MSI interrupt vectors migration to another CPU once
allocated. MSI have strict vectors allocation requirements, which are not
satisfied now during reallocation. This is not the best possible solution,
but better then just broken, as it was.

No objections: current@, arch@, jhb@


194209 14-Jun-2009 alc

Long, long ago in r27464 special case code for mapping device-backed
memory with 4MB pages was added to pmap_object_init_pt(). This code
assumes that the pages of a OBJT_DEVICE object are always physically
contiguous. Unfortunately, this is not always the case. For example,
jhb@ informs me that the recently introduced /dev/ksyms driver creates
a OBJT_DEVICE object that violates this assumption. Thus, this
revision modifies pmap_object_init_pt() to abort the mapping if the
OBJT_DEVICE object's pages are not physically contiguous. This
revision also changes some inconsistent if not buggy behavior. For
example, the i386 version aborts if the first 4MB virtual page that
would be mapped is already valid. However, it incorrectly replaces
any subsequent 4MB virtual page mappings that it encounters,
potentially leaking a page table page. The amd64 version has a bug of
my own creation. It potentially busies the wrong page and always an
insufficent number of pages if it blocks allocating a page table page.

To my knowledge, there have been no reports of these bugs, hence,
their persistance. I suspect that the existing restrictions that
pmap_object_init_pt() placed on the OBJT_DEVICE objects that it would
choose to map, for example, that the first page must be aligned on a 2
or 4MB physical boundary and that the size of the mapping must be a
multiple of the large page size, were enough to avoid triggering the
bug for drivers like ksyms. However, one side effect of testing the
OBJT_DEVICE object's pages for physical contiguity is that a dubious
difference between pmap_object_init_pt() and the standard path for
mapping devices pages, i.e., vm_fault(), has been eliminated.
Previously, pmap_object_init_pt() would only instantiate the first
PG_FICTITOUS page being mapped because it never examined the rest.
Now, however, pmap_object_init_pt() uses the new function
vm_object_populate() to instantiate them all (in order to support
testing their physical contiguity). These pages need to be
instantiated for the mechanism that I have prototyped for
automatically maintaining the consistency of the PAT settings across
multiple mappings, particularly, amd64's direct mapping, to work.
(Translation: This change is also being made to support jhb@'s work on
the Nvidia feature requests.)

Discussed with: jhb@


193804 09-Jun-2009 ariff

Move C1E workaround into its own idle function. Previous workaround works
only during initial booting process, while there are laptops/BIOSes that
tend to act 'smarter' by force enabling C1E if the main power adapter
being pulled out, rendering previous workaround ineffective. Given the
fact that we still rely on local APIC to drive timer interrupt, this
workaround should keep all Turion (probably Phenom too) X\d+ alive whether
its on battery power or not.

URL: http://lists.freebsd.org/pipermail/freebsd-acpi/2008-April/004858.html
http://lists.freebsd.org/pipermail/freebsd-acpi/2008-May/004888.html

Tested by: Peter Jeremy <peterjeremy at optushome d com d au>


193734 08-Jun-2009 ed

Revert my change; reintroduce __gnu89_inline.

It turns out our compiler in stable/7 can't build this code anymore.
Even though my opinion is that those people should just run `make
kernel-toolchain' before building a kernel, I am willing to wait and
commit this after we've branched stable/8.

Requested by: rwatson


193732 08-Jun-2009 ed

Remove __gnu89_inline.

Now that we use C99 almost everywhere, just use C99-style in the pmap
code. Since the pmap code is the only consumer of __gnu89_inline, remove
it from cdefs.h as well. Because the flag was only introduced 17 months
ago, I don't expect any problems.

Reviewed by: alc


193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


192440 20-May-2009 jhb

Don't bother reading the initial value of the machine check banks during
startup on Pentium 4 CPUs. This wasn't safe to do on APs during AP startup,
was of limited value, and won't be used for future processors.


192343 18-May-2009 jhb

- Add a tunable 'hw.mca.enabled' that can be used to enable/disable the
machine check code. Disable it by default for now.
- When computing the mask of bits that determines a non-restartable event
during a machine check exception, or-in the overflow flag rather than
replacing the other flags.

PR: i386/134586 [2]
Submitted by: Andi Kleen andi-fbsd firstfloor.org


192323 18-May-2009 marcel

Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.


192114 14-May-2009 attilio

FreeBSD right now support 32 CPUs on all the architectures at least.
With the arrival of 128+ cores it is necessary to handle more than that.
One of the first thing to change is the support for cpumask_t that needs
to handle more than 32 bits masking (which happens now). Some places,
however, still assume that cpumask_t is a 32 bits mask.
Fix that situation by using always correctly cpumask_t when needed.

While here, remove the part under STOP_NMI for the Xen support as it
is broken in any case.

Additively make ipi_nmi_pending as static.

Reviewed by: jhb, kmacy
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


192050 13-May-2009 jhb

Implement simple machine check support for amd64 and i386.
- For CPUs that only support MCE (the machine check exception) but not MCA
(i.e. Pentium), all this does is print out the value of the machine check
registers and then panic when a machine check exception occurs.
- For CPUs that support MCA (the machine check architecture), the support is
a bit more involved.
- First, there is limited support for decoding the CPU-independent MCA
error codes in the kernel, and the kernel uses this to output a short
description of any machine check events that occur.
- When a machine check exception occurs, all of the MCx banks on the
current CPU are scanned and any events are reported to the console
before panic'ing.
- To catch events for correctable errors, a periodic timer kicks off a
task which scans the MCx banks on all CPUs. The frequency of these
checks is controlled via the "hw.mca.interval" sysctl.
- Userland can request an immediate scan of the MCx banks by writing
a non-zero value to "hw.mca.force_scan".
- If any correctable events are encountered, the appropriate details
are stored in a 'struct mca_record' (defined in <machine/mca.h>).
The "hw.mca.count" is a count of such records and each record may
be queried via the "hw.mca.records" tree by specifying the record
index (0 .. count - 1) as the next name in the MIB similar to using
PIDs with the kern.proc.* sysctls. The idea is to export machine
check events to userland for more detailed processing.
- The periodic timer and hw.mca sysctls are only present if the CPU
supports MCA.

Discussed with: emaste (briefly)
MFC after: 1 month


192035 13-May-2009 alc

Correct a rare use-after-free error in pmap_copy(). This error was
introduced in amd64 revision 1.540 and i386 revision 1.547. However, it
had no harmful effects until after a recent change, r189698, on amd64.
(In other words, the error is harmless in RELENG_7.)

The error is triggered by the failure to allocate a pv entry for the one
and only mapping in a page table page. I am addressing the error by
changing pmap_copy() to abort if either pv entry allocation or page
table page allocation fails. This is appropriate because the creation of
mappings by pmap_copy() is optional. They are a (possible) optimization,
and not a requirement.

Correct a nearby whitespace error in the i386 pmap_copy().

Crash reported by: jeff@
MFC after: 6 weeks


191803 05-May-2009 mav

Do not try to initialize LAPIC timer if we are not going to use it.
It solves assertion, when kernel built with INVARIANTS configured
to use i8254 timer.


191788 04-May-2009 jkim

Unlock the largest standard CPUID on Intel CPUs for both amd64 and i386 and
fix SMP topology detection. On i386, we extend it to cover Core, Core 2,
and Core i7 processors, not just Pentium 4 family, and move it to better
place. On amd64, all supported Intel CPUs should have this MSR.


191745 02-May-2009 mav

Add support for using i8254 and rtc timers as event sources for i386 SMP
system. Redistribute hard-/stat-/profclock events to other CPUs using IPI.


191730 01-May-2009 mav

Small addition to r191720.

Restore previous behaviour for the case of unknown interrupt. Invocation
of IRQ -1 crashes my system on resume. Returning 0, as it was, is not
perfect also, but at least not so dangerous.


191720 01-May-2009 mav

Use value -1 instead of 0 for marking unused APIC vectors. This fixes
IRQ0 routing on LAPIC-enabled systems.

Add hint.apic.0.clock tunable. Setting it 0 disables using LAPIC timers
as hard-/stat-/profclock sources falling back to using i8254 and rtc timers.

On modern CPUs LAPIC is a part of CPU core which is shutting down when CPU
enters C3 or deeper power state. It makes no problems for interrupt
processing, as chipset wakes up CPU on interrupt triggering. But entering
C3 state kills LAPIC timer and freezes system time, making C3 and deeper
states practically unusable. Using i8254 timer allows to avoid this
problem.

By using i8254 timer my T7700 C2D CPU with UP kernel successfully enters
C3 state, saving more then a Watt of total idle power (>10%) in addition to
all other power-saving techniques.

This technique is not working for SMP yet, as only one CPU receives
timer interrupts. But I think that problem could be fixed by forwarding
interrupts to other CPUs with IPI.


191708 30-Apr-2009 jkim

- Fix divide-by-zero panic when SMP kernel is used on UP system[1].
- Avoid possible divide-by-zero panic on SMP system when the CPUID is
disabled, unsupported, or buggy.

Submitted by: pluknet (pluknet at gmail dot com)[1]


191648 29-Apr-2009 jeff

- Add support for cpuid leaf 0xb. This allows us to determine the
topology of nehalem/corei7 based systems.
- Remove the cpu_cores/cpu_logical detection from identcpu.
- Describe the layout of the system in cpu_mp_announce().

Sponsored by: Nokia


191438 23-Apr-2009 jhb

Reduce the number of bounce zones (and thus the number of bounce pages
used in some cases):
- Ignore DMA tag boundaries when allocating bounce pages. The boundaries
don't determine whether or not parts of a DMA request bounce. Instead,
they are just used to carve up segments.
- Allow tags with sub-page alignment to share bounce pages since bounce
pages are always page aligned.

Reviewed by: scottl (amd64)
MFC after: 1 month


191405 22-Apr-2009 jhb

Adjust the way we number CPUs on x86 so that we attempt to "group" all
logical CPUs in a package. We do this by numbering the non-boot CPUs
by starting with the first CPU whose APIC ID is after the boot CPU and
wrapping back around to APIC ID 0 if needed rather than always starting
at APIC ID 0. While here, adjust the cpu_mp_announce() routine to list
CPUs based on the mapping established by assign_cpu_ids() rather than
making assumptions about the algorithm assign_cpu_ids() uses.

MFC after: 1 month


191201 17-Apr-2009 jhb

Restore bus DMA bounce pages to an offset of 0 when they are released by
a tag that has BUS_DMA_KEEP_PG_OFFSET set. Otherwise the page could be
reused with a non-zero offset by a tag that doesn't have
BUS_DMA_KEEP_PG_OFFSET leading to data corruption.

Sleuthing by: avg
Reviewed by: scottl


191011 13-Apr-2009 kib

The bus_dmamap_load_uio(9) shall use pmap of the thread recorded in the
uio_td to extract pages from, instead of unconditionally use kernel
pmap.

Submitted by: Jason Harmening <jason.harmening gmail com> (amd64 version)
PR: amd64/133592
Reviewed by: scottl (original patch), jhb
MFC after: 2 weeks


190919 11-Apr-2009 ed

Simplify in/out functions (for i386 and AMD64).

Remove a hack to generate more efficient code for port numbers below
0x100, which has been obsolete for at least ten years, because GCC has
an asm constraint to specify that.

Submitted by: Christoph Mallon <christoph mallon gmx de>


190708 05-Apr-2009 dchagin

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


190617 01-Apr-2009 kib

Fill the fsbase and gsbase fields of the mcontext structure on i386.

In collaboration with: pho
Discussed with: peter
Reviewed by: jhb


190600 31-Mar-2009 jkim

Fix an uninitialized variable from the previous commit.


190599 31-Mar-2009 jkim

Probe size of installed memory modules from loader and display it
as 'real memory' instead of Maxmem if the value is available.
Note amd64 displayed physmem as 'usable memory' since machdep.c r1.640
to unconfuse users. Now it is consistent across amd64 and i386 again.
While I am here, clean up smbios.c a bit and update copyright date.

Reviewed by: jhb


190447 26-Mar-2009 kib

Convert gdt_segs and ldt_segs initialization to C99 style.

Reviewed by: jhb


190237 22-Mar-2009 alc

Eliminate the recomputation of pcb_cr3 from cpu_set_upcall(). The
bcopy()ed value from the old thread is the correct value because the new
thread and the old thread will share a page table.


189903 17-Mar-2009 jkim

Initial suspend/resume support for amd64.

This code is heavily inspired by Takanori Watanabe's experimental SMP patch
for i386 and large portion was shamelessly cut and pasted from Peter Wemm's
AP boot code.


189795 14-Mar-2009 alc

MFamd64 r189785
Update the pmap's resident page count when a page table page is freed in
pmap_remove_pde() and pmap_remove_pages().

MFC after: 6 weeks


189771 13-Mar-2009 dchagin

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


189572 09-Mar-2009 rwatson

Trim comments about the MP-safety of various bits of the amd64/i386
system call entry path and i386 IP checksum generation: we now assume
all code is MPSAFE unless explicitly marked otherwise. Remove XXX
Giant comments along similar lines: the code by the comments either
doesn't need or doesn't want Giant (especially the NMI handler).

MFC after: 3 days


189509 08-Mar-2009 sobomax

Small comment nit: "run time" -> "run-time".

Submitted by: rwatson


189423 05-Mar-2009 jhb

A better fix for handling different FPU initial control words for different
ABIs:
- Store the FPU initial control word in the pcb for each thread.
- When first using the FPU, load the initial control word after restoring
the clean state if it is not the standard control word.
- Provide a correct control word for Linux/i386 binaries under
FreeBSD/amd64.
- Adjust the control word returned for fpugetregs()/npxgetregs() when a
thread hasn't used the FPU yet to reflect the real initial control
word for the current ABI.
- The Linux/i386 ABI for FreeBSD/i386 now properly sets the right control
word instead of trashing whatever the current state of the FPU is.

Reviewed by: bde


189418 05-Mar-2009 jhb

Some cleanups to the i386 FPU support:
- Remove the control word parameter to npxinit(). It was always set
to __INITIAL_NPXCW__.
- Remove npx_cleanstate_ready as the cleanstate is always initalized
when it is used.
- Improve the handling of the case when the FPU isn't present. Now
the npx0 device no longer succeeds in its probe so all of npx_attach()
is skipped. Also, we allow this case with SMP (though that shouldn't
actually occur as all i386 systems that support SMP have FPUs) now.
SMP was only an issue back when we had an FPU emulator which was not
per-CPU.
- MFamd64: Clear some of the state in npx_cleanstate rather than leaving
it as garbage.
- MFamd64: When a user thread first uses the FPU, use npx_cleanstate for
the initial FPU state.

Reviewed by: bde


189057 25-Feb-2009 sobomax

Fix typo in comments in r189023.


189055 25-Feb-2009 jkim

Enable support for PAT_WRITE_PROTECTED and PAT_UNCACHED cache modes
unconditionally on amd64. On i386, we assume PAT is usable if the CPU
vendor is not Intel or CPU model is newer than Pentium IV.

Reviewed by: alc, jhb


189023 25-Feb-2009 sobomax

Make machdep.hyperthreading_enabled tunable working with the SCHED_ULE.
Unlike with SCHED_BSD, however, it can only be set to 0 at boot time,
it's not possible to change it at runtime.

Reviewed by: jhb
MFC after: 1 month


189004 24-Feb-2009 rdivacky

Change the functions to ANSI in those cases where it breaks promotion
to int rule. See ISO C Standard: SS6.7.5.3:15.

Approved by: kib (mentor)
Reviewed by: warner
Tested by: silence on -current


188932 23-Feb-2009 alc

Optimize free_pv_entry(); specifically, avoid repeated TAILQ_REMOVE()s.

MFC after: 1 week


188904 21-Feb-2009 jeff

- Resolve an issue where we may clear an idt while an interrupt on a
different cpu is still assigned to that vector by never clearing idt
entries. This was only provided as a debugging feature and the bugs
are caught by other means.
- Drop the sched lock when rebinding to reassign an interrupt vector
to a new cpu so that pending interrupts have a chance to be delivered
before removing the old vector.

Discussed with: tegge, jhb


188608 14-Feb-2009 alc

Remove unnecessary page queues locking around vm_page_busy() and
vm_page_wakeup(). (This change is applicable to RELENG_7 but not
RELENG_6.)

MFC after: 1 week


188403 09-Feb-2009 cognet

The bounce zone sees its page number increased if multiple dma maps use it in
the same dma tag. However, it can happen multiple dma tags share the same
bounce zone too, so add a per-bounce zone map counter, and check it instead of
the dma tag map counter, to know if we have to alloc more pages.

Reported by: miwi
Reviewed by: scottl


188350 08-Feb-2009 imp

When bouncing pages, allow a new option to preserve the intra-page
offset. This is needed for the ehci hardware buffer rings that assume
this behavior.

This is an interim solution, and a more general one is being worked
on. This solution doesn't break anything that doesn't ask for it
directly. The mbuf and uio variants with this flag likely don't work
and haven't been tested.

Universe builds with these changes. I don't have a huge-memory
machine to test these changes with, but will be happy to work with
folks that do and hps if this changes turns out not to be sufficient.

Submitted by: alfred@ from Hans Peter Selasky's original


188200 05-Feb-2009 kmacy

halt APs on reboot


188199 05-Feb-2009 kmacy

reboot instance on reset


187948 31-Jan-2009 obrien

Change some movl's to mov's. Newer GAS no longer accept 'movl' instructions
for moving between a segment register and a 32-bit memory location.

Looked at by: jhb


187880 29-Jan-2009 jeff

- Allocate apic vectors on a per-cpu basis. This allows us to allocate
more irqs as we have more cpus. This is principally useful on systems
with msi devices which may want many irqs per-cpu.

Discussed with: jhb
Sponsored by: Nokia


187598 22-Jan-2009 jkim

VIA Nano processor has a special MSR (CENT_HARDWARECTRL3) bit 32 to determine
whether TSC is P-state invariant or not. In fact, this MSR is writable but
we just leave it at the BIOS default for now.


187157 13-Jan-2009 jkim

Enable MSI support for VIA Nano processors on i386 (missing in r187118).


187118 12-Jan-2009 jkim

Add basic i386 support for VIA Nano processors.


187117 12-Jan-2009 jkim

Replace more strcmp(cpu_vendor, "foo") with cpu_vendor_id.


186557 29-Dec-2008 kmacy

merge 186535, 186537, and 186538 from releng_7_xen

Log:
- merge in latest xenbus from dfr's xenhvm
- fix race condition in xs_read_reply by converting tsleep to mtx_sleep

Log:
unmask evtchn in bind_{virq, ipi}_to_irq

Log:
- remove code for handling case of not being able to sleep
- eliminate tsleep - make sleeps atomic


186037 13-Dec-2008 jkoshy

- Bug fix: prevent a thread from migrating between CPUs between the
time it is marked for user space callchain capture in the NMI
handler and the time the callchain capture callback runs.

- Improve code and control flow clarity by invoking hwpmc(4)'s user
space callchain capture callback directly from low-level code.

Reviewed by: jhb (kern/subr_trap.c)
Testing (various patch revisions): gnn,
Fabien Thomas <fabien dot thomas at netasq dot com>,
Artem Belevich <artemb at gmail dot com>


186009 12-Dec-2008 jkim

Add more CPUID bits from AMD CPUID Specification Rev. 2.28.


185991 12-Dec-2008 jkoshy

Expose symbol `PMC_FN_USER_CALLCHAIN' to assembler code.


185933 11-Dec-2008 jhb

Add constants for fields in the local APIC error status register and a
routine to read it.


185461 30-Nov-2008 mav

According to "Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3B: System Programming Guide, Part 2", CPUs with family 0x6 and model
above or 0xE and CPUs with family 0xF and model above or 0x3 have invariant
TSC.


185343 26-Nov-2008 jkim

Use newly introduced cpu_vendor_id to make invariant TSC detection more
clearer and merge r185295 to amd64.


185341 26-Nov-2008 jkim

Introduce cpu_vendor_id and replace a lot of strcmp(cpu_vendor, "...").

Reviewed by: jhb, peter (early amd64 version)


185306 25-Nov-2008 ganbold

Remove unused variable.

Found with: Coverity Prevent(tm)
CID: 3685

Approved by: scottl


185295 25-Nov-2008 takawata

Core i7 supports invaliant TSC and the presense is presented on
this CPUID information, according to recently updated AP485.


185169 22-Nov-2008 kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


184564 02-Nov-2008 imp

MFp4:

Make the ISA bus keep track of more PNP details. Plus a minor style
fix while I'm here. More could be done here, but except for some SBCs
that don't have ACPI, there's limited value to anybody in doing so.


184500 31-Oct-2008 kib

The file was inadvertently excluded from r184499.


184499 31-Oct-2008 kib

Revert r184136. Instead, push the check for crashdumpmap overflow into the
MD i386 and amd64 dump code.

Requested by: jhb
Retested by: pho
MFC after: 3 days (+ 176304 + 184136)


184372 27-Oct-2008 sobomax

Fix r184323 - set stathz to be the same as lapic_timer_hz when lapic_timer_hz
is less than 128. Remove extra {} to match existing style.


184293 26-Oct-2008 sobomax

Fix division by zero panic if kern.hz less than 32.

MFC after: 1 day


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


184181 22-Oct-2008 jkim

Really fix i386 test this time.
A whole stack of pointyhat to me, please.


184169 22-Oct-2008 jkim

Add AMD Family 0Fh, Model 6Bh, Stepping 2 to the list of invariant TSCs
and fix i386 test.


184161 22-Oct-2008 ache

Fix compiler error with missing/unneded ')'


184146 22-Oct-2008 jkim

Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higher
even if BIOS does not advertise it.


184114 21-Oct-2008 kmacy

remove gratuitous XEN define


184113 21-Oct-2008 kmacy

don't globally define ipi_bitmap_handler on xen


184108 21-Oct-2008 jkim

Fix 'kern.timeconter.invariant_tsc' tunable and back out a redundant hack.
Somehow incomplete version was committed. :-(


184102 21-Oct-2008 jkim

Turn off CPU frequency change notifiers when the TSC is P-state invariant
or it is forced by setting 'kern.timecounter.invariant_tsc' tunable
to non-zero.


184101 21-Oct-2008 jkim

Detect Advanced Power Management Information for AMD CPUs.


184041 19-Oct-2008 kmacy

GC gratuitous XEN defines


183615 05-Oct-2008 davidxu

If the current thread has the trap bit set (i.e. a debugger had
single stepped the process to the system call), we need to clear
the trap flag from the new frame. Otherwise, the new thread will
receive a (likely unexpected) SIGTRAP when it executes the first
instruction after returning to userland.


183527 01-Oct-2008 peter

Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment. Textdump magic can be passed in.


183439 28-Sep-2008 marius

Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by: jhb
Reviewed by: arch, grehan, jhb


183413 27-Sep-2008 kib

Frames created by the Xcpustop, Xrendezvous, Xipi_intr_bitmap_handler
and Xlazypmap differ from the frame for Xtimerint. The Xtimerint puts
pointer to the frame between return address and frame body, while rest
of the functions listed above do not. Correct offset calculation to
allow the ddb backtrace to step over such frames.

Noted and reviewed by: tegge
Tested by: pho
MFC after: 1 week


183397 27-Sep-2008 ed

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


183322 24-Sep-2008 kib

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


183299 23-Sep-2008 obrien

The kernel implemented 'memcmp' is an alias for 'bcmp'. However, memcmp
and bcmp are not the same thing. 'man bcmp' states that the return is
"non-zero" if the two byte strings are not identical. Where as,
'man memcmp' states that the return is the "difference between the
first two differing bytes (treated as unsigned char values" if the
two byte strings are not identical.

So provide a proper memcmp(9), but it is a C implementation not a tuned
assembly implementation. Therefore bcmp(9) should be preferred over memcmp(9).


183207 20-Sep-2008 alc

MFamd64 SVN rev 179749 CVS rev 1.620
Reverse the direction of pmap_promote_pde()'s traversal over the specified
page table page. The direction of the traversal can matter if
pmap_promote_pde() has to remove write access (PG_RW) from a PTE that
hasn't been modified (PG_M). In general, if there are two or more such
PTEs to choose among, it is better to write protect the one nearer the
high end of the page table page rather than the low end. This is because
most programs access memory in an ascending direction. The net result of
this change is a sometimes significant reduction in the number of failed
promotion attempts and the number of pages that are write protected by
pmap_promote_pde().

MFamd64 SVN rev 179777 CVS rev 1.621
Tweak the promotion test in pmap_promote_pde(). Specifically, test PG_A
before PG_M. This sometimes prevents unnecessary removal of write access
from a PTE. Overall, the net result is fewer demotions and promotion
failures.


183169 19-Sep-2008 alc

MFamd64 SVN rev 179471 CVS rev 1.619
Correct an error in pmap_promote_pde() that may result in an errant
promotion within the kernel's address space.


183151 18-Sep-2008 stas

- Recognize SAVE and OSXSAVE extended processor features.

Approved by: kib (mentor)
MFC after: 1 month


183133 18-Sep-2008 kmacy

Don't do round robin assignment of interrupts on xen

MFC after: 1 month


183128 17-Sep-2008 jhb

MFamd64: More CPUID feature flags: SSE4, X2APIC, POPCNT, DTES64, and 1GB
large pages.

MFC after: 1 month


182961 12-Sep-2008 kib

When doing rfork(0), i.e. separating curproc VM from any other user of
the same vmspace, decrement the reference count of the shared LDT instead
of a newly-made copy. Code factually removed LDT from the process that
did rfork(0).

Introduce user_ldt_deref() function that does decrement of refcount for
the struct proc_ldt, and call it in the rfork(0) case on the shared LDT.

Reviewed by: jhb
MFC after: 1 week


182960 12-Sep-2008 kib

The user_ldt_alloc() function shall return with dt_lock locked.
The user_ldt_free() function shall return with dt_lock unlocked.
Error handling code in both functions do not handle this, fix it by
doing necessary lock/unlock.

While there, fix minor style nits.

MFC after: 1 week


182959 12-Sep-2008 kib

Remove warning about static LDT segment allocation. Applications
continue using it after ~7 years since warning was introduced, and there
is no reason to discourage them.

MFC after: 1 week


182936 11-Sep-2008 jhb

Update the comments above the 0xcf9 register reset attempt to match the
code. We only attempt a single reset using this method (a "hard" reset),
and we use two writes to ensure there is a 0 -> 1 transition in bit 2 to
force a reset.

MFC after: 1 week


182902 10-Sep-2008 kmacy

Get initial bootstrap of APs working under xen.
Note that the APs still blow up in sched_throw().

MFC after: 1 month


182220 26-Aug-2008 jkim

Move empty filter handling to MI source.

MFC after: 3 days


182173 25-Aug-2008 jkim

Fix a typo in copyrights.


182046 23-Aug-2008 jhb

Adjust the handling the various timer frequencies when using the lapic
timer. Previously, the various divisors were fixed which meant that while
it gave somewhat reasonable stathz, etc. at hz=1000, it went off the rails
with any other hz value. With these changes, we now pick a lapic timer hz
based on the value of hz. If hz is >= 1500, then the lapic timer runs at
hz. If 1500 hz >= 750, we run the lapic timer at hz * 2. If hz < 750, we
run at hz * 4. We compute a divider at runtime to make stathz run as close
to 128 as we can since stathz really wants to be run at something close to
that frequency. Profiling just runs on every clock tick. So some examples:

With hz = 100, the lapic timer now runs at 400 instead of 2000. stathz
will be 133, and profhz = 400. With hz = 1000 (default), the lapic timer
is still at 2000 (as it is now), stathz is at 133 (as it is now), and
profhz will be 2000 (previously 666).

MFC after: 2 weeks


182021 22-Aug-2008 kmacy

Don't try enumerating APICs when running on top of xen
(fixes boot on 64-bit dom0s)

MFC after: 1 month


181938 20-Aug-2008 kmacy

fix typo in previous commit breaking bootup

pointed out by: Takahashi Yoshihiro nyan@


181911 20-Aug-2008 kmacy

- clean up interrupt handling for xen a tiny bit
- parse the command line in to kenv
- defer shutdown watcher until later in boot

MFC after: 1 month


181894 20-Aug-2008 kmacy

don't use cpu_idle_acpi under xen

MFC after: 1 month


181853 18-Aug-2008 jkim

MFamd64: Correctly check unsignedness of all registers used
for load instructions with direct or indirect offsets.


181846 18-Aug-2008 jkim

- Make these files compilable on user land.
- Update copyrights and fix style(9).


181809 17-Aug-2008 kmacy

remove code in XEN version of init386 causing initialization failure

MFC after: 1 month


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


181792 16-Aug-2008 kmacy

Call in to xen for privileged aspects of context switching

MFC after: 1 month


181775 15-Aug-2008 kmacy

Integrate support for xen in to i386 common code.

MFC after: 1 month


181700 13-Aug-2008 jkim

Use int32_t/int16_t instead of int/short as sys/net/bpf_filter.c does.


181697 13-Aug-2008 jkim

- Remove unnecessary jump instruction(s) when offset(s) is/are zero(s).
- Constantly use conditional jumps for unsigned integers.


181696 13-Aug-2008 attilio

In the case of POWERFAIL_NMI, remove the Giant acquisitions because they
can lead to a deadlock if the thread owning the Giant lock is interrupted
by the NMI.
Instead, tollerate a small race on the x86 architecture.


181649 12-Aug-2008 jkim

MFamd64: Remove unused macros.


181648 12-Aug-2008 jkim

Update copyrights and fix style(9).


181645 12-Aug-2008 jkim

Reduce number of stack usages with unused %edi.


181606 11-Aug-2008 jhb

Decode some more "exotic" instructions including: fxsave, fxrstor, ldmxcsr,
stmxcsr, clflush, lfence, mfence, sfence, syscall, sysret, sysenter,
sysexit, pause, monitor, mwait, and swapgs (amd64 only).

MFC after: 1 week


181603 11-Aug-2008 jhb

MFamd64: Decode "cmov*" instructions.

MFC after: 1 week


181430 08-Aug-2008 stas

- Add cpuctl(4) pseudo-device driver to provide access to some low-level
features of CPUs like reading/writing machine-specific registers,
retrieving cpuid data, and updating microcode.
- Add cpucontrol(8) utility, that provides userland access to
the features of cpuctl(4).
- Add subsequent manpages.

The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX
is created for each cpu present in the systems. The pseudo-device minor
number corresponds to the cpu number in the system. The cpuctl(4) pseudo-
device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID
and UPDATE. The first pair alows the caller to read/write machine-specific
registers from the correspondent CPU. cpuid data could be retrieved using
the CPUID call, and microcode updates are applied via UPDATE.

The permissions are inforced based on the pseudo-device file permissions.
RDMSR/CPUID will be allowed when the caller has read access to the device
node, while WRMSR/UPDATE will be granted only when the node is opened
for writing. There're also a number of priv(9) checks.

The cpucontrol(8) utility is intened to provide userland access to
the cpuctl(4) device features. The utility also allows one to apply
cpu microcode updates.

Currently only Intel and AMD cpus are supported and were tested.

Approved by: kib
Reviewed by: rpaulo, cokane, Peter Jeremy
MFC after: 1 month


181284 04-Aug-2008 alc

Make pmap_kenter_attr() static.


181129 01-Aug-2008 jhb

Adjust comment. This stack is only used for booting now and not as an
idle stack.


180873 28-Jul-2008 alc

Correct an off-by-one error in the previous change to pmap_change_attr().
Change the nearby comment to mention the recursive map.


180871 28-Jul-2008 alc

Don't allow pmap_change_attr() to be applied to the recursive mapping.


180846 27-Jul-2008 alc

Style fixes to several function definitions.


180601 18-Jul-2008 alc

Correct an error in pmap_change_attr()'s initial loop that verifies that the
given range of addresses are mapped. Previously, the loop was testing the
same address every time.

Submitted by: Magesh Dhasayyan


180600 18-Jul-2008 alc

Simplify pmap_extract()'s control flow, making it more like the related
functions pmap_extract_and_hold() and pmap_kextract().


180533 15-Jul-2008 alc

Update bus_dmamem_alloc()'s first call to malloc() such that M_WAITOK is
specified when appropriate.

Reviewed by: scottl


180352 07-Jul-2008 alc

In FreeBSD 7.0 and beyond, pmap_growkernel() should pass VM_ALLOC_INTERRUPT
to vm_page_alloc() instead of VM_ALLOC_SYSTEM. VM_ALLOC_SYSTEM was the
logical choice before FreeBSD 7.0 because VM_ALLOC_INTERRUPT could not
reclaim a cached page. Simply put, there was no ordering between
VM_ALLOC_INTERRUPT and VM_ALLOC_SYSTEM as to which "dug deeper" into the
cache and free queues. Now, there is; VM_ALLOC_INTERRUPT dominates
VM_ALLOC_SYSTEM.

While I'm here, teach pmap_growkernel() to request a prezeroed page.

MFC after: 1 week


180255 04-Jul-2008 alc

Eliminate an unused declaration. (In fact, the declaration is bogus
because the variable is defined static to pmap.c on i386.)

Found by: CScout


179978 24-Jun-2008 jkim

Emit opcodes closer to GNU as(1) generated codes and micro-optimize.


179968 23-Jun-2008 jkim

Rehash and clean up BPF JIT compiler macros to match AT&T notations.


179304 25-May-2008 attilio

style fix for newly introduced macro.


179292 24-May-2008 bz

Restore buildable state. Style ignored.
Leave IDTVEC(ill) where it was unless we compile with KDTRACE_HOOKS[1].
Hide the with DTRACE case case under #ifdef KDTRACE_HOOKS.

Suggested by: attilio [1]
Reviewed by: attilio


179277 24-May-2008 jb

Add the DTrace hooks for exception handling (Function boundary trace
-fbt- provider), cyclic clock and syscalls.


179229 23-May-2008 alc

The VM system no longer uses setPQL2(). Remove it and its helpers.


179081 18-May-2008 alc

Retire pmap_addr_hint(). It is no longer used.


179049 16-May-2008 attilio

Removed unused assembly offsets for structures digging.


178947 11-May-2008 alc

Correct an error in pmap_align_superpage(). Specifically, correctly
handle the case where the mapping is greater than a superpage in size
but the alignment of the physical pages spans a superpage boundary.


178875 09-May-2008 alc

Introduce pmap_align_superpage(). It increases the starting virtual
address of the given mapping if a different alignment might result in more
superpage mappings.


178493 25-Apr-2008 alc

Always use PG_PS_FRAME to extract the physical address of a 2/4MB page
from a PDE.


178471 25-Apr-2008 jeff

- Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.

Sponsored by: Nokia


178429 22-Apr-2008 phk

Now that all platforms use genclock, shuffle things around slightly
for better structure.

Much of this is related to <sys/clock.h>, which should really have
been called <sys/calendar.h>, but unless and until we need the name,
the repocopy can wait.

In general the kernel does not know about minutes, hours, days,
timezones, daylight savings time, leap-years and such. All that
is theoretically a matter for userland only.

Parts of kernel code does however care: badly designed filesystems
store timestamps in local time and RTC chips almost universally
track time in a YY-MM-DD HH:MM:SS format, and sometimes in local
timezone instead of UTC. For this we have <sys/clock.h>

<sys/time.h> on the other hand, deals with time_t, timeval, timespec
and so on. These know only seconds and fractions thereof.

Move inittodr() and resettodr() prototypes to <sys/time.h>.
Retain the names as it is one of the few surviving PDP/VAX references.

Move startrtclock() to <machine/clock.h> on relevant platforms, it
is a MD call between machdep.c/clock.c. Remove references to it
elsewhere.

Remove a lot of unnecessary <sys/clock.h> includes.

Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs.
XXX: should be kern.disable_rtc_set really, it's not MD.


178092 11-Apr-2008 jeff

- Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.

Reviewed by: jhb
Sponsored by: Nokia


178072 10-Apr-2008 takawata

Don't break identity mapping set up for ACPI resume path.
With this change, BSP processor context seems to be recovered.


178070 10-Apr-2008 alc

Correct pmap_copy()'s method for extracting the physical address of a
2/4MB page from a PDE. Specifically, change it to use PG_PS_FRAME,
not PG_FRAME, to extract the physical address of a 2/4MB page from a
PDE.

Change the last argument passed to pmap_pv_insert_pde() from a
vm_page_t representing the first 4KB page of a 2/4MB page to the
vm_paddr_t of the 2/4MB page. This avoids an otherwise unnecessary
conversion from a vm_paddr_t to a vm_page_t in pmap_copy().


177967 07-Apr-2008 alc

Update pmap_page_wired_mappings() so that it counts 2/4MB page mappings.


177940 05-Apr-2008 jhb

Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).

Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64


177921 04-Apr-2008 alc

Reintroduce UMA_SLAB_KMAP; however, change its spelling to
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame
allocators. Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object. In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT. This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL. This change teaches
UMA how to reset the object field to the kernel object.

Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks


177702 29-Mar-2008 alc

Eliminate an #if 0/#endif that was unintentionally introduced
by the previous revision.


177690 28-Mar-2008 emaste

If we're returning successfully from bus_dmamem_alloc, don't record a KTR
of error = ENOMEM.


177684 28-Mar-2008 brooks

Use ; instead of : to end a line.

Submitted by: Niclas Zeising <niclas dot zeising at gmail dot com>


177680 28-Mar-2008 ps

Add support to mincore for detecting whether a page is part of a
"super" page or not.

Reviewed by: alc, ups


177659 27-Mar-2008 alc

MFamd64 with few changes:

1. Add support for automatic promotion of 4KB page mappings to 2MB page
mappings. Automatic promotion can be enabled by setting the tunable
"vm.pmap.pg_ps_enabled" to a non-zero value. By default, automatic
promotion is disabled. Tested by: kris

2. To date, we have assumed that the TLB will only set the PG_M bit in a
PTE if that PTE has the PG_RW bit set. However, this assumption does
not hold on recent processors from Intel. For example, consider a PTE
that has the PG_RW bit set but the PG_M bit clear. Suppose this PTE
is cached in the TLB and later the PG_RW bit is cleared in the PTE,
but the corresponding TLB entry is not (yet) invalidated.
Historically, upon a write access using this (stale) TLB entry, the
TLB would observe that the PG_RW bit had been cleared and initiate a
page fault, aborting the setting of the PG_M bit in the PTE. Now,
however, P4- and Core2-family processors will set the PG_M bit before
observing that the PG_RW bit is clear and initiating a page fault. In
other words, the write does not occur but the PG_M bit is still set.

The real impact of this difference is not that great. Specifically,
we should no longer assert that any PTE with the PG_M bit set must
also have the PG_RW bit set, and we should ignore the state of the
PG_M bit unless the PG_RW bit is set.


177642 26-Mar-2008 phk

The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].


177525 23-Mar-2008 kib

Prevent the overflow in the calculation of the next page directory.
The overflow causes the wraparound with consequent corruption of the
(almost) whole address space mapping.

As Alan noted, pmap_copy() does not require the wrap-around checks
because it cannot be applied to the kernel's pmap. The checks there are
included for consistency.

Reported and tested by: kris (i386/pmap.c:pmap_remove() part)
Reviewed by: alc
MFC after: 1 week


177467 20-Mar-2008 jhb

Implement a BUS_BIND_INTR() method in the bus interface to bind an IRQ
resource to a CPU. The default method is to pass the request up to the
parent similar to BUS_CONFIG_INTR() so that all busses don't have to
explicitly implement bus_bind_intr. A bus_bind_intr(9) wrapper routine
similar to bus_setup/teardown_intr() is added for device drivers to use.
Unbinding an interrupt is done by binding it to NOCPU. The IRQ resource
must be allocated, but it can happen in any order with respect to
bus_setup_intr(). Currently it is only supported on amd64 and i386 via
nexus(4) methods that simply call the intr_bind() routine.

Tested by: gallatin


177325 17-Mar-2008 jhb

Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.

Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}


177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


177181 14-Mar-2008 jhb

Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:

intr_bind(rman_get_start(irq_res), cpu);

however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.

Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}


177160 14-Mar-2008 jhb

Fix a silly bogon which prevented all the CPUs that are tagged as interrupt
receivers from being given interrupts if any CPUs in the system were not
tagged as interrupt receivers that I introduced when switching the x86
interrupt code to track CPUs via FreeBSD CPU IDs rather than local APIC
IDs. In practice this only affects systems with Hyperthreading (though
disabling HTT in the BIOS would workaround the issue) as that is the only
case currently where one can have CPUs that aren't tagged as interrupt
receivers. On a Dell SC1425 test box with 2 x Xeon w/ HTT (so 4 logical
CPUs of which 2 were interrupt receivers) the result was that all
device interrupts were sent to CPU 0.

MFC after: 1 week
Pointy hat to: jhb


177157 13-Mar-2008 jhb

Rework how the nexus(4) device works on x86 to better handle the idea of
different "platforms" on x86 machines. The existing code already handles
having two platforms: ACPI and legacy. However, the existing approach was
rather hardcoded and difficult to extend. These changes take the approach
that each x86 hardware platform should provide its own nexus(4) driver (it
can inherit most of its behavior from the default legacy nexus(4) driver)
which is responsible for probing for the platform and performing
appropriate platform-specific setup during attach (such as adding a
platform-specific bus device). This does mean changing the x86 platform
busses to no longer use an identify routine for probing, but to move that
logic into their matching nexus(4) driver instead.
- Make the default nexus(4) driver in nexus.c on i386 and amd64 handle the
legacy platform. It's probe routine now returns BUS_PROBE_GENERIC so it
can be overriden.
- Expose a nexus_init_resources() routine which initializes the various
resource managers so that subclassed nexus(4) drivers can invoke it from
their attach routine.
- The legacy nexus(4) driver explicitly adds a legacy0 device in its
attach routine.
- The ACPI driver no longer contains an new-bus identify method. Instead
it exposes a public function (acpi_identify()) which is a probe routine
that the MD nexus(4) drivers can use to probe for ACPI. All of the
probe logic in acpi_probe() is now moved into acpi_identify() and
acpi_probe() is just a stub.
- On i386 and amd64, an ACPI-specific nexus(4) driver checks for ACPI via
acpi_identify() and claims the nexus0 device if the probe succeeds. It
then explicitly adds an acpi0 device in its attach routine.
- The legacy(4) driver no longer knows anything about the acpi0 device.
- On ia64 if acpi_identify() fails you basically end up with no devices.
This matches the previous behavior where the old acpi_identify() would
fail to add an acpi0 device again leaving you with no devices.

Discussed with: imp
Silence on: arch@


177155 13-Mar-2008 jhb

Use the SMAP data from the loader if it is provided instead of using
virtual 86 mode to query the BIOS directly. This is needed for certain
HP machines whose BIOS only provide an SMAP when invoked from real mode.
On such machines the loader will be able to query the SMAP successfully
due to the recent BTX changes, but the kernel will not.

One thing I'm not sure of is if we can skip the INT 12h probe altogether
if we have the SMAP from the loader as it seems that we do the INT 12h
probe to setup enough state so we can use vm86 to call the BIOS.

MFC after: 1 week


177145 13-Mar-2008 kib

Since version 4.3, gcc changed its behaviour concerning the i386/amd64
ABI and the direction flag, that is it now assumes that the direction
flag is cleared at the entry of a function and it doesn't clear once
more if needed. This new behaviour conforms to the i386/amd64 ABI.

Modify the signal handler frame setup code to clear the DF {e,r}flags
bit on the amd64/i386 for the signal handlers.

jhb@ noted that it might break old apps if they assumed DF == 1 would be
preserved in the signal handlers, but that such apps should be rare and
that older versions of gcc would not generate such apps.

Submitted by: Aurelien Jarno <aurelien aurel32 net>
PR: 121422
Reviewed by: jhb
MFC after: 2 weeks


177137 13-Mar-2008 kib

Add missed parentheses


177125 12-Mar-2008 jhb

The variable MTRR registers actually have variable-sized PhysBase and
PhysMask fields based on the number of physical address bits supported
by the current CPU. The old code assumed 36 bits on i386 and 40 bits on
amd64. In truth, all Intel CPUs up until recently used 36 bits (a newer
Intel CPU uses 38 bits) and all the Opteron CPUs used 40 bits.

In at least one case (the new Intel CPU) having the size of the mask field
wrong resulted in writing questionable values into the MTRR registers on
the application processors (BSP as well if you modify the MTRRs via
memcontrol or running X, etc.). The result of the questionable physmask
was that all of memory was apparently treated as uncached rather than
write-back resulting in a very significant performance hit.

Fix this by constructing a run-time mask for the PhysBase and PhysMask
fields based on the number of physical address bits supported by the CPU.
All 64-bit capable CPUs provide a count of PA bits supported via the
0x80000008 extended CPUID feature, so use that if it is available. If that
feature is not available, then assume 36 PA bits.

While I'm here, expand the (now-unused) macros for the PhysBase and
PhysMask fields to the current largest possible value (52 PA bits).

MFC after: 1 week
PR: i386/120516
Reported by: Nokia


177124 12-Mar-2008 jhb

MFamd64: Break up the probe logic in the mem_drvinit routines so it's
a bit easier to parse.


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


177070 11-Mar-2008 jhb

Style(9) these files. No changes in the compiled code. (Verified by
diff'ing objdump -d output).


177069 11-Mar-2008 jhb

Add constants for the various fields in MTRR registers.

MFC after: 1 week
Verified by: md5(1)


177041 10-Mar-2008 jhb

Probe CPUs after the PCI hierarchy on i386, amd64, and ia64. This allows
the cpufreq drivers to reliably use properties of PCI devices for quirks,
etc.
- For the legacy drivers, add CPU devices via an identify routine in the
CPU driver itself rather than in the legacy driver's attach routine.
- Add CPU devices after Host-PCI bridges in the acpi bus driver.
- Change the ichss(4) driver to use pci_find_bsf() to locate the ICH and
check its device ID rather than having a bogus PCI attachment that only
checked for the ID in probe and always failed. As a side effect, you
can now kldload ichss after boot.
- Fix the ichss(4) driver to use the correct device_t for the ICH (and not
for ichss0) when doing PCI config space operations to enable SpeedStep.

MFC after: 2 weeks
Reviewed by: njl, Andriy Gapon avg of icyb.net.ua


177006 10-Mar-2008 jeff

- Rather than repeating the same preemption code everywhere call the scheduler
specific sched_preempt() routine.


176734 02-Mar-2008 jeff

- Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.

Sponsored by: Nokia


176664 29-Feb-2008 jhb

With the recent change to enable CPU brands from the VIA chips, the
code to add padlock features to the CPU model on VIA CPUs was no longer
effective. Change the code to instead output a separate printf during
dmesg for VIA Padlock features similar to other cpuid feature bitmasks.

MFC after: 1 week


176647 28-Feb-2008 jhb

- Check for the extended CPUID registers on VIA CPUs so we can get the
brand string.
- Fix a nit in the previous commit. "Eden" is a product name, not a core
name. The new ID is still for an "Esther" core.


176571 25-Feb-2008 jhb

Support the VIA C7 Eden CPU and treat it just like a C7 Esther. We may
want to adjust this code to just assume that all CPUs >= Esther should
be checked for the extended cpuid flags register.

MFC after: 3 days
PR: i386/119491


176304 15-Feb-2008 scottl

Teach the dump and minidump code to respect the maxioszie attribute of
the disk; the hard-coded assumption of 64K doesn't work in all cases.


176206 12-Feb-2008 scottl

If busdma is being used to realign dynamic buffers and the alignment is set to
PAGE_SIZE or less, the bounce page counting logic was flawed and wouldn't
reserve any pages. Adjust to be correct. Review of other architectures is
forthcoming.

Submitted by: Joseph Golio


176153 10-Feb-2008 phk

Add support for PC Engines ALIX boards.

Style cleanup.

Hide some messages behind bootverbose.


175768 28-Jan-2008 ru

Add a wrapper function that bound checks writes to the dump device.


175404 17-Jan-2008 alc

Retire PMAP_DIAGNOSTIC. Any useful diagnostics that were conditionally
compiled under PMAP_DIAGNOSTIC are now KASSERT()s. (Note: The kernel
option DIAGNOSTIC still disables inlining of certain pmap functions.)

Eliminate dead code from pmap_enter(). This code implemented an assertion.
On i386, an equivalent check is already implemented. However, on amd64,
a small change is required to implement an equivalent check.

Eliminate \n from a nearby panic string.

Use KASSERT() to reimplement pmap_copy()'s two assertions.


175328 14-Jan-2008 peter

Add a CTASSERT that KERNBASE is valid. This is usually messed up by an
invalid KVA_PAGES, so add a pointer to there.


175155 08-Jan-2008 alc

Convert a PMAP_DIAGNOSTIC to a KASSERT.


175119 06-Jan-2008 alc

Shrink the size of struct vm_page on amd64 and i386 by eliminating
pv_list_count from struct md_page. Ever since Peter rewrote the pv
entry allocator for amd64 and i386 pv_list_count has been correctly
maintained but otherwise unused.


175067 03-Jan-2008 alc

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


175056 02-Jan-2008 alc

Provide a legitimate pindex to vm_page_alloc() in pmap_growkernel()
instead of writing apologetic comments. As it turns out, I need every
kernel page table page to have a legitimate pindex to support superpage
promotion on kernel memory.

Correct a nearby style error: Pointers should be compared to NULL.


175020 31-Dec-2007 jhb

Include a "pae" feature if an i386 kernel is built with PAE support.

Obtained from: Yahoo!


174898 25-Dec-2007 rwatson

Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.


174677 16-Dec-2007 rpaulo

Fix previous commit. The code ended up in the wrong function.

Approved by: njl (mentor)


174557 12-Dec-2007 rpaulo

Disallow the legacy USB circuit to generate an SMI# via an ICH
register (MacBooks only).
This allows MacBooks to boot in SMP mode without any trick and solves
the timer problems with HZ=1000.

MFC after: 1 week

Reviewed by: njl (mentor), jhb
Approved by: njl (mentor), jhb


174496 09-Dec-2007 alc

Eliminate compilation warnings due to the use of non-static inlines
through the introduction and use of the __gnu89_inline attribute.

Submitted by: bde (i386)
MFC after: 3 days


174395 07-Dec-2007 jkoshy

Kernel and hwpmc(4) support for callchain capture.

Sponsored by: FreeBSD Foundation and Google Inc.


174254 04-Dec-2007 kib

Fix the ABI change of the signal delivered on the access to the page
with insufficient protection mode.

For the i386 and amd64, create the tunable, machdep.prot_fault_translation,
with the following behaviour:
0 = autodetect the signal to be delivered on KERN_PROTECTION_FAILURE
from vm_fault based on the ELF OSABI note:
no note or __FreeBSD_version < 700004 - SIGBUS/BUS_PAGE_FAULT
note, and __FreeBSD_version >= 700004 - SIGSEGV/SEGV_ACCERR
1 = always SIGBUS/BUS_PAGE_FAULT
2 = always SIGSEGV/SEGV_ACCERR

This would do mostly automatic correction of ABI breakage, with the exception
of the untaged binaries for 7-CURRENT/RELENG_7 before the note is fixed. For
them, sysctl would allow to run the binary with manual settings.

Discussed with: portmgr (kris)
PR: kern/118304
MFC after: 3 days


174250 04-Dec-2007 alc

Correct an error under COUNT_IPIS within pmap_lazyfix_action(): Increment
the counter that the pointer refers to, not the pointer.

MFC after: 3 days


174196 02-Dec-2007 rwatson

Remove duplicate $FreeBSD$ tag.


174195 02-Dec-2007 rwatson

Break out stack(9) from ddb(4):

- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)


174104 30-Nov-2007 alc

Improve get_pv_entry()'s handling of low-memory conditions. After page
allocation fails and pv entries are reclaimed, there may be an unused pv
entry in a pv chunk that survived the reclamation. However, previously,
after reclamation, get_pv_entry() did not look for an unused pv entry in
a surviving pv chunk; it simply retried the page allocation. Now, it
does look for an unused pv entry before retrying the page allocation.

Note: This only applies to RELENG_7. Earlier branches use a different
pv entry allocator.

MFC after: 6 weeks


173988 27-Nov-2007 jhb

Remove the 'needbounce' variable from the _bus_dmamap_load_buffer()
routine. It is not needed as the existing tests for segment coalescing
already handle bounced addresses and it prevents legal segment coalescing
in certain edge cases.

MFC after: 1 week
Reviewed by: scottl


173855 23-Nov-2007 jkoshy

MFP4: Add assembly language symbols used by hwpmc(4)'s callchain capture.


173799 21-Nov-2007 scottl

Extend critical section coverage in the low-level interrupt handlers to
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.

Reviewed by: jhb, jeff, silby


173708 17-Nov-2007 alc

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


173615 14-Nov-2007 marcel

o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().

i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().

PR: ia64/118024


173601 14-Nov-2007 julian

A bunch more files that should probably print out a thread name
instead of a process name.


173600 14-Nov-2007 julian

generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.


173592 13-Nov-2007 peter

Drastically simplify the i386 pcpu backend by merging parts of the
amd64 mechanism over. Instead of page table hackery that isn't
actually needed, just use 'struct pcpu __pcpu[MAXCPU]' for backing like
all the other platforms do. Get rid of 'struct privatespace' and a
while mess of #ifdef SMP garbage that set it up. As a bonus, this
returns the 4MB of KVA that we stole to implement it the old way.
This also allows you to read the pcpu data for each cpu when reading a
minidump.

Background information: Originally, pcpu stuff was implemented as having
per-cpu page tables and magic to make different data structures appear
at the same actual address. In order to share page tables, we switched
to using the GDT and %fs/%gs to access it. But we still did the evil
magic to set it up for the old way. The "idle stacks" are not used
for the idle process anymore and are just used for a few functions during
bootup, then ignored. (excercise for reader: free these afterwards).


173370 05-Nov-2007 alc

Add comments explaining why all stores updating a non-kernel page table
must be globally performed before calling any of the TLB invalidation
functions.

With one exception, on amd64, this requirement was already met. Fix this
one case. Also, as a clarification, change an existing atomic op into a
release. (Suggested by: jhb)

Reported and reviewed by: ups
MFC after: 3 days


173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


173296 03-Nov-2007 alc

Eliminate spurious "Approaching the limit on PV entries, ..."
warnings. Specifically, whenever vm_page_alloc(9) returned NULL to
get_pv_entry(), we issued a warning regardless of the number of pv
entries in use. (Note: The older pv entry allocator in RELENG_6 does
not have this problem.)

Reported by: Jeremy Chadwick

Eliminate the direct call to pagedaemon_wakeup() by get_pv_entry().
This was a holdover from earlier times when the page daemon was
responsible for the reclamation of pv entries.

MFC after: 5 days


173118 28-Oct-2007 jhb

- Add constants for the different memory types in the SMAP table.
- Use the SMAP types and constants from <machine/pc/bios.h> in the boot
code rather than duplicating it.


172937 24-Oct-2007 jhb

Update copyright attribution.

MFC after: 3 days


172835 20-Oct-2007 bz

Fold multiple asm statements into one so that the compiler at a certain
optimization level (-march=pentium-mmx for example) does not insert
intermediate ops which would trash the carry.

Change both sys/i386/i386/in_cksum.c[1] and sys/i386/include/in_cksum.h.

To my best understanding the same problem was addressed in rev. 1.16
of src/sys/i386/include/in_cksum.h for just a single function 3y ago.

Reviewed by: jhb
Submitted by: Zhouyi ZHOU <zhouzhouyi FreeBSD.org> (intial version of [1])
MFC after: 5 days
PR: 115678, 69257


172394 30-Sep-2007 marius

Make the PCI code aware of PCI domains (aka PCI segments) so we can
support machines having multiple independently numbered PCI domains
and don't support reenumeration without ambiguity amongst the
devices as seen by the OS and represented by PCI location strings.
This includes introducing a function pci_find_dbsf(9) which works
like pci_find_bsf(9) but additionally takes a domain number argument
and limiting pci_find_bsf(9) to only search devices in domain 0 (the
only domain in single-domain systems). Bge(4) and ofw_pcibus(4) are
changed to use pci_find_dbsf(9) instead of pci_find_bsf(9) in order
to no longer report false positives when searching for siblings and
dupe devices in the same domain respectively.
Along with this change the sole host-PCI bridge driver converted to
actually make use of PCI domain support is uninorth(4), the others
continue to use domain 0 only for now and need to be converted as
appropriate later on.
Note that this means that the format of the location strings as used
by pciconf(8) has been changed and that consumers of <sys/pciio.h>
potentially need to be recompiled.

Suggested by: jhb
Reviewed by: grehan, jhb, marcel
Approved by: re (kensmith), jhb (PCI maintainer hat)


172216 18-Sep-2007 phk

Recognize the Soekris NET5501 and configure the error led.

Add watchdog(4) support by using the MFGPT0 in the Geode LX CX5536.
(Supported range: 2^30 .. 2^44 ns = 1s ... 5h)

Approved by: re (bmah)


172212 17-Sep-2007 peter

Fix an undefined symbol that as/ld neglected to flag as a problem. It
was used in assembler code in such a way that no unresolved relocation
records were generated, so ld didn't flag the problem. You can see
this with an 'nm' of the kernel. There will be 'U MAXCPU' on SMP systems.

The impact of this is that the intrcount/intrnames arrays do not have
the intended amount of space reserved. This could lead to interesting
problems due to the arrays being present in the middle of kernel code.
An overflow would be rather interesting as executable code would be used
as per-cpu incrementing interrupt counters.

This fixes it for now by exporting MAXCPU to the assembler. A better fix
might be to define these data structures in C - they're only referenced
in the kernel from C code these days anyway.

Approved by: re (kensmith)


172207 17-Sep-2007 jeff

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


172144 11-Sep-2007 attilio

This is a follow-up, cleaning-up commit about recent changes involving
topology foo functions.
Working at the patch for topology problems in ia32/amd64 evicted some
problems regarding functions ordering in the SI_SUB_CPU family of
SYSINIT'ed subsystems.
In order to avoid problems with new modified to involved functions, a
correct ordering is not semantically specified for SI_SUB_CPU functions
(for a larger view of the issue please visit:
http://lists.freebsd.org/pipermail/freebsd-current/2007-July/075409.html )

Discussed with: peter
Tested by: kris, Rui Paulo <rpaulo@FreeBSD.org>
Approved by: jeff
Approved by: re


171916 22-Aug-2007 jkoshy

Assign sizes to assembly language support functions.

Approved by: re (kensmith)


171907 21-Aug-2007 alc

In general, when we map a page into the kernel's address space, we no
longer create a pv entry for that mapping. (The two exceptions are
mappings into the kernel's exec and pipe submaps.) Consequently, there is
no reason for get_pv_entry() to dig deep into the free page queues, i.e.,
use VM_ALLOC_SYSTEM, by default. This revision changes get_pv_entry() to
use VM_ALLOC_NORMAL by default, i.e., before calling pmap_collect() to
reclaim pv entries.

Approved by: re (kensmith)


171797 09-Aug-2007 njl

Add "show sysregs" command to ddb. On i386, this gives gdt, idt, ldt,
cr0-4, etc. Support should be added for other platforms that have a
different set of registers for system use.

Loosely based on: OpenBSD
Approved by: re


171702 02-Aug-2007 peter

Move mp_topology() from apic_init(i386) and apic_setup_local(amd64) to
cpu_start_mp(). This is after we have read the cpuid registers to
calculate the hyperthreading_cpus value for the sysctl that enables or
disables hyperthread cores. Change mp_topology() to use that information
rather than trying to do it itself.

This solves the problem of ULE being incorrectly told that dual core
Athlon64 X2 or Operton cpus are hyperthreading cores. At the very least,
we now have a single piece of code to identify hyperthreading.

Obtained from: jhb
Approved by: re (kensmith)


171597 26-Jul-2007 jhb

If the trap number stored in the trapframe is corrupted into a negative
value, then we would use a negative index into the trap_msg[] array
resulting in a nested page fault. Make the 'type' variable holding the
trap number unsigned to avoid this.

MFC after: 2 weeks
Approved by: re (rwatson)


171480 17-Jul-2007 jeff

- Add support for blocking and releasing threads to i386 cpu_switch(). This
is required for per-cpu scheduler lock support.

Obtained from: attilio
Tested by: current@ many users
Approved by: re


171309 08-Jul-2007 attilio

NULL_LDT_BASE is used in !SMP kernels too and set_user_ldt() is not
properly called. Address these two issues.

Reported by: Tinderbox
Tested by: le
Approved by: jeff (mentor)
Approved by: re


171295 07-Jul-2007 attilio

Actual code shows several problems in ia32 LDT handling:
- When a LDT entry changes, the old one is freed while it is still
referenced by gdt and ldtr. This can lead to disruptive behaviours in
particular on SMP machines.
- When a LDT entry changes, it is assumed that the only one entity sharing
the same LDT are threads in the same proc. It doesn't take in account
edge cases where two processes share the same VM (rfork'ed ones, for
example).

This patch addresses these two problems and addictionally it fixes the
usage of refcount switching back it to the old manually-grown refcount
(since in this case would be faster).

Diagnosed by: tegge
Tested by: pho (a former version)
Reviewed by: kib
Approved by: jeff (mentor)
Approved by: re


171128 01-Jul-2007 alc

Pages that do belong to an object and page queue can now be freed without
holding the page queues lock. Thus, the page table pages released by
pmap_remove() and pmap_remove_pages() can be freed after the page queues
lock is released.

Approved by: re (kensmith)


170688 13-Jun-2007 jhb

Don't clobber tf_err with the eva from a page fault as the page fault
address is saved in ksi_addr already.

PR: i386/101379
Submitted by: Tijl Coosemans : tijl ulyssis org


170564 11-Jun-2007 mjacob

Check against maxsegsz being zero in bus_dma_tag_create and return EINVAL
if it is.

Reviewed by: scott long


170517 10-Jun-2007 attilio

Optimize vmmeter locking.
In particular:
- Add an explicative table for locking of struct vmmeter members
- Apply new rules for some of those members
- Remove some unuseful comments

Heavily reviewed by: alc, bde, jeff
Approved by: jeff (mentor)


170368 06-Jun-2007 davidxu

Backout experimental adaptive-spin umtx code.


170340 05-Jun-2007 jhb

Move a warning under bootverbose as no machines that trigger it have ended
up being broken.


170319 05-Jun-2007 alc

Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] and dump_avail[] using one of these
definitions.

Approved by: re


170307 05-Jun-2007 jeff

Commit 14/14 of sched_lock decomposition.
- Use thread_lock() rather than sched_lock for per-thread scheduling
sychronization.
- Use the per-process spinlock rather than the sched_lock for per-process
scheduling synchronization.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170305 04-Jun-2007 jeff

- Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170304 04-Jun-2007 jeff

Commit 11/14 of sched_lock decomposition.
- There is no globally visible scheduler lock any longer. For now the
watchdog can only check Giant. This model of checking particular locks
is flawed and should be revisited. Other metrics should be considered.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170303 04-Jun-2007 jeff

Commit 10/14 of sched_lock decomposition.
- Use sched_throw() rather than replicating the same cpu_throw() code for
each architecture. This also allows the scheduler to use any locking it
may want to.
- Use the thread_lock() rather than sched_lock when preempting.
- The scheduler lock is not required to synchronize release_aps.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170291 04-Jun-2007 attilio

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


170289 04-Jun-2007 dwmalone

Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported. In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.


170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


170162 31-May-2007 piso

In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb


170114 29-May-2007 des

Add descriptive comment to PDCM entry.


170112 29-May-2007 des

Remove a pointless bootverbose message.

MFC after: 3 days


170111 29-May-2007 des

Add feature name for features2 bit 15.

PR: i386/113133
Submitted by: Pankov Pavel <pankov_p@mail.ru>
MFC after: 3 days


170110 29-May-2007 attilio

Fix some problems introduced with the last descriptors tables locking
patch:
- Do the correct test for ldt allocation
- Drop dt_lock just before to call kmem_free (since it acquires blocking
locks inside)
- Solve a deadlock with smp_rendezvous() where other CPU will wait
undefinitively for dt_lock acquisition.
- Add dt_lock in the WITNESS list of spinlocks

While applying these modifies, change the requirement for user_ldt_free()
making that returning without dt_lock held.

Tested by: marcus, tegge
Reviewed by: tegge
Approved by: jeff (mentor)


170086 29-May-2007 yongari

Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by: scottl


170028 27-May-2007 rwatson

Remove "XXX Giant" comments before calls to kdb_trap() -- the kernel
debugger is quite capable of handling Giant-free execution at this
point. Several other similar comments remain in trap.c on both i386
and amd64 awaiting analysis.


170021 27-May-2007 rwatson

Don't check curproc->p_comm for NULL as it is an array allocated as part of
struct proc.

Found with: Coverity Prevent(tm)
CID: 2096


169895 23-May-2007 kib

Move futex support code from <arch>/support.s into linux compat directory.
Implement all futex atomic operations in assembler to not depend on the
fuword() that does not allow to distinguish between -1 and failure return.
Correctly return 0 from atomic operations on success.

In collaboration with: rdivacky
Tested by: Scot Hetzel <swhetzel gmail com>, Milos Vyletel <mvyletel mzm cz>
Sponsored by: Google SoC 2007


169846 22-May-2007 kan

Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.


169805 20-May-2007 jeff

- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.

Suggested by: julian@
Contributed by: attilio@


169802 20-May-2007 jeff

- Move GDT/LDT locking into a seperate spinlock, removing the global
scheduler lock from this responsibility.

Contributed by: Attilio Rao <attilio@FreeBSD.org>
Tested by: jeff, kkenn


169799 20-May-2007 mjacob

Initializae lastaddr to 0 in bus_dmamap_load_uio so that
_bus_dmamap_load_buffer won't (potentially) be confused.

Discovered by: gcc 4.2

MFC after: 3 days


169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


169395 08-May-2007 jhb

Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses
an APIC ID of 38 for its second CPU):
- Add a new MAX_APIC_ID constant for the highest valid APIC ID for modern
systems.
- Size the various arrays in the MADT, MP Table, and SMP code that are
indexed by APIC IDs to allow for up to MAX_APIC_ID.
- Explicitly go through and assign logical cpu ids to local APICs before
starting any of the APs up rather than doing it while starting up the
APs. This step is now where we honor MAXCPU.

MFC after: 1 week


169391 08-May-2007 jhb

Minor fixes and tweaks to the x86 interrupt code:
- Split the intr_table_lock into an sx lock used for most things, and a
spin lock to protect intrcnt_index. Originally I had this as a spin lock
so interrupt code could use it to lookup sources. However, we don't
actually do that because it would add a lot of overhead to interrupts,
and if we ever do support removing interrupt sources, we can use other
means to safely do so w/o locking in the interrupt handling code.
- Replace is_enabled (boolean) with is_handlers (a count of handlers) to
determine if a source is enabled or not. This allows us to notice when
a source is no longer in use. When that happens, we now invoke a new
PIC method (pic_disable_intr()) to inform the PIC driver that the
source is no longer in use. The I/O APIC driver frees the APIC IDT
vector when this happens. The MSI driver no longer needs to have a
hack to clear is_enabled during msi_alloc() and msix_alloc() as a result
of this change as well.
- Add an apic_disable_vector() to reset an IDT vector back to Xrsvd to
complement apic_enable_vector() and use it in the I/O APIC and MSI code
when freeing an IDT vector.
- Add a new nexus hook: nexus_add_irq() to ask the nexus driver to add an
IRQ to its irq_rman. The MSI code uses this when it creates new
interrupt sources to let the nexus know about newly valid IRQs.
Previously the msi_alloc() and msix_alloc() passed some extra stuff
back to the nexus methods which then added the IRQs. This approach is
a bit cleaner.
- Change the MSI sx lock to a mutex. If we need to create new sources,
drop the lock, create the required number of sources, then get the lock
and try the allocation again.


169320 06-May-2007 piso

Bring in the reminaing bits to make interrupt filtering work:

o push much of the i386 and amd64 MD interrupt handling code
(intr_machdep.c::intr_execute_handlers()) into MI code
(kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
(intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
and make them part of 'struct intr_event', passing them as arguments to
kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
with filter and ithread functions.

Approved by: re (implicit?)


169221 02-May-2007 jhb

Revamp the MSI/MSI-X code a bit to achieve two main goals:
- Simplify the amount of work that has be done for each architecture by
pushing more of the truly MI code down into the PCI bus driver.
- Don't bind MSI-X indicies to IRQs so that we can allow a driver to map
multiple MSI-X messages into a single IRQ when handling a message
shortage.

The changes include:
- Add a new pcib_if method: PCIB_MAP_MSI() which is called by the PCI bus
to calculate the address and data values for a given MSI/MSI-X IRQ.
The x86 nexus drivers map this into a call to a new 'msi_map()' function
in msi.c that does the mapping.
- Retire the pcib_if method PCIB_REMAP_MSIX() and remove the 'index'
parameter from PCIB_ALLOC_MSIX(). MD code no longer has any knowledge
of the MSI-X index for a given MSI-X IRQ.
- The PCI bus driver now stores more MSI-X state in a child's ivars.
Specifically, it now stores an array of IRQs (called "message vectors" in
the code) that have associated address and data values, and a small
virtual version of the MSI-X table that specifies the message vector
that a given MSI-X table entry uses. Sparse mappings are permitted in
the virtual table.
- The PCI bus driver now configures the MSI and MSI-X address/data
registers directly via custom bus_setup_intr() and bus_teardown_intr()
methods. pci_setup_intr() invokes PCIB_MAP_MSI() to determine the
address and data values for a given message as needed. The MD code
no longer has to call back down into the PCI bus code to set these
values from the nexus' bus_setup_intr() handler.
- The PCI bus code provides a callout (pci_remap_msi_irq()) that the MD
code can call to force the PCI bus to re-invoke PCIB_MAP_MSI() to get
new values of the address and data fields for a given IRQ. The x86
MSI code uses this when an MSI IRQ is moved to a different CPU, requiring
a new value of the 'address' field.
- The x86 MSI psuedo-driver loses a lot of code, and in fact the separate
MSI/MSI-X pseudo-PICs are collapsed down into a single MSI PIC driver
since the only remaining diff between the two is a substring in a
bootverbose printf.
- The PCI bus driver will now restore MSI-X state (including programming
entries in the MSI-X table) on device resume.
- The interface for pci_remap_msix() has changed. Instead of accepting
indices for the allocated vectors, it accepts a mini-virtual table
(with a new length parameter). This table is an array of u_ints, where
each value specifies which allocated message vector to use for the
corresponding MSI-X message. A vector of 0 forces a message to not
have an associated IRQ. The device may choose to only use some of the
IRQs assigned, in which case the unused IRQs must be at the "end" and
will be released back to the system. This allows a driver to use the
same remap table for different shortage values. For example, if a driver
wants 4 messages, it can use the same remap table (which only uses the
first two messages) for the cases when it only gets 2 or 3 messages and
in the latter case the PCI bus will release the 3rd IRQ back to the
system.

MFC after: 1 month


169042 25-Apr-2007 ariff

Disable C1 Enhanced mode on AMD K8 Family Revision F and above to keep
local APIC timer alive.

Reviewed by: jhb
PR: i386/104678
MFC after: 3 days


169030 24-Apr-2007 jhb

Fix the triple fault used as a last resort during a reboot to actually
fault. The previous method zero'd out the page tables, invalidated the
TLB, and then entered a spin loop. The idea was that the instruction after
the TLB invalidate would result in a page fault and the page fault and
subsequent double fault wouldn't be able to determine the physical page
for their fault handlers' first instruction. This stopped working when
PGE (PG_G PTE/PDE bit) support was added as a TLB invalidate via %cr3
reload doesn't clear TLB entries with PG_G set. Thus, the CPU was still
able to map the virtual address for the spin loop and happily performed
its infinite loop.

The triple fault now uses a much more deterministic sledge-hammer approach
to generate a triple fault. First, the IDT descriptor is set to point to
an empty IDT, so any interrupts (including a double fault) will instantly
fault. Second, we trigger a int 3 breakpoint to force an interrupt and
kick off a triple fault.

MFC after: 3 days


169020 24-Apr-2007 jhb

Update comments for the 0xcf9 and 0x92 reset methods to explain what we are
actually doing and what the various bits mean.


168996 23-Apr-2007 jhb

Tweak printf string.


168951 22-Apr-2007 rwatson

Remove MAC Framework access control check entry points made redundant with
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting. These entry points exactly aligned with privileges and
provided no additional security context:

- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()

Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies. These mostly, but not
entirely, align with the set of privileges granted in jails.

Obtained from: TrustedBSD Project


168930 21-Apr-2007 ups

Modify TLB invalidation handling.

Reviewed by: alc@, peter@
MFC after: 1 week


168857 19-Apr-2007 phk

style nit


168837 18-Apr-2007 phk

On AMD's Geode LX: Force the TSC to run through core-suspension so we can
use it as a timecounter.

Sponsored by: Soekris Engineering


168822 17-Apr-2007 jhb

Honor the BUS_DMA_NOCACHE flag to bus_dmamem_alloc() on amd64 and i386 by
mapping the pages as UC (uncacheable) using pmap_change_attr().

MFC after: 1 week
Requested by: ariff
Reviewed by: scottl


168691 13-Apr-2007 alc

Eliminate the misuse of PG_FRAME to truncate a virtual address to a virtual
page boundary.

Reviewed by: ru@


168661 12-Apr-2007 ru

Fix PAE on SMP by enabling EFER_NXE on all APs.

Reported by: kris
Diagnosed by: alc


168439 06-Apr-2007 ru

Add the PG_NX support for i386/PAE.

Reviewed by: alc


168109 31-Mar-2007 jkim

Correct BB-profiling and adjust comments.

Pointed out by: bde
Reviewed by: bde


168102 30-Mar-2007 jkim

Fix off-by-4 error in address validation for i386, reduce PCB reloading, and
fix more style(9) nits.

Pointed out by: bde
Discussed with: kib
Reviewd by: bde


168088 30-Mar-2007 jkim

Fix more style(9) nits[1] and remove unnecessary use of '#if !defined(_KERNEL)'.

Pointed out by: bde[1]


168078 30-Mar-2007 jkim

Use the same wisdom of sys/i386/i386/support.s 1.97 to remove obfuscation.

Pointed out by: bde


168037 30-Mar-2007 jkim

MFP4: Linux futex support for amd64.

Initial patch was submitted by kib and additional work was done
by Divacky Roman.

Tested by: emulation


167950 27-Mar-2007 n_hibma

Revisit the watchdogs: Resetting the error to EINVAL after failing to set the
watchdog might hide the succesful arming of an earlier one. Accept that on
failing to arm any watchdog (because of non-supported timeouts) EOPNOTSUPP is
returned instead of the more appropriate EINVAL.

MFC after: 3 days


167912 26-Mar-2007 kris

Remove unnecessary giant acquisition around panic in #ifdef DIAGNOSTIC
code.

# There is some question about whether this code is even relevant any
# longer (it dates back to prehistoric times, i.e. present in r1.1),
# especially on amd64.

Reviewed by: jhb


167905 26-Mar-2007 njl

Add an interface for drivers to be notified of changes to CPU frequency.
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change. cpufreq_post_change provides the results of the
change (success or failure). cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed. Hook
in all the drivers I could find that needed it.

* TSC: update TSC frequency value. When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq. This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.

Reviewed by: bde, phk
MFC after: 1 month


167869 24-Mar-2007 alc

In order to satisfy ACPI's need for an identity mapping, modify the
temporary mapping created by locore so that the lowest two to four
megabytes can become a permanent identity mapping. This implementation
avoids any use of a large page mapping.


167767 21-Mar-2007 jhb

Change the amd64, i386, and ia64 nexus drivers to setup bus space tags and
handles when activating a resource via bus_activate_resource() rather than
doing some of the work in bus_alloc_resource() and some of it in
bus_activate_resource().

One note is that when using isa_alloc_resourcev() on PC-98, drivers now
need to just use bus_release_resource() without explicitly calling
bus_deactivate_resource() first. nyan@ has already fixed all of the PC-98
drivers.


167747 20-Mar-2007 jhb

Add a new apic0 psuedo-device to claim memory resources for the memory
address ranges used by local and I/O APICs in the system. Some systems
also reserve these ranges as system resources via either PnPBIOS or
ACPI, so this device currently attaches after acpi0 and legacy0 so that
the system resources are given precedence.


167745 20-Mar-2007 jhb

Add a new ram0 pseudo-device that claims memory resouces for physical
addresses corresponding to system RAM. On amd64 ram0 uses the SMAP
and claims all the type 1 SMAP regions. On i386 ram0 uses the
dump_avail[] array. Note that on i386 we have to ignore regions above
4G in PAE kernels since bus resources use longs.


167744 20-Mar-2007 jkim

- Add macros for newly added CPUID bits in the corresponding header files.
- Use correct capticalization in xTPR as Intel uses in their documents.
- Use proper description instead of vendor code name in comment.


167742 20-Mar-2007 jhb

Tweak the probe/attach order of devices on the x86 nexus devices.
Various BIOS-related psuedo-devices are added at an order of 5. acpi0 is
added at an order of 10, and legacy0 is added at an order of 11.


167691 19-Mar-2007 sam

display two new Intel feature bits

Submitted by: "Rui Paulo" <rpaulo@gmail.com>
MFC after: 2 weeks


167668 17-Mar-2007 alc

Eliminate an unused parameter.


167578 14-Mar-2007 njl

Create an identity mapping (V=P) super page for the low memory region on
boot. Then, just switch to the kernel pmap when suspending instead of
allocating/freeing our own mapping every time. This should solve a panic
of pmap_remove() being called with interrupts disabled. Thanks to Alan
Cox for developing this patch.

Note: this means that ACPI requires super page (PG_PS) support in the CPU.
This has been present since the Pentium and first documented in the
Pentium Pro. However, it may need to be revisited later.

Submitted by: alc
MFC after: 1 month


167493 12-Mar-2007 jkim

Add another CPUID for AMD CPUs and fix style(9) while I am here.


167364 09-Mar-2007 jhb

Defer calling lapic_init() until we've completed the 'MPTable: <...>'
printf. Otherwise, printfs inside of lapic_init() (such as during a
verbose boot) can uglify the output.


167352 09-Mar-2007 mohans

Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.


167277 06-Mar-2007 scottl

Don't increment total_bounced when doing no-op dmamap_sync ops.


167273 06-Mar-2007 jhb

Change the x86 interrupt code to use FreeBSD CPU IDs (i.e. PCPU_GET(cpuid))
rather than local APIC IDs to keep track of CPUs which can handle
interrupts.


167253 05-Mar-2007 jhb

Trim trailing whitespace.


167250 05-Mar-2007 alc

Acquiring smp_ipi_mtx on every call to pmap_invalidate_*() is wasteful.
For example, during a buildworld more than half of the calls do not
generate an IPI because the only TLB entry invalidated is on the calling
processor. This revision pushes down the acquisition and release of
smp_ipi_mtx into smp_tlb_shootdown() and smp_targeted_tlb_shootdown() and
instead uses sched_pin() and sched_unpin() in pmap_invalidate_*() so that
thread migration doesn't lead to a missed TLB invalidation.

Reviewed by: jhb
MFC after: 3 weeks


167247 05-Mar-2007 jhb

Use vm_paddr_t rather than uintptr_t when passing the physical address of
APICs to lapic_init() and ioapic_create().


167240 05-Mar-2007 jhb

Add a simple device driver to "eat" any I/O APICs that show up as PCI
devices.

MFC after: 1 week


166922 23-Feb-2007 jhb

Use ih_filter instead of ih_handler in a couple of places. This fixes
most INTR_FAST handlers on i386.

Reviewed by: piso


166901 23-Feb-2007 piso

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


166825 19-Feb-2007 kib

Unbreak ddb stepping over special frames after the following commit:
Revision Changes Path
1.113 +4 -2 src/sys/i386/i386/apic_vector.s
1.117 +7 -1 src/sys/i386/i386/exception.s
1.36 +7 -7 src/sys/i386/i386/local_apic.c
1.298 +61 -63 src/sys/i386/i386/trap.c
1.62 +15 -22 src/sys/i386/i386/vm86.c
1.32 +4 -2 src/sys/i386/i386/vm86bios.s
1.21 +2 -2 src/sys/i386/include/apicvar.h
1.27 +2 -2 src/sys/i386/isa/atpic.c
1.50 +2 -1 src/sys/i386/isa/atpic_vector.s
1.35 +1 -1 src/sys/i386/isa/icu.h

Tested by: kris, Peter Holm
No objections from: kmacy


166810 18-Feb-2007 alc

Eliminate some acquisitions and releases of the page queues lock that are
no longer necessary.


166776 15-Feb-2007 jhb

Add bootverbose printfs to indicate which IDT vectors are assigned to MSI
interrupts.


166569 08-Feb-2007 jhb

Don't send interrupts to CPUs disabled via lapic hints.

Reported by: Ludger Bolmerg <lbolmerg ! web.de>
MFC after: 3 days
Pointy hat to: jhb


166315 28-Jan-2007 rwatson

As we now have an SFB_NOWAIT flag, change 'will' to 'may' where the
comment for sf_buf_alloc(9) talks about sleeping.


166187 23-Jan-2007 jeff

- Allow the schedulers to IPI_PREEMPT idlethread. This puts the decision
for this behavior on the initiator side.


166186 23-Jan-2007 bde

Cleaned up declaration and initialization of clock_lock. It is only
used by clock code, so don't export it to the world for machdep.c to
initialize. There is a minor problem initializing it before it is
used, since although clock initialization is split up so that parts
of it can be done early, the first part was never done early enough
to actually work. Split it up a bit more and do the first part as
late as possible to document the necessary order. The functions that
implement the split are still bogusly exported.

Cleaned up initialization of the i8254 clock hardware using the new
split. Actually initialize it early enough, and don't work around it
not being initialized in DELAY() when DELAY() is called early for
initialization of some console drivers.

This unfortunately moves a little more code before the early debugger
breakpoint so that it is harder to debug. The ordering of console and
related initialization is delicate because we want to do as little as
possible before the breakpoint, but must initialize a console.


166176 22-Jan-2007 jhb

Expand the MSI/MSI-X API to address some deficiencies in the MSI-X support.
- First off, device drivers really do need to know if they are allocating
MSI or MSI-X messages. MSI requires allocating powerof2() messages for
example where MSI-X does not. To address this, split out the MSI-X
support from pci_msi_count() and pci_alloc_msi() into new driver-visible
functions pci_msix_count() and pci_alloc_msix(). As a result,
pci_msi_count() now just returns a count of the max supported MSI
messages for the device, and pci_alloc_msi() only tries to allocate MSI
messages. To get a count of the max supported MSI-X messages, use
pci_msix_count(). To allocate MSI-X messages, use pci_alloc_msix().
pci_release_msi() still handles both MSI and MSI-X messages, however.
As a result of this change, drivers using the existing API will only
use MSI messages and will no longer try to use MSI-X messages.
- Because MSI-X allows for each message to have its own data and address
values (and thus does not require all of the messages to have their
MD vectors allocated as a group), some devices allow for "sparse" use
of MSI-X message slots. For example, if a device supports 8 messages
but the OS is only able to allocate 2 messages, the device may make the
best use of 2 IRQs if it enables the messages at slots 1 and 4 rather
than default of using the first N slots (or indicies) at 1 and 2. To
support this, add a new pci_remap_msix() function that a driver may call
after a successful pci_alloc_msix() (but before allocating any of the
SYS_RES_IRQ resources) to allow the allocated IRQ resources to be
assigned to different message indices. For example, from the earlier
example, after pci_alloc_msix() returned a value of 2, the driver would
call pci_remap_msix() passing in array of integers { 1, 4 } as the
new message indices to use. The rid's for the SYS_RES_IRQ resources
will always match the message indices. Thus, after the call to
pci_remap_msix() the driver would be able to access the first message
in slot 1 at SYS_RES_IRQ rid 1, and the second message at slot 4 at
SYS_RES_IRQ rid 4. Note that the message slots/indices are 1-based
rather than 0-based so that they will always correspond to the rid
values (SYS_RES_IRQ rid 0 is reserved for the legacy INTx interrupt).
To support this API, a new PCIB_REMAP_MSIX() method was added to the
pcib interface to change the message index for a single IRQ.

Tested by: scottl


166074 17-Jan-2007 delphij

Use FOREACH_PROC_IN_SYSTEM instead of using its unrolled form.


166001 14-Jan-2007 kmacy

exclude the icu and clock lock from LOCK_PROFILING


165947 11-Jan-2007 jhb

Remove magic from rman_activate_resource() that uses the direct map at
KERNBASE for the first 1 MB of RAM instead of calling pmap_mapdev().
pmap_mapdev() knows how to handle the first 1 MB (and has known for a
while now) and properly maps the memory as UC to boot.

MFC after: 2 weeks


165929 11-Jan-2007 jeff

- Use the correct test in the ipi bitmask handler for IPI_PREEMPT so that
we actually issue preemptions.
- Remove the #ifdef IPI_PREEMPTION so it is always compiled in. Leave
the option which optionally enables support in sched_4bsd. sched_ule.c
will soon use this functionality as a run time rather than compile time
option.
- Compare against the idlethread rather than the priority. There are some
idle prio tasks that we can preempt.

Discussed with: ups
Tested on: i386, amd64


165918 09-Jan-2007 jkim

Add SSSE3 extensions and correct CNXT-ID spelling for Intel processors.


165369 20-Dec-2006 davidxu

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


165302 17-Dec-2006 kmacy

Evidently FreeBSD has long relied on the compiler to treat structures
passed by value (trap frames) as if they were in fact being passed by
reference. For better or worse, this incorrect behaviour is no longer
present in gcc 4.1. In this patch I convert all trapframe arguments to
be explicitly pass by reference. I also remove vm86_initflags, pushing
the very little work that it actually does up into vm86_prepcall.

Reviewed by: kan
Tested by: kan


165300 17-Dec-2006 kmacy

vm86_initflags was causing gcc41 and even gcc346 to get rather confused
- de-obfuscate

Suggested by: kan
Reviewed by: kan
Tested by: kan


165260 15-Dec-2006 n_hibma

Align the interfaces for the various watchdogs and make the interface
behave as expected.

Also:
- Return an error if WD_PASSIVE is passed in to the ioctl as only
WD_ACTIVE is implemented at the moment. See sys/watchdog.h for an
explanation of the difference between WD_ACTIVE and WD_PASSIVE.
- Remove the I_HAVE_TOTALLY_LOST_MY_SENSE_OF_HUMOR define. If you've
lost your sense of humor, than don't add a define.

Specific changes:

i80321_wdog.c
Don't roll your own passive watchdog tickle as this would defeat the
purpose of an active (userland) watchdog tickle.

ichwd.c / ipmi.c:
WD_ACTIVE means active patting of the watchdog by a userland process,
not whether the watchdog is active. See sys/watchdog.h.

kern_clock.c:
(software watchdog) Remove a check for WD_ACTIVE as this does not make
sense here. This reverts r1.181.


165128 12-Dec-2006 jhb

Give Host-PCI bridge drivers their own pcib_alloc_msi() and
pcib_alloc_msix() methods instead of using the method from the generic
PCI-PCI bridge driver as the PCI-PCI methods will be gaining some PCI-PCI
specific logic soon.


165125 12-Dec-2006 jhb

Add a function to return the MD interrupt source cookie associated with
an interrupt event. Use this in the x86 code to fixup the intrcnt names
when an interrupt handler is removed.


164951 06-Dec-2006 sobomax

Allow machdep.cpu_idle_hlt to be set from the loader. This should allow
to workaround the problem with SMP kernels on Turion64 X2 processors
described in kern/104678 and may be useful in other situations too.

MFC after: 3 days


164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


164760 30-Nov-2006 jb

Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.


164607 25-Nov-2006 ru

Tweak the comment about mapping a kernel using large pages.


164413 19-Nov-2006 alc

The global variable avail_end is redundant and only used once. Eliminate
it. Make avail_start static to the pmap on amd64. (It no longer exists
on other architectures.)


164362 17-Nov-2006 jhb

- Add macro constants for the various fields in %dr7 and use them in place
of various scattered magic values.
- Pretty print the address of hardware watchpoints in 'show watch' rather
than just displaying hex.
- Expand address field width on amd64 for 64-bit pointers.


164358 17-Nov-2006 jhb

Trim some noise from bootverbose:
- Drop the printf in intr_machdep.c when we assign an interrupt souce to
a CPU. Each source already has a more detailed printf.
- Don't output a line for each ioapic pin showing its initial state, this
has outlived its usefulness.
- When an APIC enumerator sets the bus, polarity, or trigger mode of an
ioapic pin, just return success without printing anything if the new
value matches the current one.

MFC after: 2 weeks


164357 17-Nov-2006 jhb

A few more style fixes.


164332 16-Nov-2006 maxim

o Make pv_maxchunks no less than maxproc. This helps to survive a
forkbomb explosion.

Reviewed by: alc
Security: local DoS
X-MFC atfer: RELENG_6 is not affected due to a different pv_entry
allocation code.


164303 15-Nov-2006 jhb

Various whitespace and style fixes.


164301 15-Nov-2006 jhb

Fix a typo that broke MSI (MSI-X worked fine) in the later revisions of
the MSI patches.


164265 13-Nov-2006 jhb

MD support for PCI Message Signalled Interrupts on amd64 and i386:
- Add a new apic_alloc_vectors() method to the local APIC support code
to allocate N contiguous IDT vectors (aligned on a M >= N boundary).
This function is used to allocate IDT vectors for a group of MSI
messages.
- Add MSI and MSI-X PICs. The PIC code here provides methods to manage
edge-triggered MSI messages as x86 interrupt sources. In addition to
the PIC methods, msi.c also includes methods to allocate and release
MSI and MSI-X messages. For x86, we allow for up to 128 different
MSI IRQs starting at IRQ 256 (IRQs 0-15 are reserved for ISA IRQs,
16-254 for APIC PCI IRQs, and IRQ 255 is reserved).
- Add pcib_(alloc|release)_msi[x]() methods to the MD x86 PCI bridge
drivers to bubble the request up to the nexus driver.
- Add pcib_(alloc|release)_msi[x]() methods to the x86 nexus drivers that
ask the MSI PIC code to allocate resources and IDT vectors.

MFC after: 2 months


164229 12-Nov-2006 alc

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


164078 07-Nov-2006 ru

Spelling.


164065 07-Nov-2006 jhb

Remove old XXX comment about possibly adding a print_Intel_info() function
to dump CPUID level=2 stuff. A print_INTEL_info() function that does just
that was added a while ago.


164064 07-Nov-2006 jhb

Remove duplicate IDTVEC macro definition, it's already defined in
<machine/intr_machdep.h>.


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


163858 01-Nov-2006 jb

Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by: scottl, jhb
MFC after: 2 weeks


163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


163603 22-Oct-2006 alc

Eliminate unnecessary PG_BUSY tests.


163534 20-Oct-2006 bde

Don't show debug registers in "show registers". Special registers should
be displayed specially, and debug registers are among of the least
interesting special registers (far behind %cr3). The debug registers
are still accessible as variables and displayed in another bogus place
("show watches").


163449 17-Oct-2006 davidxu

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


163219 10-Oct-2006 jhb

Change the x86 interrupt code to suspend/resume interrupt controllers
(PICs) rather than interrupt sources. This allows interrupt controllers
with no interrupt pics (such as the 8259As when APIC is in use) to
participate in suspend/resume.
- Always register the 8259A PICs even if we don't use any of their pins.
- Explicitly reset the 8259As on resume on amd64 if 'device atpic' isn't
included.
- Add a "dummy" PIC for the local APIC on the BSP to reset the local APIC
on resume. This gets suspend/resume working with APIC on UP systems.
SMP still needs more work to bring the APs back to life.

The MFC after is tentative.

Tested by: anholt (i386)
Submitted by: Andrea Bittau <a.bittau at cs.ucl.ac.uk> (3)
MFC after: 1 week


162958 02-Oct-2006 phk

Second part of a little cleanup in the calendar/timezone/RTC handling.

Split subr_clock.c in two parts (by repo-copy):
subr_clock.c contains generic RTC and calendaric stuff. etc.
subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c. They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162839 30-Sep-2006 phk

Remove the no longer relevant or correct bootinfo sysctls.


162713 27-Sep-2006 sobomax

Extend comment explaining why code is conditional at !defined(SCHED_ULE).

Suggested by: ru


162708 27-Sep-2006 sobomax

Since ULE doesn't honor hlt_cpus_mask don't compile code that prevents
timer interrupt servicing for disabled HTT cores in ULE case. Should be
probably fixed in ULE code instead, but we have no real maintainer for
ULE to do it.

PR: 103697


162673 26-Sep-2006 scottl

The need to run a filter also implies that bouncing could be possible, so
just use the COULD_BOUNCE flag for both and retire the USE_FILTER flag.
This fixes the problem that rev 1.81 introduced with the if_bfe driver
(and possibly others).


162607 24-Sep-2006 imp

Add a newline to the printf.


162275 13-Sep-2006 scottl

Remove duplicated code. Declare functions non-static that shouldn't be
inlined.


162233 11-Sep-2006 jhb

Add a new ddb command 'show lapic' to dump details about the local APIC
registers for the current CPU.

MFC after: 3 days


162232 11-Sep-2006 jhb

Actually hook up the IPI_INVLCACHE IDT vectors backing
pmap_invalidate_cache() in the SMP case so pmap_mapdev() in multiuser
doesn't panic with a trap 30. I broke this many months ago when I
added pmap_invalidate_cache() as early parts of the PAT work.

Patience from: jmg
Pointy hat: jhb


162224 11-Sep-2006 jhb

- Fix rman_manage_region() to be a lot more intelligent. It now checks
for overlaps, but more importantly, it collapses adjacent free regions.
This is needed to cope with BIOSen that split up ports for system devices
(like IPMI controllers) across multiple system resource entries.
- Now that rman_manage_region() is not so dumb, remove extra logic in the
x86 nexus drivers to populate the IRQ rman that manually coalesced the
regions.

MFC after: 1 week


162211 11-Sep-2006 scottl

The run_filter() procedure is a means of working around DMA engine bugs in
old/broken hardware. Unfortunately, it adds cache pressure and possible
mispredicted branches to the fast path of the bus_dmamap_load collection of
functions. Since it's meant for slow path exception processing, de-inline
it and allow its conditions to be pre-computed at tag_create time and thus
short-circuited at runtime.

While here, cut down on the size of _bus_dmamap_load_buffer() by pushing the
bounce page logic into a non-inlined function. Again, this helps with
cache pressure and mispredicted branches.

According to the TSC, this shaves off a few cycles on average. Unfortunately,
the data varies quite a bit due to interrupts and preemption, so it's hard to
get a good measurement. Real world measurements of network PPS are welcomed.
A merge to amd64 and other arches is pending more testing.


162175 09-Sep-2006 rwatson

Audit sysarch() operation argument.

MFC after: 3 days


162112 07-Sep-2006 jhb

Use a single constant to define the sizes of the physmap[], phys_avail[],
and dump_avail[] arrays so they are in sync (previously it was possible
to store more entries in the physmap[] then we could store in phys_avail[],
which was pointless). While I'm here, bump up the length of these tables
to hold 30 entries on amd64 and 16 on i386. This allows machines with
fairly fragmented memory maps to boot ok (at least one machine would
not boot FreeBSD/i386 but would boot FreeBSD/amd64 because amd64 allowed
for more fragments).

MFC after: 3 days


162087 06-Sep-2006 sobomax

Unbreak in the case when device apic is compiled into non-SMP kernel.

Reported by: jhay
MFC after: 2 weeks


162042 05-Sep-2006 sobomax

The FreeBSD by default "disables" hyper-threading cores, by not scheduling
any threads to them. However, it still counts those cores as "active but
permanently idle" when calculating system-wide CPUs statistics. It is
incorrect, since it skews statistics quite a bit and creates real problems
for certain types of applications (monitoring applications for example),
by making them believe that the system does have enough idle CPU resources,
while in fact it does not.

Correct the problem by not calling performance counting routines on "disabled"
cores. The cleaner solution would be to just disable APIC timer interrupts on
those cores completely, but ENOTIME here and it is not clear if the
additional complexity really worth minor performance gain.

Reviewed by: ssouhlal
Sponsored by: Sippy Software, Inc.
MFC after: 2 weeks


161675 28-Aug-2006 davidxu

Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.


161318 15-Aug-2006 netchild

Remove the include of opt_global.h. It's included globally by a command
line switch. Other files which may make the same mistake (according to
fxr.watson.org) but aren't fixed in this commit (people with more clue
about those files should fix this):
- i386/xbox/xbox.c
- arm/arm/elf_trampoline.c
- arm/arm/mem.c

Noticed by: cognet


161314 15-Aug-2006 netchild

Add include of opt_global.h, else the futex operations aren't locked on
SMP systems.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


161303 15-Aug-2006 netchild

- Add some ASM stuff needed for futexes (linuxolator).

Sponsored by: Google SoC 2006
Submitted by: rdivacky
With help from: kib


161292 14-Aug-2006 alc

Eliminate an unnecessary initialization from trap_pfault() that also
happens to contain a style error.


161285 14-Aug-2006 jhb

Don't try to preserve PAT bits in pmap_enter(). We currently on pages that
aren't mapped via pmap_enter() (KVA). We will eventually support PAT bits
on user pages, but those will require some sort of MI caching mode stored
in the vm_page.

Reviewed by: alc


161223 11-Aug-2006 jhb

First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR). Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
additional parameter (relative to pmap_mapdev()) specifying the cache
mode for this mapping. Note that on amd64 only WB mappings are done with
the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
mappings rather than WB. Previously we relied on the BIOS setting up
MTRR's to enforce memio regions being treated as UC. This might make
hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
that used pmap_mapdev() to map non-device memory (such as ACPI tables)
to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
caching mode for a range of KVA.

Reviewed by: alc


161140 09-Aug-2006 imp

Minor style(9) nit.


161016 06-Aug-2006 alc

Eliminate the acquisition and release of the page queues lock around a call
to vm_page_sleep_if_busy().


160970 04-Aug-2006 mr

Dont overwrite cpu_model in the case of Via's C3-CPU.

Noticed by: Mike Tancsa
MFC after: 2 days


160964 04-Aug-2006 yar

Commit the results of the typo hunt by Darren Pilgrim.
This change affects documentation and comments only,
no real code involved.

PR: misc/101245
Submitted by: Darren Pilgrim <darren pilgrim bitfreak org>
Tested by: md5(1)
MFC after: 1 week


160889 01-Aug-2006 alc

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


160869 01-Aug-2006 obrien

Correct spelling of 3DNow!.


160801 28-Jul-2006 jhb

Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.


160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


160773 27-Jul-2006 jhb

Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
section.
- Add a new explicit check before userret() to make sure we don't return
with any locks held. The advantage here is that we can include the
syscall number and name in syscall() whereas that info is not available
in userret().
- Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by
the more general checks just added.

MFC after: 2 weeks


160772 27-Jul-2006 jhb

Argh, fix compile with XBOX enabled. Somehow I missed a LINT compile. :(


160763 27-Jul-2006 jhb

Don't allow MAXMEM or hw.physmem to extend the top of memory if our memory
map was obtained from the SMAP. SMAP is trustworthy, and the memory
extending feature is a band-aid for older systems where FreeBSD's methods
of detecting memory were not always trustworthy. This fixes the issue
where using hw.physmem could result in the ACPI tables getting trashed
breaking ACPI.

MFC after: 3 days
Tested on: i386


160525 20-Jul-2006 alc

Add pmap_clear_write() to the interface between the virtual memory
system's machine-dependent and machine-independent layers. Once
pmap_clear_write() is implemented on all of our supported
architectures, I intend to replace all calls to pmap_page_protect() by
calls to pmap_clear_write(). Why? Both the use and implementation of
pmap_page_protect() in our virtual memory system has subtle errors,
specifically, the management of execute permission is broken on some
architectures. The "prot" argument to pmap_page_protect() should
behave differently from the "prot" argument to other pmap functions.
Instead of meaning, "give the specified access rights to all of the
physical page's mappings," it means "don't take away the specified
access rights from all of the physical page's mappings, but do take
away the ones that aren't specified." However, owing to our i386
legacy, i.e., no support for no-execute rights, all but one invocation
of pmap_page_protect() specifies VM_PROT_READ only, when the intent
is, in fact, to remove only write permission. Consequently, a
faithful implementation of pmap_page_protect(), e.g., ia64, would
remove execute permission as well as write permission. On the other
hand, some architectures that support execute permission have
basically ignored whether or not VM_PROT_EXECUTE is passed to
pmap_page_protect(), e.g., amd64 and sparc64. This change represents
the first step in replacing pmap_page_protect() by the less subtle
pmap_clear_write() that is already implemented on amd64, i386, and
sparc64.

Discussed with: grehan@ and marcel@


160461 18-Jul-2006 alc

MFamd64
pmap_clear_ptes() is already convoluted. This will worsen with the
implementation of superpages. Eliminate it and add pmap_clear_write().

There are no functional changes. Checked by: md5


160419 17-Jul-2006 alc

Now that free_pv_entry() accesses the pmap, call free_pv_entry() in
pmap_remove_all() before rather than after the pmap is unlocked. At
present, the page queues lock provides sufficient sychronization. In the
future, the page queues lock may not always be held when free_pv_entry() is
called.


160412 16-Jul-2006 alc

MFamd64
Make three simplifications to pmap_ts_referenced():
Eliminate an initialized but otherwise unused variable.
Eliminate an unnecessary test.
Exit the loop in a shorter way.


160408 16-Jul-2006 alc

Eliminate the remaining uses of "register".

Convert the remaining K&R-style function declarations to ANSI-style.

Eliminate excessive white space from pmap_ts_referenced().


160379 15-Jul-2006 alc

Make pc_freemask an array of uint32_t, rather than uint64_t. (I believe
that the use of the latter is simply an oversight in porting the new pv
entry code from amd64.)


160312 12-Jul-2006 jhb

Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.


160309 12-Jul-2006 mr

Initialise (if necessary) the VIA C3/C7 features.
Store the capabilities for further use by random(4), padlock(4), ...

Obtained from: mostly OpenBSD
MFC after: 1 week


160305 12-Jul-2006 mr

fix typo in identcpu.c and add one define to specialreg.h.

MFC after: 1 week


160298 12-Jul-2006 mr

First step to identify and initialize the newer VIA C7 CPU
as found in a VIA EPIA EN-15000 board.

Obtained from: large parts from OpenBSD


160286 12-Jul-2006 jkim

Add two new CPUID bits for AMD CPUs, i. e., SVM and extended APIC register.


160073 02-Jul-2006 alc

Correct an error in the new pmap_collect(), thus only affecting HEAD.
Specifically, the pv entry was always being freed to the caller's pmap
instead of the pmap to which the pv entry belongs.


159970 27-Jun-2006 alc

Correct a very old and very obscure bug: vmspace_fork() calls
pmap_copy() if the mapping is VM_INHERIT_SHARE. Suppose the mapping
is also wired. vmspace_fork() clears the wiring attributes in the vm
map entry but pmap_copy() copies the PG_W attribute in the PTE. I
don't think this is catastrophic. It blocks pmap_remove_pages() from
destroying the mapping and corrupts the pmap's wiring count.

This revision fixes the problem by changing pmap_copy() to clear the
PG_W attribute.

Reviewed by: tegge@


159933 25-Jun-2006 alc

Eliminate a comment that became stale after revision 1.547.


159803 20-Jun-2006 alc

Change get_pv_entry() such that the call to vm_page_alloc() specifies
VM_ALLOC_NORMAL instead of VM_ALLOC_SYSTEM when try is TRUE. In other
words, when get_pv_entry() is permitted to fail, it no longer tries as
hard to allocate a page.

Change pmap_enter_quick_locked() to fail rather than wait if it is
unable to allocate a page table page. This prevents a race between
pmap_enter_object() and the page daemon. Specifically, an inactive
page that is a successor to the page that was given to
pmap_enter_quick_locked() might become a cache page while
pmap_enter_quick_locked() waits and later pmap_enter_object() maps
the cache page violating the invariant that cache pages are never
mapped. Similarly, change
pmap_enter_quick_locked() to call pmap_try_insert_pv_entry() rather
than pmap_insert_entry(). Generally speaking,
pmap_enter_quick_locked() is used to create speculative mappings. So,
it should not try hard to allocate memory if free memory is scarce.

Add an assertion that the object containing m_start is locked in
pmap_enter_object(). Remove a similar assertion from
pmap_enter_quick_locked() because that function no longer accesses the
containing object.

Remove a stale comment.

Reviewed by: ups@


159790 20-Jun-2006 yar

We no longer need to disable interrupts in MD trap machinery
when we're about to call kdb_trap() because the latter MI
function can disable interrupts by itself now.

Pointed out by: bde
X-MFC remark: depends on kern/subr_kdb.c#1.18
Sponsored by: RiNet (Cronyx Plus LLC)


159763 19-Jun-2006 davidxu

Clear bit 22 in MSR IA32_MISC_ENABLE, according to Intel document,
when the bit 22 is set to 1, CPUID with EAX=0 returns a maximum
value in EAX[7..0] of 3, when set to 0(default), CPUID with EAX=0
returns the number corresponding to the maximum standard function
supported. On my machine, BIOS sets the bit to 1 to make it to be
compatible with old OS, this causes dual-core Pentium-D (two
physical cores) to be identified as hyperthreading (two logical
cores) by function mp_topology().


159724 18-Jun-2006 yar

Fix style while I'm here.


159723 18-Jun-2006 yar

The i386 "call" instruction works as follows: it pushes
the return address on the stack and only then "dereferences" %pc.
Therefore, in the case of a call to an invalid address, we arrive
to the trap handler with the invalid value in tf_eip. This used
to prevent db_backtrace() from assigning the most recent and interesting
frame on the stack to the right spot in the right function, from
which the invalid call was attempted.

Try to detect and work around that by recovering the return address
from the stack.

The work-around requires the fault address be passed to db_backtrace().
Smuggle it as tf_err.

MFC after: 1 month
Sponsored by: RiNet (Cronyx Plus LLC)


159659 16-Jun-2006 yar

Return -1 from db_numargs() if number of args couldn't be guessed.
Use this later to indicate in backtrace output that args shown are
uncertain.

Sponsored by: RiNet (Cronyx Plus LLC)


159657 16-Jun-2006 yar

Guess the number of arguments to a function somewhat better.
Now GCC likes to stick a "mov %eax, %FOO" instruction before
"addl $BAR, %esp" if the function just called returns an int,
which is a very common case in the kernel.

Sponsored by: RiNet (Cronyx Plus LLC)


159627 15-Jun-2006 ups

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


159546 12-Jun-2006 alc

Don't invalidate the TLB in pmap_qenter() unless the old mapping was valid.
Most often, it isn't.

Reviewed by: tegge@


159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


159293 05-Jun-2006 emaste

Fix cut-n-pasteo: use the i386 version #define for i386 dumps, not the amd64 one.


159244 05-Jun-2006 alc

MFamd64
Eliminate unnecessary, recursive acquisitions and releases of the page
queues lock by free_pv_entry() and pmap_remove_pages().

Reduce the scope of the page queues lock in pmap_remove_pages().


159130 01-Jun-2006 silby

After much discussion with mjacob and scottl, change bus_dmamem_alloc so
that it just warns the user with a printf when it misaligns a piece
of memory that was requested through a busdma tag.

Some drivers (such as mpt, and probably others) were asking for alignments
that could not be satisfied, but as far as driver operation was concerned,
that did not matter. In the theory that other drivers will fall into
this same category, we agreed that panicing or making the allocation
fail will cause more hardship than is necessary. The printf should
be sufficient motivation to get the driver glitch fixed.


159092 31-May-2006 mjacob

Turn the panic on not being able to meet alignment constraints
in bus_dmamem_alloc into the more reasonable EINVAL return.

Also, reclaim memory allocated but then not used if we had
an error return.


159089 31-May-2006 davidxu

Clear invalid bits only if CPU supports SSE, otherwise, some fields in
struct save87 will be cleared unexpectly.


159087 30-May-2006 davidxu

Use the method described in IA-32 Intel Architecture Software Developer's
Manual chapter 11.6.6 to get valid mxcsr bits, use the mxcsr mask to clear
invalid bits passed by user code.

Reviewed by: bde


159027 29-May-2006 davidxu

Backout changes trying to inherit floating-point environment, although
POSIX (susv3) requires this, but it is unclear what should be inherited,
duplicating whole 387 stack for new thread seems to be unnecessary and
dangerous. Revert to previous code, force a new thread to be started with
clean FP state.


159011 28-May-2006 silby

Add a quick hack to ensure that bus_dmamem_alloc properly aligns
small allocations with large alignment requirements.

Add a panic to detect cases where we've still failed to properly align.


159000 28-May-2006 davidxu

Clear high 16 bits of mxcsr register, according to Intel document, if
the high 16 bits is non-zero, fxrstor instruction will generate GP fault,
resulting kernel crash, this bug can be triggered by setcontext and
ptrace(PT_SETXMMREGS).


158997 28-May-2006 davidxu

PCB_NPXINITDONE is cleared by npx_fork_thread.


158995 28-May-2006 davidxu

When creating a new thread, inherit floating-point environment from
current thread, this is required by POSIX pthread_create document.


158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


158445 11-May-2006 phk

Clean out sysctl machdep.* related defines.

The cmos clock related stuff should really be in MI code.


158264 03-May-2006 scottl

Allow bus_dmamap_load() to pass ENOMEM back to the caller. This puts it into
conformance with the mbuf and uio load routines. ENOMEM can only happen
with BUS_DMA_NOWAIT is passed in, thus the deferals are disabled. I don't
like doing this, but fixing this fixes assumptions in other important drivers,
which is a net benefit for now.


158238 01-May-2006 jhb

Add various constants for the PAT MSR and the PAT PTE and PDE flags.
Initialize the PAT MSR during boot to map PAT type 2 to Write-Combining
(WC) instead of Uncached (UC-).

MFC after: 1 month


158236 01-May-2006 jhb

Add a new 'pmap_invalidate_cache()' to flush the CPU caches via the
wbinvd() instruction. This includes a new IPI so that all CPU caches on
all CPUs are flushed for the SMP case.

MFC after: 1 month


158235 01-May-2006 peter

Using an idea from Stephan Uphoff, use the empty pte's that correspond
to the unused kva in the pv memory block to thread a freelist through.
This allows us to free pages that used to be used for pv entry chunks
since we can now track holes in the kva memory block.

Idea from: ups


158226 01-May-2006 peter

Fix missing changes required for the amd64->i386 conversion. Add the
missing VM_ALLOC_WIRED flags to vm_page_alloc() calls I added.

Submitted by: alc


158120 28-Apr-2006 peter

Interim fix for pmap problems I introduced with my last commit.
Remove the code to dyanmically change the pv_entry limits. Go back
to a single fixed kva reservation for pv entries, like was done
before when using the uma zone. Go back to never freeing pages
back to the free pool after they are no longer used, just like
before.

This stops the lock order reversal due to aquiring the kernel map
lock while pmap was locked.

This fixes the recursive panic if invariants are enabled.

The problem was that allocating/freeing kva causes vm_map_entry
nodes to be allocated/freed. That can recurse back into pmap as
new pages are hooked up to kvm and hence all the problem.
Allocating/freeing kva indirectly allocate/frees memory.

So, by going back to a single fixed size kva block and an index,
we avoid the recursion panics and the LOR.

The problem is that now with a linear block of kva, we have no
mechanism to track holes once pages are freed. UMA has the same
problem when using custom object for a zone and a fixed reservation
of kva. Simple solutions like having a bitmap would work, but would
be very inefficient when there are hundreds of thousands of bits
in the map. A first-free pointer is similarly flawed because pages
can be freed at random and the first-free pointer would be rewinding
huge amounts. If we could allocate memory for tree strucures or
an external freelist, that would work. Except we cannot allocate/free
memory here because we cannot allocate/free address space to use
it in. Anyway, my change here reverts back to the UMA behavior of
not freeing pages for now, thereby avoiding holes in the map.

ups@ had a truely evil idea that I'll investigate. It should allow
freeing unused pages again by giving us a no-cost way to track the
holes in the kva block. But in the meantime, this should get people
booting with witness and/or invariants again.

Footnote: amd64 doesn't have this problem because of the direct map
access method. I'd done all my witness/invariants testing there. I'd
never considered that the harmless-looking kmem_alloc/kmem_free calls
would cause such a problem and it didn't show up on the boot test.


158097 28-Apr-2006 sobomax

Unbreak pc98. Sorry...


158088 27-Apr-2006 alc

In general, bits in the page directory entry (PDE) and the page table
entry (PTE) have the same meaning. The exception to this rule is the
eighth bit (0x080). It is the PS bit in a PDE and the PAT bit in a
PTE. This change avoids the possibility that pmap_enter() confuses a
PAT bit with a PS bit, avoiding a panic().

Eliminate a diagnostic printf() from the i386 pmap_enter() that serves
no current purpose, i.e., I've seen no bug reports in the last two
years that are helped by this printf().

Reviewed by: jhb


158068 27-Apr-2006 sobomax

In the case when reset via keyboard controller doesn't work for some reason
(i.e. no keyboard controller present), try two other common methods for
resetting i386 machine - pci reset and port 0x92 fast reset. Only if neither
works warn user and resort to "unmap entire address space and hope for good"
hack. This makes my MacBook Pro rebooting just fine and should also help
other legacy-free hardware out there.

Also, disable interrupts unconditionally in cpu_reset_real(), since we don't
want any interference.

MFC after: 1 week


158067 27-Apr-2006 delphij

Fix build on i386


158060 26-Apr-2006 peter

MFamd64: shrink pv entries from 24 bytes to about 12 bytes. (336 pv entries
per page = effectively 12.19 bytes per pv entry after overheads).
Instead of using a shared UMA zone for 24 byte pv entries (two 8-byte tailq
nodes, a 4 byte pointer, and a 4 byte address), we allocate a page at a
time per process. This provides 336 pv entries per process (actually, per
pmap address space) and eliminates one of the 8-byte tailq entries since
we now can track per-process pv entries implicitly. The pointer to
the pmap can be eliminated by doing address arithmetic to find the metadata
on the page headers to find a single pointer shared by all 336 entries.
There is an 11-int bitmap for the freelist of those 336 entries.

This is mostly a mechanical conversion from amd64, except:
* i386 has to allocate kvm and map the pages, amd64 has them outside of kvm
* native word size is smaller, so bitmaps etc become 32 bit instead of 64
* no dump_add_page() etc stuff because they are in kvm always.
* various pmap internals tweaks because pmap uses direct map on amd64 but
on i386 it has to use sched_pin and temporary mappings.

Also, sysctl vm.pmap.pv_entry_max and vm.pmap.shpgperproc are now
dynamic sysctls. Like on amd64, i386 can now tune the pv entry limits
without a recompile or reboot.

This is important because of the following scenario. If you have a 1GB
file (262144 pages) mmap()ed into 50 processes, that requires 13 million
pv entries. At 24 bytes per pv entry, that is 314MB of ram and kvm, while
at 12 bytes it is 157MB. A 157MB saving is significant.

Test-run by: scottl (Thanks!)


158007 25-Apr-2006 jkim

Check if reported HTT cores are physical cores. This commit does not
affect AMD CPUs at all because HTT bit is disabled earlier. Intel
multicore CPUs and ULE scheduler may be affected.


158004 24-Apr-2006 jkim

Add another Intel CPU feature flag, xTPR (Send Task Priority Messages).


158003 24-Apr-2006 jkim

Check if deterministic cache parameters leaf is valid before use.


158000 24-Apr-2006 cperciva

Adjust dangerous-shared-cache-detection logic from "all shared data
caches are dangerous" to "a shared L1 data cache is dangerous". This
is a compromise between paranoia and performance: Unlike the L1 cache,
nobody has publicly demonstrated a cryptographic side channel which
exploits the L2 cache -- this is harder due to the larger size, lower
bandwidth, and greater associativity -- and prohibiting shared L2
caches turns Intel Core Duo processors into Intel Core Solo processors.

As before, the 'machdep.hyperthreading_allowed' sysctl will allow even
the L1 data cache to be shared.

Discussed with: jhb, scottl
Security: See FreeBSD-SA-05:09.htt for background material.


157909 21-Apr-2006 peter

Merge minidumps from amd64 where they were originally developed.

Major differences:
* since there is no direct map region, there is no custom uma memory
allocator to modify to include its pages in the dumps.
* Various data entries are reduced from 64 bit to 32 bit to match the
native size.

dump_add_page() and dump_drop_page() are still present in case one wants to
arrange for arbitary pages to be dumped. This is of marginal use though
because libkvm+kgdb cannot address physical memory that isn't mapped into
kvm.


157890 20-Apr-2006 imp

Set the rid of the resource we're about to return to the user.


157680 12-Apr-2006 alc

Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.

Reviewed by: tegge


157568 06-Apr-2006 jhb

- Don't set CR0_NE and CR0_MP in npx_probe() as they are already set
earlier in cpu_setregs().
- If we know this CPU has a FPU via cpuid, then just assume the INT16
interface and make the npx device quiet to not clutter the dmesg. This
is true for all Pentium and later CPUs and even some of the later 486dx
CPUs.

Reviewed by: bde
Tested by: ps
MFC after: 1 week


157541 05-Apr-2006 jhb

Cache the value of the lower half of each I/O APIC redirection table entry
so that we only have to do an ioapic_write() instead of an ioapic_read()
followed by an ioapic_write() every time we mask and unmask level triggered
interrupts. This cuts the execution time for these operations roughly in
half.

Profiled by: Paolo Pisati <p.pisati@oltrelinux.com>
MFC after: 1 week


157453 04-Apr-2006 jkoshy

Freshen a comment.

Reviewed by: jhb


157443 03-Apr-2006 peter

Remove the unused sva and eva arguments from pmap_remove_pages().


157394 02-Apr-2006 alc

Introduce pmap_try_insert_pv_entry(), a function that conditionally creates
a pv entry if the number of entries is below the high water mark for pv
entries.

Use pmap_try_insert_pv_entry() in pmap_copy() instead of
pmap_insert_entry(). This avoids possible recursion on a pmap lock in
get_pv_entry().

Eliminate the explicit low-memory checks in pmap_copy(). The check that
the number of pv entries was below the high water mark was largely
ineffective because it was located in the outer loop rather than the
inner loop where pv entries were allocated. Instead of checking, we
attempt the allocation and handle the failure.

Reviewed by: tegge
Reported by: kris
MFC after: 5 days


156963 21-Mar-2006 alc

Eliminate unnecessary invalidations of the entire TLB by pmap_remove().
Specifically, on mappings with PG_G set pmap_remove() not only performs
the necessary per-page invlpg invalidations but also performs an
unnecessary invalidation of the entire set of non-PG_G entries.

Reviewed by: tegge


156930 21-Mar-2006 davidxu

Remove stale KSE code.

Reviewed by: alc


156920 20-Mar-2006 jhb

Drop some unneeded casts since we program the kernel in C rather than C++.


156706 14-Mar-2006 jhb

Don't allow userland to set hardware watch points on kernel memory at all.
Previously, we tried to allow this only for root. However, we were calling
suser() on the *target* process rather than the current process. This
means that if you can ptrace() a process running as root you can set a
hardware watch point in the kernel. In practice I think you probably have
to be root in order to pass the p_candebug() checks in ptrace() to attach
to a process running as root anyway. Rather than fix the suser(), I just
axed the entire idea, as I can't think of any good reason _at all_ for
userland to set hardware watch points for KVM.

MFC after: 3 days
Also thinks hardware watch points on KVM from userland are bad: bde, rwatson


156523 10-Mar-2006 davidxu

It is not necessary to read %gs twice.


156522 10-Mar-2006 davidxu

Fix stack offset to allow gcc's stack aligment code to work correctly.

MFC after: 3 days


156504 09-Mar-2006 jhb

Flip the switch and don't route interrupts to hyperthreads in a HT system.
In at least one benchmark this showed around a 20% performance increase.
If other workloads do benefit from having hyperthreads service interrupts,
we can always make this a loader tunable.

MFC after: 3 days
Tested by: ps


156335 06-Mar-2006 phk

Improve the advantech watchdog.


156254 03-Mar-2006 netchild

- use a more common style to print memory sizes
- add some more cache sizes (2nd and 3rd level) [1]

Submitted by: HATANOU Tomomi <hatanou@infolab.ne.jp> [1]
PR: 91328 [1]


156124 28-Feb-2006 jhb

Rework how we wire up interrupt sources to CPUs:
- Throw out all of the logical APIC ID stuff. The Intel docs are somewhat
ambiguous, but it seems that the "flat" cluster model we are currently
using is only supported on Pentium and P6 family CPUs. The other
"hierarchy" cluster model that is supported on all Intel CPUs with
local APICs is severely underdocumented. For example, it's not clear
if the OS needs to glean the topology of the APIC hierarchy from
somewhere (neither ACPI nor MP Table include it) and setup the logical
clusters based on the physical hierarchy or not. Not only that, but on
certain Intel chipsets, even though there were 4 CPUs in a logical
cluster, all the interrupts were only sent to one CPU anyway.
- We now bind interrupts to individual CPUs using physical addressing via
the local APIC IDs. This code has also moved out of the ioapic PIC
driver and into the common interrupt source code so that it can be
shared with MSI interrupt sources since MSI is addressed to APICs the
same way that I/O APIC pins are.
- Interrupt source classes grow a new method pic_assign_cpu() to bind an
interrupt source to a specific local APIC ID.
- The SMP code now tells the interrupt code which CPUs are avaiable to
handle interrupts in a simpler and more intuitive manner. For one thing,
it means we could now choose to not route interrupts to HT cores if we
wanted to (this code is currently in place in fact, but under an #if 0
for now).
- For now we simply do static round-robin of IRQs to CPUs when the first
interrupt handler just as before, with the change that IRQs are now
bound to individual CPUs rather than groups of up to 4 CPUs.
- Because the IRQ to CPU mapping has now been moved up a layer, it would
be easier to manage this mapping from higher levels. For example, we
could allow drivers to specify a CPU affinity map for their interrupts,
or we could allow a userland tool to bind IRQs to specific CPUs.

The MFC is tentative, but I want to see if this fixes problems some folks
had with UP APIC kernels on 6.0 on SMP machines (an SMP kernel would work
fine, but a UP APIC kernel (such as GENERIC in RELENG_6) would lose
interrupts).

MFC after: 1 week


155771 16-Feb-2006 tegge

Rounding addr upwards to next 4M or 2M boundary in pmap_growkernel() could
cause addr to become 0, resulting in an early return without populating
the last PDE.

Reviewed by: alc


155720 15-Feb-2006 dwmalone

It seems bit 5 of cpu_feature2 is the VMX (Virtual Machine Extensions)
bit. While I'm here, delete a comment that was cut and past from the
cpu_features code that doesn't belong here.


155534 11-Feb-2006 phk

CPU time accounting speedup (step 2)

Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64


155510 10-Feb-2006 rink

Cleaned the memory initialization up, moved some defines from the framebuffer
to an include file.

Reviewed by: imp
Approved by: imp (mentor)


155466 09-Feb-2006 yar

Avoid calling CPUID function 0x02 if the CPU reports no support for
it. The former code used to hang older Intel CPUs by trying to get
non-existent TLB info 2^32 times.

Reduce code duplication around the calls to CPUID 0x02 by using
do-while loops.

PR: i386/92977
Tested by: cy


155455 08-Feb-2006 phk

Simplify system time accounting for profiling.

Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.


155444 07-Feb-2006 phk

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


155299 04-Feb-2006 wsalamon

Hook up the audit system to system call entry and exit. System calls will
now be audited.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)


155238 03-Feb-2006 davidxu

Clear carry flag in get_mcontext so that setcontext does not
return a bogus error.

PR: misc/92110


155219 02-Feb-2006 davidxu

Under verbose mode, correctly report L2 cache information
for CPU which supports CPUID function 8000_0006h.

Tested on: Pentum-M 750


155218 02-Feb-2006 davidxu

Fix bug in L2 cache size detection code for CPU which supports CPUID
function 8000_0006h.

Tested on: Pentum-M 750


155203 02-Feb-2006 davidxu

Correctly report L2 cache size according to its code comment.
Tested on my Dual PIII machine.


154935 27-Jan-2006 jhb

Call WITNESS_CHECK() in the page fault handler and immediately assume it
is a fatal fault if we are holding any non-sleepable locks. This should
cut down on the number of bogus LORs we currently get when the kernel
panics due to a NULL (or bogus) pointer dereference that goes wandering
off into the VM system which tries to acquire locks and then kicks off
the spurious LORs. This should probably be ported to all the archs at
some point.

Tested on: i386


154721 23-Jan-2006 ups

Fix race conditions.

Tested by: kris@
MFC after: 3 days


154502 18-Jan-2006 davidxu

Eliminate a stale instruction introduced in revision 1.136.


154367 14-Jan-2006 scottl

Free the newtag if we exit with a failure from alloc_bounce_zone().

Found by: Coverity Prevent(tm)


154079 06-Jan-2006 jhb

- Make pcib_devclass private to sys/dev/pci/pci_pci.c and change all the
various pcib drivers to use their own private devclass_t variables for
their modules.
- Use the DEFINE_CLASS_0() macro to declare drivers for the various pcib
drivers while I'm here.


154074 06-Jan-2006 jhb

Fix various places that were testing td_critnest to see if interrupts
should remain disabled during a trap or not to check
td_md.md_spinlock_count instead.


154039 04-Jan-2006 netchild

We don't support I386_CPU in 6.0 and later. This file can be cleaned
up some to assume that '#if defined(I486_CPU) || defined(I586_CPU) ||
defined(I686_CPU)' is true.

Suggested by: jhb
Reviewed by: jhb


154022 04-Jan-2006 netchild

- Make sure the cpu_exthigh variable is initialized (page coloring case). [1]
- Remove a conditional in the AMD cache detection, it's always false. [2]
- Don't try to detect a cache if only compiled for i386.

Analyzed by: Antoine Brodin <antoine.brodin@laposte.net> [1]
Submitted by: Antoine Brodin <antoine.brodin@laposte.net> [2]


153995 03-Jan-2006 jkim

- Explicitly validate an empty filter to match bpf_filter() comment[1].
- Do not use BPF JIT compiler for an empty filter.

[1] Pointed out by: darrenr


153940 31-Dec-2005 netchild

MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)

MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)


153836 29-Dec-2005 davidxu

Remove pcb_switchout, it has not been used for a long time.


153741 26-Dec-2005 sobomax

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


153726 26-Dec-2005 davidxu

Move global variable private_tss into per-cpu area.

Reviewed by: jhb


153694 23-Dec-2005 jeff

- Improve the INKERNEL macro such that it can no longer give false positives.
This fixes the stack(9) functionality.

Submitted by: Antoine Brodin <antoine.brodin@laposte.net>


153666 22-Dec-2005 jhb

Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)


153426 14-Dec-2005 jhb

Fix stale comment.


153383 13-Dec-2005 jhb

Revert previous commit. The BIOS braindamage is even worse than I
originally thought. The BIOS that cleared CPUID_APIC actually managed
to disable the local APIC entirely and even Windows 64 doesn't boot on
it.

Reported by: bz


153377 13-Dec-2005 jhb

Don't check the CPUID_APIC bit in the cpu_features flags field to determine
if the boot CPU has a local APIC because some BIOS vendors are not
competent enough to set this bit. Instead, just assume that we always have
a local APIC on amd64. For i386 the check is a bit more subtle. FreeBSD
requires either an MP Table or an ACPI MADT table to enumerate APICs. The
only systems that have one of those tables that don't have local APICs are
some presumably rare (and old) SMP 486 systems using external APICs. Thus,
instead of checking the CPUID_APIC flag, check the CPU class and abort if
we are running on a 486.

MFC after: 1 week
Reported by: bz


153177 06-Dec-2005 jkim

Fix ZERO_EDX() macro from the previous commit. It was emitting
`xor %ecx, %ecx', not `xor %edx, %edx'.


153157 06-Dec-2005 jkim

s/M_WAITOK/M_NOWAIT/ while mutex is held.

Pointed out by: csjp


153156 06-Dec-2005 jkim

- Micro-optimize `mov $0, %edx' -> `xor %edx, %edx'.
- Correct amd64 macro style (no functional change).


153151 06-Dec-2005 jkim

Add experimental BPF Just-In-Time compiler for amd64 and i386.

Use the following kernel configuration option to enable:

options BPF_JITTER

If you want to use bpf_filter() instead (e. g., debugging), do:

sysctl net.bpf.jitter.enable=0

to turn it off.

Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is
partially supported because 1) no need, 2) avoid expensive m_copydata(9).

Obtained from: WinPcap 3.1 (for i386)


153146 05-Dec-2005 jhb

Change the i386 code to pass the interrupt vector as a separate argument
rather than embedding it in the intrframe as if_vec. This reduces diffs
with amd64 somewhat.
- Remove cf_vec from clockframe (it wasn't used anyway) and stop pushing
dummy vector arguments for ipi_bitmap_handler() and lapic_handle_timer()
since clockframe == trapframe now.
- Fix ddb to handle stack traces across interrupt entry points that just
have a trapframe on their stack and not a trapframe + vector.
- Change intr_execute_handlers() to take a trapframe rather than an
intrframe pointer.
- Change lapic_handle_intr() and atpic_handle_intr() to take a vector and
trapframe rather than an intrframe.
- GC struct intrframe now that nothing uses it anymore.
- GC CLOCK_TO_TRAPFRAME() and INTR_TO_TRAPFRAME().

Reviewed by: bde
Requested by: peter


153141 05-Dec-2005 jhb

- Move the code to deal with handling an IPI_STOP IPI out of
ipi_nmi_handler() and into a new cpustop_handler() function. Change
the Xcpustop IPI_STOP handler to call this function instead of
duplicating all the same logic in assembly.
- EOI the local APIC for the lapic timer interrupt in C rather than
assembly.
- Bump the lazypmap IPI counter if COUNT_IPIS is defined in C rather than
assembly.


153135 05-Dec-2005 jhb

- Move PUSH_FRAME and POP_FRAME into machine/asmacros.h.
- Add a new SET_KERNEL_SREGS macro that sets up %ds and %es to point to
kernel data and %fs to point to per-CPU data and use the new macro
in several kernel entry points including trap and interrupt handlers.
- Convert the IPI_STOP handler Xcpustop to push a standard trap frame
rather than an application frame.
- Make the TRAP() macro private to exception.s since it is only used
there.
- Move the PCPU_*() macros in asmacros.h out of the middle of the
profiling macros.

Reviewed by: bde
Requested by: bde (4, 5)


153115 05-Dec-2005 ru

Prepare for MACHINE and hw.machine switching to "pc98" on FreeBSD/pc98.

Reviewed by: nyan


152906 28-Nov-2005 jhb

If we get a stray interrupt, return after logging it. In the extremely
rare case of a stray interrupt to an unregistered source (such as a stray
interrupt from the 8259As when using APIC), this could result in a page
fault when it tried to walk the list of interrupt handlers to execute
INTR_FAST handlers. This bug was introduced with the intr_event changes,
so it's not present in 5.x or 6.x.

Submitted by: Mark Tinguely tinguely at casselton dot net


152775 24-Nov-2005 le

Fix typo.


152753 24-Nov-2005 ru

Add missing "struct" in i386/i386/machdep.c,v 1.497 by deischen@.


152701 22-Nov-2005 jhb

Make COUNT_IPIS and COUNT_XINVLTLB_HITS real kernel options and take
them out of machine/smptests.h.


152699 22-Nov-2005 jhb

Garbage collect the code to store diagnostics codes in a CMOS register
during SMP startup. We haven't had any issues with starting up the APs
on i386 in quite a while now which is all this code is really useful for.
If someone ever does really need it they can always dig it up out of the
attic.


152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


152588 18-Nov-2005 jhb

- Always print the trap number so that we have something to start with for
mystery traps. If we don't have a message for a given trap, just use
UNKNOWN for the message.
- Add trap messages for T_XMMFLT and T_RESERVED.

MFC after: 1 week


152537 17-Nov-2005 obrien

Fix spelling mistake.

Submitted by: kris


152531 16-Nov-2005 jhb

Revert a part of the previous commits to these files that made the NMI
IPI_STOP handling code use atomic_readandclear() to execute the restart
function on the first CPU to resume and restore the behavior of always
executing the restart function on the BSP since this is in fact what the
non-NMI IPI_STOP handler does. I did add back in a statement to clear
the restart function pointer after it is executed to match the behavior
of the non-NMI IPI_STOP handler.


152529 16-Nov-2005 jhb

Revert previous commit to these files. There isn't a race necessitating
an xchg instruction as we only try to execute the startup function if
the CPU ID is 0 (i.e. the BSP). I missed this earlier.


152528 16-Nov-2005 jhb

Fix a typo in the check for an invalid APIC. If we are told about an
I/O APIC that doesn't exist, then a read of the version register is going
to return -1 which is 0xffffffff not 0xffffff.

Tested on: i386
Tested by: Nikos Ntarmos ntarmos at ceid dot upatras dot gr
MFC after: 1 week


152461 15-Nov-2005 andre

Provide a link to the documentation of the I/O APIC at Intel.


152404 14-Nov-2005 imp

Provide a dummy NO_XBOX option that lives in opt_xbox.h for pc98.
This allows us to eliminate a three ifdef PC98 instances.


152359 13-Nov-2005 alc

In get_pv_entry() use PMAP_LOCK() instead of PMAP_TRYLOCK() when deadlock
cannot possibly occur.


152238 09-Nov-2005 nyan

Fix pc98 build.


152224 09-Nov-2005 alc

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


152219 09-Nov-2005 imp

Add support for XBOX to the FreeBSD port. The xbox architecture is
nearly identical to wintel/ia32, with a couple of tweaks. Since it is
so similar to ia32, it is optionally added to a i386 kernel. This
port is preliminary, but seems to work well. Further improvements
will improve the interaction with syscons(4), port Linux nforce driver
and future versions of the xbox.

This supports the 64MB and 128MB boxes. You'll need the most recent
CVS version of Cromwell (the Linux BIOS for the XBOX) to boot.

Rink will be maintaining this port, and is interested in feedback.
He's setup a website http://xbox-bsd.nl to report the latest
developments.

Any silly mistakes are my fault.

Submitted by: Rink P.W. Springer rink at stack dot nl and
Ed Schouten ed at fxq dot nl


152089 05-Nov-2005 phk

Unbreak !SMP kernels


152042 04-Nov-2005 alc

Begin and end the initialization of pvzone in pmap_init().
Previously, pvzone's initialization was split between pmap_init() and
pmap_init2(). This split initialization was the underlying cause of
some UMA panics during initialization. Specifically, if the UMA boot
pages was exhausted before the pvzone was fully initialized, then UMA,
through no fault of its own, would use an inappropriate back-end
allocator leading to a panic. (Previously, as a workaround, we have
increased the UMA boot pages.) Fortunately, there is no longer any
reason that pvzone's initialization cannot be completed in
pmap_init().

Eliminate a check for whether pv_entry_high_water has been initialized
or not from get_pv_entry(). Since pvzone's initialization is
completed in pmap_init(), this check is no longer needed.

Use cnt.v_page_count, the actual count of available physical pages,
instead of vm_page_array_size to compute the maximum number of pv
entries.

Introduce the vm.pmap.pv_entries tunable on alpha and ia64.

Eliminate some unnecessary white space.

Discussed with: tegge (item #1)
Tested by: marcel (ia64)


151979 02-Nov-2005 jhb

Change the x86 code to allocate IDT vectors on-demand when an interrupt
source is first enabled similar to how intr_event's now allocate ithreads
on-demand. Previously, we would map IDT vectors 1:1 to IRQs. Since we
only have 191 available IDT vectors for I/O interrupts, this limited us
to only supporting IRQs 0-190 corresponding to the first 190 I/O APIC
intpins. On many machines, however, each PCI-X bus has its own APIC even
though it only has 1 or 2 devices, thus, we were reserving between 24 and
32 IRQs just for 1 or 2 devices and thus 24 or 32 IDT vectors. With this
change, a machine with 100 IRQs but only 5 in use will only use up 5 IDT
vectors. Also, this change provides an API (apic_alloc_vector() and
apic_free_vector()) that will allow a future MSI interrupt source driver to
request IDT vectors for use by MSI interrupts on x86 machines.

Tested on: amd64, i386


151910 31-Oct-2005 alc

Instead of a panic()ing in pmap_insert_entry() if get_pv_entry()
fails, reclaim a pv entry by destroying a mapping to an inactive
page.

Change the format strings in many of the assertions that were recently
converted from PMAP_DIAGNOSTIC printf()s so that they are compatible
with PAE. Avoid unnecessary differences between the amd64 and i386
format strings.


151897 31-Oct-2005 rwatson

Normalize a significant number of kernel malloc type names:

- Prefer '_' to ' ', as it results in more easily parsed results in
memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion. Similar changes are required for UMA zone names.


151889 30-Oct-2005 alc

Replace diagnostic printf()s by assertions. Use consistent style for
similar assertions.


151658 25-Oct-2005 jhb

Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)

Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)


151634 24-Oct-2005 jhb

Rename the KDB_STOP_NMI kernel option to STOP_NMI and make it apply to all
IPI_STOP IPIs.
- Change the i386 and amd64 MD IPI code to send an NMI if STOP_NMI is
enabled if an attempt is made to send an IPI_STOP IPI. If the kernel
option is enabled, there is also a sysctl to change the behavior at
runtime (debug.stop_cpus_with_nmi which defaults to enabled). This
includes removing stop_cpus_nmi() and making ipi_nmi_selected() a
private function for i386 and amd64.
- Fix ipi_all(), ipi_all_but_self(), and ipi_self() on i386 and amd64 to
properly handle bitmapped IPIs as well as IPI_STOP IPIs when STOP_NMI is
enabled.
- Fix ipi_nmi_handler() to execute the restart function on the first CPU
that is restarted making use of atomic_readandclear() rather than
assuming that the BSP is always included in the set of restarted CPUs.
Also, the NMI handler didn't clear the function pointer meaning that
subsequent stop and restarts could execute the function again.
- Define a new macro HAVE_STOPPEDPCBS on i386 and amd64 to control the use
of stoppedpcbs[] and always enable it for i386 and amd64 instead of
being dependent on KDB_STOP_NMI. It works fine in both the NMI and
non-NMI cases.


151633 24-Oct-2005 jhb

When restarting the BSP during cpu_reset() use a membar to ensure that
the updated cpustop_restartfunc is seen when the BSP resumes execution.
This matches the membar already present in restart_cpus().


151632 24-Oct-2005 jhb

Use xchg in Xcpustop to close a race and make cpustop_restartfunc truly
one-shot in the SMP case (before using the simple mov / cmp / mov sequence
could allow multiple CPUs to execute the restart function on resume).


151631 24-Oct-2005 jhb

- Various small whitespace and style nits.
- Use PCPU_GET(cpumask) in preference to 1 << PCPU_GET(cpuid) in a few
places.


151543 21-Oct-2005 ade

Specifically panic() in the case where pmap_insert_entry() fails to
get a new pv under high system load where the available pv entries
have been exhausted before the pagedaemon has a chance to wake up
to reclaim some.

Prior to this, the NULL pointer dereference ended up causing
secondary panics with rather less than useful resulting tracebacks.

Reviewed by: alc, jhb
MFC after: 1 week


151431 17-Oct-2005 jkim

Redo physical/logical CPU count.

Suggested by: jhb


151418 17-Oct-2005 jkim

Split displaying number of physical and logical cores.


151375 16-Oct-2005 obrien

For AMD processors, nullify CPUID.HTT. FreeBSD has no need for the
information it conveys, and it is only confusing people.
This fixes incorrect output in the previous commit.


151348 14-Oct-2005 jkim

- Print number of physical/logical cores and more CPUID info.
- Add newer CPUID definitions for future use.

Many thanks to Mike Tancsa <mike at sentex dot net> for providing test
cases for Intel Pentium D and AMD Athlon 64 X2.

Approved by: anholt (mentor)


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


151302 13-Oct-2005 alc

Restore the UP optimization to reduce the number of TLB invalidations. The
previous revision only restored the MP optimization.

Describe the optimization strategy for TLB invalidations in a comment.

Reviewed by: ups@
MFC after: 3 days


151274 13-Oct-2005 ups

Restore optimizations to reduce TLB shootdowns.
Alan Cox pointed out that they are really useful for
sendfile().

MFC after: 3 days


151247 12-Oct-2005 ups

Ensure that a thread stays on same CPU when calculating per CPU
TLB shootdown requirements. Otherwise a CPU may not get the needed
TLB invalidation.

The PTE valid and access flags can not be used here to avoid TLB
shootdowns unless sf->cpumask == all_cpus.
( Otherwise some CPUs may still hold an even older entry in the TLB)
Since sf_buf_alloc mappings are normally always used this is
also not really useful and presetting accessed and modified
allows the CPU to speculatively load the entry into the TLB.

Both bugs can cause random data corruption.

MFC after: 3 days


150789 01-Oct-2005 glebius

Big polling(4) cleanup.

o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.

o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.

Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.

Reviewed by: ru, sam, jhb


150697 28-Sep-2005 jhb

Add interrupt counters for IPIs. By default they are disabled, but they
can be enabled by enabling COUNT_IPIS in smptests.h. When enabled, each
CPU provides an interrupt counter for nearly all of the IPIs it receives
(IPI_STOP currently doesn't have a counter) that can be examined using
vmstat -i, etc.

MFC after: 3 days
Requested by: rwatson


150696 28-Sep-2005 jhb

Rename the lapic timer interrupt counters from lapicX: timer to cpuX: timer
since it's not always obvious that lapic == cpu.

MFC after: 3 days


150663 28-Sep-2005 rwatson

Back out alpha/alpha/trap.c:1.124, osf1_ioctl.c:1.14, osf1_misc.c:1.57,
osf1_signal.c:1.41, amd64/amd64/trap.c:1.291, linux_socket.c:1.60,
svr4_fcntl.c:1.36, svr4_ioctl.c:1.23, svr4_ipc.c:1.18, svr4_misc.c:1.81,
svr4_signal.c:1.34, svr4_stat.c:1.21, svr4_stream.c:1.55,
svr4_termios.c:1.13, svr4_ttold.c:1.15, svr4_util.h:1.10,
ext2_alloc.c:1.43, i386/i386/trap.c:1.279, vm86.c:1.58,
unaligned.c:1.12, imgact_elf.c:1.164, ffs_alloc.c:1.133:

Now that Giant is acquired in uprintf() and tprintf(), the caller no
longer leads to acquire Giant unless it also holds another mutex that
would generate a lock order reversal when calling into these functions.
Specifically not backed out is the acquisition of Giant in nfs_socket.c
and rpcclnt.c, where local mutexes are held and would otherwise violate
the lock order with Giant.

This aligns this code more with the eventual locking of ttys.

Suggested by: bde


150335 19-Sep-2005 rwatson

Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),
as they both interact with the tty code (!MPSAFE) and may sleep if the
tty buffer is full (per comment).

Modify all consumers of uprintf() and tprintf() to hold Giant around
calls into these functions. In most cases, this means adding an
acquisition of Giant immediately around the function. In some cases
(nfs_timer()), it means acquiring Giant higher up in the callout.

With these changes, UFS no longer panics on SMP when either blocks are
exhausted or inodes are exhausted under load due to races in the tty
code when running without Giant.

NB: Some reduction in calls to uprintf() in the svr4 code is probably
desirable.

NB: In the case of nfs_timer(), calling uprintf() while holding a mutex,
or even in a callout at all, is a bad idea, and will generate warnings
and potential upset. This needs to be fixed, but was a problem before
this change.

NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having
non-MPSAFE tty code.

MFC after: 1 week


150264 17-Sep-2005 imp

Expose legacy_pcib_alloc_resource, and use it in the mptable pci bus
implementation, like other routines in the legacy bus.

This should fix problems with resource allocation on MP systems without
ACPI enabled.


150176 15-Sep-2005 jhb

- Adjust a comment, we do program the performance counter LVT entry now
if hwpmc(4) is included.
- Don't recursively panic if we are unable to send an IPI, just bail and
hope for the best.

MFC after: 1 week


150173 15-Sep-2005 jhb

Explicitly switch to the new TSS by updating the current CPU's TSS selector
and reloading it in i386_extend_pcb() rather than trying to force a context
switch to reload the TSS via the TDF_NEEDRESCHED flag. Optimizations to
avoid calling cpu_switch() when the new thread was identical to the old
thread defeated the attempt to force a TSS reload. Explicitly loading the
new TSS is what we really want to do anyway.

PR: i386/84842
Reported by: Alexander Best arundel at h3c dot de
MFC after: 1 week
Reviewed by: bde (mostly)


150041 12-Sep-2005 nyan

opt_pc98.h is not needed.


149925 10-Sep-2005 marcel

Move the prototypes of db_md_set_watchpoint(), db_md_clr_watchpoint()
and db_md_list_watchpoints() to ddb/ddb.h.


149786 04-Sep-2005 alc

Eliminate unnecessary TLB invalidations by pmap_enter(). Specifically,
eliminate TLB invalidations when permissions are relaxed, such as when a
read-only mapping is changed to a read/write mapping. Additionally,
eliminate TLB invalidations when bits that are ignored by the hardware,
such as PG_W ("wired mapping"), are changed.

Reviewed by: tegge


149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


149532 27-Aug-2005 alc

MFamd64 revision 1.526
When pmap_allocpte() destroys a 2/4MB "superpage" mapping it does not
reduce the pmap's resident count accordingly. It should.


149300 19-Aug-2005 pjd

Avoid code duplication and implement bitcount32() function in systm.h only.

Reviewed by: cperciva
MFC after: 3 days


149142 16-Aug-2005 jhb

Clarify a comment.


149058 14-Aug-2005 alc

Simplify the page table page reference counting by pmap_enter()'s change of
mapping case.

Eliminate a stale comment from pmap_enter().

Reviewed by: tegge


148976 11-Aug-2005 alc

Eliminate unneeded diagnostic code.

Eliminate an unused #include. (Kernel stack allocation and deallocation
long ago migrated to the machine-independent code.)


148971 11-Aug-2005 alc

Eliminate unneeded diagnostic code.

Reviewed by: tegge


148952 11-Aug-2005 alc

Decouple the unrefing of a page table page from the removal of a pv entry.
In other words, change pmap_remove_entry() such that it no longer unrefs
the page table page. Now, it only removes the pv entry.

Reviewed by: tegge


148835 07-Aug-2005 alc

When support for 2MB/4MB pages was added in revision 1.148 an error was
made in pmap_protect(): The pmap's resident count should not be reduced
unless mappings are removed.

The errant change to the pmap's resident count could result in a later
pmap_remove() failing to remove any mappings if the errant change has set
the pmap's resident count to zero.


148694 04-Aug-2005 tobez

Make kernel build suceed when with "options CPU_DISABLE_SSE".

PR: 84010
Submitted by: Sergey Gluschenko <deen@freebsd.org.ua>
MFC after: 1 week


148666 03-Aug-2005 jeff

- Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
Concept code from: Neal Fachan <neal@isilon.com>


148539 29-Jul-2005 jhb

Fix a bug in pmap_protect() in the PAE case where it would try to look up
the vm_page_t associated with a pte using only the lower 32-bits of the pte
instead of the full 64-bits.

Submitted by: Greg Taleck greg at isilon dot com
Reviewed by: jeffr, alc
MFC after: 3 days


148538 29-Jul-2005 jhb

Add a tunable 'hw.apic.enable_extint' that can be set from the loader to
not mask the ExtINT pin on the first I/O APIC as at least one PIII chipset
seems to need this even though all of the pins in the 8259A's are masked.
The default is still to mask the ExtINT pin.

Reported by: Mike Tancsa mike at sentex.net
MFC after: 3 days


148231 21-Jul-2005 phk

Make the facility for recognizing BIOS-signatures more general
and return a printable representation.

This fixes recognition of the PC Engines WRAP and improves the
recognition of the Soekris boards (Bios version can now be
seen in the dmesg output for instance).

Also, add watchdog support for PCM-582x platforms.

Submitted by: Adrian Steinmann <ast@marabu.ch>
Slightly changed by: phk
PR: 81360


147950 13-Jul-2005 jkoshy

Use an interrupt gate for the NMI handler and prevent too-early
enabling of interrupts inside of trap(). Fix a typo in a comment.

Revert rev 1.113 of "sys/i386/i386/exception.s" as it is no longer
needed.

Reviewed by: bde
MFC after: 3 days


147889 10-Jul-2005 davidxu

Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)


147865 09-Jul-2005 jkoshy

Have the NMI handler call the C language trap() routine and directly
exit via 'doreti_exit'.

Since the NMI interrupt may be taken at any time, including when
the processor has masked external interrupts, it is not safe to
call ast() as is done for normal interrupts.

Approved by: re (scottl)


147741 02-Jul-2005 delphij

Remove the CPU_ENABLE_SSE option from the i386 and pc98 architectures,
as they are already default for I686_CPU for almost 3 years, and
CPU_DISABLE_SSE always disables it. On the other hand, CPU_ENABLE_SSE
does not work for I486_CPU and I586_CPU.

This commit has:
- Removed the option from conf/options.*
- Removed the option and comments from MD NOTES files
- Simplified the CPU_ENABLE_SSE ifdef's so they don't
deal with CPU_ENABLE_SSE from kernel configuration. (*)

For most users, this commit should be largely no-op. If you used to
place CPU_ENABLE_SSE into your kernel configuration for some reason,
it is time to remove it.

(*) The ifdef's of CPU_ENABLE_SSE are not removed at this point, since
we need to change it to !defined(CPU_DISABLE_SSE) && defined(I686_CPU),
not just !defined(CPU_DISABLE_SSE), if we really want to do so.

Discussed on: -arch
Approved by: re (scottl)


147740 02-Jul-2005 marcel

Fix a buglet that was present in the ia64 code and that got inherited
by amd64 and i386: For buffered writes we collect data and write it
out a ${DEV_BSIZE}-sized block at a time. The fragsz variable is used
to keep track of how much data we have collected in the buffer so far
and it's reset to zero immediately after writing a block to the dump
device.
When the last, possibly partially filled buffer is flushed, we didn't
reset fragsz to 0 and as such would stop reflecting reality. Since we
currently only need to do buffered writes once, this isn't a problem.
However, when kernel dumps are made by hand (say by callling doadump
from within DDB), the improperly cleared state from the first call to
dumpsys causes the next call to dumpsys to create an invalid code file.
This change resets fragsz after flushing the partially filled buffer so
that it fixes the two problems at once.

Approved by: re (scottl)


147691 30-Jun-2005 peter

Begin promoting the AMD-originated feature flags to first class flags, now
that newer Intel cpu hardware implements them too. This includes things
like the NX (pte no-execute) flag for execute protection. We'll need to
reference this for implementing no-exec in pmap.c at some point.

Some feature flags are duplicated in both the Intel-orignated bits and
the AMD bits. Suppress the the duplicates correctly - the old code
assumed they were a 1:1 mapping which is not correct. We can't just mask
off the bits present in cpu_feature.

Converge with amd64 where this originated from.

Intel cpu's that implement any AMD features will report them in dmesg now.

Approved by: re


147674 29-Jun-2005 peter

Move the KDB_STOP_NMI option from opt_global.h to opt_kdb.h

Approved by: re


147671 29-Jun-2005 peter

Switch AMD64 and i386 platforms to using ELF as their kernel crash
dump format. The key reason to do this is so that we can dump sparse
address space. For example, we need to be able to skip the PCI hole
just below the 4GB boundary. Trying to destructively dump MMIO device
registers is Really Bad(TM). The frequent result of trying to do a
crash dump on a machine with 4GB or more ram was ugly (lockup or reboot).

This code has been taken directly from the IA64 dump_machdep.c code,
with just a few (mostly minor) mods.

Introduce a dump_avail[] array in the machdep.c code so that we have a
source of truth for what memory is present in a machine that needs to be
dumped. We can't use phys_avail[] because all sorts of things slice
memory out of it that we really need to dump. eg: the vm page array
and the dmesg buffer. dump_avail[] is pretty much an unmolested version
of phys_avail[]. It does have Maxmem correction.

Bump the i386 and amd64 dump format to version 2, but nothing actually
uses this. amd64 was actually using the i386 dump version number.

libkvm support to follow.

Approved by: re


147604 25-Jun-2005 ups

Disable the interrupts in trap_fatal before calling kdb_trap.
(required now that critical sections no longer block interrupts)

Reviewed by: jhb@
Approved by: re (scottl)
Tested by: kris@,glebius@


147565 24-Jun-2005 peter

Move HWPMC_HOOKS into its own opt_hwpmc_hooks.h file. It doesn't merit
being in opt_global.h and forcing a global recompile when only a few files
reference it.

Approved by: re


147558 23-Jun-2005 jhb

Various and sundry style fixes and comment cleanups.

Approved by: re (scottl)


147217 10-Jun-2005 alc

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


147181 09-Jun-2005 ups

Add IPI support for preempting a thread on another CPU.

MFC after: 3 weeks


146959 04-Jun-2005 dfr

Add the proper logic so that we don't try to do SSE stuff unless its
enabled.


146818 31-May-2005 dfr

Add support for XMM registers in GDB for x86 processors that support
SSE (or its successors).

Reviewed by: marcel, davidxu
MFC After: 2 weeks


146799 30-May-2005 jkoshy

Kernel hooks to support PMC sampling modes.

Reviewed by: alc


146794 29-May-2005 marcel

Create nexus in configure_first() instead of in configure(). This
makes sure that sysinit tasks that run after configure_first(),
but before configure() have a nexus to hang devices off.


146767 29-May-2005 schweikh

Chop a '>' in a feature name (RSVD2>) that snuck in;
this now balances the <> flags displayed at boot, e.g. without this
Features2=0x41d<SSE3,RSVD2>,MON,DS_CPL,CNTX-ID>

MFC after: 1 week


146551 23-May-2005 obrien

Sync the style of these two files.


146263 16-May-2005 obrien

Add the 2nd word of IA32 feature flags. This includes things such as SSE3.

Obtained from: sys/amd64/amd64/identcpu.


146172 13-May-2005 nectar

Default hyperthreading on in -CURRENT. No seatbelts in CURRENT (^_^)

Requested by: peter, jhb


146170 13-May-2005 nectar

Add a knob for disabling/enabling HTT, "machdep.hyperthreading_allowed".
Default off due to information disclosure on multi-user systems.

Submitted by: cperciva
Reviewed by: jhb


146049 10-May-2005 nyan

Change a directory layout for pc98.
- Move MD files into <arch>/<arch>.
- Move bus dependent files into <arch>/<bus>.
Rename some files to more suitable names.

Repo-copied by: peter
Discussed with: imp


145950 06-May-2005 cperciva

Correctly validate inputs to the i386_get_ldt syscall.

Security: FreeBSD-SA-05:07.ldt


145913 05-May-2005 peter

Move the pcb variable initialization earlier. This is cosmetic here, but
in as-yet uncommitted code for 32 bit binary compatability on 64 bit
kernels, some of the 32 bit registers come from the pcb. Moving the
initialization here means fill_regs32() etc are laid out the same.


145727 30-Apr-2005 dwhite

Implement an alternate method to stop CPUs when entering DDB. Normally we use
a regular IPI vector, but this vector is blocked when interrupts are disabled.
With "options KDB_STOP_NMI" and debug.kdb.stop_cpus_with_nmi set, KDB will
send an NMI to each CPU instead. The code also has a context-stuffing
feature which helps ddb extract the state of processes running on the
stopped CPUs.

KDB_STOP_NMI is only useful with SMP and complains if SMP is not defined.
This feature only applies to i386 and amd64 at the moment, but could be
used on other architectures with the appropriate MD bits.

Submitted by: ups


145433 23-Apr-2005 davidxu

Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.


145383 21-Apr-2005 alc

Eliminate an unpredictable branch from bcmp().

Reviewed by: bde


145343 20-Apr-2005 ps

Don't enter the debugger if KDB_UNATTENDED is set or if
debug.debugger_on_panic=0.

MFC after: 2 weeks


145276 19-Apr-2005 davidxu

Further narrow down critical region of FSBASE code.


145274 19-Apr-2005 davidxu

Use critical section functions rather than scheduler lock to protect
critical region.


145256 19-Apr-2005 jkoshy

Bring a working snapshot of hwpmc(4), its associated libraries, userland utilities
and documentation into -CURRENT.

Bump FreeBSD_version.

Reviewed by: alc, jhb (kernel changes)


145080 14-Apr-2005 jhb

Remove support for mixed mode altogether now that we no longer use IRQ 0
when using an APIC. This simplifies the APIC code somewhat and also allows
us to be pedantically more compliant with ACPI which mandates no use of
mixed mode.


145057 14-Apr-2005 jhb

Bah, add a missing cast.


145055 14-Apr-2005 jhb

Always use the local APIC timer, even on UP machines.


145054 14-Apr-2005 jhb

If an I/O APIC returns 0xffffffff for its version register after we map it,
assume it is bogus and return NULL instead of trying to parse it as an
APIC.

Inspired by: linux bug reports via njl


145041 14-Apr-2005 peter

Allow user processes to completely empty out their LDT, now that user
processes run from segment selectors that live in the GDT. Doing this
used to be equivalent to committing suicide, but now this is a NOP.


145034 13-Apr-2005 peter

Change the segment limits to 4GB, we set the user accessible bit on all
of the kernel address space already. Intel recommend this anyway, because
using a non-4GB limit adds an additional clock cycle to address generation.
We were able to install 4GB segments into the LDT, so any limits we imposed
on %cs and %ds were academic anyway. More importantly, this allows us to
make a page in the kernel readable to user applications, for holding things
like the signal trampoline and other fun things.

Move the user %cs/%ds segments from the LDT to the GDT. There was no good
reason for them to be there anyway. The old LDT entries are still there
but we can now relax the restriction that prevented users from emptying
the default LDT entries.

Putting user and kernel %cs and %ds together allows us to access the fast
sysenter/sysexit/syscall/sysret instructions. syscall/sysret in particular
require that the user/kernel segments be laid out this way. Reserve a slot
specifically for NDIS while here.

Create two user controllable slots in the GDT that are context switched
with the (kernel) thread. This allows user applications to set two
user privilige selectors to arbitary values. Create
i386_set_fsbase(void *base) and friends. (get/set, fs/gs). For i386,
%gs is used by tls and the thread libraries and this means that user
processes no longer have to have the cost of having a custom LDT, and
we will no longer to do a ldt switch when activating a kthread/ithread in
the usual case any more.

In other words, we can now set the base address for %fs and %gs to arbitary
addresses without the pain of messing with ldt segments.


145025 13-Apr-2005 peter

Fix an evil bug that appeared in September 2003. VM86 bios calls use two
of the __pcb_spare longs. Except that fields were changed and one of the
spare values was used and the __pcb_spare field was reduced from two to one
long. Now VM86 bios calls can trash the first 4 bytes of the next page
following the kernel stack/pcb. This Is Bad(TM). This bug has been
present in 5.2-release and onwards, and is still in RELENG_5.

Instead of tempting fate and trying to use "spare" fields, explicitly
reserve them.


144971 12-Apr-2005 jhb

Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.


144969 12-Apr-2005 jhb

Use NULL rather than 0 in a couple of places.


144637 04-Apr-2005 jhb

Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more


144265 29-Mar-2005 sam

handle malloc failure and don't proceed when the bios call to
get parameters passed to malloc fails

Noticed by: Coverity Prevent analysis tool (malloc failure)


144013 23-Mar-2005 das

Bounds check the length parameter to i386_set_ldt() before passing it
to kmem_alloc(). Failure to do this made it possible for user
processes to cause a hard lock on i386 kernels. I believe this only
affects 6-CURRENT on or after 2005-01-26.

Found by: Coverity Prevent analysis tool
Security: Local DOS


143785 18-Mar-2005 imp

Use STAILQ in preference to SLIST for the resources. Insert new resources
last in the list rather than first.

This makes the resouces print in the 4.x order rather than the 5.x order
(eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x). This
also means that the pci code will once again print the resources in BAR
ascending order.


143449 12-Mar-2005 scottl

Guard against an integer underflow that could cause busdma to eat up all
available RAM. This also results in the global bounce page limit being
applied to zones instead of globally.

Submitted by: Petr Lampa (in part)


143293 08-Mar-2005 mux

Oops, CTR*() macros are not varadic macros, and the number indicates
the number of parameters. Fix my previous commit to use the correct
CTR*() macros.

Pointy hat to: mux


143284 08-Mar-2005 mux

Use __func__ in the KTR_BUSDMA traces. This avoids copy and paste
errors like in the bus_dmamap_load_mbuf_sg() case where we were wrongly
displaying the function name as bus_dmamap_load_mbuf.


143202 07-Mar-2005 scottl

Remove dead code.


143157 05-Mar-2005 des

Use TUNABLE_ULONG_FETCH to retrieve hw.physmem; getenv_quad() will take
care of the multiplier suffix.


143156 05-Mar-2005 des

Replace goto with continue.


143063 02-Mar-2005 joerg

netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.

Submitted by: netchild
Reviewed by: various developers on arch@, some time ago


143034 02-Mar-2005 jhb

Tweak the lapic timer code to get the performance closer to the pre-lapic
timer case:
- Remove the virtual fooclock interrupt counters as they have served their
purpose.
- Adjust the dividers for the different clock such that profhz is now a
multiple of stathz as in the non-lapic case, and the timer now runs at
hz * 2 rather than hz * 3. With the new divisors, the default clock
rates are:

kern.clockrate: { hz = 1000, tick = 1000, profhz = 666, stathz = 133 }


142834 28-Feb-2005 wes

Add a sysctl that records the amount of physical memory in the machine.

Submitted by: Nicko Dehaine <nicko@stbernard.com>
MFC after: 1 day


142723 27-Feb-2005 pjd

Fix typo.


142312 23-Feb-2005 njl

Remove the old p4tcc.


142256 22-Feb-2005 jhb

If mixed mode is not enabled by the APIC enumerator (MPTable always does,
ACPI MADT only does if the PC-AT flag is set), then don't assume that pin 0
on the first I/O APIC is an ExtINT pin. Instead, assume that it is ISA
IRQ 0.


141942 15-Feb-2005 njl

Correct a few bugs in the legacy cpu attachment. Get the unit from the
parent cpu device before passing it to pcpu_find(). Get the ivars from the
child, not parent cpu device. These bugs would cause a panic when
dereferencing the pcpu ivar, but weren't present in the acpi attachment
which it seems most people are using.


141848 13-Feb-2005 alc

Request a CPU private mapping from sf_buf_alloc().


141784 13-Feb-2005 alc

Implement support for CPU private mappings within sf_buf_alloc().


141616 10-Feb-2005 phk

Make a bunch of malloc types static.

Found by: src/tools/tools/kernxref


141544 08-Feb-2005 pjd

- Add debug.watchdog tunable, so we can specify watchdog CPU from loader
which will help to debug hangs on boot.
- Remove 'U' from debug.watchdog sysctl definition, so if we set it to '-1'
it really shows '-1'.
- Fix comment.

Reviewed by: rwatson


141538 08-Feb-2005 jhb

Use the local APIC timer to drive the various kernel clocks on SMP machines
rather than forwarding interrupts from the clock devices around using IPIs:
- Add an IDT vector that pushes a clock frame and calls
lapic_handle_timer().
- Add functions to program the local APIC timer including setting the
divisor, and setting up the timer to either down a periodic countdown
or one-shot countdown.
- Add a lapic_setup_clock() function that the BSP calls from
cpu_init_clocks() to setup the local APIC timer if it is going to be
used. The setup uses a one-shot countdown to calibrate the timer. We
then program the timer on each CPU to fire at a frequency of hz * 3.
stathz is defined as freq / 23 (hz * 3 / 23), and profhz is defined as
freq / 2 (hz * 3 / 2). This gives the clocks relatively prime divisors
while keeping a low LCM for the frequency of the clock interrupts.
Thanks to Peter Jeremy for suggesting this approach.
- Remove the hardclock and statclock forwarding code including the two
associated IPIs. The bitmap IPI handler has now effectively degenerated
to just IPI_AST.
- When the local APIC timer is used we don't turn the RTC on at all, but
we still enable interrupts on the ISA timer 0 (i8254) for timecounting
purposes.


141455 07-Feb-2005 sobomax

Fix the problem with incorrect throttling level reported immediately after
reboot. Safter the reboot the TCC is usually in the Automatic mode, in which
reading current performance level is likely to produce bogus results make sure
to switch it to the On-Demand mode and set to some known performance level.
Unfortunately there is no reliable way to check that TCC is in the Automatic
mode. Reading bit 4 of ACPI Thermal Monitor Control Register produces 0
regardless of the current mode.

MFC after: 1 week


141380 06-Feb-2005 njl

Staticize the legacy cpu devclasses and revert the name for the acpi_cpu
devclass. As pointed out by dfr@, devclasses don't have to share the same
linkage if multiple drivers have the same name. Newbus should match the
devclasses based on name and allocate non-conflicting unit numbers.


141378 06-Feb-2005 njl

Finish the job of sorting all includes and fix the build by including
malloc.h before proc.h on sparc64. Noticed by das@

Compiled on: alpha, amd64, i386, pc98, sparc64


141374 05-Feb-2005 njl

Make cpu_est_clockrate() more accurate by disabling interrupts for the
millisecond it is calibrating. Suggested by jhb@ and bde@. Don't clobber
the tsc_freq with the new value since it isn't accurate enough for
timecounters and the timecounter system as a whole needs support for
changing rates before we do this. Subtract 0.5% from our measurement
to account for overhead in DELAY. Note that this interface is for
estimating the clockrate and needs to work well at runtime so doing a full
calibration including disabling interrupts for a second is not feasible.


141367 05-Feb-2005 alc

Implement proper handling of PG_G mappings in pmap_protect(). (I don't
believe that this omission mattered before the introduction of MemGuard.)

Reviewed by: tegge@
MFC after: 1 week


141278 04-Feb-2005 nyan

Fix pc98 support (broken by previous change).


141238 04-Feb-2005 njl

Update the CPU attachments to return CPU_IVAR_PCPU as well as pass on
appropriate requests to any children.


141237 04-Feb-2005 njl

Add an implementation of cpu_est_clockrate(9). This function estimates the
current clock frequency for the given CPU id in units of Hz.


140862 26-Jan-2005 sobomax

o Move copyin()/copyout() out of i386_{get,set}_ldt() and
i386_{get,set}_ioperm() and make those APIs visible in the kernel namespace;

o use i386_{get,set}_ldt() and i386_{get,set}_ioperm() instead of sysarch()
in the linuxlator, which allows to kill another two stackgaps.

MFC after: 2 weeks


140452 18-Jan-2005 jhb

If a valid ELCR was found, consult it for the trigger mode of ISA
interrupts that have a trigger mode of conforming. This fixes problems on
some older machines that still route PCI devices via ISA interrupts when
using an I/O APIC.

Tested by: Peter Trifonov pvtrifonov at mail dot ru
MFC after: 1 month


140451 18-Jan-2005 jhb

Tweak the ELCR support slightly. Explicitly probe the ELCR during boot
instead of burying that in the atpic(4) code as atpic(4) is not the only
user of elcr(4). Change the elcr(4) code to export a global elcr_found
variable that other code can check to see if a valid ELCR was found.

MFC after: 1 month


140401 18-Jan-2005 jhb

Unbreak stack traces across double faults. In a particular edge case
(calling a __dead2 function such as panic() at the end of a function), the
saved %eip on the stack will actually not be part of the function that
executed a call instruction but instead will be the first instruction of
the next function in the text. This happens with dblfault_handler() and
syscall() for example. Work around this in the one place it matters by
looking at the saved %eip - 1 to determine the calling function when we
check for "magic" frames.

MFC after: 2 weeks


140260 14-Jan-2005 jhb

Bah, another whitespace fix.


140259 14-Jan-2005 jhb

Remove an extraneous space.


140257 14-Jan-2005 jhb

Remove redundant code to drop per-thread debug register state from
cpu_exit() as this is already performed in cpu_thread_exit() and the
debug state is per-thread rather than per-process.


140254 14-Jan-2005 jhb

Drop the 'active-' prefix from the polarity printf to be consistent with
the rest of the interrupt code.


140129 12-Jan-2005 jhb

Try harder to work with MP table interrupt entries that claim that an
interrupt is wired up to all the I/O APICs in the system. If the system
has only one I/O APIC, then just act as if the entry specified that APIC.
We still don't try to handle global entries in a system with multiple I/O
APICs.

Tested by: Peter Trifonov pvtrifonov at mail dot ru
MFC after: 1 week


139864 07-Jan-2005 jhb

Fix support for machines with default MP Table configurations:
- Fix the MP Table pci bridge drivers to not probe the configuration table
unless we actually have one. Machines using a default configuration do
not have such a table.
- Only allow default configuration types of 5 (ISA + PCI) and 6 (EISA +
PCI) as the others are not likely to work. Types 1 through 4 use an
external APIC (probably with 80486 processors) which we certainly do not
support, and type 7 uses an MCA bus which has not been tested with the
new MP Table code.
- Correct the fact that the single I/O APIC in a default configuration has
an ID of 2, not 0.
- Fix off by one errors in setting the bus types from the default_data[]
arrays for default configurations.
- Explicitly configure each of the 16 interrupt pins on the sole I/O APIC
when using a default configuration. This is especially helpful for type
6 (EISA + PCI) since the EISA interrupts need to have their polarity
programmed based on the values in the ELCR.

Much thanks to the submitter and tester who endured several rounds of
testing to get this fixed.

MFC after: 1 week
Tested by: Georg Schwarz georg dot schwarz at freenet dot de


139840 07-Jan-2005 scottl

Introduce bus_dmamap_load_mbuf_sg(). Instead of taking a callback arg, this
cuts to the chase and fills in a provided s/g list. This is meant to optimize
out the cost of the callback since the callback doesn't serve much purpose for
mbufs since mbuf loads will never be deferred. This is just for amd64 and
i386 at the moment, other arches will be coming shortly.


139724 05-Jan-2005 imp

Start all license/copyright notice comments with /*-, per tradition


139450 30-Dec-2004 jhb

Use NULL instead of 0 in a few places as well as various whitespace fixes.


139448 30-Dec-2004 jhb

Small whitespace fixes.


139343 27-Dec-2004 njl

Restore the cpu_reset proxy code. It is needed if you want to reset the
system from an AP at runtime (i.e., calling cpu_reset from ddb). Someday,
if we move to an NMI for stopping cpus instead, we can do away with this.

Requested by: jhb


139245 23-Dec-2004 jhb

- Give the timer, thermal, and error LVT entries an interrupt vector even
though these aren't used yet.
- Add missing function prototypes for some static functions.
- Allow lvt_mode() to handle an LVT entry with a delivery mode of fixed.
- Consolidate code duplicated in lapic_init() and lapic_setup() to program
the spurious vector register of a local APIC in a static lapic_enable()
function.
- Dump the timer, thermal, error, and performance counter LVT entries
during lapic_dump().
- Program LVT pins (currently only LINT0 and LINT1) after the local
APIC has been software enabled via lapic_enable() since otherwise the
LVT programming will not be able to unmask LVT sources.


139244 23-Dec-2004 jhb

Some small style fixes.


139242 23-Dec-2004 jhb

Add a simple 'intrcnt_add' function that other MD code can use to add a
single named counter to the interrupt counts without having to fake up an
entire interrupt source.


139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


139240 23-Dec-2004 jhb

- Add a function to set the Task Priority Register (TPR) of the local APIC.
Currently this is only used to initiailize the TPR to 0 during initial
setup.
- Reallocate vectors for the local APIC timer, error, and thermal LVT
entries. The timer entry is allocated from the top of the I/O interrupt
range reducing the number of vectors available for hardware interrupts
to 191. Linux happens to use the same exact vector for its timer
interrupt as well. If the timer vector shared the same priority queue
as the IPI handlers, then the frequency that the timer vector will
eventually be firing at can interact badly with the IPIs resulting in
the queue filling and the dreaded IPI stuck panics, hence it being located
at the top of the previous priority queue instead.
- Fixup various minor nits in comments.


138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


138722 12-Dec-2004 njl

Move the author's copyright notice to match the initial LongRun import
now that we have split out this support into longrun.c


138592 08-Dec-2004 kbyanc

If the parent process has the trap bit set (i.e. a debugger had single
stepped the process to the system call), we need to clear the trap flag
from the new frame unless the debugger had set PF_FORK on the parent.
Otherwise, the child will receive a (likely unexpected) SIGTRAP when it
executes the first instruction after returning to userland.

Reviewed by: bde
MFC after: 3 days


138528 07-Dec-2004 ups

Avoid more than two pending IPI interrupt vectors per local APIC
as this may cause deadlocks.

This should fix kern/72123.

Discussed with: jhb
Tested by: Nik Azim Azam, Andy Farkas, Flack Man, Aykut KARA
Izzet BESKARDES, Jens Binnewies, Karl Keusgen
Approved by: sam (mentor)


138520 07-Dec-2004 imp

NEC PC-98 machines do not have and cannot have an EISA bus. They have
only C-Bus and PCI busses. Therefore, don't create an eisa0 node on
the legacy bus that can never attach.

PC-98 info verified by: nyan-san


138506 07-Dec-2004 imp

PNP BIOS devices are fundamentally different than ISA PNP devices.
These devices should be probed first because they are at fixed
locations and cannot be turned off. ISA PNP devices, on the other
hand, can be turned off and often can be flexible in the resources
they use. Probe them last, as always.


138504 07-Dec-2004 ups

Move reading the current CPU mask in pmap_lazyfix() to where the thread
is protected from migrating to another CPU.

Approved by: sam (mentor)
MFC after: 4 weeks


138498 06-Dec-2004 ups

Allow fast interrupts to cause preemption.

Reviewed by: jhb, scottl
Approved by: sam (mentor)


138303 02-Dec-2004 alc

For efficiency move the call to pmap_pte_quick() out of pmap_protect()'s
and pmap_remove()'s inner loop.

Reviewed by: peter@, tegge@


138253 01-Dec-2004 marcel

Change gdb_cpu_setreg() to not take the value to which to set the
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.

This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.

Tested on: alpha, amd64, i386, ia64, sparc64


138236 30-Nov-2004 njl

Remove now unused variable.

Pointy hat: njl from nskyline_r35 at yahoo com


138216 30-Nov-2004 njl

MFamd64: Remove the cpu_reset_proxy cruft now that we run boot() on
cpu 0. Also, restructure cpu_reset to be cleaner (no functional change.)


138194 29-Nov-2004 scottl

Don't flag alignment constraints as a reason for bouncing. This fixes the
trigger for other misbehaviour in the sym driver that was causing freezes at
boot. Thanks to phk@ for reporting and testing this.


138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


137966 21-Nov-2004 scottl

Remove an extra #include


137965 21-Nov-2004 scottl

MFC amd64:
Consolidate all of the bounce tests into the BUS_DMA_COULD_BOUNCE flag.
Allocate the bounce zone at either tag creation or map creation to help
avoid null-pointer derefs later on. Track total pages per zone so that
each zone can get a minimum allocation at tag creation time instead of
being defeated by mis-behaving tags that suck up the max amount.


137917 20-Nov-2004 das

Remove references to U area and garbage collect includes.

Reviewed by: arch@


137912 20-Nov-2004 das

U areas are going away, so don't allocate one for process 0.

Reviewed by: arch@


137894 19-Nov-2004 scottl

Revert part of rev 1.57. The tag boundary is honored by splitting the
segment, not by bouncing.


137784 16-Nov-2004 jhb

Initiate deorbit burn sequence for 80386 support in FreeBSD: Remove
80386 (I386_CPU) support from the kernel.


137528 10-Nov-2004 nyan

cosmetic changes.


137460 09-Nov-2004 scottl

Zero the tag when it's allocated. Also fix a printf format problem. This
should fix the problems introduced several hours ago.


137445 09-Nov-2004 scottl

First pass at replacing the single global bounce pool with sub-pools that are
appropriate for different tag requirements. With the former global pool,
bounce pages might get allocated that are appropriate for one tag, but not
appropriate for another, but the system had no way to distinguish between them.
Now zones with distinct attributes are created to hold pages, and each tag
that requires bouncing is associated with a zone. New zones are created as
needed if no existing zones can meet the requirements of the tag. Stats for
each zone are tracked via the hw.busdma sysctl node.

This should help drivers that are failing with mysterious data corruption.

MFC After: 1 week


137372 08-Nov-2004 alc

Introduce two new options, "CPU private" and "no wait", to sf_buf_alloc().
Change the spelling of the "catch" option to be consistent with the new
options. Implement the "no wait" option. An implementation of the "CPU
private" for i386 will be committed at a later date.


137165 03-Nov-2004 scottl

Don't use atomic ops to increment interrupt stats. This was only done on
amd64 and i386 anyways. The stats are only kept for informational purposes.


137142 02-Nov-2004 scottl

Streamline busdma a bit. Inline _bus_dmamap_load_buffer, optimize some
tests, replace a passed td with a passed pmap to eliminate some deferences.


137117 01-Nov-2004 jhb

- Change the ddb paging "support" to use a variable (db_lines_per_page) to
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.

MFC after: 1 month
Inspired by: kris


137116 01-Nov-2004 jhb

Allow individual application processors to be disabled from the loader
via hints for 'lapicX'. For example, to disable the CPU with the local
APIC ID of 7, use 'hint.lapic.7.disabled=1'.

MFC after: 1 month


137052 29-Oct-2004 alc

Implement per-CPU SYSMAPs, i.e., CADDR* and CMAP*, to reduce lock
contention within pmap_zero_page() and pmap_copy_page().


136897 24-Oct-2004 simokawa

Preserve dcons(4) buffer passed by loader(8).


136805 23-Oct-2004 rwatson

Add some basic KTR tracing to busdma on i386. This is likely not
the final set of traces -- someone with more busdma background
will probably want to review and expand this, as well as port to
other platforms. This tracing is sufficient to identify key
busdma events on i386, and in particular to draw attention to
bounce buffering events that may have a substantial performance
impact.


136601 16-Oct-2004 alc

When sf_buf_alloc() replaces a virtual-to-physical mapping, it needn't
invalidate the TLB(s) if the old mapping wasn't used by the CPU. With
network interfaces that implement checksum off-loading, the old mapping is
almost never used by the CPU, only by the device driver for setting up the
DMA operation.

Reviewed by: tegge@


136521 14-Oct-2004 njl

Print flags in the nexus for child devices.


136419 12-Oct-2004 phk

Add zero flags argument to sysctl calls.


136252 08-Oct-2004 alc

Make pte_load_store() an atomic operation in all cases, not just i386 PAE.

Restructure pmap_enter() to prevent the loss of a page modified (PG_M) bit
in a race between processors. (This restructuring assumes the newly atomic
pte_load_store() for correct operation.)

Reviewed by: tegge@
PR: i386/61852


136101 03-Oct-2004 alc

Undo revision 1.251. This change was a performance pessimizing work-around
that is no longer required. (In fact, it is not clear that it was ever
required in HEAD or RELENG_4, only RELENG_3 required a work-around.) Now,
as before revision 1.251, if the preexisting PTE is invalid, pmap_enter()
does not call pmap_invalidate_page() to update the TLB(s).

Note: Even with this change, the handling of a copy-on-write fault is
inefficient, in such cases pmap_enter() calls pmap_invalidate_page() twice.

Discussed with: bde@
PR: kern/16568


136070 03-Oct-2004 alc

The physical address stored in the vm_page is page aligned. There is no
need to mask off the page offset bits. (This operation made some sense
prior to i386/i386/pmap.c revision 1.254 when we passed a physical address
rather than a vm_page pointer to pmap_enter().)


136050 02-Oct-2004 alc

Eliminate unnecessary uses of PHYS_TO_VM_PAGE() from pmap_enter(). These
uses predate the change in the pmap_enter() interface that replaced the
page's physical address by the address of its vm_page structure. The
PHYS_TO_VM_PAGE() was being used to compute the address of the same vm_page
structure that was being passed in.


136028 01-Oct-2004 nyan

Fix BIOS default geometry on pc98.

PR: kern/72225
Submitted by: Hirokazu WATANABE <wnabe@par.odn.ne.jp>


135939 29-Sep-2004 alc

Prevent the unexpected deallocation of a page table page while performing
pmap_copy(). This entails additional locking in pmap_copy() and the
addition of a "flags" parameter to the page table page allocator for
specifying whether it may sleep when memory is unavailable. (Already,
pmap_copy() checks the availability of memory, aborting if it is scarce.
In theory, another CPU could, however, allocate memory between
pmap_copy()'s check and the call to the page table page allocator,
causing the current thread to release its locks and sleep. This change
makes this scenario impossible.)

Reviewed by: tegge@


135754 24-Sep-2004 jhb

Improve the panic message for a busted MP table with conflicting entries
for the same PCI interrupt.

Tested by: Pavel Gubin pg at ie dot tusur dot ru
MFC after: 3 days


135621 23-Sep-2004 rik

Invalidate cache after changing pte entry.

Discussed with: jhp and njl
MFC after: 5 days


135565 22-Sep-2004 alc

Correct a long-standing error in _pmap_unwire_pte_hold() affecting
multiprocessors. Specifically, the error is conditioning the call to
pmap_invalidate_page() on whether the pmap is active on the current CPU.
This call must be unconditional. Regardless of whether the pmap is active
on the CPU performing _pmap_unwire_pte_hold(), it could be active on another
CPU. For example, a call to pmap_remove_all() by the page daemon could
result in a call to _pmap_unwire_pte_hold() with the pmap inactive on the
current CPU and active on another CPU. In such circumstances, failing to
call pmap_invalidate_page() results in a stale TLB entry on the other CPU
that still maps the now deallocated page table page. What happens next is
typically a mysterious panic in pmap_enter() by the other CPU, either
"pmap_enter: attempted pmap_enter on 4MB page" or "pmap_enter: pte vanished,
va: 0x%lx". Both occur because the former page table page has been recycled
and allocated to a new purpose. Consequently, it no longer contains zeroes.

See also Peter's i386/i386/pmap.c revision 1.448 and the related e-mail
thread last year.

Many thanks to the engineers at Sandvine for providing clear and concise
information until all of the pieces of the puzzle fell into place and
for testing an earlier patch.

MT5 Candidate


135529 20-Sep-2004 jhb

- Add support for "paging" in stack trace output. That is, when you do
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.

MFC after: 1 month
Tested on: i386, alpha, sparc64


135479 19-Sep-2004 alc

Simplify the reference counting of page table pages. Specifically, use
the page table page's wired count rather than its hold count to contain
the reference count. My rationale for this change is based on several
factors:

1. The machine-independent and pmap layers used the same hold count field
in subtly different ways. The machine-independent layer uses the hold
count to implement a form of ephemeral wiring that is used by pipes,
physio, etc. In other words, subsystems where we wish to temporarily
block a page from being swapped out while it is mapped into the kernel's
address space. Such pages are never removed from the page queues.
Instead, the page daemon recognizes a non-zero hold count to mean "hands
off this page." In contrast, page table pages are never in the page
queues; they are wired from birth to death. The hold count was being
used as a kind of reference count, specifically, the number of valid
page table entries within the page. Not surprisingly, these two
different uses imply different synchronization rules: in the machine-
independent layer access to the hold count requires the page queues
lock; whereas in the pmap layer the pmap lock is required. Thus,
continued use by the pmap layer of vm_page_unhold(), which asserts that
the page queues lock is held, made no sense.

2. _pmap_unwire_pte_hold() was too forgiving in its handling of the wired
count. An unexpected wired count on a page table page was ignored and
the underlying page leaked.

3. In a word, microoptimization. Using the wired count exclusively, rather
than a combination of the wired and hold counts, makes the code slightly
smaller and faster.

Reviewed by: tegge@


135452 19-Sep-2004 alc

Remove an outdated assertion from _pmap_allocpte(). (When vm_page_alloc()
succeeds, the page's queue field is unconditionally set to PQ_NONE by
vm_pageq_remove_nowakeup().)


135443 18-Sep-2004 alc

Release the page queues lock earlier in pmap_protect() and pmap_remove() in
order to reduce contention.


135283 15-Sep-2004 julian

Fix breakpoint handling for i386.
not sure yet about 5.x... MFC if needed.
Also fixes small problems with examining some registers and
some specific gdb transfer problems.

As the patch says:
This is not a pretty patch and only meant as a temporary
fix until a better solution is committed.

PR: i386/71715
Submitted by: Stephan Uphoff <ups@tree.com>
MFC after: 1 week


135123 12-Sep-2004 alc

Use an atomic op to update the pte in pmap_protect(). This is to prevent
the loss of a page modified (PG_M) bit in a race between processors.

Quoting Tor:
One scenario where the old code could cause a lost PG_M bit is a
multithreaded linux program (or FreeBSD program using the
linuxthreads port) where one thread was starting a subprocess.
The thread doing fork() would call vmspace_fork(), which would then
call vm_map_copy_entry() which would call pmap_protect() on an area
possibly accessed by other threads.

Additionally, make the clearing of PG_M by pmap_protect() unconditional if
write permission is removed. Previously, PG_M could persist on a read-only
unmanaged page. That seems inconsistent and confusing.

In collaboration with: tegge@

MT5 candidate
PR: 61852


135076 11-Sep-2004 scottl

Revert the previous round of changes to td_pinned. The scheduler isn't
fully initialed when the pmap layer tries to call sched_pini() early in the
boot and results in an quick panic. Use ke_pinned instead as was originally
done with Tor's patch.

Approved by: julian


135056 10-Sep-2004 julian

Make up my mind if cpu pinning is stored in the thread structure or the
scheduler specific extension to it. Put it in the extension as
the implimentation details of how the pinning is done needn't be visible
outside the scheduler.

Submitted by: tegge (of course!) (with changes)
MFC after: 3 days


135008 09-Sep-2004 jhb

Teach the stack trace code how to step across a double fault when stepping
across frames. Basically, if the current frame is for the
'dblfault_handler' function, then get the next %eip and %ebp values to use
from the original TSS of the thread that has the saved state when the
double fault triggered.

MFC after: 4 days


134960 08-Sep-2004 alc

Use atomic ops in pmap_clear_ptes() to prevent SMP races that could
result in the loss of an accessed or modified bit from the pte.

In collaboration with: tegge@

MT5 candidate


134934 08-Sep-2004 scottl

Fix a problem with tag->boundary inheritence that has existed since day one
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.

This is a MT5 candidate.

Reviewed by: marcel
Submitted by: sos (in part)


134791 05-Sep-2004 julian

Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week


134611 01-Sep-2004 alc

Correction to the previous revision: I forgot to apply the ones complement
to a constant. This didn't show in testing because the broken expression
produced the same result in my tests as the correct expression.


134607 01-Sep-2004 alc

Modify pmap_pte() to support its use on non-current, non-kernel pmaps
without holding Giant.


134591 01-Sep-2004 julian

Give the 4bsd scheduler the ability to wake up idle processors
when there is new work to be done.

MFC after: 5 days


134571 31-Aug-2004 julian

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


134568 31-Aug-2004 julian

Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after: 2 days


134509 30-Aug-2004 alc

Remove unnecessary check for curthread == NULL.


134416 28-Aug-2004 obrien

s/smp_rv_mtx/smp_ipi_mtx/g

Requested by: jhb


134393 27-Aug-2004 alc

The machine-independent parts of the virtual memory system always pass a
valid pmap to the pmap functions that require one. Remove the checks for
NULL. (These checks have their origins in the Mach pmap.c that was
integrated into BSD. None of the new code written specifically for
FreeBSD included them.)


134227 23-Aug-2004 peter

Commit Doug White and Alan Cox's fix for the cross-ipi smp deadlock.
We were obtaining different spin mutexes (which disable interrupts after
aquisition) and spin waiting for delivery. For example, KSE processes
do LDT operations which use smp_rendezvous, while other parts of the
system are doing things like tlb shootdowns with a different mutex.

This patch uses the common smp_rendezvous mutex for all MD home-grown
IPIs that spinwait for delivery. Having the single mutex means that
the spinloop to aquire it will enable interrupts periodically, thus
avoiding the cross-ipi deadlock.

Obtained from: dwhite, alc
Reviewed by: jhb


134216 23-Aug-2004 njl

Add a BUS_GET_RESOURCE_LIST method for nexus.

MFC after: 3 days


134200 23-Aug-2004 sobomax

o Fix whitespace bug introduced in the previous commit.

Submitted by: ru

o Simplify p4tcc_power_profile().

Submitted by: maxim


134199 23-Aug-2004 sobomax

o Extend boot output: print out mimimum/maximum performance value and number
of performance steps available;

o similarly to Enhanced SpeedStep driver, export list of all available steps
via hw.p4tcc.cpuperf_levels sysctl.


134127 21-Aug-2004 alc

Properly free the temporary sf_buf in uiomove_fromphys() if a copyin or
copyout fails.

Obtained from: DragonFlyBSD


133886 16-Aug-2004 gibbs

Modify the "legacy bus" to pass all resource allocations through to its
parent rather than track resources locally. The original code
was incomplete in that it would only honor requests for resources
that already exist in its resource list. This prevented many ISA
identify routines from allocating temporary resources. Passing
the requests up to legacy's parent losing no functionality and
allows these requests to succeed.

Reviewed by: imp, jhb
Approved by: RE


133770 15-Aug-2004 rwatson

Preemptive anti-footshooting: cause a #error if MP_WATCHDOG is compiled
with SCHED_ULE.


133766 15-Aug-2004 rwatson

Spell MP_WATCHDIG right: I fixed the build without MP_WATCHDOG after
testing MP_WATCHDOG, and used an incorrect ifdef.


133759 15-Aug-2004 rwatson

Add an "options MP_WATCHDOG" to i386. This option allows one of the
logical CPUs on a system to be used as a dedicated watchdog to cause a
drop to the debugger and/or generate an NMI to the boot processor if
the kernel ceases to respond. A sysctl enables the watchdog running
out of the processor's idle thread; a callout is launched to reset a
timer in the watchdog. If the callout fails to reset the timer for ten
seconds, the watchdog will fire. The sysctl allows you to select which
CPU will run the watchdog.

A sample "debug.leak_schedlock" is included, which causes a sysctl to
spin holding sched_lock in order to trigger the watchdog. On my Xeons,
the watchdog is able to detect this failure mode and break into the
debugger, which cannot otherwise be done without an NMI button.

This option does not currently work with sched_ule due to ule's push
notion of scheduling, similar to machdep.hlt_logical_cpus failing to
work with that scheduler.

On face value, this might seem somewhat inefficient, but there are a
lot of dual-processor Xeons with HTT around, so using one as a watchdog
for testing is not as inefficient as one might fear.


133464 11-Aug-2004 marcel

Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc


133292 08-Aug-2004 alc

With the advent of pmap locking it makes sense for pmap_copy() to be less
forgiving about inconsistencies in the source pmap. Also, remove a new-
line character terminating a nearby panic string.


133231 06-Aug-2004 rwatson

Generate KTR trace records for syscall enter and exit in i386 system
calls. Note that the information included is a bit different from the
existing KTR traces generated on powerpc, as I'm primarily interested
in kernel context (thread, syscall #, proc, etc), not the user
arguments to the system call. Some convergence would be useful here.


133145 05-Aug-2004 rwatson

Move definition of mem_range_softc from mp_machdep.c to machdep.c so
that it is defined for non-SMP builds, not just SMP ones.


133139 04-Aug-2004 jhb

Remove a potential deadlock on i386 SMP by changing the lazypmap ipi and
spin-wait code to use the same spin mutex (smp_tlb_mtx) as the TLB ipi
and spin-wait code snippets so that you can't get into the situation of
one CPU doing a TLB shootdown to another CPU that is doing a lazy pmap
shootdown each of which are waiting on each other. With this change, only
one of the CPUs would do an IPI and spin-wait at a time.


133129 04-Aug-2004 markm

Fix module builds for i386 and amd64.


133124 04-Aug-2004 alc

Post-locking clean up/simplification, particularly, the elimination of
vm_page_sleep_if_busy() and the page table page's busy flag as a
synchronization mechanism on page table pages.

Also, relocate the inline pmap_unwire_pte_hold() so that it can be used
to shorten _pmap_unwire_pte_hold() on alpha and amd64. This places
pmap_unwire_pte_hold() next to a comment that more accurately describes
it than _pmap_unwire_pte_hold().


133034 02-Aug-2004 markm

Sort includes; minor whitespace.


133017 02-Aug-2004 scottl

Optimize intr_execute_handlers() by combining the pic_disable_source() and
pic_eoi_source() into one call. This halves the number of spinlock operations
and indirect function calls in the normal case of handling a normal (ithread)
interrupt. Optimize the atpic and ioapic drivers to use inlines where
appropriate in supporting the intr_execute_handlers() change.

This knocks 900ns, or roughly 1350 cycles, off of the time spent servicing an
interrupt in the common case on my 1.5GHz P4 uniprocessor system. SMP systems
likely won't see as much of a gain due to the ioapic being more efficient than
the atpic. I'll investigate porting this to amd64 soon.

Reviewed by: jhb


132956 01-Aug-2004 markm

Break out the MI part of the /dev/[k]mem and /dev/io drivers into
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.


132919 31-Jul-2004 alc

Add pmap locking to pmap_object_init_pt().


132852 29-Jul-2004 alc

Advance the state of pmap locking on alpha, amd64, and i386.

- Enable recursion on the page queues lock. This allows calls to
vm_page_alloc(VM_ALLOC_NORMAL) and UMA's obj_alloc() with the page
queues lock held. Such calls are made to allocate page table pages
and pv entries.
- The previous change enables a partial reversion of vm/vm_page.c
revision 1.216, i.e., the call to vm_page_alloc() by vm_page_cowfault()
now specifies VM_ALLOC_NORMAL rather than VM_ALLOC_INTERRUPT.
- Add partial locking to pmap_copy(). (As a side-effect, pmap_copy()
should now be faster on i386 SMP because it no longer generates IPIs
for TLB shootdown on the other processors.)
- Complete the locking of pmap_enter() and pmap_enter_quick(). (As of now,
all changes to a user-level pmap on alpha, amd64, and i386 are performed
with appropriate locking.)


132808 28-Jul-2004 phk

Move a relic to its correct location(s): Put nfs diskless initialization
calls with the code they call. (Yet another example of mindless copy&paste).


132545 22-Jul-2004 scottl

Arg! Revert local changes that were accidentlly included in the previous
version.


132544 22-Jul-2004 scottl

Don't count needed bounce pages if loading a buffer that was created with
bus_dmamem_alloc()

Submitted by: harti


132505 21-Jul-2004 cognet

Using NULL as a malloc type when calling contigmalloc() is wrong, so introduce
a new malloc type, and use it.


132482 21-Jul-2004 marcel

Unify db_stack_trace_cmd(). All it did was look up the thread given
the thread ID and call db_trace_thread().
Since arm has all the logic in db_stack_trace_cmd(), rename the
new DB_COMMAND function to db_stack_trace to avoid conflicts on
arm.
While here, have db_stack_trace parse its own arguments so that
we can use a more natural radix for IDs. If the ID is not a thread
ID, or more precisely when no thread exists with the ID, try if
there's a process with that ID and return the first thread in it.
This makes it easier to print stack traces from the ps output.

requested by: rwatson@
tested on: amd64, i386, ia64


132422 20-Jul-2004 davidxu

Make end of frames for KSE thread, for system scope thread, without this
change, debugger will dump a weird stack backtrace.


132408 19-Jul-2004 jhb

As a temporary hack, turn off deferred preemptions that are the result of
a fast interrupt handler doing an swi_sched(). This fixed the lockups I
saw on my laptop when using xmms in KDE and on rwatson's MySQL benchmarks
on SMP. This will eventually be removed and/or modified when I figure out
what the root cause is and fix that.


132376 19-Jul-2004 silby

Add a #error requiring KDB if DDB is specified. (This can probably be
relocated to a better place, if one exists.)


132365 18-Jul-2004 alc

Utilize pmap_pte_quick() rather than pmap_pte() in pmap_protect(). The
reason being that pmap_pte_quick() requires the page queues lock, which is
already held, rather than Giant.


132318 17-Jul-2004 alc

Remedy my omission of one change in the prevision revision: pmap_remove()
must pin the current thread in order to call pmap_pte_quick().


132316 17-Jul-2004 alc

- Utilize pmap_pte_quick() rather than pmap_pte() in pmap_remove() and
pmap_remove_page(). The reason being that pmap_pte_quick() requires
the page queues lock, which is already held, rather than Giant.
- Assert that the page queues lock is held in pmap_remove_page() and
pmap_remove_pte().


132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


132213 15-Jul-2004 jhb

Fix a typo in a comment.


132156 14-Jul-2004 jhb

Correct bounds check in lapic_create().

Submitted by: "Ted Unangst" tedu at coverity.com


132088 13-Jul-2004 davidxu

Add ptrace_clear_single_step(), alpha already has it for years, the function
will be used by ptrace to clear a thread's single step state.


132082 13-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_remove_pages(). (The implementation of pmap_remove_pages() is
optional. If pmap_remove_pages() is unimplemented, the acquisition and
release of the page queues lock is unnecessary.)

Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().


131960 11-Jul-2004 marcel

Remove the now unused GDB stubs. See src/sys/gdb/* for the new KDB
backend.


131952 10-Jul-2004 marcel

Mega update for the KDB framework: turn DDB into a KDB backend.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.

The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.

With this change, ia64 has support for breakpoints.


131944 10-Jul-2004 marcel

Update for the KDB framework:
o Make debugging code conditional upon KDB instead of DDB.
o Declare ksym_start and ksym_end as extern and initialize them.
This was previously and bogusly handled by DDB itself.
o Call kdb_enter() instead of Debugger().
o Remove implementation of Debugger().


131936 10-Jul-2004 marcel

Update for the KDB framework:
o s/ddb_on_nmi/kdb_on_nmi/g
o Rename sysctl machdep.ddb_on_nmi to machdep.kdb_on_nmi
o Make debugging support conditional upon KDB instead of DDB.
o Call kdb_reenter() when kdb_active is non-zero.
o Call kdb_trap() to enter the debugger when not already active.
o Update comments accordingly.
o Remove misplaced prototype of kdb_trap().


131905 10-Jul-2004 marcel

Implement makectx(). The makectx() function is used by KDB to create
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.


131899 10-Jul-2004 marcel

Introduce the GDB debugger backend for the new KDB framework. The
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.


131840 08-Jul-2004 brian

Change the following environment variables to kernel options:

bootp -> BOOTP
bootp.nfsroot -> BOOTP_NFSROOT
bootp.nfsv3 -> BOOTP_NFSV3
bootp.compat -> BOOTP_COMPAT
bootp.wired_to -> BOOTP_WIRED_TO

- i.e. back out the previous commit. It's already possible to
pxeboot(8) with a GENERIC kernel.

Pointed out by: dwmalone


131814 08-Jul-2004 brian

Change the following kernel options to environment variables:

BOOTP -> bootp
BOOTP_NFSROOT -> bootp.nfsroot
BOOTP_NFSV3 -> bootp.nfsv3
BOOTP_COMPAT -> bootp.compat
BOOTP_WIRED_TO -> bootp.wired_to

This lets you PXE boot with a GENERIC kernel by putting this sort of thing
in loader.conf:

bootp="YES"
bootp.nfsroot="YES"
bootp.nfsv3="YES"
bootp.wired_to="bge1"

or even setting the variables manually from the OK prompt.


131755 07-Jul-2004 phk

Fix an oversight in r1.26: remove #ifdef around necessary variable.

Spotted by: philip


131744 07-Jul-2004 alc

Simplify the control flow in pmap_extract(), enabling the elimination of a
PMAP_UNLOCK() call.


131730 07-Jul-2004 alc

White space and style changes only.


131666 06-Jul-2004 alc

Style changes to pmap_extract().


131534 03-Jul-2004 imp

Don't define __RMAN_RESOURCE_VISISBLE. They aren't needed here after
I've converted the direct accessing of struct resource members to the
preferred interface.


131529 03-Jul-2004 scottl

Commit the first of half of changes that allow busdma to transparently
honor the alignment and boundary constraints in the dma tag when loading
buffers. Previously, these constraints were only honored when allocating
memory via bus_dmamem_alloc(). Now, bus_dmamap_load() will automatically
use bounce buffers when needed.

Also add a set of sysctls to monitor the global busdma stats. These are:

hw.busdma.free_bpages
hw.busdma.reserved_bpages
hw.busdma.active_bpages
hw.busdma.total_bpages
hw.busdma.total_bounced
hw.busdma.total_deferred


131481 02-Jul-2004 jhb

Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by: scottl (with his re@ hat)


131398 01-Jul-2004 jhb

Trim a few things from the dmesg output and stick them under bootverbose to
cut down on the clutter including PCI interrupt routing, MTRR, pcibios,
etc.

Discussed with: USENIX Cabal


131344 30-Jun-2004 imp

Hide struct resource and struct rman. You must define
__RMAN_RESOURCE_VISIBLE to see inside these now.

Reviewed by: dfr, njl (not njr)


131301 30-Jun-2004 peter

Fix leftover argument to pmap_unuse_pt(). I committed the wrong diff.

Submmitted by: Jon Noack <noackjr@alumni.rice.edu>


131272 29-Jun-2004 peter

Reduce the size of pv entries by 15%. This saves 1MB of KVA for mapping
pv entries per 1GB of user virtual memory. (eg: if we had 1GB file was
mmaped into 30 processes, that would theoretically reduce the KVA demand by
30MB for pv entries. In reality though, we limit pv entries so we don't
have that many at once.)

We used to store the vm_page_t for the page table page. But we recently
had the pa of the ptp, or can calculate it fairly quickly. If we wanted
to avoid the shift/mask operation in pmap_pde(), we could recover the
pa but that means we have to store it for a while.

This does not measurably change performance.

Suggested by: alc
Tested by: alc


131225 28-Jun-2004 imp

bde points out that this can't do anything useful. The full patch has
other parts that I can't locat at the moment, so back it out until I can.


131220 28-Jun-2004 imp

When opening /dev/io, preserve iopl properly. Otherwise, if you open
/dev/io multiple times, the first close remove the privs.


131150 26-Jun-2004 alc

In case pmap_extract_and_hold() is ever performed on a different pmap than
the current one, we need to pin the current thread to its CPU.

Submitted by: tegge@


130984 23-Jun-2004 jhb

Finally implement bus_config_intr() support for I/O APIC interrupt sources.
This should fix problems with older SMP systems that only have ISA/EISA
IRQs when routing virgin PCI interrupts as well as on other boxes whose
MADT does not have any interrupt override entries for ISA IRQs that are
used to route PCI interrupts even in APIC mode.


130983 23-Jun-2004 jhb

Fetch the actual acpi0 device_t and use device_is_attached() to see if
it's alive rather than trying to fetch its softc pointer via its devclass.

Glanced at by: imp, njl


130980 23-Jun-2004 jhb

Various cleanups in support of a future ioapic_config_intr() function:
- Allow ioapic_set_{nmi,smi,extint}() to be called multiple times on the
same pin so long as the pin's mode is the same as the mode being
requested.
- Add a notion of bus type for the interrupt associated with interrupt pin.
This is needed so that we can force all EISA interrupts to be active high
in the forthcoming ioapic_config_intr().
- Fix a bug for EISA systems that didn't remap IRQs. This would have broken
EISA systems that tried to disable mixed mode for IRQ 0.


130963 23-Jun-2004 bms

MFamd64: Document the machdep.hlt_cpus sysctl MIB variable.

PR: i386/65729
Submitted by: Craig Rodrigues


130932 22-Jun-2004 alc

Implement the protection check required by the pmap_extract_and_hold()
specification. This enables the elimination of Giant from that function.

Reviewed by: tegge@


130814 20-Jun-2004 alc

- Simplify pmap_remove_pages(), eliminating unnecessary indirection.
- Simplify the locking of pmap_is_modified() by converting control flow to
data flow.


130765 20-Jun-2004 alc

Add pmap locking to pmap_is_prefaultable().


130738 19-Jun-2004 alc

Remove unused pt_entry_ts. Remove an unneeded semicolon.


130722 19-Jun-2004 bde

Removed foot-shooting setting of CR0_TS in exec_setregs(). It is
unnecessary because cpu_setregs() and/or npxinit() always sets CR0_TS
during system initialization, and CR0_TS is set in the next statement
(fpstate_drop()) if necessary after system initialization. Setting
it unnecessarily was less than a pessimization since it broke the
invariant that the npx can be used without an npxdna() trap if
fpucurthread is non-null. The broken invariant became harmful when I
added an fnclex to npxdrop().

Removed setting of CR0_MP in exec_setregs(). This was similarly
unnecessary but was harmless.

Updated comments (mainly by removing them). Things are simpler now
that we have cpu_setregs() and don't support a math emulator or pretend
to support not having either a math emulator or an npx.

Removed the ifdef for avoiding setting CR0_NE in the !SMP case in
cpu_setregs(). npx_probe() should reverse the setting if it wants to
force IRQ13 exception handling for testing.


130641 17-Jun-2004 njl

Revert last change. If acpi is loaded or compiled into the kernel, its
devclass will be present even if the driver was disabled by a hint. Using
device_get_softc() provides the right info even if it's overkill.

Explained by: jhb


130626 17-Jun-2004 alc

Do not preset PG_BUSY on VM_ALLOC_NOOBJ pages. Such pages are not
accessible through an object. Thus, PG_BUSY serves no purpose.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130573 16-Jun-2004 alc

MFamd64
Introduce pmap locking to many of the pmap functions.


130560 16-Jun-2004 alc

MFamd64
Remove dead or unneeded code, e.g., spl calls.


130539 15-Jun-2004 alc

Remove a stale comment.


130510 15-Jun-2004 njl

We only need the devclass_find() result, not the softc.


130433 13-Jun-2004 alc

Prevent the loss of a PG_M bit through an SMP race in pmap_ts_referenced().


130386 12-Jun-2004 alc

In a multiprocessor, the PG_W bit in the pte must be changed atomically.
Otherwise, the setting of the PG_M bit by one processor could be lost if
another processor is simultaneously changing the PG_W bit.

Reviewed by: tegge@


130344 11-Jun-2004 phk

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


130313 10-Jun-2004 jhb

- Use the correct devclass name ("acpi" vs "ACPI") to detect if acpi0 is
present and thus that the PnPBIOS probe should be skipped instead of
having ACPI zero out the PnPBIOStable pointer.
- Make the PnPBIOStable pointer static to i386/i386/bios.c now that that is
the only place it is used.


130312 10-Jun-2004 jhb

Remove atdevbase and replace it's remaining uses with direct references to
KERNBASE instead.


130164 06-Jun-2004 phk

Remove filename+line number from panic messages.


130041 03-Jun-2004 phk

Automatically recognize the WRAP.1C and Soekris 4801 platforms and configure
LEDS accordingly.


130040 03-Jun-2004 phk

Add new bios_string() which will hunt for a string inside a given range
of the BIOS. This can be used for finding arbitrary magic in the BIOS
in order to recognize particular platforms.


130036 03-Jun-2004 phk

The NatSemi (now AMD) Geode SC1100 needs special treatment here and there
because it is an embedded gadget. Give it it's own value for the "cpu"
variable and add code to reset it lacking a keyboard controller.


130028 03-Jun-2004 tjr

Remove checks for curthread == NULL - it can't happen.


130023 03-Jun-2004 tjr

Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by: jhb


129989 02-Jun-2004 tjr

Move TDF_SA from td_flags to td_pflags (and rename it accordingly)
so that it is no longer necessary to hold sched_lock while
manipulating it.

Reviewed by: davidxu


129964 01-Jun-2004 jhb

- Add a function ioapic_program_intpin() that completely programs an I/O
APIC interrupt pin based on the settings in the corresponding interrupt
source structure.
- Use ioapic_program_intpin() in place of manual frobbing of the intpin
configuration in ioapic_program_destination() and ioapic_register().
- Use ioapic_program_intpin() to implement suspend/resume support for I/O
APICs.


129961 01-Jun-2004 jhb

Fix legacy_add_child() to properly handle the case where
device_add_child_ordered() fails (due to a duplicate device add for
example) and properly cleanup and return NULL.


129910 01-Jun-2004 njl

Remove debugging printf that never triggered because acpi is the first
user of nexus::bus_get_resource.


129906 31-May-2004 bmilekic

Bring in mbuma to replace mballoc.

mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)


129882 30-May-2004 phk

Add missing #include <sys/module.h>


129876 30-May-2004 phk

Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>


129872 30-May-2004 phk

struct cpu_nameclass is a private to identcpu.c, move it there.


129817 28-May-2004 alc

Remove a broken micro-optimization from pmap_enter(). The ill effect
of this micro-optimization occurs when we call pmap_enter() to wire an
already mapped page. Because of the micro-optimization, we fail to
mark the PTE as wired. Later, on teardown of the address space,
pmap_remove_pages() destroys the PTE before vm_fault_unwire() has
unwired the page. (pmap_remove_pages() is not supposed to destroy
wired PTEs. They are destroyed by a later call to pmap_remove().)
Thus, the page becomes lost.

Note: The page is not lost if the application called munlock(2), only
if it relies on teardown of the address space to unwire its pages.

For the historically inclined, this bug was introduced by a
megacommit, revision 1.182, roughly six years ago.

Leak observed by: green@ and dillon independently
Patch submitted by: dillon at backplane dot com
Reviewed by: tegge@
MFC after: 1 week


129816 28-May-2004 jhb

Reenable ithread preemption for interrupts that occur while executing in
the kernel. I accidentally broke this with the new interrupt code that
came in prior to 5.2.

Submitted by: bde


129750 26-May-2004 tmm

Retire cpu_sched_exit(); it is not used any more.


129742 26-May-2004 bde

MFamd64:

Fixed profiling of trap, syscall and interrupt handlers and some
ordinary functions, essentially by backing out half of rev.1.106 of
i386/exception.s. The handlers must be between certain labels for
the purposes of profiling, and this was broken by scattering them in
separately compiled .s files, especially for ordinary functions that
ended up between the labels. Merge the files by #including them as
before, except with different pathnames and better comments and
organization. Changes to the scattered files are minimal -- just
move the labels to the file that does the #includes.

This also partly fixes profiling of IPIs -- all IPI handlers are now
correctly classified as interrupt handlers, but many are still missing
mcount calls.

vm86bios.s is included as before, but it is now between the labels for
interrupt handlers again, which seems to be wrong since half of it is
for a non-interrupt handler.


129663 24-May-2004 jhb

Revert part of rev 1.230 and assume that all EISA IRQs use active high
polarity rather than assuming that level triggered IRQs use active low and
edge triggered IRQs use active high. Both the MultiProcessor 1.4
and ACPI 2.0 Specifications state in their examples that level triggered
EISA IRQs are active low, but in practice they seem to be active high.

Reported by: Nik Azim Azam nskyline_r35 at yahoo dot com


129624 23-May-2004 bde

MFamd64 (1.117: made the FAKE_MCOUNT() in doreti work non-accidentally,
and removed buggy unnecessary FAKE_MCOUNT() in calltrap).


129620 23-May-2004 bde

MFamd64 (put TF_EIP in assym.s and use it instead of a magic offset in
FAKE_MCOUNT()s).


129615 23-May-2004 bde

MFamd64 (1.111: fixed missing call to .mexitcount in lgdt()).


129550 21-May-2004 bde

Updated and reorganized the comments for the fetch and store families of
functions.


129549 21-May-2004 bde

Fixed high resoultion profiling of fuword32() and suword32(). Use the
standard macro ALTENTRY() instead of a home made incomplete version
of it.


129282 16-May-2004 peter

Make a small revision to the api between the elf linker core and the
elf_reloc() backends for two reasons. First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).


129129 11-May-2004 jhb

- Remove a spurious blank line.
- Add a missing static keyword.


129097 10-May-2004 jhb

Rework the APIC mixed mode support a bit:
- Require the APIC enumerators to explicitly enable mixed mode by calling
ioapic_enable_mixed_mode(). Calling this function tells the apic driver
that the PC-AT 8259A PICs are present and routable through the first I/O
APIC via an ExtINT pin. The mptable enumerator always calls this
function for now. The MADT enumerator only enables mixed mode if the
PC-AT compatability flag is set in the MADT header.
- Allow mixed mode to be enabled or disabled via a 'hw.apic.mixed_mode'
tunable. By default this tunable is set to 1 (true). The kernel option
NO_MIXED_MODE changes the default to 0 to preserve existing behavior, but
adding 'hw.apic.mixed_mode=0' to loader.conf achieves the same effect.
- Only use mixed mode to route IRQ 0 if it is both enabled by the APIC
enumerator and activated by the loader tunable. Note that both
conditions must be true, so if the APIC enumerator does not enable mixed
mode, then you can't set the tunable to try to override the enumerator.


129012 06-May-2004 njl

Move the CPU newbus attachment to i386 legacy. The acpi_cpu device will
become just "cpu" and provide attachments in the !legacy case.

Tested by: des


129008 06-May-2004 nyan

Disable an EISA support on PC98.


128958 05-May-2004 mux

Unbreak kernel build in the !apic case. We moved to using enums for
setting the polarity and the trigger mode of interrupts.

Tested by: Mark Santcroos, Russell Jackson, Christian Hiris


128931 04-May-2004 jhb

- Add a new pic method pic_config_intr() to set the trigger mode and
polarity for a specified IRQ. The intr_config_intr() function wraps
this pic method hiding the IRQ to interrupt source lookup.
- Add a config_intr() method to the atpic(4) driver that reconfigures
the interrupt using the ELCR if possible and returns an error otherwise.
- Add a config_intr() method to the apic(4) driver that just logs any
requests that would change the existing programming under bootverbose.
Currently, the only changes the apic(4) driver receives are due to bugs
in the acpi(4) driver and its handling of link devices, hence the reason
for such requests currently being ignored.
- Have the nexus(4) driver on i386 implement the bus_config_intr() function
by calling intr_config_intr().


128930 04-May-2004 jhb

- Change the APIC code to mostly use the recently added intr_trigger
and intr_polarity enums for passing around interrupt trigger modes and
polarity rather than using the magic numbers 0 for level/low and 1 for
edge/high.
- Convert the mptable parsing code to use the new ELCR wrapper code rather
than reading the ELCR directly. Also, use the ELCR settings to control
both the trigger and polarity of EISA IRQs instead of just the trigger
mode.
- Rework the MADT's handling of the ACPI SCI again:
- If no override entry for the SCI exists at all, use level/low trigger
instead of the default edge/high used for ISA IRQs.
- For the ACPI SCI, use level/low values for conforming trigger and
polarity rather than the edge/high values we use for all other ISA
IRQs.
- Rework the tunables available to override the MADT. The
hw.acpi.force_sci_lo tunable is no longer supported. Instead, there
are now two tunables that can independently override the trigger mode
and/or polarity of the SCI. The hw.acpi.sci.trigger tunable can be
set to either "edge" or "level", and the hw.acpi.sci.polarity tunable
can be set to either "high" or "low". To simulate hw.acpi.force_sci_lo,
set hw.acpi.sci.trigger to "level" and hw.acpi.sci.polarity to "low".
If you are having problems with ACPI either causing an interrupt storm
or not working at all (e.g., the power button doesn't turn invoke a
shutdown -p now), you can try tweaking these two tunables to find the
combination that works.


128873 03-May-2004 jhb

Use a private attach method for the MP Table host-PCI bridge driver rather
than using legacy_pcib_attach(). The MP Table drivers don't use the $PIR,
and the legacy_pcib_attach() function probes and parses the $PIR in
addition to adding the pci bus child device.


128677 27-Apr-2004 phk

Make sure we are all set up before creating the LED instance.


128328 16-Apr-2004 jhb

Use %eax rather than %ax when loading segment registers to avoid partial
register stalls.

Reviewed by: bde (a while ago, and I think an earlier version)


128100 11-Apr-2004 alc

- is_physical_memory()'s parameter, which is a physical address, should be
a vm_paddr_t not a vm_offset_t.


128098 10-Apr-2004 alc

- pmap_kenter_temporary()'s first parameter, which is a physical address,
should be declared as vm_paddr_t not vm_offset_t.


128063 09-Apr-2004 markm

I hate noticing bugs after committing. :-(
ALWAYS set up the CPU base identity string. THEN optionally
add features.


128056 09-Apr-2004 markm

Add extra output to show when VIA C3 Nehemiah CPUs have hardware
Random Number Generator (RNG) and/or Advanced Cryptography Engine
(ACE).


128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127889 05-Apr-2004 dfr

Print cpu features for crusoe processors.


127875 05-Apr-2004 alc

Remove avail_start on those platforms that no longer use it. (Only amd64
does anything with it beyond simple initialization.)


127869 05-Apr-2004 alc

Remove unused arguments from pmap_init().


127813 03-Apr-2004 marcel

Move the definition of rss() from db_interface.c to cpufunc.h where
it belongs. Change the implementation to match those of rfs() and
rgs() for consistency and irrespective of whether the original was
more correct or not (technically speaking).


127801 03-Apr-2004 phk

Unbreak LED support on Elan cpus.


127788 03-Apr-2004 alc

In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not. Add a new parameter so that the caller can specify which is
the case.

Reported by: dillon


127582 29-Mar-2004 peter

Finish tidying up a couple of leftovers from the KSTACK_PAGES stuff. Some
files still #included the opt_ file. powerpc hadn't been updated yet.


127283 22-Mar-2004 wpaul

The kthread_create() API is supposed to allow you to create threads
with more than the normal amount of stack pages, however the stack
pointer always wound up being initialized using KSTACK_PAGES. It
should be using td->td_kstack_pages instead. This means that although
the vm subsystem would give you all the stack pages you asked for,
%esp would always be initialized as if you had just 2 pages, and
the rest would go to waste.

I wanted to use the 'give me more stack pages' feature of kthread_create()
because the Intel 2200BG NDIS driver does an alloca() of about 5000 bytes,
which wrecks the stack with the default 2 page size, and I was baffled
that no matter how much code I shoved into thread contexts with
allegedly larger stacks, the thing would still crash unless I changed
KSTACK_PAGES.

Note: this bug is present in _ALL_ arches at this point. Peter has
promised to merge this fix into all of them.


127282 21-Mar-2004 alc

Add an implementation of uiomove_fromphys() for i386. This implementation
uses sf_buf_alloc() and sf_buf_free() to create and destroy the necessary
ephemeral mappings.


127086 16-Mar-2004 alc

Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.


127039 15-Mar-2004 phk

The PPS code needs to be much more brutal to avoid synchronism on
hardware with non-sucky clocks.


126942 14-Mar-2004 alc

Simplify sf_buf_alloc().


126919 13-Mar-2004 scottl

Now that contigfree() does not require Giant, don't grab it in busdma.


126891 12-Mar-2004 trhodes

These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
- in_cksum.[ch]:
* use a generic C version instead of the assembly version in the !gcc
case (ASM code breaks with the optimizations icc does)
-> no bad checksums with an icc compiled kernel
Help from: andre, grehan, das
Stolen from: alpha version via ppc version
The entire checksum code should IMHO be replaced with the DragonFly
version (because it isn't guaranteed future revisions of gcc will
include similar optimizations) as in:
---snip---
Revision Changes Path
1.12 +1 -0 src/sys/conf/files.i386
1.4 +142 -558 src/sys/i386/i386/in_cksum.c
1.5 +33 -69 src/sys/i386/include/in_cksum.h
1.5 +2 -0 src/sys/netinet/igmp.c
1.6 +0 -1 src/sys/netinet/in.h
1.6 +2 -0 src/sys/netinet/ip_icmp.c

1.4 +3 -4 src/contrib/ipfilter/ip_compat.h
1.3 +1 -2 src/sbin/natd/icmp.c
1.4 +0 -1 src/sbin/natd/natd.c
1.48 +1 -0 src/sys/conf/files
1.2 +0 -1 src/sys/conf/files.amd64
1.13 +0 -1 src/sys/conf/files.i386
1.5 +0 -1 src/sys/conf/files.pc98
1.7 +1 -1 src/sys/contrib/ipfilter/netinet/fil.c
1.10 +2 -3 src/sys/contrib/ipfilter/netinet/ip_compat.h
1.10 +1 -1 src/sys/contrib/ipfilter/netinet/ip_fil.c
1.7 +1 -1 src/sys/dev/netif/txp/if_txp.c
1.7 +1 -1 src/sys/net/ip_mroute/ip_mroute.c
1.7 +1 -2 src/sys/net/ipfw/ip_fw2.c
1.6 +1 -2 src/sys/netinet/igmp.c
1.4 +158 -116 src/sys/netinet/in_cksum.c
1.6 +1 -1 src/sys/netinet/ip_gre.c
1.7 +1 -2 src/sys/netinet/ip_icmp.c
1.10 +1 -1 src/sys/netinet/ip_input.c
1.10 +1 -2 src/sys/netinet/ip_output.c
1.13 +1 -2 src/sys/netinet/tcp_input.c
1.9 +1 -2 src/sys/netinet/tcp_output.c
1.10 +1 -1 src/sys/netinet/tcp_subr.c
1.10 +1 -1 src/sys/netinet/tcp_syncache.c
1.9 +1 -2 src/sys/netinet/udp_usrreq.c

1.5 +1 -2 src/sys/netinet6/ipsec.c
1.5 +1 -2 src/sys/netproto/ipsec/ipsec.c
1.5 +1 -1 src/sys/netproto/ipsec/ipsec_input.c
1.4 +1 -2 src/sys/netproto/ipsec/ipsec_output.c

and finally remove
sys/i386/i386 in_cksum.c
sys/i386/include in_cksum.h
---snip---
- endian.h:
* DTRT in C++ mode
- quad.h:
* we don't use gcc v1 anymore, remove support for it
Suggested by: bde (long ago)
- assym.h:
* avoid zero-length arrays (remove dependency on a gcc specific
feature)
This change changes the contents of the object file, but as it's
only used to generate some values for a header, and the generator
knows how to handle this, there's no impact in the gcc case.
Explained by: bde
Submitted by: Marius Strobl <marius@alchemy.franken.de>
- aicasm.c:
* minor change to teach it about the way icc spells "-nostdinc"
Not approved by: gibbs (no reply to my mail)
- bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by: -arch
Submitted by: netchild


126828 11-Mar-2004 marcel

Remove stale or broken call to kdb_trap() and protected by the non-
option KDB. Besides being wrong, it also interferes with ongoing
work.


126793 10-Mar-2004 alc

- Make the acquisition of Giant in vm_fault_unwire() conditional on the
pmap. For the kernel pmap, Giant is not required. In general, for
other pmaps, Giant is required by i386's pmap_pte() implementation.
Specifically, the use of PMAP2/PADDR2 is synchronized by Giant.
Note: In principle, updates to the kernel pmap's wired count could be
lost without Giant. However, in practice, we never use the kernel
pmap's wired count. This will be resolved when pmap locking appears.
- With the above change, cpu_thread_clean() and uma_large_free() need
not acquire Giant. (The first case is simply the revival of
i386/i386/vm_machdep.c's revision 1.226 by peter.)


126785 09-Mar-2004 jb

Remove duplicate code.

Requested by: bde


126762 09-Mar-2004 jb

Add #ifdef CPU_SOEKRIS in the missing places around the led_* code
that is specific to those boards.

This allows this file to compile again with CPU_ELAN enabled, but not
CPU_SOEKRIS, for a Compulab board.


126761 09-Mar-2004 jb

AMD's ELAN documentation says that you write to the SYS_RST register
in the Memory Mapped Configuration Region (MMCR) to reset the CPU.
If CPU_ELAN is set, try this first to reset the CPU before the
traditional way.

Without this change, my Compulab board powers down on 'reset' instead
of rebooting.


126738 08-Mar-2004 peter

Add back Giant locks around kmem_free() call from user_ldt cleanup path
during exit. Apparently it isn't safe after all. See uma_large_free().

Pointed out by: alc


126736 08-Mar-2004 peter

Other parts of the tree do not protect calls to kmem_free() with Giant,
so remove it from here. The most notable examples include vm_mmap().
This removes one more Giant event from exit(2).


126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


126653 05-Mar-2004 bde

Include <machine/psl.h> for the definition of PSL_I instead of depending
on namespace pollution in <machine/cpufunc.h>.


126412 29-Feb-2004 maxim

o Typo: Ternal -> Thermal.


126387 28-Feb-2004 phk

Add support for the watchdog in Geode SC1100 which is used in embedded
systems like the Soekris NET4801


126370 28-Feb-2004 phk

Add a generic watchdog facility which through a single device entry
in /dev controls all available watchdog implementations.


126349 28-Feb-2004 phk

Add support for /dev/led/error on Soekris Net4801.


126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


126076 21-Feb-2004 phk

Device megapatch 1/6:

Free approx 86 major numbers with a mostly automatically generated patch.

A number of strategic drivers have been left behind by caution, and a few
because they still (ab)use their major number.


125487 05-Feb-2004 kan

Rename cn_unavailable to cnunavailable for little more consistency.
Garbage collect unused cndebug() function.

Suggested by: bde


125467 05-Feb-2004 kan

Eliminate global cons_unavailable flag and replace it by the status
bit maintained on a per-device basis. Single variable is inadequate
on machines running with multiple consoles enabled.


125405 03-Feb-2004 jhb

Revert the skipping of segment register reloads as it appears to actually
be a pessimization on non Pentium4 CPUs. More importantly, it is buggy as
it can cause GPF's when using APM or vm86.


125375 03-Feb-2004 bde

Removed bogus checks that (PCPU_GET(curpcb) != NULL). Rev.1.586 of
machdep.c fixed the missing early initialization of curpcb, so curpcb
is now always set together with curthread and it cannot be NULL except
before the IDT has been set up (so trap() is unreachable) or after a
memory error. In any case, it was often used without checking.

curcpb shouldn't exist anyway. It doesn't exist for most non-i386 arches.
It just caches curthread->td_pcb in a global. This was a better idea
before it was per-cpu. trap() and some other places can get at it more
efficiently using td->td_pcb instead of PCPU_GET(curpcb). The main
exception is support.s which mostly wants only curpcb->pcb_onfault.


125361 02-Feb-2004 jhb

Set PCPU_GET(curpcb) for the BSP to thread0's pcb. Otherwise, the boot CPU
doesn't have a pcb until after it's first context switch. This can cause
secondary panics if a page fault happens during bootup.


125317 02-Feb-2004 jeff

- Make sure the apic is idle before sending an IPI. This is required on
non-X-APIC machines. Previously this was only done in the
DETECT_DEADLOCK case when really it is needed in all cases.

Reminded by: jhb


125308 01-Feb-2004 alc

Eliminate all TLB shootdowns by pmap_pte_quick(): By temporarily pinning
the thread that calls pmap_pte_quick() and by virtue of the page queues
lock being held, we can manage PADDR1/PMAP1 as a CPU private mapping.

The most common effect of this change is to reduce the overhead of the page
daemon on multiprocessors.

In collaboration with: tegge


125276 31-Jan-2004 shiba

Compiled longrun.c when defined options CPU_ENABLE_LONGRUN,
and fixed wrong comparation in cpu vendor. Longrun function
was re-enabled.


125161 28-Jan-2004 jhb

Optimize the i386 interrupt entry code to not reload the segment registers
if they already contain the correct kernel selectors.

Reviewed by: peter
Suggested by: peter


125133 28-Jan-2004 rwatson

Remove process lock XXX's, fixed in src/sys/sys/proc.h:1.366.


125063 27-Jan-2004 rwatson

Stick two XXX's in the syscall() code: we call STOPEVENT() twice for
every system call, and that grabs and release the process lock each
time. Don't fix it (yet), but document it so we know to fix it.
Also should be a 5.3-RELEASE todo item.


124961 25-Jan-2004 sobomax

Move LongRun support out of identcpu.c, where it hardly belongs, into its
own file and make it opt-in, not mandatory, depending on CPU_ENABLE_LONGRUN
config(8) option.

Discussed with: nate
MFC after: 2 weeks


124956 25-Jan-2004 jeff

- Now that both schedulers support temporary cpu pinning use this rather
than the switchin functions to guarantee that we're operating with the
correct tlb entry.
- Remove the post copy/zero tlb invalidations. It is faster to invalidate
an entry that is known to exist and so it is faster to invalidate after
use. However, some architectures implement speculative page table
prefetching so we can not be guaranteed that the invalidated entry is still
invalid when we re-enter any of these functions. As a result of this we
must always invalidate before use to be safe.


124945 25-Jan-2004 jeff

- Don't define DETECT_DEADLOCK. I don't know that this code has detected
a deadlock in several years. Furthermore, the IPI code is currently
protected by a seperate spinlock. This only served to make IPIs twice as
expensive as they had to be which severely slowed down the IPI heavy ULE
scheduler.


124930 24-Jan-2004 sobomax

- Move performance-controlling sysctls into hw.p4tcc.* tree;

Suggested by: nate

- get rid of "magick" values in code and make sysctl's reflecting reality
on processor versions which have one or another frequency "forbidden"
due to errata.

MFC after: 2 weeks


124925 24-Jan-2004 jeff

- Move smp_topology to subr_smp.c so that it is defined on all architectures.


124732 19-Jan-2004 phk

Add linenumber and source filename to panic(9) output.

Ideally a traceback should be printed too, any takers ?


124684 18-Jan-2004 sobomax

Add new CPU_ENABLE_TCC option, from NOTES:

CPU_ENABLE_TCC enables Thermal Control Circuitry (TCC) found in some
Pentium(tm) 4 and (possibly) later CPUs. When enabled and detected,
TCC allows to restrict power consumption by using machdep.cpuperf*
sysctls. This operates independently of SpeedStep and is useful on
systems where other mechanisms such as apm(4) or acpi(4) don't work.

Given the fact that many, even modern, notebooks don't work properly
with Intel ACPI, this is indeed very useful option for notebook owners.

Obtained from: OpenBSD
MFC after: 2 weeks


124360 11-Jan-2004 alc

Include "opt_cpu.h" and related #ifdef's for SSE so that pagezero()
actually includes the call to sse2_pagezero().


124144 05-Jan-2004 phk

Add struct definition of the Elan MMCR registers (from jb@)

Put a CTASSERT() on the size of the struct.

Use the struct where it is easy to do so in elan_mmcr.c

Add the Elan specific hardware reset code (also from jb@).


124092 03-Jan-2004 davidxu

Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr


123955 29-Dec-2003 bde

Sorted includes.


123945 28-Dec-2003 alc

Don't bother clearing PG_ZERO on the page table page in _pmap_allocpte();
it serves no purpose.


123929 28-Dec-2003 silby

Track three new sendfile-related statistics:
- The number of times sendfile had to do disk I/O
- The number of times sfbuf allocation failed
- The number of times sfbuf allocation had to wait


123925 28-Dec-2003 alc

Don't bother clearing and setting PG_BUSY on page table directory pages.


123920 28-Dec-2003 silby

Move the declaration of sfbufspeak and sfbufsused to mbuf.h,
and use imax instead of max, as sfbufspeak and sfbufsused
are signed.

Submitted by: bde


123884 27-Dec-2003 silby

Track current and peak sfbuf usage, export the values via sysctl.


123742 23-Dec-2003 peter

Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.


123710 22-Dec-2003 alc

- Significantly reduce the number of preallocated pv entries in
pmap_init(). Such a large preallocation is unnecessary and wastes
nearly eight megabytes of kernel virtual address space per gigabyte
of managed physical memory.
- Increase UMA_BOOT_PAGES by two. This enables the removal of
pmap_pv_allocf(). (Note: this function was only used during
initialization, specifically, after pmap_init() but before
pmap_init2(). During pmap_init2(), a new allocator is installed.)


123650 18-Dec-2003 jhb

MFamd64: Remove i386_protection_init() and the protection_codes[] array
and replace them with a simple if test to turn on PG_RW. i386 != vax.


123432 11-Dec-2003 jeff

- Call mp_topology() after all CPUs have been probed.


123431 11-Dec-2003 jeff

- Add the mp_topology() function to mp_machdep.c. This function builds up
the smp_topology structure to reflect the layout of HTT enabled machines.
- Add a prototype for mp_topology() in smp.h


123403 10-Dec-2003 jhb

Use NAPICID for the maximum number of local APICs rather than MAXCPU when
doing the HTT fixup. This is a step closer to possibly having an apic.ko
module someday.


123402 10-Dec-2003 jhb

Correct usage of MAXCPU. The MAXCPU value itself is not a valid CPU ID
value as it is a count of maximum values.

Reported by: bde


123266 07-Dec-2003 alc

Don't remove the virtual-to-physical mapping when an sf_buf is freed.
Instead, allow the mapping to persist, but add the sf_buf to a free list.
If a later sendfile(2) or zero-copy send resends the same physical page,
perhaps with the same or different contents, then the mapping overhead is
avoided and the sf_buf is simply removed from the free list.

In other words, the i386 sf_buf implementation now behaves as a cache of
virtual-to-physical translations using an LRU replacement policy on
inactive sf_bufs. This is similar in concept to a part of
http://www.cs.princeton.edu/~yruan/debox/ patch, but much simpler in
implementation. Note: none of this is required on alpha, amd64, or ia64.
They now use their direct virtual-to-physical mapping to avoid any
emphemeral mapping overheads in their sf_buf implementations.


123135 03-Dec-2003 jhb

- Remove the hack to prevent the acpi module from loading.
- Add a really, really, nasty hack to provide stub versions of all of
the 'device apic' functions used by the ACPI MADT APIC enumerator if
'device apic' is not compiled into the kernel. This is gross but is
the best we can do with the current kernel linker implementation.

Approved by: re (scottl / blanket)


123133 03-Dec-2003 jhb

- Reorder the APIC enumerator SYSINIT's to register enumeators at
SI_SUB_CPU - 1 and probe enumerators, probe CPUs, and setup the local
APIC programming all at SI_SUB_CPU / SI_ORDER_FIRST. This is needed to
help get the ACPI module working again as it moves the APIC enumeration
code after SI_SUB_KLD.
- In the MADT parser, use mp_maxid rather than MAXCPU to terminate a loop
when assigning per-cpu ACPI IDs to avoid a dependency on 'options SMP'.
- Allow the apic device to be disabled via 'hint.apic.0.disabled' from the
loader. Note that since this is done in the local APIC code, it works
for both the ACPI and non-ACPI cases.

Approved by: re (scott / blanket)


123126 03-Dec-2003 jhb

Fix all users of mp_maxid to use the same semantics, namely:

1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by: re (scottl)
Tested on: i386, amd64, alpha


123106 02-Dec-2003 bde

Fixed panics in apic interrupt handlers if kernel profiling is turned
on. MCOUNT and FAKE_MCOUNT() may clobber all the call-used registers,
and one FAKE_MCOUNT() was placed so that an active %eax was clobbered.
The fix is to move this FAKE_MCOUNT() earlier where it should have
been anyway.

Fixed 3 layers of bitrot in the comment about why this FAKE_MCOUNT()
was where it was by removing the comment. (mcount() should be called
as early as possible after entering a new level, but an implementation
detail got in the way until 3 layers of changes ago.)

Kernel profiling still gives wrong results because the new interrupt
code rearranged object files too much. mcount() depends on trap,
syscall and interrupt handlers being between certain magic labels with
interrupt handlers last, and on nothing else being there. Splitting
up exception.o moved the magic labels to effectively random places
relative to what they are supposed to delimit. This mainly broke the
call graph; the flat profile is still usable.


123015 27-Nov-2003 phk

Refactor AMD Elan 520 CPU support.

Make it possible to configure GPIO pins as led(4) devices, PPS inputs
and PPS-echo outputs with a sysctl. Led(4) and PPS-echo can be configured
for active-high or active-low.

Be more complete in initialization of timecounter hardware.

Approved by: re@


122947 21-Nov-2003 jhb

- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.

Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)


122931 20-Nov-2003 peter

MFamd64: use a less compiler-intensive MD implementation of 'curthread'
so that the compiler doesn't have to do so much work.

Approved by: re (jhb)


122860 17-Nov-2003 alc

- Change the i386's sf_buf implementation so that it never allocates
more than one sf_buf for one vm_page. To accomplish this, we add
a global hash table mapping vm_pages to sf_bufs and a reference
count to each sf_buf. (This is similar to the patches for RELENG_4
at http://www.cs.princeton.edu/~yruan/debox/.)

For the uninitiated, an sf_buf is nothing more than a kernel virtual
address that is used for temporary virtual-to-physical mappings by
sendfile(2) and zero-copy sockets. As such, there is no reason for
one vm_page to have several sf_bufs mapping it. In fact, using more
than one sf_buf for a single vm_page increases the likelihood that
sendfile(2) blocks, hurting throughput.
(See http://www.cs.princeton.edu/~yruan/debox/.)


122841 17-Nov-2003 peter

Widen the enable/disable helper function's argument in line with the
ithread_create() changes etc. This should be mostly a NOP.


122821 16-Nov-2003 alc

- Remove unnecessary synchronization from sf_buf_init(). (There is only
one active CPU when sf_buf_init() is performed.)


122780 16-Nov-2003 alc

- Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.


122771 16-Nov-2003 bde

Localized the cy driver's locking.


122713 14-Nov-2003 peter

Minor source sync with amd64. Use int as the type for the width
field of %.*s rather than size_t.


122700 14-Nov-2003 jhb

If an interrupt source doesn't have an ithread, treat it as a stray
interrupt. This can only happen if an unregistered interrupt source
triggers an interrupt.


122697 14-Nov-2003 peter

basemem is in K, not bytes. I think I tricked jhb into making the same
mistake I did and then committing it to cvs.


122690 14-Nov-2003 jhb

Shuffle the APIC interrupt vectors around a bit:
- Move the IPI and local APIC interrupt vectors up into the 0xf0 - 0xff
range. The pmap lazyfix IPI was reordered down next to the TLB
shootdowns to avoid conflicting with the spurious interrupt vector.
- Move the base of APIC interrupts up 16 so that the first 16 APIC
interrupts do not overlap the vectors used by the ATPIC.
- Remove bogus interrupt vector reservations for LINT[01].
- Now that 0xc0 - 0xef are available, use them for device interrupts.
This increases the number of APIC device interrupts to 191.
- Increase the system-wide number of global interrupts to 191 to catch up
to more APIC interrupts.

Requested by: peter (2)


122688 14-Nov-2003 jhb

Fix a typo. We need opt_acpi.h not opt_apic.h for DEV_ACPI.


122620 13-Nov-2003 jhb

Whitespace.


122573 12-Nov-2003 jhb

Garbage collect unused values.


122572 12-Nov-2003 jhb

- Move manipulation of td_intr_nesting_level out of assembly interrupt
vector stubs and into the C functions they call.
- Move disabling and EOIing of interrupt sources out of PIC driver entry
points and into intr_execute_handlers(). Intr_execute_handlers() only
disables a source for an interrupt if it is a stray interrupt or has
threaded handlers. Sources with fast handlers no longer disable (mask)
the source while executing the handlers.
- Move the setting of clkintr_pending into intr_execute_handlers() and set
the variable for any interrupt source with a vector of 0. (Should only
be true for IRQ 0.) This fixes clkintr_pending in the NO_MIXED_MODE
case.
- Implement lapic_eoi() and use it to implement ioapic_eoi_source().
- Rename atpic_sched_ithd() to atpic_handle_intr() since it is used to
handle all atpic interrupts and not just threaded ones.

Inspired by: peter's changes to amd64 in p4 (1)
Requested by: bde (2)


122511 11-Nov-2003 jhb

Don't probe busses in the MP Table for the MP Table PCI bridge drivers
if the bus number doesn't correspond to a PCI bus in the MP Table.

Reported by: jhay


122491 11-Nov-2003 jhb

Enable HTT CPUs by default instead of halting them by default. Users
should now only have HTT CPUs if they have explicitly asked for them
either by enabling HyperThreading in the BIOS or by using the
MPTABLE_FORCE_HTT kernel option.


122490 11-Nov-2003 jhb

Disable probing of HTT CPUs by default for the MP Table case. HTT CPUs
should only be used if they are enabled in the BIOS. Now that we support
enumerating CPUs using the ACPI MADT, any HTT machine using ACPI should
respect the BIOS setting. For HTT machines with ACPI disabled in the
kernel, the MPTABLE_FORCE_HTT kernel option can be used to try to probe HTT
CPUs like have done in the past for the MP Table case. This option should
only be enabled if HTT is enabled in the BIOS.


122438 10-Nov-2003 jhb

MFamd64 (via P4, not in CVS yet):
- Use the static boot_address variable directly rather than passing it
around to several functions.
- Clean up a couple of magic numbers.


122434 10-Nov-2003 jhb

Bump APIC ID limits up to 32 since a machine with 16 CPUs will have APIC
IDs for the I/O APICs that are greater than 16.

Reported by: John Cagle <john.cagle@hp.com>


122425 10-Nov-2003 jhb

Update a comment.

Requested by: bde


122364 09-Nov-2003 marcel

Change the clear_ret argument of get_mcontext() to be a flags argument.
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.

This change is triggered by a need on ia64 to have another knob for
get_mcontext().


122284 08-Nov-2003 alc

- Similar to post-PAE RELENG_4 split pmap_pte_quick() into two cases,
pmap_pte() and pmap_pte_quick(). The distinction being based upon the
locks that are held by the caller. When the given pmap is not the
current pmap, pmap_pte() should be used when Giant is held and
pmap_pte_quick() should be used when the vm page queues lock is held.
- When assigning to PMAP1 or PMAP2, include PG_A anf PG_M.
- Reenable the inlining of pmap_is_current().

In collaboration with: tegge


122268 07-Nov-2003 jhb

Dump the trigger and polarity of each intpin's default setting in the
bootverbose output.


122156 06-Nov-2003 peter

OK, this might be a bit silly, but add another popcnt() candidate.


122150 05-Nov-2003 jhb

Instead of marking all 159 interrupts as available in the IRQ resource
manager, only add interrupts that have an associated source in the
interrupt table to the resource manager.


122149 05-Nov-2003 jhb

When remapping an ISA interrupt from one intpin to another, disable the
pin that is used by the default identity mapping if it still maps to the
old vector. The ACPI case might need some tweaking for the SCI interrupt
case since ACPI likes to address the intpin using both the IRQ remapped to
it as well as the previous existing PCI IRQ mapped to it.

Reported by: kan


122148 05-Nov-2003 jhb

Two style nits.


122124 05-Nov-2003 jhb

- Adjust some of the bitfields in the ioapic_intsrc struct to be unsigned
rather than signed. This fixes some cosmetics such as verbose printf's
for IRQs greater than 127.
- The calculation for next_ioapic_base was also adjusted so that it will
only complain once for each hole in the IRQs provided by ACPI for IO
APICs.

Reported by: Michal Mertl <mime@traveller.cz>


122123 05-Nov-2003 jhb

Add a workaround for MP Tables that list the same PCI IRQ twice with
the same APIC / pin destination in both cases.

Reported by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>


122064 04-Nov-2003 jhb

Tweak the version string output for ioapic devices.


122010 03-Nov-2003 jhb

Remove remaining bits of old interrupt and APIC code.


122000 03-Nov-2003 jhb

Update includes for new interrupt code.


121998 03-Nov-2003 jhb

- Always allocate the maximum size for the IRQ resource manager. Ideally
we would manage this better by having the interrupt code add each
interrupt vector to the resource map when each source is registered.
- Use the new interrupt code API for registering and tearing down interrupt
handlers.


121997 03-Nov-2003 jhb

Catch up to i386 interrupt and SMP code changes.


121996 03-Nov-2003 jhb

New i386 SMP code:

- The MP code no longer knows anything specific about an MP Table.
Instead, the local APIC code adds CPUs via the cpu_add() function when
a local APIC is enumerated by an APIC enumerator.
- Don't divide the argument to mp_bootaddress() by 1024 just so that we
can turn around and mulitply it by 1024 again.
- We no longer panic if SMP is enabled but we are booted on a UP machine.
- init_secondary(), the asm code between init_secondary() and ap_init()
in mpboot.s and ap_init() have all been merged together in C into
init_secondary().
- We now use the cpuid feature bits to determine if we should enable
PSE, PGE, or VME on each AP.
- Due to the change in the implementation of critical sections, acquire
the SMP TLB mutex around a slightly larger chunk of code for TLB
shootdowns.
- Remove some of the debug code from the original SMP implementation
that is no longer used or no longer applies to the new APIC code.
- Use a temporary hack to disable the ACPI module until the SMP code has
been further reorganized to allow ACPI to work as a module again.
- Add a DDB command to dump the interesting contents of the IDT.


121995 03-Nov-2003 jhb

Don't probe PnP BIOS devices for PICs for now to avoid problems with those
devices claiming resources that they don't actually use. The PIC drivers
only register valid interrupt sources, so we don't need to rely on these
drivers to claim invalid IRQs to prevent their use by other drivers.


121994 03-Nov-2003 jhb

- Remove explicit enabling of the BSP's APIC in the APIC_IO case and the
slave pin on the master PIC in the !APIC_IO case. The PIC drivers now
manage these details internally.
- Remove an spl0() that hasn't done anything since SMPng was first
committed.
- Update some comments that have rotted since SMPng.


121991 03-Nov-2003 jhb

Add the MP Table APIC enumerator. This code uses the BIOS MP Table to
enumerate I/O APICs as well as local APICs. It also provides Host-PCI
and PCI-PCI bridge drivers to use the MP Table to route PCI interrupts.


121990 03-Nov-2003 jhb

- Export doreti as a global symbol.
- Don't include isa/vector.s. Each PIC driver's entry points now live in
their own standalone files.


121989 03-Nov-2003 jhb

Update names of entry points for interrupt frames.


121986 03-Nov-2003 jhb

New APIC support code:

- The apic interrupt entry points have been rewritten so that each entry
point can serve 32 different vectors. When the entry is executed, it
uses one of the 32-bit ISR registers to determine which vector in its
assigned range was triggered. Thus, the apic code can support 159
different interrupt vectors with only 5 entry points.
- We now always to disable the local APIC to work around an errata in
certain PPros and then re-enable it again if we decide to use the APICs
to route interrupts.
- We no longer map IO APICs or local APICs using special page table
entries. Instead, we just use pmap_mapdev(). We also no longer
export the virtual address of the local APIC as a global symbol to
the rest of the system, but only in local_apic.c. To aid this, the
APIC ID of each CPU is exported as a per-CPU variable.
- Interrupt sources are provided for each intpin on each IO APIC.
Currently, each source is given a unique interrupt vector meaning that
PCI interrupts are not shared on most machines with an I/O APIC.
That mapping for interrupt sources to interrupt vectors is up to the
APIC enumerator driver however.
- We no longer probe to see if we need to use mixed mode to route IRQ 0,
instead we always use mixed mode to route IRQ 0 for now. This can be
disabled via the 'NO_MIXED_MODE' kernel option.
- The npx(4) driver now always probes to see if a built-in FPU is present
since this test can now be performed with the new APIC code. However,
an SMP kernel will panic if there is more than one CPU and a built-in
FPU is not found.
- PCI interrupts are now properly routed when using APICs to route
interrupts, so remove the hack to psuedo-route interrupts when the
intpin register was read.
- The apic.h header was moved to apicreg.h and a new apicvar.h header
that declares the APIs used by the new APIC code was added.


121983 03-Nov-2003 jhb

Allocate space for the intrcnt array. This array is managed in the
interrupt code layer as interrupt sources are added and handlers added
to those sources.


121982 03-Nov-2003 jhb

New device interrupt code. This defines an interrupt source abstraction
that provides methods via a PIC driver to do things like mask a source,
unmask a source, enable it when the first interrupt handler is added, etc.
The interrupt code provides a table of interrupt sources indexed by IRQ
numbers, or vectors. These vectors are what new-bus uses for its IRQ
resources and for bus_setup_intr()/bus_teardown_intr(). The interrupt
code then maps that vector a given interrupt source object. When an
interrupt comes in, the low-level interrupt code looks up the interrupt
source for the source that triggered the interrupt and hands it off to
this code to execute the appropriate handlers.

By having an interrupt source abstraction, this allows us to have different
types of interrupt source providers within the shared IRQ address space.
For example, IRQ 0 may map to pin 0 of the master 8259A PIC, IRQs 1
through 60 may map to pins on various I/O APICs, and IRQs 120 through
128 may map to MSI interrupts for various PCI devices.


121977 03-Nov-2003 jhb

Revert the critical section implementation to disable interrupts via
cli/sti now that we support many more than 32 interrupt sources.


121955 03-Nov-2003 obrien

Add AMD Features NX and LM.


121944 03-Nov-2003 phk

Change /dev/soekris-errled to be /dev/led/error and make it conditional
on CPU_SOEKRIS.

Note the subtle change in semantfics for 'f%d' flash instruction and the
new morse facility (see details in dev/led/led.c)


121942 03-Nov-2003 phk

Free major#100


121823 31-Oct-2003 jhb

For physical address regions between 0 and KERNLOAD, allow pmap_mapdev()
to use the direct mapped KVA at KERNBASE to service the request. This also
allows pmap_mapdev() to be used for such addresses very early during the
boot process and might provide some small savings on KVA.

Reviewed by: peter


121803 31-Oct-2003 jhb

- Finish externing of r_idt in the f00f hack code.
- Miscellaneous style fixes in the f00f hack code and some nearby code.

Submitted by: bde


121758 30-Oct-2003 peter

Change the pmap_invalidate_xxx() functions so they test against
pmap == kernel_pmap rather than pmap->pm_active == -1. gcc's inliner
can remove more code that way. Only kernel_pmap has a pm_active of -1.


121755 30-Oct-2003 jhb

Include "opt_pmap.h" so that the DISABLE_P* options are honored.


121754 30-Oct-2003 jhb

Always export r_gdt and r_idt and give them extern declarations in
machine/segments.h.


121721 30-Oct-2003 davidxu

Try to fetch thread mailbox address in page fault trap, so when thread
blocks in page fault hanlder, and upcall thread can be scheduled. It is
useful if process is doing lots of mmap based I/O.


121621 27-Oct-2003 jhb

Fix pmap_unmapdev() to call pmap_kremove() instead of implementing it
directly so that it more closely mirrors pmap_mapdev() which calls
pmap_kenter().


121512 25-Oct-2003 peter

For the SMP case, flush the TLB at the beginning of the page zero/copy
routines. Otherwise we run into trouble with speculative tlb preloads
on SMP systems. This effectively defeats Jeff's revision 1.438
optimization (for his pentium4-M laptop) in the SMP case. It breaks
other systems, particularly athlon-MP's.


121494 25-Oct-2003 peter

GC workaround code for detecting pentium4's and disabling PSE and PG_G.
It's been ifdef'ed out for ages.


121481 24-Oct-2003 jhb

A few whitespace and comment tweaks.


121480 24-Oct-2003 jhb

- Fail to probe if acpi0 probed ok as this driver basically tries to probe
the ACPI timer and we shouldn't do that if ACPI is already around to do
that for us.
- Set a description and tweak the order of checks in the probe function
to more closely match other PCI drivers.

This should probably be moved to sys/dev/piix/piix.c at some point and
turned on for all i386 kernels rather than just SMP ones.


121307 21-Oct-2003 silby

Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.


121235 19-Oct-2003 davidxu

Use npxdrop in cpu_thread_exit to save some cycles.
Clear FPU pcb flags for new upcall thread, these flags needn't
be inherited, the new thread should start from clean FPU status.


121228 18-Oct-2003 njl

Add the cpu_idle_hook() function pointer so that other idlers can be
hooked at runtime. Make C1 sleep (e.g., HLT) be the default. This
prepares the way for further ACPI sleep states.


121133 16-Oct-2003 bde

Don't forget to load %es with the kernel data segment selector in
Xcpustop(). %es is used in at least the call to savectx() when savectx()
calls bcopy(), so not loading it was fatal if a stop IPI interrupts
user mode.

This reduces bugs starting and stopping CPUs for debuggers. CPUs are
stopped mainly in kdb_trap() and cpu_reset(). At reset time there is
a good chance that all the CPUs are in the kernel, so the bug was
probably harmless then.


121102 15-Oct-2003 peter

Get some more data if we hit the pmap_enter() thing.


121087 14-Oct-2003 peter

Fix just about as many bugs in my last commit here as there were lines that
I changed. That is never a good sign.
1) only map 1 page at address zero, not 4096 pages
2) page 1 starts at address 4096 (PAGE_SIZE) not 4095 (PAGE_MASK). I
don't even want to think what the pte's looked like.
3) subtract the r/o page group start address from the end before
converting it to a count. Otherwise an extra page is mapped.

If you were affected by this, the symptoms of this was a hang at boot
after the spinner. Sorry folks. :-(

"You broke my laptop!" by: sam


121055 13-Oct-2003 alc

- Modify pmap_is_current() to return FALSE when a pmap's page table is in
use because a kernel thread is borrowing it. The borrowed page table
can change spontaneously, making any dependence on its continued use
subject to a race condition.
- _pmap_unwire_pte_hold() cannot use pmap_is_current(): If a change is
made to a page table page mapping for a borrowed page table, the TLB
must be updated.

In collaboration with: tegge


121024 12-Oct-2003 phk

Initialize CMAP3 to 0


120992 10-Oct-2003 peter

Set page zero read/write right from the start rather than trying to
change it later on.


120975 10-Oct-2003 peter

Move the pmap_kenter(KERNBASE, 0) a bit earlier so that it works for
the hasbrokenint12 tunable case too. (with some related and unrelated
style fixes)

Submitted by: bde


120937 09-Oct-2003 robert

Implement preliminary support for the PT_SYSCALL command to ptrace(2).


120772 05-Oct-2003 alc

Don't bother setting a page table page's valid field. It is unused and
not setting it is consistent with other uses of VM_ALLOC_NOOBJ pages.


120767 04-Oct-2003 peter

Fix the apm problem for real. We leave the first 4K page for the bios to
work in, but we had it mapped read-only. While this has always been the
case, the PG_PS enable hack hid it and the apm bios code ended up taking
advantage of it.


120734 04-Oct-2003 jeff

- The proper test is CPU_ENABLE_SSE and not CPU_ENABLED_SSE. This
effectively disabled the sse2_pagezero() code.

Spotted by: bde


120728 04-Oct-2003 peter

Emulate bugs in the old PSE code so that apm works again.

I do not yet understand why, but apm *depended* on the fact that the old
PSE code caused the first 1MB of ram to be mapped read/write because it
was in the same 4MB page as the kernel text+data+bss blob.

If anybody ever tried DISABLE_PSE before, apm would not work.

If your cpu did not have PSE, apm would not work there either (eg: 486).

This bug has been around for a Very Long Time.

The Pentium-4-fix commits did not emulate this unintended side effect of
the PSE post-early-boot fixup, and thus apm blew up. I've added a hack to
emulate the bug until either apm is fixed or we set fire to our bridges.

This is bad though because it gives kernel mode code the opportunity
to accidently write to the first few megs of the general page pool
which is remapped at KERNBASE. It needs to be fixed properly.


120722 03-Oct-2003 alc

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


120690 03-Oct-2003 peter

Add #include "opt_pmap.h" so locore picks up DISABLE_PSE etc options.


120654 01-Oct-2003 peter

Commit Bosko's patch to clean up the PSE/PG_G initialization to and
avoid problems with some Pentium 4 cpus and some older PPro/Pentium2
cpus. There are several problems, some documented in Intel errata.
This patch:
1) moves the kernel to the second page in the PSE case. There is an
errata that says that you Must Not point a 4MB page at physical
address zero on older cpus. We avoided bugs here due to sheer luck.
2) sets up PSE page tables right from the start in locore, rather than
trying to switch from 4K to 4M (or 2M) pages part way through the boot
sequence at the same time that we're messing with PG_G.

For some reason, the pmap work over the last 18 months seems to tickle
the problems, and the PAE infrastructure changes disturb the cpu
bugs even more.

A couple of people have reported a problem with APM bios calls during
boot. I'll work with people to get this resolved.

Obtained from: bmilekic


120627 01-Oct-2003 jeff

- Add a memory barrier before the sse2_pagezero() function returns. This
code uses write combining which must be committed to memory prior to
other uses of this page.

Spotted by: alc


120623 01-Oct-2003 jeff

- Hide more #ifdef logic in a new invlcaddr inline. This function flushes
the full tlb if you're on an I386or does an invlpg otherwise.

Glanced at by: peter


120621 01-Oct-2003 jeff

- Define an inline pagezero() to select the appropriate full-page zeroing
function from one of bzero, i686_pagezero, or sse2_pagezero.
- Use pagezero() in the three pmap functions that need to zero full pages.


120620 01-Oct-2003 jeff

- Add ss2_pagezero() for zeroing pages using the movnti instruction. This
instruction is enabled with SSE2 but does not use SSE registers. It is a
"non-temporal" move which bypasses the cache and does not dirty lines.


120613 01-Oct-2003 jeff

- Correct a problem with the last commit. The CMAP ptes need to be zeroed
prior to invalidating the TLB to be certain that the processor doesn't
keep a cached copy.

Discussed with: pete
Paniced: tegge
Pointy Hat: The usual spot


120602 30-Sep-2003 jeff

- On my Pentium4-M laptop, invalpg takes ~1100 cycles if the page is found in
the TLB and ~1600 if it is not. Therefore, it is more effecient to
invalidate the TLB after operations that use CMAP rather than before.
- So that the tlb is invalidated prior to switching off of a processor, we
must change the switchin functions to switchout functions.
- Remove td_switchout from the thread and move it to the x86 pcb.
- Move the code that calls switchout into swtch.s. These changes make this
optimization truely x86 specific.


120594 30-Sep-2003 jeff

- Correct a typo in a comment.


120506 27-Sep-2003 phk

The present defaults for the open and close for device drivers which
provide no methods does not make any sense, and is not used by any
driver.

It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.

Change the defaults to be nullopen() and nullclose() which simply
does nothing.

Remove explicit initializations to these from the drivers which
already used them.


120499 27-Sep-2003 alc

Addendum to the previous revision: If vm_page_alloc() for the page
table page fails, perform a VM_WAIT; update some comments in
_pmap_allocpte().


120424 25-Sep-2003 alc

- Eliminate the pte object.
- Use kmem_alloc_nofault() rather than kmem_alloc_pageable() to allocate
KVA space for the page directory page(s). Submitted by: tegge


120422 25-Sep-2003 peter

Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.


120322 21-Sep-2003 alc

Allocate the page table directory page(s) as "no object" pages. (This
leaves one explicit use of the pte object.)


120307 20-Sep-2003 alc

Reimplement pmap_release() such that it uses the page table rather than the
pte object to locate the page table directory pages. (This is another step
toward the elimination of the pte object.)


120187 18-Sep-2003 bde

Don't forget to reenable interrupts after a breakpoint and trace traps from
user mode. This goes with rev.1.468 of machdep.c which changed the gates
for these traps to interrupt gates. Having the interrupts disabled for
these traps from user mode is just an unwanted side effect.

This fixes at least 1 case of "panic: absolutely cannot call
smp_ipi_shootdown with interrupts already disabled". Too much code was
run with interrupts disabled, and it sometimes hit a sanity check.

Fix verified by: deischen


120040 13-Sep-2003 alc

Simplify (and micro-optimize) pmap_unuse_pt(): Only one caller,
pmap_remove_pte(), passed NULL instead of the required page table
page to pmap_unuse_pt(). Compute the necessary page table page
in pmap_remove_pte(). Also, remove some unreachable code from
pmap_remove_pte().


119999 12-Sep-2003 alc

Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from: tegge


119948 10-Sep-2003 jhb

Whitespace.


119941 10-Sep-2003 jhb

Remove an XXX comment by using the per CPU mask added after this comment
was added.


119937 10-Sep-2003 jhb

Add comments to the members of the timecounter struct similar to other
timecounters.


119935 10-Sep-2003 jhb

Add constants for entries in the IDT and use those instead of magic
numbers.


119869 08-Sep-2003 alc

Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by: tegge


119847 07-Sep-2003 bde

clock.c:
Quick fix for calling DELAY() for ddb input in some (atkbd-based)
console drivers. ddb must not use any normal locks, but DELAY()
normally calls getit() which needs clock_lock. One problem with using
normal locks in ddb is that deadlock is possible, but deadlock on
clock_lock is unlikely becaluse clock_lock is bogusly recursive,
apparently just to hide the problem of ddb using it. The i8254 clock
hardware has mostly write-only registers so it is important for it to
use a lock that gives exclusive access. (atkbd hardware is also
unfriendly to reentrant software but that problem is more local and
already solved.) I mostly saw the symptoms of the bug caused by
unlocking in getit() running cpu_unpend(). cpu_unpend() should not
be called while in ddb and Debugger() calls for failing assertions
about this caused a breakpoint within ddb.

ddb must also not call getit() because ddb may be being used to step
through clock initialization code that has stopped or otherwise mangled
the clock. If the clock is stopped, then getit() always returns the
same value and DELAY() takes forever if it trusts getit().

The quick fix is implement DELAY(n) as (n * timer_freq / 1000000)
inb(0x84)'s if ddb is active.

machdep.c:
Don't permit recursion on clock_lock.


119844 07-Sep-2003 bde

Moved stop/start code for other CPUs to near the beginning/end of
kdb_trap(). Stopping the other CPUs acts like locking them out, but
it wasn't done early enough or held long enough to prevent concurrent
accesses to shared data. In particular, the saved regs could be
clobbered.


119825 07-Sep-2003 davidxu

Turning on warning for static LDT allocation.


119715 03-Sep-2003 phk

Give the ELAN timecounter better quality than i8254


119608 31-Aug-2003 phk

Detect Geode CPUs and initialize the 27MHz timecounter "Geode".

This timecounter is 2usec faster than the i8254 and has 22 times
better resolution.


119563 29-Aug-2003 alc

Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.


119452 25-Aug-2003 obrien

Fix copyright comment & FBSDID style nits.

Requested by: bde


119276 22-Aug-2003 alc

Eliminate the last (direct) use of vm_page_lookup() on the pte object.


119139 19-Aug-2003 alc

Eliminate a possible race condition for multithreaded applications in
_pmap_allocpte(): Guarantee that the page table page is zero filled before
adding it to the directory. Otherwise, a 2nd, 3rd, etc. thread could
access a nearby virtual address and use garbage for the address
translation.

Discussed with: peter, tegge


119133 19-Aug-2003 sam

remove #define no longer used


119087 18-Aug-2003 jhb

Add missing header include for MSR macros.

Submitted by: bde


119054 17-Aug-2003 alc

Acquire the pte object's mutex when performing vm_page_grab(). Note: It is
my long-term objective to eliminate the pte object. In the near term, this
does, however, enable the addition of some vm object locking assertions.


119015 17-Aug-2003 gordon

Fixup the ELF branding information to point to the new home of rtld.


119006 17-Aug-2003 alc

In pmap_copy(), since we have the page table page's physical address
in hand, use PHYS_TO_VM_PAGE() rather than vm_page_lookup().


119004 16-Aug-2003 marcel

In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.


118987 16-Aug-2003 phk

Give timecounters a numeric quality field.

A timecounter will be selected when registered if its quality is
not negative and no less than the current timecounters.

Add a sysctl to report all available timecounters and their qualities.

Give the dummy timecounter a solid negative quality of minus a million.

Give the i8254 zero and the ACPI 1000.

The TSC gets 800, unless APM or SMP forces it negative.

Other timecounters default to zero quality and thereby retain current
selection behaviour.


118955 15-Aug-2003 jhb

- Fix a typo in a comment.
- Use macros for MSR register indexes as well as the bitfields in the
APICBASE MSR.


118952 15-Aug-2003 jhb

- Remove redundant <sys/sysctl.h> include.
- Move the <machine/vm86.h> include up to the other <machine/*> includes.


118951 15-Aug-2003 jhb

Adjust the style of the #ifdef SMP in casuptr() so that the #ifdef SMP
just covers the lock prefix to match the existing style in other asm files
in i386.


118950 15-Aug-2003 jhb

- Update location of PCI headers.
- Use macros for PCI config registers instead of magic numbers.
- Small whitespace nits.


118936 15-Aug-2003 imp

Improve the C3 CPU identification. I didn't notice that the CPU id
was masked. However KIMURA Yasuhiro-san noticed my mistake and was
kind enough to provide a better patch in PR 55581. I've merged that
into the routine. Hopefully I've not overlooked anything this time.

MFC After: 5 days


118905 14-Aug-2003 imp

Add many new VIA C3 CPU types now that they appear to be available in
machines (at least in Japan).

Submitted by: Masahiko KIMOTO-san
PR: 55578


118894 14-Aug-2003 alc

Eliminate pmap_page_lookup() and its uses. Instead, use PHYS_TO_VM_PAGE()
to convert the pte's physical address into a vm page.

Reviewed by: peter


118848 12-Aug-2003 imp

Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon


118839 12-Aug-2003 jhb

Fixup comment.


118832 12-Aug-2003 ps

Halted CPU's should not accumulate time.

Reviewed by: jhb


118741 10-Aug-2003 alc

Rename pmap_changebit() to pmap_clear_ptes() and remove the last
parameter. The new name better reflects what the function does and
how it is used. The last parameter was always FALSE.

Note: In theory, gcc would perform constant propagation and dead code
elimination to achieve the same effect as removing the last parameter,
which is always FALSE. In practice, recent versions do not. So, there
is little point in letting unused code pessimize execution.


118573 07-Aug-2003 imp

Remove trailing newlines (from the right branch this time)


118562 06-Aug-2003 alc

Correct a mistake in the previous revision: Reduce the scope of the page
queues lock such that it isn't held around the call to get_pv_entry(),
which calls uma_zalloc(). At the point of the call to get_pv_entry(), the
lock isn't necessary and holding it could lead to recursive acquisition,
which isn't allowed.


118561 06-Aug-2003 alc

Acquire the page queues lock in pmap_insert_entry(). (I used to believe
that the page's busy flag could be relied upon to synchronize access to the
pv list. I don't any longer. See, for example, the call to
pmap_insert_entry() from pmap_copy().)


118550 06-Aug-2003 phk

Dont initialize a TSC timecounter until we know if it is broken or not.


118549 06-Aug-2003 phk

Update to recognize Geode and note that the TSC seems broken.


118451 04-Aug-2003 scottl

In _bus_dmamap_load_buffer(), only count the number of bounce pages needed if
they haven't been counted before. This test was ommitted when bus_dmamap_load()
was merged into this function, and results in the pagesneeded field growing
without bounds when multiple deferrals happen.

Thanks to Paul Saab for beating his head against this for a few hours =-)


118444 04-Aug-2003 jhb

- GC unused cpu_thread_link().
- Move the enabling of interrupts out of assembly and into C a few
instructions later at cpu_critical_fork_exit(). This puts more of the
MD critical section implementation under the MD critical section API
making it easier to test and develop alternative implementations.


118440 04-Aug-2003 julian

Allow foot shooting as Linux emulation needs it.
Also change "Auto mode" to use a "special" value
instead of 0, and define and document it.
I had thought libpthread had already been switched to use auto mode but
it appears that patch hasn't been committed yet.

Discussed with: Davidxu


118360 02-Aug-2003 julian

fix braino in last commit.

Beaten with clue-stick by: Davidxu


118345 02-Aug-2003 julian

Relax the check for bad LDTE allocations. It turns out that
there is code that blindly allocates LDTEs starting at slot 6
and I quess it doesn't really matter to us if they overwrite the BSDI
syscall slot, since it isn't a BSDI binary. Also add some code to help track
down other such users (commented out for now).

Reviewed by: deischen@


118344 02-Aug-2003 alc

- Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in
pmap_mapdev(). See revision 1.140 of kern/sys_pipe.c for a detailed
rationale. Submitted by: tegge
- Remove GIANT_REQUIRED from pmap_mapdev().


118253 31-Jul-2003 julian

Have a go at unbreaking the tinderbox by fixing a debug printf.
The other option would be to remove it, but I can imagine it may be useful
for the forseeable future as we fiddle with segments in KSE and thr libraries,


118246 31-Jul-2003 scottl

Allocate the S/G list in the tag, not on the stack. The enforces the rule
that while many maps can exist and be loaded per tag, bus_dmamap_load() and
friends can only be called on one map at a time from the tag. This is
enforced via the mutex arguments in the tag.

Fixing this bug means that s/g lists can be arbitrarily long in length, and
also removes an ugly GNU-ism from the code. No API or ABI change is
incurred. Similar changes for other platforms is forthcoming.


118244 31-Jul-2003 bmilekic

Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.


118242 31-Jul-2003 davidxu

Enhance i386_set_ldt to allow application to dynamic allocate
or free a LDT entry. The function has following prototype:
int i386_set_ldt(int start_sel, union descriptor *descs, int num_sels);

Added Features:
o If start_sel is 0, num_sels is 1 and the descriptor pointed to by descs
is legal, then i386_set_ldt() will allocate a descriptor and return its
selector numbe

o If num_descs is 1, start_sels is valid, and descs is NULL, then
i386_set_ldt() will free that descriptor (making it available to be real-
located again later).

o If num_descs is 0, start_sels is 0 and descs is NULL then, as a special
case, i386_set_ldt() will free all descriptors.

Reviewed by: julian


118235 31-Jul-2003 peter

Cosmetic: fix disorder of opt_kstack_pages.h include.


118226 30-Jul-2003 bde

Fixed style bugs in rev.1.94 before MFCing it (for large C asm statements,
use "\n\" instead of "\" at the end of each source line, and don't use
semicolons). Fixed some older style bugs on the same lines (mainly
English errors in comments).


118186 29-Jul-2003 bde

Restored clearing of the bss, except for putting it in a correct place
with up to date comments. This fixes booting kernels with boot2
(except for loss of the features provided by loader) and is suitable
for MFC. Contrary to the old comments, most loaders don't clear the bss.
biosboot lost clearing of the bss in a code crunch in 1997, and boot2
never did it.

kan didn't notice the problem with gcc-3.3 putting variables that are
initialized to 0 in the bss until after committing gcc-3.3 because he
was already using essentially this patch. Before gcc-3.3, only the
non-critical `bootdev' variable was clobbered by clearing the bss.

MFC after: 3 days


118154 29-Jul-2003 bde

Don't hide the name of tmpstk, since there is no need to do so and the
HIDENAME() macro seems to be unimplementable in C. (HIDENAME() used
to use invalid token pasting using ## for the STDC case until gcc
started rejecting that; now it uses unportable token pasting using
juxtaposition in all cases.) This reduces use of HIDENAME() in the
kernel to only i386 and amd64 profiling code so that it doesn't bite
most kernels whenever gcc becomes stricter. Problems with HIDENAME()
in userland are smaller because userland mostly doesn't use strict
flags yet. There are some advantages to hiding the name of mcount,
but newer arches shouldn't do it; only amd64 does.

MFC after: 3 days

On second thoughts hide tmpstk better by staticizing it.


118081 27-Jul-2003 mux

- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.


117929 23-Jul-2003 alc

Annotate pmap_changebit() as __always_inline. This function was
written as a template that when inlined is specialized for the caller
through constant value propagation and dead code elimination. Thus,
the specialized code that is generated for pmap_clear_reference() et
al. avoids several conditional branches inside of a loop.


117928 23-Jul-2003 jhb

Use macros from apic.h to when writing to the ICR to send IPIs to startup
APs rather than magic numbers.

Tested by: scottl


117927 23-Jul-2003 jhb

Add a new macro APIC_ICRLO_RESV_MASK that contains all of the reserved
fields in the low 32 bits of the local APIC ICR register. Use this macro
in place of APIC_RESV2_MASK when masking off existing bits from the ICR
when writing to it to send an IPI.

Tested by: scottl


117870 22-Jul-2003 peter

Initiate de-orbit burn for fpu-less operation. 386+387 is still
theoretically supportable, but you'd really be happier with FreeBSD 2.1.8
on it.


117691 17-Jul-2003 scottl

Now that the dust has settled, make dflt_lock() always panic.


117600 15-Jul-2003 davidxu

Rename thread_siginfo to cpu_thread_siginfo.

Suggested by: jhb


117459 11-Jul-2003 peter

Fix the gcc-3.3 boot problem. Gcc now optimizes 'int foo = 0' by moving
it to the bss section and skips the initialization. This causes all
sorts of havoc because the bogus bss zero code clobbered previously set
variables. All our supported boot loaders already zero the bss, even
kgzip for the elf case. Since we dont generate a.out kernels, the old
a.out bootblocks and the a.out kgzip are not a factor anymore.


117385 10-Jul-2003 markm

Protect lint(1) from a #error.


117372 10-Jul-2003 peter

unifdef -DLAZY_SWITCH and start to tidy up the associated glue.


117340 08-Jul-2003 alc

In pmap_object_init_pt(), the pmap_invalidate_all() should be performed on
the caller-provided pmap, not the kernel_pmap. Using the kernel_pmap
results in an unnecessary IPI for TLB shootdown on SMPs.

Reviewed by: jake, peter


117321 08-Jul-2003 davidxu

Fix LOR between scheduler lock and SMP rendezvous lock.

Reviewed by: jhb


117285 06-Jul-2003 bsd

Woops - don't forget to declare locals needed for that last commit.


117284 06-Jul-2003 bsd

Don't unconditionally reset the hardware debug registers in cpu_exit(),
reset them only if they were previously in use. Unconditionally
resetting the registers wipes them out frequently, which interferes
with their use for kernel debugging.

While I'm here, be less verbose in the associated comment of a
neighboring function.

Noticed by: bde


117242 04-Jul-2003 alc

Add vm object locking to pmap_prefault().


117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


117136 01-Jul-2003 mux

Sync more things with other backends.


117129 01-Jul-2003 mux

Honor the boundary of the busdma tag when allocating bounce pages.
This was fixed in revision 1.5 of alpha/alpha/busdma_machdep.c and
was never fixed in other busdma backends using bounce pages.


117126 01-Jul-2003 scottl

Mega busdma API commit.

Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.

sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.

If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.

Reviewed by: tmm, gibbs


117045 29-Jun-2003 alc

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


117006 28-Jun-2003 jeff

- Construct a cpu topology map for Hyper Threading systems so that ULE may
take advantage of them.


116958 28-Jun-2003 davidxu

Add a machine depended function thread_siginfo, SA signal code
will use the function to construct a siginfo structure and use
the result to export to userland.

Reviewed by: julian


116930 27-Jun-2003 peter

Tidy up leftover lazy_switch instrumentation that is no longer needed.
This cleans up some #ifdef hell.


116929 27-Jun-2003 peter

*groan*. I can't win today. Fix manual transcription error so that the
PAE ifdef is correct.

Pointy hat assigned by: kan


116928 27-Jun-2003 peter

Make LAZY_SWITCH work with PAE


116926 27-Jun-2003 peter

Fix the false IPIs on smp when using LAZY_SWITCH caused by pmap_activate()
not releasing the pm_active bit in the old pmap.


116907 27-Jun-2003 scottl

Do the first and mostly mechanical step of adding mutex support to the
bus_dma async callback scheme. Note that sparc64 does not seem to do
async callbacks. Note that ia64 callbacks might not be MPSAFE at the
moment. Note that powerpc doesn't seem to do async callbacks due to
the implementation being incomplete.

Reviewed by: mostly silence on arch@


116361 15-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


116328 14-Jun-2003 alc

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


116304 13-Jun-2003 alc

Add vm object locking to pmap_object_init_pt().


116279 13-Jun-2003 alc

Add vm object locking to various pagers' "get pages" methods, i386 stack
management functions, and a u area management function.


116277 13-Jun-2003 mdodd

Conditionally attach the MCA bus device.


116188 11-Jun-2003 peter

GC unused cpu_wait() function


115907 06-Jun-2003 jhb

- Use IDTVEC() to declare IPI handlers since they are also IDT vectors.
- Make handlers for IPI's used by SMP kernels #ifdef SMP.


115858 04-Jun-2003 marcel

Change the second (and last) argument of cpu_set_upcall(). Previously
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.

Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.

This change officially marks the ability on ia64 for 1:1 threading.

Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64


115728 02-Jun-2003 tegge

Initialize td->td_pcb->pcb_ext in cpu_thread_setup() since a garbage
value (e.g. 0xd0d0d0d0) can cause a kernel panic.


115683 02-Jun-2003 obrien

Use __FBSDID().


115546 31-May-2003 phk

Avoid unbalancing the { } count in the source file with #ifdef by
putting the opening { after the #ifdef ... #endif sequence.

Found by: FlexeLint


115545 31-May-2003 phk

Remove break after return;

Found by: FlexeLint


115502 31-May-2003 phk

We cannot use the normal strlen() and strcpy, but don't #define strlen and
strcpy to get private versions, use private versions directly.

Found by: FlexeLint


115501 31-May-2003 phk

Add missing break;

Found by: FlexeLint


115500 31-May-2003 phk

Protect macro with do { ... } while (0)

Found by: FlexeLint


115499 31-May-2003 phk

Remove unused variable.

Found by: FlexeLint


115343 27-May-2003 scottl

Bring back bus_dmasync_op_t. It is now a typedef to an int, though the
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.

Approved by: re (bmah)


115316 26-May-2003 scottl

De-orbit bus_dmamem_alloc_size(). It's a hack and was never used anyways.
No need for it to pollute the 5.x API any further.

Approved by: re (bmah)


115184 20-May-2003 jhb

The per-CPU spinlocks list is only maintained when WITNESS is enabled.
Thus, treat all page faults while in a critical section as fatal rather
than just those that occur with a non-empty spinlocks list. All such page
faults are fatal anyways. Calling trap_fatal() earlier increases the
chances of getting more useful panic messages and a possible DDB prompt.

Approved by: re (scottl)


115044 16-May-2003 tmm

In cpu_fork(), initialize pcb_psl for the new process to PSL_KERNEL,
instead of taking the (userland) eflags from the trap frame and masking
out PSL_I. There is no need to inherit any flags from the forking process;
the old method however can cause flags set in userland for the forking
process to be bogusly set in kernel mode when the newly forked process
runs for the first time (in particular PSL_T, which is set for userland
when the process is single-stepped; this would cause trace traps in
kernel mode).

Approved by: re (jhb)


115016 15-May-2003 alc

Initialize logical_cpus_mask when the logical CPUs are enumerated in
the mptable. (Previously, logical_cpus_mask was only initialized if
the hyperthreading fixup was executed.)

Approved by: re (jhb)
Reviewed by: ps


114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


114305 30-Apr-2003 jhb

Range check the syscall number before looking it up in the syscallnames[]
array.

Submitted by: pho


114293 30-Apr-2003 markm

Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.


114291 30-Apr-2003 markm

Warns fixing. Protect against inappropriate linting, and mark
GCC-specific assemble code as such (in #ifdefs). Fix an easy
static variable warning while I'm here.


114177 28-Apr-2003 jake

Use inlines for loading and storing page table entries. Use cmpxchg8b for
the PAE case to ensure idempotent 64 bit loads and stores.

Sponsored by: DARPA, Network Associates Laboratories


114029 25-Apr-2003 jhb

- Push down Giant into the sysarch() calls that still need Giant.
- Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is
invalid.


114013 25-Apr-2003 jake

Remove harmless invalid cast.

Sponsored by: DARPA, Network Associates Laboratories


113998 25-Apr-2003 deischen

Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared. The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets. Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code. The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by: jake
Reviewed by: bde (months ago)


113953 24-Apr-2003 davidxu

Don't print anything for fault at cpu_switch_load_gs, just like other
code to recover fault in doreti because of invalid segment registers,
silently push error to userland.


113845 22-Apr-2003 davidxu

Move down intr level testing code a bit, cpu_switch_load_gs fault can be at
interrupt nested time.


113843 22-Apr-2003 davidxu

Fix some problems for cpu_switch_load_gs. when fault address is at
cpu_switch_load_gs, cpu is in context switch, so don't enable interrupt.
because it is in context switch, it is expected sched_lock was held,
so don't PROC_LOCK(p) and psignal, it is LOR, probably we can
set a P_XSIGBUS like flag in p_sflags, and set TDF_ASTPENDING in
td_flags, in ast(), post a SIGBUS to process if P_XSIGBUS was set.


113833 22-Apr-2003 davidxu

Remove single threading detecting code, these code really should be
replaced by thread_user_enter(), but current we don't want to enable
this in trap.


113796 21-Apr-2003 davidxu

Reset pcb_gs and %gs before possibly invalidating it.


113686 18-Apr-2003 jhb

Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().


113682 18-Apr-2003 jhb

Hold the proc lock for curproc around sigonstack().


113621 17-Apr-2003 jhb

Remove a couple of unused symbols.


113492 15-Apr-2003 mux

style(9)


113472 14-Apr-2003 simokawa

Restore delayed load support for the resource shortage case.
It was missed in the previous change.
Now, _bus_dmamap_load_buffer() accepts BUS_DMA_WAITOK/BUS_DMA_NOWAIT flags.

Original idea from: jake


113459 14-Apr-2003 simokawa

* Use _bus_dmamap_load_buffer() and respect maxsegsz in bus_dmamap_load().
Ignoring maxsegsz may lead to fatal data corruption for some devices.
ex. SBP-2/FireWire
We should apply this change to other platforms except for sparc64.

MFC after: 1 week


113364 11-Apr-2003 davidxu

Copy %gs from current CPU not from a stale PCB backup.


113363 11-Apr-2003 davidxu

set_user_ldt_rv() should check same proc not thread,
this commit fixes an user LDT smp rendezvous bug.


113348 10-Apr-2003 des

Convert the SMP_TSC kernel option into a loader tunable. Also enable
the TSC timecounter on single-CPU systems even when they are running
an SMP kernel.


113347 10-Apr-2003 mux

Change the operation parameter of bus_dmamap_sync() from an
enum to an int and redefine the BUS_DMASYNC_* constants as
flags. This allows us to specify several operations in one
call to bus_dmamap_sync() as in NetBSD.


113339 10-Apr-2003 julian

Move the _oncpu entry from the KSE to the thread.
The entry in the KSE still exists but it's purpose will change a bit
when we add the ability to lock a KSE to a cpu.


113321 10-Apr-2003 wes

Add a sysctl that records and reports the CPU clock rate calculated
at boot. Funny how often this trivial piece of information crops up
in embedded boxen.

Sponsored by: St. Bernard Software


113228 07-Apr-2003 jake

Add support for bounce buffers to _bus_dmamap_load_buffer, which is the
backend for bus_dmamap_load_mbuf and bus_dmamap_load_uio.

- Increaes MAX_BPAGES to 512. Less than this causes fxp to quickly runs out
of bounce pages.
- Add an argument to reserve_bounce_pages indicating wether this operation
should fail or be queued for later processing if we run out of memory.
The EINPROGRESS return value is not handled properly by consumers of
bus_dmamap_load_mbuf.
- If bounce buffers are required allocate minimum 1 bounce page at map
creation time. If maxsize was small previously this could get truncated
to 0 and the drivers would quickly run out of bounce pages.
- Fix a bug handling the return value of alloc_bounce_pages at map creation
time. It returns the number of pages allocated, not 0 on success.
- Use bus_addr_t for physical addresses to avoid truncation.
- Assert that the map is non-null and not the no bounce map in
add_bounce_pages.

Sponsored by: DARPA, Network Associates Laboratories


113173 06-Apr-2003 des

Initialize the PIIX timecounter in piix_attach(), which is called only
once, instead of doing it in piix_probe(), which is called every time
the PCI bus is rescanned.


113148 05-Apr-2003 peter

Unbreak the !LAZY_SWITCH case. I #ifdef'ed too much when I added
the ifdefs prior to commit and killed the same-address-space test.

Submitted by: bde


113100 04-Apr-2003 tegge

Add SMP_TSC option, which can be used on SMP systems where the TSCs
are synchronized to reduce context switch cost.


113090 04-Apr-2003 des

Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector. Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.


113040 03-Apr-2003 jake

- Removed APTD and associated macros, it is no longer used.

BANG BANG BANG etc.

Sponsored by: DARPA, Network Associates Laboratories


112993 02-Apr-2003 peter

Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.

Glanced at by: jake, jhb


112967 02-Apr-2003 jake

- Make casuptr return the old value of the location we're trying to update,
and change the umtx code to expect this.

Reviewed by: jeff


112898 01-Apr-2003 jeff

- Define a new md function 'casuptr'. This atomically compares and sets
a pointer that is in user space. It will be used as the basic primitive
for a kernel supported user space lock implementation.
- Implement this function in x86's support.s
- Provide stubs that return -1 in all other architectures. Implementations
will follow along shortly.

Reviewed by: jake


112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


112886 31-Mar-2003 jeff

- Fix two calls to trapsignal() that were still passing in 'struct proc'.
These were missed in my last commit.


112883 31-Mar-2003 jeff

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


112841 30-Mar-2003 jake

- Add support for PAE and more than 4 gigs of ram on x86, dependent on the
kernel opition 'options PAE'. This will only work with device drivers which
either use busdma, or are able to handle 64 bit physical addresses.

Thanks to Lanny Baron from FreeBSD Systems for the loan of a test machine
with 6 gigs of ram.

Sponsored by: DARPA, Network Associates Laboratories, FreeBSD Systems


112837 30-Mar-2003 jake

- Remove invalid casts.

Sponsored by: DARPA, Network Associates Laboratories


112836 30-Mar-2003 jake

- Convert all uses of pmap_pte and get_ptbase to pmap_pte_quick. When
accessing an alternate address space this causes 1 page table page at
a time to be mapped in, rather than using the recursive mapping technique
to map in an entire alternate address space. The recursive mapping
technique changes large portions of the address space and requires global
tlb flushes, which seem to cause problems when PAE is enabled. This will
also allow IPIs to be avoided when mapping in new page table pages using
the same technique as is used for pmap_copy_page and pmap_zero_page.

Sponsored by: DARPA, Network Associates Laboratories


112687 26-Mar-2003 ps

Nuke options HTT infavor of machdep.hlt_logical_cpus tunable/sysctl.
This keeps the logical cpu's halted in the idle loop. By default
the logical cpu's are halted at startup. It is also possible to
halt any cpu in the idle loop now using machdep.hlt_cpus.

Examples of how to use this:
machdep.hlt_cpus=1 halt cpu0
machdep.hlt_cpus=2 halt cpu1
machdep.hlt_cpus=4 halt cpu2
machdep.hlt_cpus=3 halt cpu0,cpu1

Reviewed by: jhb, peter


112686 26-Mar-2003 peter

Halt the cpus in the idle loop for SMP as well for several reasons:
1) Its critical for HTT. There's less foot-shooting opportunity.
2) I've seen significant improvements in interactive response to commands
over ssh sessions. I assume this is less lock contention.
3) As incentive to finish the idle cpu IPI wakeup stuff.
4) The machine on my desk was blowing hot air in my general direction
because somebody forgot to turn the hlt on, and it saves 50 watts per
cpu..

The machdep.cpu_idle_hlt sysctl is still available, but now the default
is the same as on UP kernels.


112569 25-Mar-2003 jake

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


112526 24-Mar-2003 bde

Disable interrupts while in kdb_trap() to handle cases where the caller
doesn't do it. This fixes all known causes of "Context switches not
allowed in the debugger" in mi_switch(). The main cause was trap_fatal()
calling kdb_trap() with interrupts enabled. Switching to ithreads for
interrupt handling then made fatal traps more fatal and harder to debug.
The problem was limited in -current because most interrupt handlers are
blocked by Giant, but it occurred almost deterministically for me because
my clock interrupt handlers are non-fast and not blocked by Giant.


112445 20-Mar-2003 dwmalone

Extend CPU_ATHLON_SSE_HACK to cover a few more revisions of Athlon CPUs.

Submitted by: Jon Kuster <kwsn@earthlink.net>
MFC after: 2 weeks


112436 20-Mar-2003 mux

Use atomic operations to increment and decrement the refcount
in busdma tags. There are currently no tags shared accross
different drivers so this isn't needed at the moment, but it
will be required when we'll have a proper newbus method to get
the parent busdma tag.


112367 18-Mar-2003 phk

Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.


112346 17-Mar-2003 mux

- Lock down the bounce pages structures. We use the same locking scheme
as with the alpha backend because both implementations of bounce pages
are identical.
- Remove useless splhigh()/splx() calls.


112196 13-Mar-2003 mux

Grab Giant around calls to contigmalloc() and contigfree() so
that drivers converted to be MP safe don't have to deal with it.


112133 12-Mar-2003 jake

- Added support for multiple page directory pages to pmap_pinit and
pmap_release.
- Merged pmap_release and pmap_release_free_page. When pmap_release is
called only the page directory page(s) can be left in the pmap pte object,
since all page table pages will have been freed by pmap_remove_pages and
pmap_remove. In addition, there can only be one reference to the pmap and
the page directory is wired, so the page(s) can never be busy. So all there
is to do is clear the magic mappings from the page directory and free the
page(s).

Sponsored by: DARPA, Network Associates Laboratories


111975 08-Mar-2003 davidxu

Initialize eflags in fake frame to default value rather than random one.
The random value sometimes causes macro CLKF_USERMODE to return true
because PSL_VM bit is set and really shoudn't be, this causes statclock()
to execute in wrong path, and further breaks KSE code and kernel crashes
when executing threaded program.


111939 06-Mar-2003 rwatson

Instrument sysarch() MD privileged I/O access interfaces with a MAC
check, mac_check_sysarch_ioperm(), permitting MAC security policy
modules to control access to these interfaces. Currently, they
protect access to IOPL on i386, and setting HAE on Alpha.
Additional checks might be required on other platforms to prevent
bypass of kernel security protections by unauthorized processes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


111878 04-Mar-2003 jhb

Wrap the hyperthreading support code with the HTT kernel option.
Hyperthreading support is now off unless the HTT option is added.

MFC-after: 3 days


111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


111647 27-Feb-2003 phk

Add support for the Elan CPU hardware watchdog used in "active" mode.


111638 27-Feb-2003 jhb

Expand some #ifdef's to fix I386_CPU compile.

Reported by: Andy Farkas <andyf@speednet.com.au>


111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


111535 26-Feb-2003 davidxu

Better to not know anything about KSE.


111524 26-Feb-2003 mux

Correctly set BUS_SPACE_MAXSIZE in all the busdma backends.
It was bogusly set to 64 * 1024 or 128 * 1024 because it was
bogusly reused in the BUS_DMAMAP_NSEGS definition.


111493 25-Feb-2003 jake

- Added inlines pmap_is_current, pmap_is_alternate and pmap_set_alternate
for testing and setting the current and alternate address spaces.
- Changed PTDpde and APTDpde to arrays to support multiple page directory
pages.

ponsored by: DARPA, Network Associates Laboratories


111477 25-Feb-2003 davidxu

Remove an unsafe KASSERT.


111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


111428 24-Feb-2003 nyan

The mpbiosreason variable does not used for pc98.


111385 24-Feb-2003 jake

Use the direct mapping of IdlePTD setup in locore for proc0's page directory,
instead of allocating another page of kva and mapping it in again. This was
likely an oversight in revision 1.174 (cut and paste from pmap_pinit).

Discussed with: peter, tegge
Sponsored by: DARPA, Network Associates Laboratories


111382 23-Feb-2003 tegge

Allow machines with one CPU and a valid mp table to boot an SMP kernel.


111372 23-Feb-2003 jake

Previous commit missed a 1 that should be NGPTD, and an NPDEPG that should
be NPDEPTD. Grumble.

Sponsored by: DARPA, Network Associates Laboratories


111363 23-Feb-2003 jake

- Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the
page directory.
- Use these instead of the magic constants 1 or PAGE_SIZE where appropriate.
There are still numerous assumptions that the page directory is exactly
1 page.

Sponsored by: DARPA, Network Associates Laboratories


111299 23-Feb-2003 jake

- Added macros PDESHIFT and PTESHIFT, use these instead of magic constants
in locore.
- Removed the macros PTESIZE and PDESIZE, use sizeof instead in C.

Sponsored by: DARPA, Network Associates Laboratories


111272 22-Feb-2003 alc

The root of the splay tree maintained within the pm_pteobj always refers
to the last accessed pte page. Thus, the pm_ptphint is redundant and can
be removed.


111271 22-Feb-2003 jake

unsigned -> pt_entry_t.

Sponsored by: DARPA, Network Associates Laboratories


111167 20-Feb-2003 peter

Fix fumble in rev 1.525. pmap_kenter()'s second argument is a physical
address, not a page index.

Laughed at by: jake


111138 19-Feb-2003 phk

#include "opt_cpu.h" so we notice our options.


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


111028 17-Feb-2003 jeff

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


111024 17-Feb-2003 jeff

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


111017 16-Feb-2003 phk

Change "dev_t gdbdev" to "void *gdb_arg", some possible paths for GDB
will not have a dev_t.


111002 16-Feb-2003 phk

Remove #include <sys/dkstat.h>


110955 15-Feb-2003 alc

Assert that the kernel map's system mutex is held in pmap_growkernel().


110845 14-Feb-2003 alc

- Add a mutex for synchronizing the use of CMAP/CADDR 1 and 2.
- Eliminate small style differences between pmap_zero_page(),
pmap_copy_page(), etc.


110781 13-Feb-2003 peter

Oops. I mis-remembered about the P4 problems. It was 5.0-DP2 that
was shipped with DISABLE_PG_G and DISABLE_PSE, not 5.0-REL. *blush*
Disable the code - but still leave it there in case its still lurking.


110780 13-Feb-2003 peter

Turn of PG_PS and PG_G for Pentium-4 cpus at boot time. This is so
that we can stop turning off PG_G and PG_PS globally for releases.


110747 12-Feb-2003 alc

Remove kptobj. Instead, use VM_ALLOC_NOOBJ.


110532 08-Feb-2003 alc

MF alpha
- Synchronize access to the allpmaps list with a mutex.


110480 07-Feb-2003 peter

Commit some cosmetic changes I had laying around and almost included
with another commit. Unwrap a line. Unexpand a pmap_kenter().


110379 05-Feb-2003 phk

This file has no longer any content from the original Berkeley file so
replace the UCB copyright with a FreeBSD 2 clause thing.

Remove some no longer relevant comments.


110370 05-Feb-2003 phk

i386/i386/tsc.c was repo-copied from i386/isa/clock.c.

Remove all the stuff that does not relate to the TSC.

Change the calibration to use DELAY(1000000) rather than trying to check
it against the CMOS RTC, this drastically increases precision:

Using 25 samples on a Athlon 700MHz UP machine I find:

stddev min max average
CMOS 22200 Hz -74980 Hz 34301 Hz 704928721 Hz
DELAY 1805 Hz -1984 Hz 2678 Hz 704937583 Hz

(The difference between the two averages is not statistically significant.)

expressed in PPM of the frequency:
stddev min max
CMOS 31.49 PPM -106.37 PPM 48.66 PPM
DELAY 2.56 PPM 2.81 PPM 3.80 PPM

This code will not be used until a followup commit to sys/isa/clock.c
and sys/pc98/pc98/clock.c which will only happen after some field testing.


110335 04-Feb-2003 harti

Fix a problem in bus_dmamap_load_{mbuf,uio} when the first mbuf or the first
uio segment is empty. In this case no dma segment is create by
bus_dmamap_load_buffer, but the calling routine clears the first flag.
Under certain combinations of addresses of the first and second mbuf/uio
buffer this leads to corrupted DMA segment descriptors. This was already
fixed by tmm in sparc64/sparc64/iommu.c.

PR: kern/47733
Reviewed by: sam
Approved by: jake (mentor)


110299 03-Feb-2003 phk

Split the global timezone structure into two integer fields to
prevent the compiler from optimizing assignments into byte-copy
operations which might make access to the individual fields non-atomic.

Use the individual fields throughout, and don't bother locking them with
Giant: it is no longer needed.

Inspired by: tjr


110296 03-Feb-2003 jake

Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway. This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by: jhb, tmm
Tested on: i386, sparc64


110254 03-Feb-2003 alc

- Make allpmaps static.
- Use atomic subtract to update the global wired pages count. (See
also vm/vm_page.c revision 1.233.)
- Assert that the page queue lock is held in pmap_remove_entry().


110232 02-Feb-2003 alfred

Consolidate MIN/MAX macros into one place (param.h).

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


110039 29-Jan-2003 phk

Make tsc_freq a 64bit quantity.

Inspired by: http://www.theinquirer.net/?article=7481


110030 29-Jan-2003 scottl

Implement bus_dmamem_alloc_size() and bus_dmamem_free_size() as
counterparts to bus_dmamem_alloc() and bus_dmamem_free(). This allows
the caller to specify the size of the allocation instead of it defaulting
to the max_size field of the busdma tag.

This is intended to aid in converting drivers to busdma. Lots of
hardware cannot understand scatter/gather lists, which forces the
driver to copy the i/o buffers to a single contiguous region
before sending it to the hardware. Without these new methods, this
would require a new busdma tag for each operation, or a complex
internal allocator/cache for each driver.

Allocations greater than PAGE_SIZE are rounded up to the next
PAGE_SIZE by contigmalloc(), so this is not suitable for multiple
static allocations that would be better served by a single
fixed-length subdivided allocation.

Reviewed by: jake (sparc64)


109994 28-Jan-2003 jake

Remove BDE_DEBUGGER.

Discussed with: bde


109964 28-Jan-2003 alc

Merge pmap_testbit() and pmap_is_modified(). The latter is the only caller
of the former.


109898 26-Jan-2003 julian

Fix KSE related patch.
Make it compile for the SMP case..
statclock_process() has changed prototypes.


109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


109715 23-Jan-2003 peter

Now that TPR isn't bogusly raised at boot, there is no need to clear
it at context switch.


109713 23-Jan-2003 peter

Dont raise the TPR register at initialization time. It only causes
problems and we only ever clear it.


109700 22-Jan-2003 jhb

- Move enable_sse()'s prototype to machine/md_var.h.
- Sort definition of cpu_* variables appropriately.
- Move cpu_fxsr out of the magic non-BSS set of variables and stick it in
the BSS along with hw_instruction_sse (make the latter static as well).

Submitted by: bde (partially)


109696 22-Jan-2003 jhb

Rename cpuid_cpuinfo to cpu_procinfo. bde requested that I rename this
variable to something in the cpu_* namespace since that's what all the
other cpuid variables were named and cpu_procinfo is what I came up with.

Requested by: bde


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109605 21-Jan-2003 jake

Resolve relative relocations in klds before trying to parse the module's
metadata. This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by: Hartmut Brandt <brandt@fokus.gmd.de>
PR: 46732
Tested on: alpha (obrien), i386, sparc64


109342 16-Jan-2003 dillon

Merge all the various copies of vm_fault_quick() into a single
portable copy.


109340 15-Jan-2003 dillon

Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy. Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).


109327 15-Jan-2003 phk

Add machdep.elan_freq sysctl which can be used to set the CPU clock
frequency in Hz. The default is still 33.333 MHz. Please notice
that the number is round to a multiple of four internally so it may
not read back exactly the same as written.

Add compile time ELAN_XTAL option to override the 33.333 MHz default.

Add compile time ELAN_PPS option to enable code for high precision
(250 nanoseconds) timestamping of external signals.


109027 09-Jan-2003 jhb

Remove earlysetcpuclass() as it has been OBE.

Suggested by: bde


109026 09-Jan-2003 jhb

Rework part of the previous processor name changes so that we read
cpu_exthigh and cpu_brand in printcpuinfo() instead of in identify_cpu().
We also only do it for known-good values of cpu_vendor which is a bit more
conservative.

Reviewed by: bde (mostly)


108961 08-Jan-2003 jhb

Consistently use spaces in between arguments to strcmp(). Whitespace
only.


108948 08-Jan-2003 jhb

- Use cpu_exthigh instead of executing cpuid again to retrieve it for the
print_AMD_foo() functions.
- Add a brand name table for the brand index provided on Intel CPU's in
%ebx after cpuid 1.
- For Intel CPUs, if we don't get a processor name from the extended cpuid
then use the brand index in cpuid_cpuinfo to pick a name from the brand
table and copy that name into cpu_brand.
- Replace the duplicated code to use the extended cpuid to replace
cpu_model with the processor name in the AMD and Transmeta sections of
printcpuinfo() with generic code that replaces cpu_model with
cpu_brand if cpu_brand is not an empty string. We also trim leading
spaces from cpu_brand prior to doing this since at least some processor
names (notably those of Intel CPUs) have leading spaces in the name.
- Give print_AMD_features() its own private regs[] array since
printcpuinfo() doesn't use the one it has anymore.


108947 08-Jan-2003 jhb

- Add a cpu_exthigh variable to hold the highest extended cpuid value
returned from cpuid 0x80000000.
- Add a cpu_brand char array to hold the processor name returned by
cpuid 0x80000002-0x80000004 on AMD, Intel, Transmeta, and possibly
other CPUs.
- Use cpuid to set cpu_exthigh and read the processor name if it is present
in identify_cpu().


108946 08-Jan-2003 jhb

Bah, get the test for more than one logical CPU right so we don't bogusly
claim a CPU has HT support when it lists 0 or 1 logical CPU's per physical
processor.


108914 08-Jan-2003 jhb

Enumerate logical hyperthread CPUs manually if they aren't already listed
in the mptable. The way this works is that we determine if the system
has hyperthreading and how many logical CPU's should be in each physical
CPU by using the information returned by cpuid. During the first pass of
the mptable, we build a bitmask of the APIC IDs of the CPUs listed in the
mptable. We then scan that bitmask to see if the CPUs are already listed
by the mptable, or if there are any APIC IDs already in use that would
conflict with the APIC IDs of the logical CPUs. If that test succeeds,
then we fixup the count of application processors. Later on during the
second pass of the mptable we create fake processor entries for logical
CPUs and add them to the system.

We only need this type of fixup hack when using the mptable to enumerate
CPUs. The ACPI MADT table properly enumerates all logical CPUs.


108913 08-Jan-2003 jhb

If the boot processor supports hyperthreading and contains more than one
logical CPU, display the number of logical CPUs per physical processor
underneath the list of CPU features.


108911 08-Jan-2003 jhb

Add a cpuid_cpuinfo variable to hold the results of %ebx from cpuid with
%eax of 1 and set it in identify_cpu().


108608 03-Jan-2003 jhb

Document bit 31 of the cpuid features word as PBE (Pending Break Enable).


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108517 31-Dec-2002 njl

Return an error when r/w is requested on an unsupported device instead of
looping.

Submitted by: Sean Kelly <smkelly@zombie.org>
Pointed out by: bde


108338 28-Dec-2002 julian

Add code to ddb to allow backtracing an arbitrary thread.
(show thread {address})

Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.

All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().

kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.

Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.


108337 28-Dec-2002 alc

Assert that the page queues lock is held in pmap_testbit().


108253 24-Dec-2002 alc

- Hold the page queues lock around calls to vm_page_wakeup() and
vm_page_flag_clear().


107965 17-Dec-2002 njl

Back out 1.19 to rethink approach

Requested by: julian@


107962 17-Dec-2002 njl

Automatically issue a "continue" along with the "detach" command. This
fixes the problem of cleanly restarting a target after entering gdb mode.

Reviewed by: archie@


107957 16-Dec-2002 julian

Reformat last change
Requested by: nate@


107955 16-Dec-2002 julian

Don't dump core into a partition that is too small for it.
If we do, we usually wrote backwareds into the proceeding partititon
which is usually the root partition.


107867 14-Dec-2002 phk

Only dump the BIOS geometry table from bootinfo on PC98, we don't use
the contents on i386 anymore.


107853 14-Dec-2002 alc

Add page locking to pmap_mincore().

Submitted (in part) by: tjr@


107719 10-Dec-2002 julian

Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by: re (jhb)


107576 04-Dec-2002 phk

Use the correct value when writing the Day Of Week byte in the CMOS.
The correct range is [1...7] with Sunday=1, but we have been writing
[0...6] with Sunday=0.

The Soekris computers flagged the zero, zapped the date, so if you
rebooted your soekris on a sunday, it would come up with a wrong
date.

Bruce has a more extensive rework of this code, but we will stick with
the minimalist fix for now.

Spotted by: Soren Kristensen <soren@soekris.com>
Thanks to: Michael Sierchio <kudzu@tenebras.com>.
Confirmed by: bde
Approved by: re


107538 03-Dec-2002 alc

Avoid recursive acquisition of the page queues lock in pmap_unuse_pt().

Approved by: re


107521 02-Dec-2002 deischen

Align the FPU state in the ucontext and sigcontext to 16 bytes
to accomodate the new SSE/XMM floating point save/restore
instructions.

This commit is mostly from bde and includes some style nits.

Approved by: re (jhb)


107490 02-Dec-2002 alc

Hold the page queues lock when calling pmap_unwire_pte_hold() or
pmap_remove_pte(). Use vm_page_sleep_if_busy() in
_pmap_unwire_pte_hold() so that the page queues lock is released
when sleeping.

Approved by: re (blanket)


107434 01-Dec-2002 alc

Assert that the page queues lock is held in pmap_changebit()
and pmap_ts_referenced().

Approved by: re (blanket)


107410 30-Nov-2002 alc

Assert that the page queues lock is held in pmap_page_exists_quick().

Approved by: re (blanket)


107217 25-Nov-2002 alc

Assert that the page queues lock is held in pmap_remove_pages().

Approved by: re (blanket)


107212 24-Nov-2002 alc

Add page queues locking to vunmapbuf(); reduce differences with respect
to the sparc64 implementation. (Note: With modest effort on the alpha and
ia64 this function could migrate to the MI part of the kernel.)

Approved by: re (blanket)


107184 23-Nov-2002 alc

- Assert that the page queues lock is held in pmap_remove_all().
- Fix a diagnostic message and comment in pmap_remove_all().
- Eliminate excessive white space from pmap_remove_all().

Approved by: re


107180 22-Nov-2002 mux

Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit(). This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit(). We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by: re@
Suggested by: jhb


106977 16-Nov-2002 deischen

Add getcontext, setcontext, and swapcontext as system calls.
Previously these were libc functions but were requested to
be made into system calls for atomicity and to coalesce what
might be two entrances into the kernel (signal mask setting
and floating point trap) into one.

A few style nits and comments from bde are also included.

Tested on alpha by: gallatin


106865 13-Nov-2002 mux

Remove a commented out #include "opt_pci.h", it doesn't
exist anymore.


106842 13-Nov-2002 mdodd

Loader tunable 'machdep.disable_mtrrs'.
Sysctl of same name to reflect status.

Submitted by: jhb
Approved by: re (murray)
MFC after: 1 day


106838 13-Nov-2002 alc

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


106753 11-Nov-2002 alc

- Clear the page's PG_WRITEABLE flag in the i386's pmap_changebit()
if we're removing write access from the page's PTEs.
- Export pmap_remove_all() on alpha, i386, and ia64. (It's already
exported on sparc64.)


106707 09-Nov-2002 iwasaki

Add a new loader tunable, hw.hasbrokenint12, to indicate that BIOS
has broken int 12H.
If hw.hasbrokenint12="1" in loader environment, kernel never use BIOS
INT 12 call to determine base memory size.
Otherwise, kernel use INT 12 in old behaviour.
This should fix kernel panic problem caused by 1.544 changes.

MFC after: 1 day


106697 09-Nov-2002 des

Print real / avail memory in megabytes rather than kilobytes.


106609 08-Nov-2002 davidxu

adjust critical section to only wrap around vm86_bioscall().


106607 08-Nov-2002 davidxu

use critical_enter/exit to add a critical section around BIOS call,
this unbreaks WITNESS.

Pointed out by: jhb


106605 07-Nov-2002 tmm

Move the definitions of the hw.physmem, hw.usermem and hw.availpages
sysctls to MI code; this reduces code duplication and makes all of them
available on sparc64, and the latter two on powerpc.
The semantics by the i386 and pc98 hw.availpages is slightly changed:
previously, holes between ranges of available pages would be included,
while they are excluded now. The new behaviour should be more correct
and brings i386 in line with the other architectures.

Move physmem to vm/vm_init.c, where this variable is used in MI code.


106567 07-Nov-2002 alc

Simplify and optimize pmap_object_init_pt(). More specifically,
take advantage of the fact that the vm object's list of pages is
now ordered to reduce the overhead of finding the desired set of
pages to be mapped. (See revision 1.215 of vm/vm_page.c.)


106542 07-Nov-2002 davidxu

1.Fix smp race between kernel vm86 BIOS calling and userland vm86 mode code,
remove global variable in_vm86call, set vm86 calling flag in PCB flags.

2.Fix vm86 BIOS calling preempted problem by changing vm86_lock mutex type
from MTX_DEF to MTX_SPIN. vm86pcb is not remembered in thread struct,
when the thread calling vm86 BIOS is preempted by interrupt thread,
and later switching back to the thread would cause incorrect context be
loaded into CPU registers, this leads to kernel crash.


106503 06-Nov-2002 jmallett

Remove what was a temporary bogus assignment of bits of siginfo_t, as it does
not look like the prerequisites to fill it in properly will be in the tree
for the upcoming release, but it's mostly done, so there is no need for these
to stay around to remind us.


105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


105949 25-Oct-2002 iwasaki

Change method to determine base memory size.
Try INT 15H/E820H first, then fall back to the old compatibility
method (INT 12H).
This is a workaround for newer machines which have broken INT 12H BIOS
service implementation.

Reviewed by: -current ML
MFC after: 3 days


105900 24-Oct-2002 julian

Extract out KSE specific code from machine specific code
so that there is ony one copy of it. Fix that one copy
so that KSEs with no mailbox in a KSE program are not a cause
of page faults (this can legitmatly happen).

Submitted by: (parts) davidxu


105554 20-Oct-2002 phk

Change the definition of the debugging registers to be an array, so
that we can index into it, rather than do pointer gymnastics on a
structure containing 8 elements.

Verified by: MD5 hash on the produced .o files.


105469 19-Oct-2002 marcel

Add two hooks to signal module load and module unload to MD code.
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.

Validated on: alpha, i386, ia64


105328 17-Oct-2002 iwasaki

1. Fix a comment. Locking _is_ needed (but not done).
2. Update a comment. We now restore much more than RTC updates and
interrupts.
3. Order change. Stop interrupts by writing to RTC_STATUSB,
restore rate bits for the interrupts by writing to RTC_STATUSA,
then enable interrupts again.
This seems to be done perfectly backwards in startrtclock().
Otherwise, the idea for this change was obtained from
startrtclock().
4. Don't stop the clock (RTCB_HALT). We only program some control bits
and don't want to stop the clock.
5. (Not really related.) Add caveats to the comment about timer_restore().
The update is non-atomic since locking is not done.

On locking:
6. rtcin() and writertc() are locked() adequately by splhigh() in RELENG_4,
but this locking is null in -current.
7. Doing things in the correct order in (3) combined with (6) is probably
enough locking for rtcrestore() in RELENG_4. In -current, the
writertc()'s race with rtcintr() unless the BIOS disables RTC interrupts.

Submitted by: bde (including commit message)
MFC after: 1 week


105216 16-Oct-2002 phk

Be consistent about functions being static.

Spotted by: FlexeLint.


104964 12-Oct-2002 jeff

- Create a new scheduler api that is defined in sys/sched.h
- Begin moving scheduler specific functionality into sched_4bsd.c
- Replace direct manipulation of scheduler data with hooks provided by the
new api.
- Remove KSE specific state modifications and single runq assumptions from
kern_switch.c

Reviewed by: -arch


104908 11-Oct-2002 mike

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


104695 09-Oct-2002 julian

Round out the facilty for a 'bound' thread to loan out its KSE
in specific situations. The owner thread must be blocked, and the
borrower can not proceed back to user space with the borrowed KSE.
The borrower will return the KSE on the next context switch where
teh owner wants it back. This removes a lot of possible
race conditions and deadlocks. It is consceivable that the
borrower should inherit the priority of the owner too.
that's another discussion and would be simple to do.

Also, as part of this, the "preallocatd spare thread" is attached to the
thread doing a syscall rather than the KSE. This removes the need to lock
the scheduler when we want to access it, as it's now "at hand".

DDB now shows a lot mor info for threaded proceses though it may need
some optimisation to squeeze it all back into 80 chars again.
(possible JKH project)

Upcalls are now "bound" threads, but "KSE Lending" now means that
other completing syscalls can be completed using that KSE before the upcall
finally makes it back to the UTS. (getting threads OUT OF THE KERNEL is
one of the highest priorities in the KSE system.) The upcall when it happens
will present all the completed syscalls to the KSE for selection.


104513 05-Oct-2002 deischen

Fix building of minimal kernels without npx by rearranging ifdefs.
Also fix some style bugs in surrounding code, and add a comment
about FP state restoral that seems questionable.

Submitted by: bde


104486 04-Oct-2002 sam

New bus_dma interfaces for use by crypto device drivers:

o bus_dmamap_load_mbuf
o bus_dmamap_load_uio

Test on i386. Known to compile on alpha and sparc64, but not tested.
Otherwise untried.


104474 04-Oct-2002 jhb

Fix a bogon in previous commit. bcopy() from the malloc'd memory that we
already copied into, rather than doing the bcopy() from the userland
pointer. "Oops."


104460 04-Oct-2002 deischen

Add another temporary hack to allow running older i386 binaries.
This will be removed when new versions of syscalls sigreturn()
and sigaction() are added (mini is working on this but is in
the middle of a move).

This should fix the problem of cvsupd dying.


104354 02-Oct-2002 scottl

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


104319 01-Oct-2002 phk

The pmap_prefault_pageorder[] array was initialize with wrong values
due to a missing comma.

I have no idea what trouble, if any, this may have caused.

Pointed out by: FlexeLint


104224 30-Sep-2002 jhb

- Give legacy an identify routine that always adds 'legacy0' at an order
of 1 so that it is not probed until after acpi0 is probed and attached.
- In legacy_probe(), return ENXIO if acpi0 is around and alive.
- nexus_attach() is now much simpler and just lets its child drivers do
all the work.


104215 30-Sep-2002 obrien

Turn back on the "SMP: AP CPU #N Launched!" message on normal boots.
Peter's rev 1.189 should fix the lost console on SCSI-based systems due
to this message.


104175 30-Sep-2002 obrien

Only print out the "SMP: AP CPU #N Launched!" message on verbose boots.
The kernel printf() isn't race-free


104174 30-Sep-2002 obrien

Save the FP state in the PCB as that is compatable with releng4 binaries.

This is a band-aid until the KSE pthread committers get back on the ground
and have their machines setup.

Submitted by: eischen


104118 28-Sep-2002 peter

Deal with some SMP races by doing the entire copyin at once rather
than doing the checks piecemeal and then doing a second copyin later.

PR: 38021
Submitted by: davidx (I've tweaked the patch a bit)


104106 28-Sep-2002 peter

Repair range checking for reading the ldt list.

PR: 38016
Submitted by: davidx


104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


103870 23-Sep-2002 alfred

use __packed.


103865 23-Sep-2002 jhb

Update the nexus driver for the addition of the legacy driver:
- nexus no longer has PCI bridges as direct children, so the PCI bus
ivar is no longer used and is removed.
- Don't attach default EISA, ISA, or MCA busses. Instead, if we do not
have an acpi0 device after bus_generic_probe(), add a legacy0 child
device.
- Remove machine/nexusvar.h.


103862 23-Sep-2002 jhb

Add a new legacy(4) device driver for use on machines that do not have
ACPI or for when ACPI support is disabled or not present in the kernel.
Basically, the nexus device is now split into two with some parts
(such as adding default ISA, MCA, and EISA busses if they aren't found
as well as support for PCI bus device ivars) being moved to the legacy
driver.


103778 22-Sep-2002 peter

Create inlines for ltr(sel), lldt(sel), lidt(addr) rather than
functions that have one instruction.


103772 22-Sep-2002 mdodd

- Move the init of %gs and pcb_gs before user_ldt_free().
- Always call load_gs()
- Trim comments.

This addresses some of the issues raised by BDE.


103770 22-Sep-2002 jake

Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c
so that it is MI. Allow nfs_mountroot to return an error if the nfs_diskless
struct is not valid, rather than panicing later on. Call nfs_setup_diskless()
from nfs_mountroot if NFS_ROOT is defined, like bootpc_init(). Removed legacy
root mount support for sparc64, and enabled NFS_ROOT by default.


103756 21-Sep-2002 markm

Use a function instead of a non-portable, GCC-specific asm() entry.


103755 21-Sep-2002 markm

A good dose of style.9. No functional change.


103753 21-Sep-2002 markm

Code tidy-up. ISOfy, turn a macro into an inline for lint(1) (perhaps
this needs to go to cpufunc.h?), de-register.


103733 21-Sep-2002 phk

Fix a 3 year old oversight: Remove the #ifdef/#endif pair now that there
is nothing between them anymore.

Spotted by: peter.


103706 20-Sep-2002 phk

We need neither <sys/diskslice.h> nor <sys/disklabel.h> here.

Sponsored by: DARPA & NAI Labs.


103703 20-Sep-2002 phk

For reasons now lost in historical fog, the bounds_check_with_label()
function were put in i386/i386/machdep.c from where it has been
cut and pasted to other architectures with only minor corruption.

Disklabel is really a MI format in many ways, at least it certainly
is when you operate on struct disklabel.

Put bounds_check_with_label() back in subr_disklabel.c where it belongs.

Sponsored by: DARPA & NAI Labs.


103682 20-Sep-2002 jhb

fork_trampoline() marks a trap frame.

Submitted by: bde


103681 20-Sep-2002 jhb

Use proper type for a variable used as a DDB symbol.


103680 20-Sep-2002 jhb

Trim includes.

Submitted by: bde


103679 20-Sep-2002 jhb

Various style fixes, including moving db_print_backtrace() out of the
middle of the watchpoint code.

Submitted by: bde


103649 19-Sep-2002 mdodd

This patch enables FreeBSD i686 MTRR support on Intel Pentium
4/XEON processors, which are not currently recognized.

Submitted by: Christian Zander <zander@minion.de>


103646 19-Sep-2002 jhb

Implement db_print_backtrace() if DDB is compiled into the kernel. This
MD function is just a wrapper around db_stack_trace_cmd() that prints out
a backtrace of curthread. Currently, this function is only implemented
on i386 and alpha (and the alpha version isn't quite tested yet, will do
that in a bit). Other changes:

- For i386, fix a bug in the raw frame address case. The eip we extract
from the passed in frame address does not match the frame we received.
Thus, instead of printing a bogus frame with the wrong eip, go ahead
and advance frame down to the same frame as the eip we are using.
- For alpha, attempt to add a way of doing a raw trace for alpha. Instead
of passing a frame address in 'addr', pass in a pointer to a structure
containing PC and KSP and use those to start the backtrace. The alpha
db_print_backtrace() uses asm to read in the current PC and KSP values
into such a request.

Tested on: i386
Requested by: many


103645 19-Sep-2002 mdodd

From Christian Zander:

This patch addresses a bug that can cause a GPF in the kernel - if a
process makes use of i386_set_ldt to install a LDT entry, then loads
a corresponding segment descriptor into %gs, forks, and if the child
execs.

In this scenario, setregs executes user_ldt_free and then determines
how to reset the %gs register:

/* reset %gs as well */
if (pcb == curpcb)
load_gs(_udatasel);
else
pcb->pcb_gs = _udatasel;

This is insufficient in the fork/exec case, since pcb will be equal
to curpcb when the child execs; load_gs will reset %gs to _udatasel
but it doesn't reset pcb->pcb_gs; upon return from the system call,
cpu_switch_load_gs will thus attempt to restore %gs from pcb->pcb_gs
and trigger a GPF since all LDT entries have already been cleared.

The fix is to always reset pcb->pcb_gs to _udatasel.

Submitted by: Christian Zander <zander@minion.de>
Reviewed by: jake


103527 18-Sep-2002 iwasaki

Restore status register A of RTC at resume time.
This should fix the 'too many RTC interrupts and statclock seems
broken after resume' problem.

MFC after: 1 week


103482 17-Sep-2002 phk

Add /dev/soekris-errled device to control the Error-LED on Soekris cards/boxes.

# turn LED off
echo '0' > /dev/soekris-errled

# turn LED on
echo '1' > /dev/soekris-errled

# flash LED (5 hz)
echo 'f' > /dev/soekris-errled

# flash LED (4/2 = 2 hz), syntax: "f[1-9]" -> .5 -> 4.5 Hz
echo 'f4' > /dev/soekris-errled

# flash digits 1,3 and 7, syntax: "d[1-9]*"
echo 'd137' > /dev/soekris-errled

Characters not understood are ignored.


103477 17-Sep-2002 sobomax

Don't reference cpu_fxsr unless CPU_ENABLE_SSE is defined. This fixes kernel
in !CPU_ENABLE_SSE case.


103436 17-Sep-2002 peter

Initiate deorbit burn for the i386-only a.out related support. Moves are
under way to move the remnants of the a.out toolchain to ports. As the
comment in src/Makefile said, this stuff is deprecated and one should not
expect this to remain beyond 4.0-REL. It has already lasted WAY beyond
that.

Notable exceptions:
gcc - I have not touched the a.out generation stuff there.
ldd/ldconfig - still have some code to interface with a.out rtld.
old as/ld/etc - I have not removed these yet, pending their move to ports.
some includes - necessary for ldd/ldconfig for now.

Tested on: i386 (extensively), alpha


103407 16-Sep-2002 mini

Add kernel support needed for the KSE-aware libpthread:
- Maintain fpu state across signals.
- Use ucontext_t's to store KSE thread state.
- Synthesize state for the UTS upon each upcall, rather than
saving and copying a trapframe.
- Save and restore FPU state properly in ucontext_t's.

Reviewed by: deischen, julian
Approved by: -arch


103367 15-Sep-2002 julian

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


103346 15-Sep-2002 dwmalone

Some BIOSs are using MTRR values that are only documented under NDA
to control the mapping of things like the ACPI and APM into memory.

The problem is that starting X changes these values, so if something
was using the bits of BIOS mapped into memory (say ACPI or APM),
then next time they access this memory the machine would hang.

This patch refuse to change MTRR values it doesn't understand,
unless a new "force" option is given. This means X doesn't change
them by accident but someone can override that if they really want
to.

PR: 28418
Tested by: Christopher Masto <chris@netmonger.net>,
David Bushong <david@bushong.net>,
Santos <casd@myrealbox.com>
MFC after: 1 week


103168 10-Sep-2002 sam

move some printfs under bootverbose

Reviewed by: phk


103081 07-Sep-2002 jmallett

Fill out two fields (si_pid, si_uid) in the siginfo structure handed back
to userland in the signal handler that were not being iflled out before, but
should and can be.

This part of sendsig could be slightly refactored to use an MI interface, or
ideally, *sendsig*() would have an API change to accept a siginfo_t, which
would be filled out by an MI function in the level above sendsig, and said MI
function would make a small call into MD code to fill out the MD parts (some
of which may be bogus, such as the si_addr stuff in some places). This would
eventually make it possible for parts of the kernel sending signals to set up
a siginfo with meaningful information.

Reviewed by: mux
MFC after: 2 weeks


103076 07-Sep-2002 jmallett

Match the more modern ports and comment the filling of POSIX parts of siginfo
with 'Fill in POSIX parts'. (Diff reduction.)


103064 07-Sep-2002 peter

Automatically enable CPU_ENABLE_SSE (detect and enable SSE instructions)
if compiling with I686_CPU as a target. CPU_DISABLE_SSE will prevent
this from happening and will guarantee the code is not compiled in.

I am still not happy with this, but gcc is now generating code that uses
these instructions if you set CPUTYPE to p3/p4 or athlon-4/mp/xp or higher.


103049 07-Sep-2002 peter

Zap the implementations of the i386-aout specific cpu_coredump function.
Most of the non-i386 platforms had rather broken implementations anyway.


102974 05-Sep-2002 jhb

Move some variables to the BSS instead of explicitly zero'ing them. This
also makes all of the PCIbios variable be zero'd, not just the entry field.


102935 04-Sep-2002 phk

On the ElanSC520 CPU use general purpose timer#2 as timecounter.

This is a vast improvement over the i8254, since it is a simple
memory load rather than a comples sequence of interrupt blocking,
multiple input/output instructions, and wrap-around detection.

I have not bothered to time the fundamental timecounter get routine,
but gettimeofday(2) is 10% faster with the ELAN timecounte.

The downside is that HZ=100 is not enough, 150 or more recommended,
I use 250 myself.


102934 04-Sep-2002 phk

Change the support for AMDs ElanSC520 CPU from being a device driver to
be
options CPU_ELAN
(NB: Soekris.com users!)

It is cleaner this way. We still recognize the cpu on the host-pci bridge.


102920 04-Sep-2002 jhb

Use resource_list_print_type() instead of duplicating the code in
nexus_print_resources().


102808 01-Sep-2002 jake

Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.


102667 31-Aug-2002 bde

db_ps.c:
Don't attempt to follow null pointers for zombie processes in db_ps().

Style fix: use explicit an comparison with NULL for all null pointer
checks in db_ps() instead of for half of them.

db_interface.c:
Fixed ddb's handling of traps from with ddb on i386's only.

This was mostly fixed in rev.1.27 (by longjmp()'ing back to the top
level) but was completly broken in rev.1.48 (by not unwinding the new
state (mainly db_active) either before or after the longjmp(). This
mostly never worked for other arches, since rev.1.27 has not been ported
and lower level longjmp()'s only handle traps for memory accesses. All
cases should be handled at a lower level to provided better control and
simplify unwinding of state.

Implementation details: don't pretend to maintain db_active in a nested
way -- ddb cannot be reentered in a nested way. Use db_active instead
of the db_global_jmpbuf_valid flag and longjmp()'s return value for things
related to reentering ddb. [re]entering is still not atomic enough.


102666 31-Aug-2002 peter

Take a shot at fixing up a whole stack of style and other embarresing
unforced errors that Bruce identified. I have not yet addressed all of
his concerns.


102603 30-Aug-2002 ache

Unbreak kernel build by printing Maxmem using %ld instead of old (now changed)
%u


102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


102561 29-Aug-2002 jake

Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.


102543 28-Aug-2002 peter

OK, I have had it with losing my console because the AP's print their "I am
alive!" message right as the scsi probe messages happen. This is a bit
nasty, but it seems to work. At the point that we unlock the AP's, briefly
wait till they are all done while we hold the console on their behalf.


102412 25-Aug-2002 charnier

Replace various spelling with FALLTHROUGH which is lint()able


102399 25-Aug-2002 alc

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


102329 23-Aug-2002 peter

Ok, somebody please shoot me. The asm I wrote for the ranged IPI shootdown
was wrong. It only ever invalidated one page due to me getting the loop
terminator wrong. This explains the DISABLE_PG_G effect on SMP.


102241 21-Aug-2002 archie

Don't use "NULL" when "0" is really meant.


102041 18-Aug-2002 alc

o Simplify the ptphint test in pmap_release_free_page(). In other words,
make it just like the test in _pmap_unwire_pte_hold().


101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101879 14-Aug-2002 jmallett

Document why the has_f00f_bug variable is initialised rather than placed into
the BSS (so that it can be binary-patched).

Inspired by: bde


101770 13-Aug-2002 alc

o Remove an unnecessary vm_page_flash() from _pmap_unwire_pte_hold().

Reviewed by: peter


101751 12-Aug-2002 alc

o Convert three instances of vm_page_sleep_busy() into vm_page_sleep_if_busy()
with page queue locking.


101721 12-Aug-2002 iedowse

Use roundup2() to avoid a problem where pmap_growkernel was unable
to extend the kernel VM to the maximum possible address of 4G-4M.

PR: i386/22441
Submitted by: Bill Carpenter <carp@world.std.com>
Reviewed by: alc


101635 10-Aug-2002 alc

o Remove the setting and clearing of the PG_MAPPED flag. (This flag is
obsolete.)


101352 05-Aug-2002 peter

Revert rev 1.356 and 1.352 (pmap_mapdev hacks). It wasn't worth the
pain.


101322 04-Aug-2002 peter

Fix a mistake in 1.352 - I was returning a pointer to the rounded down
address. I expect this will fix acpica.


101294 04-Aug-2002 alc

o Request a wired page from vm_page_grab() in _pmap_allocpte().


101280 03-Aug-2002 alc

o Ask for a prezeroed page in pmap_pinit() for the page directory page.


101254 03-Aug-2002 alc

o Don't set PG_MAPPED on the page allocated and mapped in _pmap_allocpte().
(Only set this flag if the mapping has a corresponding pv list entry,
which this mapping doesn't.)


101249 03-Aug-2002 peter

Take advantage of the fact that there is a small 1MB direct mapped region
on x86 in between KERNBASE and the kernel load address. pmap_mapdev()
can return pointers to this for devices operating in the isa "hole".


101248 03-Aug-2002 peter

Take a shot at fixing a nasty bug in the pmap changes that I did. I
missed the pmap_kenter/kremove in this file, which leads to read()/write()
of /dev/mem using stale TLB entries. (gah!) Fortunately, mmap of /dev/mem
wasn't affected, so it wasn't as bad as it could have been. This throws
some light on the 'X server affects stability' thread....

Pointed out by: bde


101229 02-Aug-2002 phk

SYSINIT needs to be SI_SUB_PSEUDO. Add a printf to tell we are here.


101225 02-Aug-2002 phk

Add the minimalist elan-mmcr device driver.

This driver allows a userland program to mmap the MMCR of the AMD
Elan sc520 CPU.


101197 02-Aug-2002 alc

o Lock page queue accesses by vm_page_deactivate().


101105 31-Jul-2002 alc

o Setting PG_MAPPED and PG_WRITEABLE on pages that are mapped and unmapped
by pmap_qenter() and pmap_qremove() is pointless. In fact, it probably
leads to unnecessary pmap_page_protect() calls if one of these pages is
paged out after unwiring.

Note: setting PG_MAPPED asserts that the page's pv list may be
non-empty. Since checking the status of the page's pv list isn't any
harder than checking this flag, the flag should probably be eliminated.
Alternatively, PG_MAPPED could be set by pmap_enter() exclusively
rather than various places throughout the kernel.


101054 31-Jul-2002 phk

The Elan SC520 MMCR is actually 16bit wide, so u_char is inconvenient.


100912 30-Jul-2002 alc

o Lock page queue accesses by pmap_release_free_page().


100862 29-Jul-2002 alc

o Pass VM_ALLOC_WIRED to vm_page_grab() rather than calling vm_page_wire()
in pmap_new_thread(), pmap_pinit(), and vm_proc_new().
o Lock page queue accesses by vm_page_free() in pmap_object_init_pt().


100781 28-Jul-2002 peter

Unwind the syscall_with_err_pushed tweak that jake did some time back.

OK'ed by: jake


100646 24-Jul-2002 julian

Add some locking asserts and some comments


100432 21-Jul-2002 peter

Move SWTCH_OPTIM_STATS related code out of cpufunc.h. (This sort of stat
gathering is not an x86 cpu feature)


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


100378 19-Jul-2002 alc

o Use vm_page_alloc(... | VM_ALLOC_WIRED) in place of vm_page_wire().


100321 18-Jul-2002 phk

Add initialization code for the AMD Elan sc520 which maps the MMCR
into KVM and sets the i8254 frequency to the correct value.


100275 18-Jul-2002 peter

Use pmap_kenter() rather than vtopte() and bashing the page tables
directly.


100264 17-Jul-2002 peter

Avoid trying to set PG_G on the first 4MB when we set up the 4MB page.
This solves the SMP panic for at least one system. I'd still like to know
why my xeon works though.

Tested by: bmilekic


100220 17-Jul-2002 dillon

Qualify comment on machdep.cpu_idle_hlt. Turning this on on a SMP
machine will result in approximately a 4.2% loss of performance (buildworld)
and approximately a 5% reduction in power consumption (when idle). Add XXX
note on how to really make hlt work (send an IPI to wakeup HLTed cpus on
a thread-schedule event? Generate an interrupt somehow?).


100152 15-Jul-2002 peter

The pmap_invalidate_all() here is definately not a good idea. We are
running with interrupts disabled, other cpus locked down, and only
making a temporary local mapping that we immediately back out again.

Tested by: gallatin


99987 14-Jul-2002 alc

o Lock page queue accesses by vm_page_wire().


99931 13-Jul-2002 peter

Two invlpg's slipped through that were not protected from I386_CPU

Pointed out by: dillon


99930 13-Jul-2002 peter

invlpg() does not work too well on i386 cpus. Add token i386 support
back in to the pmap_zero_page* stuff.


99929 13-Jul-2002 peter

Do global shootdowns when switching to/from 4MB pages. I believe we can
do a shootdown on a 4MB "page" though, but this should be safer for now.

Noticed by: tegge


99928 13-Jul-2002 peter

Bandaid for SMP. Changing APTDpde without a global shootdown is not
safe yet. We used to do a global shootdown here anyway so another day
or so shouldn't hurt.


99925 13-Jul-2002 alc

o Lock some page queue accesses, in particular, those by vm_page_unwire().


99900 13-Jul-2002 mini

Add additional cred_free_thread() calls that I had missed the first time.

Pointed out by: jhb


99890 12-Jul-2002 dillon

Re-enable the idle page-zeroing code. Remove all IPIs from the idle
page-zeroing code as well as from the general page-zeroing code and use a
lazy tlb page invalidation scheme based on a callback made at the end
of mi_switch.

A number of people came up with this idea at the same time so credit
belongs to Peter, John, and Jake as well.

Two-way SMP buildworld -j 5 tests (second run, after stabilization)
2282.76 real 2515.17 user 704.22 sys before peter's IPI commit
2266.69 real 2467.50 user 633.77 sys after peter's commit
2232.80 real 2468.99 user 615.89 sys after this commit

Reviewed by: peter, jhb
Approved by: peter


99887 12-Jul-2002 jhb

Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.

Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64


99864 12-Jul-2002 peter

Be specific about which reason caused vm86_addpages to panic


99862 12-Jul-2002 peter

Revive backed out pmap related changes from Feb 2002. The highlights are:
- It actually works this time, honest!
- Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive,
so try and optimize things where possible.
- Introduce ranged shootdowns that can be done as a single IPI.
- PG_G support for i386
- Specific-cpu targeted shootdowns. For example, there is no sense in
globally purging the TLB cache for where we are stealing a page from
the local unshared process on the local cpu. Use pm_active to track
this.
- Add some instrumentation for the tlb shootdown code.
- Rip out SMP code from <machine/cpufunc.h>
- Try and fix some very bogus PG_G and PG_PS interactions that were bad
enough to cause vm86 bios calls to break. vm86 depended on our existing
bugs and this was the cause of the VESA panics last time.
- Fix the silly one-line error that caused the 'panic: bad pte' last time.
- Fix a couple of other silly one-line errors that should have caused more
pain than they did.

Some more work is needed:
- pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we
have a hook in cpu_switch.
- The IPI handlers need some cleanup. I have a bogus %ds load that can
be avoided.
- APTD handling is rather bogus and appears to be a large source of
global TLB IPI shootdowns for no really good reason.

I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop.
I expect to see a bigger difference when there is significant pageout
activity or the system otherwise has memory shortages.

I have backed out a few optimizations that I had been using over the last
few days in order to be a little more conservative. I'll revisit these
again over the next few days as the dust settles.

New option: DISABLE_PG_G - In case I missed something.


99852 12-Jul-2002 peter

Unexpand a couple of 8-space indents that I added in rev 1.285.


99766 11-Jul-2002 peter

Bah, move the invltlb counter to C code and hook a debug sysctl onto it.


99765 11-Jul-2002 peter

s/NCPU/MAXCPU/ to try and get this to compile.


99746 10-Jul-2002 julian

fix a comment and note a problem with XXXSMP


99742 10-Jul-2002 dillon

Remove the critmode sysctl - the new method for critical_enter/exit (already
the default) is now the only method for i386.

Remove the paraphanalia that supported critmode. Remove td_critnest, clean
up the assembly, and clean up (mostly remove) the old junk from
cpu_critical_enter() and cpu_critical_exit().


99741 10-Jul-2002 obrien

Consistently line-up /**/ comments so they don't cause line wrappage.


99703 10-Jul-2002 julian

Include all of isa/ipl.s into exception.s as there is now nothing left in
ipl.s except doreti which really belongs in with the exceptions as it's
just the other side of the same coin. Will remove ipl.s in a separate commit.

Agreed by: several including bde@freebsd.org


99571 08-Jul-2002 peter

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


99567 08-Jul-2002 peter

s/procrunnable/kserunnable/ in a comment


99561 08-Jul-2002 peter

Fix a hideous TLB bug. pmap_unmapdev neglected to remove the device
mappings from the page tables, which were mapped with PG_G! We could
reuse the page table entry for another mapping (pmap_mapdev) but it
would never have cleared any remaining PG_G TLB entries.


99559 07-Jul-2002 peter

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


99537 07-Jul-2002 mux

One #include <sys/lock.h> is enough.

Submitted by: Olivier Houchard <cognet@ci0.org>


99415 04-Jul-2002 peter

Diff reduction (microoptimization) with another WIP. Move the frame
calculation in get_ptbase() to a little later on.


99398 04-Jul-2002 julian

Don't free pages we never allocated..

My eyes openned by: Matt


99397 04-Jul-2002 julian

Slight restatement of the code and remove some unused variables.


99381 03-Jul-2002 julian

Add comments and slightly rearrange the thread stack assignment code
to try make it less obscure.


99380 03-Jul-2002 julian

Remove vestiges of old code...
These functions are always called on new memory so they can
not already be set up, so don't bother testing for that.
(This was left over from before we used UMA (which is cool))


99095 29-Jun-2002 julian

Fix reverse ordering of locks. add a comment about locks on some platforms.

Submitted by: jhb@freebsd.org


99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


98902 27-Jun-2002 arr

Fix for the problem stated below by Tor Egge:
(from: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832566+0+ \
current/freebsd-current)

"Too many pages were prefaulted in pmap_object_init_pt, thus
the wrong physical page was entered in the pmap for the virtual
address where the .dynamic section was supposed to be."

Submitted by: tegge
Approved by: tegge's patches never fail


98892 26-Jun-2002 iedowse

Avoid using the 64-bit vm_pindex_t in a few places where 64-bit
types are not required, as the overhead is unnecessary:

o In the i386 pmap_protect(), `sindex' and `eindex' represent page
indices within the 32-bit virtual address space.
o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary
variable to store the low few bits of a vm_pindex_t that gets used
as an array index.
o vm_uiomove() uses `osize' and `idx' for page offsets within a
map entry.
o In vm_object_split(), `idx' is a page offset within a map entry.


98824 25-Jun-2002 iedowse

Complete the initial set of VM changes required to support full
64-bit file sizes. This step simply addresses the remaining overflows,
and does attempt to optimise performance. The details are:

o Use a 64-bit type for the vm_object `size' and the size argument
to vm_object_allocate().
o Use the correct type for index variables in dev_pager_getpages(),
vm_object_page_clean() and vm_object_page_remove().
o Avoid an overflow in the i386 pmap_object_init_pt().


98778 24-Jun-2002 peter

Compile in the cpu halt code even on SMP, instead just default the
sysctl (machdep.cpu_idle_hlt) to off in the SMP case. This allows you to
turn it on if you wish and do not particularly care about the small window
where a cpu will remain halted even when a job is placed on the run queue
(until the next clock tick).


98765 24-Jun-2002 jake

Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.


98728 24-Jun-2002 mini

userout -> out. These two labels are now identical.

Approved by: alfred


98727 24-Jun-2002 mini

Remove unused diagnostic function cread_free_thread().

Approved by: alfred


98648 22-Jun-2002 jdp

Fix several bugs in the i386 asm statements used to speed up Internet
checksumming. These bugs could possibly cause bad code to be
generated at elevated optimization levels.

First, eliminate the use of preprocessor magic to form the address
fields of asm instructions. It hid the actual addresses being
referenced from the compiler. Without knowledge of all the data
dependencies, the compiler might possibly use optimizations which
would result in incorrect code.

Use "__asm __volatile" rather than "__asm" for instruction sequences
that pass information through the condition codes (the carry bit, in
this case). Without __volatile, the compiler might add unrelated
code between consecutive __asm instructions, modifying the condition
codes. I have seen GCC insert stack pointer adjustments in this
way, for example. Unfortunately, GCC doesn't provide a way to
specify dependencies on the condition codes. You can specify that
they are clobbered, but not that you are going to use them as input.

Finally, simplify the LOAD macro. This macro is used as a poor
man's prefetch. The simpler version gives the compiler more leeway
about just how it performs the prefetch.

MFC after: 1 week


98618 22-Jun-2002 mp

Clock frequencies reported by sysctl should be unsigned values. Discovered
when machdep.tsc_freq returned a negative number on a 2.2GHz Xeon.

Submitted by: Brian Harrison <bharrison@ironport.com>
Reviewed by: phk
MFC after: 1 week


98482 20-Jun-2002 peter

Use suword16/fuword16 instead of susword/fusword - this has two different
definitions so far.. 16 bit on x86 and appears to be 32 bit on sparc64.
Be explicit to avoid suprises.


98480 20-Jun-2002 peter

Deorbit suibyte(). It was only used for split address space systems
for supporting UIO_USERISPACE (ie: it wasn't used).


98361 17-Jun-2002 jeff

- Introduce the new M_NOVM option which tells uma to only check the currently
allocated slabs and bucket caches for free items. It will not go ask the vm
for pages. This differs from M_NOWAIT in that it not only doesn't block, it
doesn't even ask.

- Add a new zcreate option ZONE_VM, that sets the BUCKETCACHE zflag. This
tells uma that it should only allocate buckets out of the bucket cache, and
not from the VM. It does this by using the M_NOVM option to zalloc when
getting a new bucket. This is so that the VM doesn't recursively enter
itself while trying to allocate buckets for vm_map_entry zones. If there
are already allocated buckets when we get here we'll still use them but
otherwise we'll skip it.

- Use the ZONE_VM flag on vm map entries and pv entries on x86.


98145 12-Jun-2002 bde

If trap() is called when ddb is active, then go directly to trap_fatal();
do not blunder around enabling interrupts and running trap handlers.
trap_pfault() will normally pass control to ddb's fault handler which
will normally do the right thing.

This bug is very old. but in old versions of FreeBSD it is probably only
serious for trap handling that involves sleeping. In -current, attempting
to examine unmapped memory while stopped at a breakpoint at mi_switch()
was always fatal.


98001 07-Jun-2002 jhb

- Fixup / remove obsolete comments.
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.


97307 26-May-2002 dfr

Add declarations of suword32 and suword64. Add implementations of one or
the other (or both) to all the platforms. Similar for fuword32 and
fuword64.


96527 13-May-2002 bde

Fixed a semantic error. va_arg(ap, u_short) is nonsense except on i386's
with 16-bit ints, since u_short is promoted when it is passed to a
varargs function. gcc now warns about this. We always pass small
integers (this is well obuscated), so there are no conversion problems.

Fixed a related style bug (bogus cast).


96517 13-May-2002 bde

Fixed a syntax error (a label not followed by a statement).


96487 13-May-2002 jake

These were repo-copied to dump_machdep.c.


96035 04-May-2002 fenner

Restore the ability interrupt dumps on i386, based on
the old kern_shutdown.c . Other archs might be able to
use similar code but I don't have anything to test on.


95814 30-Apr-2002 phk

Don't export timecounter structures under debug. with sysctl, they
contain no truly interesting data anymore.


95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


95579 27-Apr-2002 alc

For what it's worth, fix the compilation of an I386_CPU-only kernel
now that certain warnings are fatal.


95571 27-Apr-2002 alc

Don't call vm_map_growstack() from trapwrite() as vm_fault() now performs
this automatically.


95489 26-Apr-2002 phk

Remove the tc_update() function. Any frequency change to the
timecounter will be used starting at the next second, which is
good enough for sysctl purposes. If better adjustment is needed
the NTP PLL should be used.


95410 25-Apr-2002 marcel

Don't use the symbol name to lookup the symbol value when we can use
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.

Don't pass the symbol name to elf_reloc(), as it isn't used any more.


95320 23-Apr-2002 phk

Don't free(9) a pointer which has been modified.

Chapeau de pointe: mux


95076 19-Apr-2002 alfred

Clean up:

Comment run_filter() to explain what it does.

Remove chatty comments.

void busdma_swi() { } -> void busdma_swi(void) { }


94977 18-Apr-2002 alc

o Call vm_map_growstack() from vm_fault() if vm_map_lookup() has failed
due to conditions that suggest the possible need for stack growth.
This has two beneficial effects: (1) we can
now remove calls to vm_map_growstack() from the MD trap handlers and (2)
simple page faults are faster because we no longer unnecessarily perform
vm_map_growstack() on every page fault.
o Remove vm_map_growstack() from the i386's trap_pfault().
o Remove the acquisition and release of Giant from i386's trap_pfault().
(vm_fault() still acquires it.)


94967 17-Apr-2002 tegge

Fix typo in adjusted panic message.

Submitted by: cokane


94962 17-Apr-2002 tegge

Update io_apic_ints array properly when revoking an irq mapping.
Adjust panic message.

Submitted by: David Xu <bsddiy@yahoo.com>


94936 17-Apr-2002 mux

Rework the kernel environment subsystem. We now convert the static
environment needed at boot time to a dynamic subsystem when VM is
up. The dynamic kernel environment is protected by an sx lock.

This adds some new functions to manipulate the kernel environment :
freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be
called after every getenv() when you have finished using the string.
testenv() only tests if an environment variable is present, and
doesn't require a freeenv() call. setenv() and unsetenv() are self
explanatory.

The kenv(2) syscall exports these new functionalities to userland,
mainly for kenv(1).

Reviewed by: peter


94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


94683 14-Apr-2002 dwmalone

Make the MTRR code a bit more defensive - this should help people
trying to run X on some Athlon systems where the BIOS does odd things
(mines an ASUS A7A266, but it seems to also help on other systems).

Here's a description of the problem and my fix:

The problem with the old MTRR code is that it only expects
to find documented values in the bytes of MTRR registers.
To convert the MTRR byte into a FreeBSD "Memory Range Type"
(mrt) it uses the byte value and looks it up in an array.
If the value is not in range then the mrt value ends up
containing random junk.

This isn't an immediate problem. The mrt value is only used
later when rewriting the MTRR registers. When we finally
go to write a value back again, the function i686_mtrrtype()
searches for the junk value and returns -1 when it fails
to find it. This is converted to a byte (0xff) and written
back to the register, causing a GPF as 0xff is an illegal
value for a MTRR byte.

To work around this problem I've added a new mrt flag
MDF_UNKNOWN. We set this when we read a MTRR byte which
we do not understand. If we try to convert a MDF_UNKNOWN
back into a MTRR value, then the new function, i686_mrt2mtrr,
just returns the old value of the MTRR byte. This leaves
the memory range type unchanged.

I'd like to merge this before the 4.6 code freeze, so if people
can test this with XFree 4 that would be very useful.

PR: 28418, 25958
Tested by: jkh, Christopher Masto <chris@netmonger.net>
MFC after: 2 weeks


94383 10-Apr-2002 alc

o In osigreturn(), restore all of the registers in one place.
o Recent changes to osigreturn() and sigreturn() have made them MPSAFE. Add
a comment to this effect.

Submitted by: bde (bullet #1)
Reviewed by: jhb (bullet #2)


94275 09-Apr-2002 phk

GC various bits and pieces of USERCONFIG from all over the place.


94197 08-Apr-2002 bde

Removed ispc98 sysctl completely. Applications should understand that
ispc98 isn't set if its sysctl doesn't exist. At least make(1) already
understands this.

Approved by: nyan


94151 07-Apr-2002 phk

GC the "dumplo" variable, which is no longer used.

A lot of sys/*/*/machdep.c seems not to be.


93944 06-Apr-2002 nyan

Remove pc98 code.


93823 04-Apr-2002 dillon

Embed a struct vmmeter in the per-cpu structure and add a macro,
PCPU_LAZY_INC() which increments elements in it for cases where we
can afford the occassional inaccuracy. Use of per-cpu stats counters
avoids significant cache stalls in various critical paths that would
otherwise severely limit our cpu scaleability.

Adjust all sysctl's accessing cnt.* elements to now use a procedure
which aggregates the requested field for all cpus and for the global
vmmeter.

The global vmmeter is retained, since some stats counters, like v_free_min,
cannot be made per-cpu. Also, this allows us to convert counters from
the global vmmeter to the per-cpu vmmeter in a piecemeal fashion, so
have at it!


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93794 04-Apr-2002 brian

Back out the previous commit.

In the i386 case, options BOOTP requires options NFS_ROOT as well as
options NFSCLIENT. With *both* the NFS options, a bootpc_init()
prototype is brought in by nfsclient/nfsdiskless.h.

In the ia64 case, it just doesn't work and my change just pushes it
further away from working.

Suggested to be wrong by: bde


93793 04-Apr-2002 bde

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


93785 04-Apr-2002 brian

Pre-declare bootpc_init() so that options BOOTP doesn't break the
build in ia64 and i386 due to -Werror.


93717 03-Apr-2002 marcel

Make the kernel dump header endianness invariant by always dumping
in dump byte order (=network byte order). Swap blocksize and dumptime
to avoid extraneous padding on 64-bit architectures. Use CTASSERT
instead of runtime checks to make sure the header is 512 bytes large.
Various style(9) fixes.

Reviewed by: phk, bde, mike


93702 02-Apr-2002 jhb

- Move the MI mutexes sched_lock and Giant from being declared in the
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.

Tested on: alpha, i386


93607 01-Apr-2002 dillon

Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by: jake


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93496 31-Mar-2002 phk

Here follows the new kernel dumping infrastructure.

Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc. (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed. The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev. To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented. All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record. The header record
is dumped in readable format in the .info file. The kernel
is not saved. Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by: DARPA, NAI Labs


93467 31-Mar-2002 phk

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


93461 31-Mar-2002 alc

Implement i386's (o)sigreturn() like the alpha's: Use copyin() to read
the osigcontext or ucontext_t rather than useracc() followed by direct user-
space memory accesses. This reduces (o)sigreturn()'s execution time by 5-
50%.

Submitted by: bde


93334 28-Mar-2002 nyan

Remove unneeded pc98 hack.


93312 28-Mar-2002 obrien

style(9)

Approved by: jake


93273 27-Mar-2002 jeff

Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag. Remove the dup_list and dup_ok code from subr_witness. Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface. The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by: jhb


93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


93017 23-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


93005 23-Mar-2002 takawata

Add bios area range check (lower side).


92890 21-Mar-2002 alc

o Use the MI vm_map_growstack() instead of grow_stack() in trap_pfault()
and trapwrite().
o On i386/pc98, remove the (now) unused grow_stack().


92860 21-Mar-2002 imp

Fix abuses of cpu_critical_{enter,exit} by converting to
intr_{disable,restore} as well as providing an implemenation of
intr_{disable,restore}.

Reviewed by: jake, rwatson, jhb


92846 21-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


92824 20-Mar-2002 jhb

Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount. Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case. In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place. This also removse td_cache_ucred from struct thread. This is
still only done #ifdef DIAGNOSTIC.

Tested on: i386, alpha


92770 20-Mar-2002 alfred

Remove __P.


92765 20-Mar-2002 alfred

Remove __P.


92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


92651 19-Mar-2002 alc

Add #include so that the previous change compiles.


92649 19-Mar-2002 alfred

fix perfmon for DEVFS.

PR: kern/36008


92548 18-Mar-2002 alc

Eliminate grow_stack() from (o)sendsig(). If the stack needs to grow,
copyout() will page fault and perform grow_stack() from trap_pfault().
These calls to grow_stack() accomplish nothing.

Reviewed by: bde


92470 17-Mar-2002 alc

o Stop calling useracc() in (o)sendsig() now that we use copyout()
to copy the sigframe to the user's stack. Useracc() takes a non-trivial
amount of time. Eliminating it speeds up signal delivery by 15% or more.
o Update some comments.

Submitted by: bde


92018 10-Mar-2002 luigi

Export a (machine dependent) kernel variable bootdev as
machdep.guessed_bootdev, and add code to sysctl to parse its value
and give a (not necessarily correct) name to the device we booted
from (the main motivation for this code is to use the info in the
PicoBSD boot scripts, and the impact on the kernel is minimal).

NOTE: the information available in bootdev is not always reliable,
so you should not trust it too much. The parsing code is the same
as in boot2.c, and cannot cover all cases -- as it is, it seems to
work fine with floppies and IDE disks recognised by the BIOS. It
_should_ work as well with SCSI disks recognised by the BIOS.
Booting from a CDROM in floppy emulation will return /dev/fd0 (because
this is what the BIOS tells us).
Booting off the network (e.g. with etherboot) leaves bootdev unset so
the value will be printed as "invalid (0xffffffff)".

Finally, this feature might go away at some point, hopefully when we
have a more reliable way to get the same information.

MFC-after: 5 days


91978 10-Mar-2002 alc

Condition the compilation of trapwrite() on I386_CPU.


91893 08-Mar-2002 phk

#include <machine/smp.h> in the SMP case.
don't include <sys/smp.h> at all.

Fallout from: probably something jake did.
Hint by: jhb


91778 07-Mar-2002 jake

Add needed includes of machine/smp.h, remove nested include in sys/smp.h
so that inlines in machine/smp.h can use variables declared in sys/smp.h.


91673 05-Mar-2002 jeff

Add a new variable mp_maxid. This is used so that per cpu datastructures may
be allocated as arrays indexed by the cpu id. Previously the only reliable
way to know the max cpu id was through MAXCPU. mp_ncpus isn't useful here
because cpu ids may be sparsely mapped, although x86 and alpha do not do this.

Also, call cpu_mp_probe much earlier so the max cpu id is known before the VM
starts up. This is intended to help support per cpu queues for the new
allocator, but may be useful elsewhere.

Reviewed by: jake
Approved by: jake


91640 04-Mar-2002 iwasaki

Add generalized power profile code.
This makes other power-management system (APM for now) to be able to
generate power profile change events (ie. AC-line status changes), and
other kernel components, not only the ACPI components, can be notified
the events.

- move subroutines in acpi_powerprofile.c (removed) to kern/subr_power.c
- call power_profile_set_state() also from APM driver when AC-line
status changes
- add call-back function for Crusoe LongRun controlling on power
profile changes for a example


91504 28-Feb-2002 arr

- Move a comment from being on the same line as a #ifdef to the line
following it. This should have gone in the previous commit, but
misviewed Bruce's patch.

Requested by: bde


91473 28-Feb-2002 arr

- trap -> trap() in panic() string.
- Translate the message into some sort of understandable english.
- Fix a couple near-by style nits.

Submitted by: bde


91471 28-Feb-2002 silby

Fix a minor swap leak.

Previously, the UPAGES/KSTACK area of processes/threads would leak memory
at the time that a previously swapped process was terminated. Lukcily, the
leak was only 12K/proc, so it was unlikely to be a major problem unless you
had an undersized swap partition.

Submitted by: dillon
Reviewed by: silby
MFC after: 1 week


91460 28-Feb-2002 peter

Fix warnings.. bootpc_init() and related.


91422 27-Feb-2002 arr

- Insert a space in the panic() string in order more clearly show the
message.


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


91403 27-Feb-2002 silby

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


91368 27-Feb-2002 peter

Re-fix a pointer/integer warning.


91367 27-Feb-2002 peter

Back out all the pmap related stuff I've touched over the last few days.
There is some unresolved badness that has been eluding me, particularly
affecting uniprocessor kernels. Turning off PG_G helped (which is a bad
sign) but didn't solve it entirely. Userland programs still crashed.


91358 27-Feb-2002 peter

Bandaid for the Uniprocessor kernel exploding. This makes a UP kernel
boot and run (and indeed I am committing from it) instead of exploding
during the int 0x15 call from inside the atkbd driver to get the keyboard
repeat rates.


91353 27-Feb-2002 alfred

clarify panic message


91344 27-Feb-2002 peter

Jake further reduced IPI shootdowns on sparc64 in loops by using ranged
shootdowns in a couple of key places. Do the same for i386. This also
hides some physical addresses from higher levels and has it use the
generic vm_page_t's instead. This will help for PAE down the road.

Obtained from: jake (MI code, suggestions for MD part)


91341 27-Feb-2002 dillon

didn't quite undo the last reversion. This gets it.


91329 26-Feb-2002 dillon

revert compatibility fix temporarily (thought it would not break anything
leaving it in).


91328 26-Feb-2002 dillon

revert last commit temporarily due to whining on the lists.


91322 26-Feb-2002 dillon

Make peter's commit compatible with interrupt-enabled critical_enter()
and exit(), which has already solved the problem in regards to deadlocked
IPI's.


91315 26-Feb-2002 dillon

STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.

Other architectures are unaffected.

* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.

The new code places the burden of entering the critical section
in the trampoline code where it belongs.

* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.

* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.

* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.

* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.

* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.

* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.

Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.

* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.

Performance issues:

One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.

The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.

The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).

The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().

Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.


91262 26-Feb-2002 peter

Fix a warning. useracc() should take a const pointer argument.


91260 25-Feb-2002 peter

Work-in-progress commit syncing up pmap cleanups that I have been working
on for a while:
- fine grained TLB shootdown for SMP on i386
- ranged TLB shootdowns.. eg: specify a range of pages to shoot down with
a single IPI, since the IPI is very expensive. Adjust some callers
that used to trigger this inside tight loops to do a ranged shootdown
at the end instead.
- PG_G support for SMP on i386 (options ENABLE_PG_G)
- defer PG_G activation till after we decide what we are going to do with
PSE and the 4MB pages at the start of the kernel. This should solve
some rumored strangeness about stale PG_G entries getting stuck
underneath the 4MB pages.
- add some instrumentation for the fine TLB shootdown
- convert some asm instruction wrappers from functions to inlines. gcc
seems to do a fair bit better with this.
- [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix
this again shortly.

This has been working fairly well for me for a while, but I have tweaked
it again prior to commit since my last major testing round. The only
outstanding problem that I know of is PG_G related, which is why there
is an option for it (not on by default for SMP). I have seen a world
speedups by a few percent (as much as 4 or 5% in one case) but I have
*not* accurately measured this - I am a bit sceptical of these numbers.


91250 25-Feb-2002 peter

Tidy up some warnings


91090 22-Feb-2002 julian

Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by: dillon@freebsd.org


91066 22-Feb-2002 phk

Convert p->p_runtime and PCPU(switchtime) to bintime format.


90998 20-Feb-2002 peter

Pass me the pointy hat please. Be sure to return a value in a non-void
function. I've been running with this buried in the mountains of compiler
output for about a month on my desktop.


90960 20-Feb-2002 cjc

Fix typos in some comments.

PR: i386/35114
Submitted by: Gavin Atkinson <gavin.atkinson@ury.york.ac.uk>


90947 20-Feb-2002 peter

Some more tidy-up of stray "unsigned" variables instead of p[dt]_entry_t
etc.


90776 17-Feb-2002 deischen

Use struct __ucontext in prototypes and associated functions instead of
ucontext_t. Forward declare struct __ucontext in <sys/signal.h> and
remove reliance on <sys/ucontext.h> being included.

While I'm here, also hide osigcontext types from userland; suggested
by bde.

Namespace pollution noticed by: Kevin Day <toasty@shell.dragondata.com>


90762 17-Feb-2002 nyan

- Split the routine to initialize a bus_space_handle into the separate
function.
- Only access a bus_space_handle if the resource type is SYS_RES_MEMORY or
SYS_RES_IOPORT.
- Add the bus_space_subregion supports.


90748 17-Feb-2002 julian

If the credential on an incoming thread is correct, don't bother
reaquiring it. In the same vein, don't bother dropping the thread cred
when goinf ot userland. We are guaranteed to nned it when we come back,
(which we are guaranteed to do).

Reviewed by: jhb@freebsd.org, bde@freebsd.org (slightly different version)


90718 16-Feb-2002 bde

Don't leave garbage in parts of fpregs in the fxsr case. All callers
(procfs and ptrace) supply kernel stack garbage, so kernel context was
leaked to userland.

Reviewed by: des


90633 13-Feb-2002 bde

Don't confuse a struct with its first member. This fixes:
./@/i386/i386/machdep.c: In function `init386':
./@/i386/i386/machdep.c:1700: warning: assignment from incompatible pointer type


90590 12-Feb-2002 dwmalone

Add an option CPU_ATHLON_SSE_HACK which attempts to enable the SSE
feature bit on newer Athlon CPUs if the BIOS has forgotten to enable
it.

This patch was constructed using some info made available by John
Clemens at http://www.deater.net/john/PavilionN5430.html

Reviewed by: -audit
MFC after: 3 weeks


90589 12-Feb-2002 dwmalone

Move do_cpuid() from a identcpu.c into cpufunc.h.


90562 12-Feb-2002 alc

Remove an unused (but initialized) variable from vmapbuf().


90515 11-Feb-2002 bde

Garbage-collect the "LOCORE" version of MPLOCKED.


90468 10-Feb-2002 kato

Cosmetic changes:
- Collected i486 identification codes in one place like
586 and 686.
- Merged two cases (0x470 and 0x490) for `Enhanced Am486DX4
Write-Back.'
- Replaced `unknown' into `Unknown'.

Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)


90425 09-Feb-2002 kato

Recognize VIA C3 Samuel 2.

MFC after: 3 days


90372 07-Feb-2002 peter

Attempt to patch up some style bugs introduced in the previous commit


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


90344 07-Feb-2002 phk

GC the PC_SWITCH* symbols which are not used in assembly anymore.


90132 03-Feb-2002 bde

Use osigreturn(2) instead of sigreturn(2) plus broken magic for returning
from old signal handlers. This is simpler and faster, and fixes (new)
sigreturn(2) when %eip in the new signal context happens to match the
magic value (0x1d516). 0x1d516 is below the default ELF text section,
so this probably never broken anything in practice.

locore.s:
In addition, don't build the signal trampoline for old signal handlers
when it is not used.

alpha:
Not fixed, but seems to be even less broken in practice due to more
advanced magic. A false match occurs for register #32 in mc_regs[].
Since there is no hardware register #32, a false match is only possible
for direct calls to sigreturn(2) that happen to have the magic number
in the spare mc_regs[32] field.


90128 03-Feb-2002 bde

Improve the change in the previous commit: use a stub for osigreturn()
when it is not really used instead of unconditionalizing all of it.


90065 01-Feb-2002 bde

Compile osigreturn() unconditionally since it will always be needed on
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.

ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.

powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.

sparc64: copy the cleaned up the ia64 stub, since there was no stub before.


89990 30-Jan-2002 bde

Backed out the main part of revs.1.14-16. Don't disable interrupts in
the packet transfer routines, since rev.1.468 of machdep.c does this
better. I'm surprised that disabling interrupts helped much. Disabling
them in the packet receive routine is too late.

Fixed some minor style bugs in rev.1.14.


89989 30-Jan-2002 bde

Backed out the last vestiges of rev.1.51. Don't enter a critical
region in Debugger(), since rev.1.468 of machdep.c does this better.
Other cosmetic backouts.


89988 30-Jan-2002 bde

Cleaned up the 0ldSiG magic check before removing it. Just use fuword()
to fetch the magic word instead of useracc() plus a direct access.
This is more efficient as well as simpler and less incorrect:
- it was inefficent because useracc() takes much longer than just
accessing the data using a correct access method, at least on i386's.
- it was incorrect because direct access is incorrect unless the address
has been mapped. This and nearby direct accesses are mostly handled
better for other arches because they have to be (direct accesses don't
work).
- using magic in sigreturn is still fundamentally broken because false
matches are possible. On i386's, a false match occurs when %eip in a
new signal context happens to equal the magic value. This is not
handled better for other arches.


89980 30-Jan-2002 bde

Don't include <isa/isavar.h> or compile code depending on it when isa
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.

This was fixed in 1999 but was broken by unconditionalizing PNPBIOS.


89631 22-Jan-2002 peter

List bit 18 (reserved, apparently present on thunderbird cpus)
and bit 19 (athlon XP/MP rev 0x662 and later) for amd_features.

Submitted by: dwcjr


89489 18-Jan-2002 peter

Avoid __func__ string concatenation


89466 17-Jan-2002 bde

Changed the type of pcb_flags from u_char to u_int and adjusted things.
This removes the only atomic operation on a char type in the entire
kernel.


89412 16-Jan-2002 peter

Change <b28> to HTT (Hyperthreading technology). If this flag is set then
cpuid with %eax=1 will return a logical cpu count in bits 16-23 of %ebx.
Bit 29 is actually 'TM' according to AP-485. This signifies the presence
of the thermal control circuit (which I believe can slow the clock down
to reduce core temperature).


89410 16-Jan-2002 peter

Ensure that we set all the %cr0 bits to a known state for the AP's before
they make it through to userland. This should fix the p5-smp problem
without affecting the other cpus (eg: cyrix, see initcpu.c and the special
cache handling for these cpu types).


89195 10-Jan-2002 bde

Clear the single-step flag for signal handlers. This fixes bogus trace
traps on the first instruction of signal handlers.

In trap.c:syscall(), fake a trace trap if the single-step flag was set
on entry to the kernel, not if it will be set on exit from the kernel.
This fixes bogus trace traps after the last instruction of signal handlers.

gdb-4.18 (the version in FreeBSD) still has problems with the program in
the PR. These seem to be due to bugs in gdb and not in FreeBSD, and are
fixed in gdb-5.1 (the distribution version).

PR: 33262
Tested by: k Macy <kip_macy@yahoo.com>
MFC after: 1 day


89175 10-Jan-2002 deischen

Use a spare slot in the machine context for a flags word to indicate
whether the machine context is valid and whether the FPU state is
valid (saved).

Mark the machine context valid before copying it out when sending a
signal.

Approved by: -arch


88903 05-Jan-2002 peter

Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).


88900 05-Jan-2002 jhb

Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe. Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer. This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs. Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called. (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha. I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine. PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken. Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by: peter
Tested on: i386, alpha


88896 05-Jan-2002 peter

GC unfinished function selected_proc_ipi(). It is a duplicate of
apic_ipi_singledest() anyway.


88838 03-Jan-2002 peter

Allow a specific setting for pv entries. This avoids the need to guess
(or calculate by hand) the effect of interactions between shpgperproc,
physical ram size, maxproc, maxdsiz, etc.


88744 31-Dec-2001 dillon

Grrr. The tlb code is strewn over 3 files and I misread it. Revert
the last change (it was a NOP), and remove the XXX comments that no longer
apply.


88742 31-Dec-2001 dillon

You know those 'XXX what about SMP' comments in pmap_kenter()? Well,
they were right. Fix both kenter() and kremove() for SMP by ensuring that
the tlb is flushed on other cpu's. This will directly solve random-corruption
panic issues in -stable when it is MFC'd. Better to be safe then sorry, we
can optimize this later.

Original Suspicion by: peter
Maybe MFC: immediately on re's permission


88719 30-Dec-2001 phk

GC an alternate trap_pfault() which has rotted away behind an "#ifdef notyet"
since 21-Mar-95 .


88322 20-Dec-2001 jhb

Introduce a standard name for the lock protecting an interrupt controller
and it's associated state variables: icu_lock with the name "icu". This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP. This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.


88245 20-Dec-2001 peter

Replace a bunch of:
for (pv = TAILQ_FIRST(&m->md.pv_list);
pv;
pv = TAILQ_NEXT(pv, pv_list)) {
with:
TAILQ_FOREACH(pv, &m->md.pv_list, pv_list) {


88240 20-Dec-2001 peter

Fix some whitespace nits, and a minor error that I made in some unused
#ifdef DEBUG code (VM_MAXUSER_ADDRESS vs UPT_MAX_ADDRESS).


88146 18-Dec-2001 julian

In a couple of places, we recalculated addresses we already had in local
pointer variables.


88088 18-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


88085 17-Dec-2001 jhb

Small cleanups to the SMP code:
- Axe inlvtlb_ok as it was completely redundant with smp_active.
- Remove references to non-existent variable and non-existent file
in i386/include/smp.h.
- Don't perform initializations local to each CPU while holding the
ap boot lock on i386 while an AP bootstraps itself.
- Reorganize the AP startup code some to unify the latter half of the
functions to bring an AP up. Eventually this might be broken out into
a MI function in subr_smp.c.


87902 14-Dec-2001 luigi

Device Polling code for -current.

Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

options DEVICE_POLLING

and at runtime enable polling with

sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications


87721 12-Dec-2001 jhb

Axe an unneeded PCPU_SET(spinlocks, NULL) that I missed earlier.


87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


87637 11-Dec-2001 peter

Delete some leftover code from a bygone age. We dont have an array of
IdlePTDS anymore and dont to the PTD[MPPTDI] swapping etc.


87620 10-Dec-2001 guido

Add new boot flag to i386 boot: -p.
This flag adds a pausing utility. When ran with -p, during the kernel
probing phase, the kernel will pause after each line of output.
This pausing can be ended with the '.' key, and is automatically
suspended when entering ddb.

This flag comes in handy at systems without a serial port that either hang
during booting or reser.
Reviewed by: (partly by jlemon)
MFC after: 1 week


87546 09-Dec-2001 dillon

Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after: 1 week


87122 30-Nov-2001 peter

cpuid bit 30 is 'IA64', for when you're running in i386 mode on an ia64
cpu. (This is for either userland apps running in i386 mode on an ia64
OS, or when the cpu is in i386 legacy mode running an i386 OS).


87018 28-Nov-2001 jhb

Clean up some of the gross macros whitespace wise before I fix the asm
constraints.


86486 17-Nov-2001 peter

Fix the non-KSTACK_GUARD case.. It has been broken since the KSE
commit. ptek was not been initialized.


86485 17-Nov-2001 peter

Start bringing i386/pmap.c into line with cleanups that were done to
alpha pmap. In particular -
- pd_entry_t and pt_entry_t are now u_int32_t instead of a pointer.
This is to enable cleaner PAE and x86-64 support down the track sor
that we can change the pd_entry_t/pt_entry_t types to 64 bit entities.
- Terminate "unsigned *ptep, pte" with extreme prejudice and use the
correct pt_entry_t/pd_entry_t types.
- Various other cosmetic changes to match cleanups elsewhere.
- This eliminates a boatload of casts.
- use VM_MAXUSER_ADDRESS in place of UPT_MIN_ADDRESS in a couple of places
where we're testing user address space limits. Assuming the page tables
start directly after the end of user space is not a safe assumption.
There is still more to go.


86443 16-Nov-2001 peter

Oops, I accidently merged a whitespace error from the original commit.
(whitespace at end of line in rev 1.264 pmap.c). Fix them all.


86439 16-Nov-2001 peter

Converge/fix some debug code (#if 0'ed on alpha, but whatever)
- use NPTEPG/NPDEPG instead of magic 1024 (important for PAE)
- use pt_entry_t instead of unsigned (important for PAE)
- use vm_offset_t instead of unsigned for va's (important for x86-64)


86408 15-Nov-2001 jhb

- Don't enable interrupts in trap() if we trapped while holding a spin
lock as this usually makes the problem worse.
- If we get a page fault while holding a spin lock, treat it as a fatal
trap and don't even bother calling into the VM since calling into the
VM will panic when trying to lock Giant before we can get a useful
message anyways.


86332 13-Nov-2001 keramida

Replace use of "0" constraints in inline asm with "+" constraints,
when an operand is used both for input and output.

Reviewed by: jhb


86307 12-Nov-2001 keramida

Change constraints to use "+" in inline asm instead of mapping input
to output parameters with "0".

Reviewed by: jhb


86108 05-Nov-2001 phk

GC userconfig after Peter axed it 15 months ago.


85835 01-Nov-2001 iwasaki

Some fix for the recent apm module changes.
- Now that apm loadable module can inform its existence to other kernel
components (e.g. i386/isa/clock.c:startrtclock()'s TCS hack).
- Exchange priority of SI_SUB_CPU and SI_SUB_KLD for above purpose.
- Add simple arbitration mechanism for APM vs. ACPI. This prevents
the kernel enables both of them.
- Remove obsolete `#ifdef DEV_APM' related code.
- Add abstracted interface for Powermanagement operations. Public apm(4)
functions, such as apm_suspend(), should be replaced new interfaces.
Currently only power_pm_suspend (successor of apm_suspend) is implemented.

Reviewed by: peter, arch@ and audit@


85806 01-Nov-2001 peter

Skip PG_UNMANAGED pages when we're shooting everything down to try and
reclaim pv_entries. PG_UNMANAGED pages dont have pv_entries to reclaim.

Reported by: David Xu <davidx@viasoft.com.cn>


85793 31-Oct-2001 mjacob

Remove previous revision. smp_started back in subr_smp where it belongs.


85788 31-Oct-2001 mjacob

Make the actual volatile int smp_started live *somewhere*. This is
a temporary fix so that we can compile kernels. I waited 30 minutes
for a response from the person who would likely know, but any longer
is too long to wait with breakage at ToT.


85762 31-Oct-2001 dillon

Don't let pmap_object_init_pt() exhaust all available free pages
(allocating pv entries w/ zalloci) when called in a loop due to
an madvise(). It is possible to completely exhaust the free page list and
cause a system panic when an expected allocation fails.


85761 31-Oct-2001 msmith

Don't try to probe the PnP BIOS if ACPI is active.


85711 30-Oct-2001 jhb

Fix a typo in comment and #ifdef fixes: GRAP_PRIO -> GRAB_PRIO so that
x86 SMP kernels actually boot again to single user mode.

Pointy hat to: jhb
Noticed by: jlemon


85695 29-Oct-2001 bde

Don't set CR0_NE in cpu_setregs() for the SMP case, since setting it
is npx.c's job and setting it here breaks the edit-time option of not
setting it in npx.c. (It is not set in the right places for the SMP
case, but always setting it here is harmless because there isn't even
an edit-time option to not set it.)


85627 28-Oct-2001 jhb

- More whitespace and comment cleanups.
- Remove unused sw1a label. A breakpoint can be set in choosethread() for
the same effect.

Reviewed by: bde
Submitted by: bde (partly)


85525 26-Oct-2001 jhb

Add a per-thread ucred reference for syscalls and synchronous traps from
userland. The per thread ucred reference is immutable and thus needs no
locks to be read. However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on: x86 (SMP), alpha, sparc64


85491 25-Oct-2001 jhb

Currently no code does a CROSSJUMP() to sw1a, so we don't need a
CROSSJUMPTARGET() for it.

Submitted by: bde


85490 25-Oct-2001 jhb

Use %ecx instead of %ebx for the scratch register while updating %dr7 since
%ecx isn't a call safe register and thus we don't have to save and restore
it.

Submitted by: bde


85489 25-Oct-2001 jhb

- Fix typo in comment from previous revision.
- Fix a bug in the LDT changes where the wrong argument was passed to
set_user_ldt() from cpu_switch(). The bug was passing a pointer to the
ldt, but set_user_ldt() takes a pointer to the process' mdproc structure.

Submitted by: bde


85487 25-Oct-2001 jhb

Whitespace, comment, and string fixes.

Submitted by: bde (mostly)


85449 25-Oct-2001 jhb

Split the per-process Local Descriptor Table out of the PCB and into
struct mdproc.

Submitted by: Andrew R. Reiter <arr@watson.org>
Silence on: -current


85419 24-Oct-2001 jhb

- Clean up the comments slightly here to make them more readable.
- Set the type and trapframe number for the F00F workaround since type
can be used later by sv_transtrap(). Debuggers might also want to look
at the type in the trapframe.

Submitted by: bde (mostly)


85384 23-Oct-2001 jhb

Set the code and signal for the F00F hack fault directly instead of
changing the code in the trapframe and looping back to the top of trap
again.

Tested by: cjc


85373 23-Oct-2001 jlemon

Implement multiple low-level console support.


85297 21-Oct-2001 des

Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


84935 14-Oct-2001 tegge

Change vmapbuf() to use pmap_qenter() and vunmapbuf() to use pmap_qremove().

This significantly reduces the number of TLB shootdowns caused by
vmapbuf/vunmapbuf when performing many large reads from raw disk devices.

Reviewed by: dillon


84934 14-Oct-2001 tegge

Reduce the number of TLB shootdowns caused by a call to pmap_qenter()
from number of pages mapped to 1.

Reviewed by: dillon


84850 12-Oct-2001 jdp

Correct the input/output/clobber specifications for the cpuid
instruction. Stefan Keller <dres@earth.serd.org> noticed that CPU
identification was broken when compiled with -O2, and tracked it
down to the asm statement, which was storing values into memory
without specifying that memory was modified. He submitted a patch
which added "memory" as a clobber, but I refined it further to
arrive at this version.

MFC after: 3 days


84815 11-Oct-2001 jhb

Oops, these already included sys/lock.h, they just did so after
sys/mutex.h which is too late.


84812 11-Oct-2001 jhb

Add missing includes of sys/ktr.h.


84811 11-Oct-2001 jhb

Add missing includes of sys/lock.h.


84733 09-Oct-2001 iedowse

Remove the Xresume* labels from the i386 interrupt handlers; the
code in ipl.s and icu_ipl.s that used them was removed when the
interrupt thread system was committed. Debuggers also knew about
Xresume* because these labels hide the real names of the interrupt
handlers (Xintr*), and debuggers need to special-case interrupt
handlers to get the interrupt frame.

Both gdb and ddb will now use the Xintr* and Xfastintr* symbols to
detect interrupt frames. Fast interrupt frames were never identified
correctly before, so this fixes the problem of the running stack
frame getting lost in a ddb or gdb trace generated from a fast
interrupt - e.g. when debugging a simple infinite loop in the kernel
using a serial console, the frame containing the loop would never
appear in a gdb or ddb trace.

Reviewed by: jhb, bde


84721 09-Oct-2001 robert

Remove an unneeded variable declaration and statement.

Approved by: jake


84637 07-Oct-2001 des

Dissociate ptrace from procfs.

Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".


84615 07-Oct-2001 nyan

Rewrite the pc98 bus_space stuff.

The type of bus_space_tag_t is now a pointer to bus_space_tag structure,
and the bus_space_tag structure saves pointers to functions for direct
access and relocate access.

Added bsh_bam member to the bus_space_handle structure, it saves access
method either direct access or relocate access which is called by
bus_space_* functions.

Added the mecia device support. If the bs_da and bs_ra in bus tag are set
NEPC_io_space_tag and NEPC_mem_space_tag respectively, new bus_space stuff
changes the register of mecia automatically for 16bit access.

Obtained from: NetBSD/pc98


84572 06-Oct-2001 peter

Fix a warning. (unused p if not INVARIANTS)


84381 02-Oct-2001 mjacob

Fix problem where a user buffer outside of the area being tested
will be corrupted.

PR: 29194
Obtained from: Tor.Egge@fast.no
MFC after: 2 weeks


83972 26-Sep-2001 rwatson

o Modify i386_set_ioperm() to use securelevel_gt() instead of
direct securelevel variable checks.

Obtained from: TrustedBSD Project


83971 26-Sep-2001 rwatson

o Modify device open access control for /dev/mem and friends to use
securelevel_gt() instead of direct securelevel variable checks.

Obtained from: TrustedBSD Project


83660 19-Sep-2001 peter

Reserve an extra 16 bytes in case we have to grow the trapframe into
a vm86trapframe for switching to vm86 [unlikely] while exiting.
I lost this when doing the pcb move that went in with the KSE commit.

Reviewed by: jake


83659 19-Sep-2001 peter

Fix a mistake I made with the pcb movement relative to the stack in the
KSE patch. We need to leave the 16 bytes here for enabling the trapframe
to be converted to a vm86trapframe if we're switching *to* a vm86 context.


83651 18-Sep-2001 peter

Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83276 10-Sep-2001 peter

Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by: jake, tmm, dillon


83275 10-Sep-2001 peter

gcc-3 has objections about the bluetrap6 and bluetrap13 inline asm
functions. Apparently multi-line string asm arguments are deprecated.


83223 08-Sep-2001 peter

Missing part of dillon's coredump commit. cpu_coredump() was still
passing IO_NODELOCKED to vn_rdwr(), this would cause operations on the
unlocked core vnode and softupdates nastiness if an a.out binary cored.


83163 06-Sep-2001 jhb

Call sendsig() with the proc lock held and return with it held.


83093 05-Sep-2001 jlemon

Remove superfluous statement.


83051 05-Sep-2001 yokota

Rework the ISA PnP driver pnp and the PnP resource parser to fix
the following bugs.

- When constructing a resource configuration, respect the order
in which resource descriptors are read, in order to establish
the correct mapping between the descriptors and configuration
registers.
"Plug and Play ISA Specification, Version 1.0a", Sec 4.6.1, May 5,
1994. "Clarifications to the Plug and Play ISA Specification,
Version 1.0a", Sec 6.2.1, Dec. 10, 1994.

- Do not ignore null (empty) descriptors; they are valid descriptors
acting as filler.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a",
Sec 6.2.1.

- Correctly set up logical device configuration registers for null
resources.
"Clarifications to the Plug and Play ISA Specification, Version 1.0a"

- Handle null resources properly in the resource allocator for the
ISA bus.


82971 04-Sep-2001 iwasaki

Reenable RTC interrupts after wakeup. Some laptops have a problem
with system statistics monitoring tools (such as systat, vmstat...)
because of stopping RTC interrupts generation.
Restore all the timers (RTC and i8254) atomically.

Reviewed by: bde
MFC after: 1 week


82957 04-Sep-2001 peter

Mostly cosmetic. Move various variables from .s files to .c files so that
gdb generates debug info for them.


82939 04-Sep-2001 peter

Zap #if 0'ed map init code that got moved to the MI area.
Convert the powerpc tree to use the common code.


82938 04-Sep-2001 peter

Nuke #if 0'ed "setredzone()" stub. We never used it, and probably
never will. I've implemented an optional redzone as part of the KSE
upage breakup.


82624 31-Aug-2001 peter

Do a style cleanup pass for the pmap_{new,dispose,etc}_proc() functions
to get them closer to the KSE tree. I will do the other $machine/pmap.c
files shortly.


82585 30-Aug-2001 dillon

Remove the MPSAFE keyword from the parser for syscalls.master.
Instead introduce the [M] prefix to existing keywords. e.g.
MSTD is the MP SAFE version of STD. This is prepatory for a
massive Giant lock pushdown. The old MPSAFE keyword made
syscalls.master too messy.

Begin comments MP-Safe procedures with the comment:
/*
* MPSAFE
*/
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).

sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE

ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)

Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not. Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.

Rebuild syscall tables.


82555 30-Aug-2001 msmith

Add ACPI attachments.


82394 27-Aug-2001 peter

There is nothing more embarresing than having three goes at correcting
typos in the same paragraph. s/in in/in/

Submitted by: iedowse


82393 27-Aug-2001 peter

Enable hardwiring of things like tunables from embedded enironments
that do not start from loader(8).


82366 26-Aug-2001 peter

I missed a typo in the last commit: s/whach/which/

Submitted by: bde


82310 25-Aug-2001 julian

Add another comment.
check for 'teh's this time..


82309 25-Aug-2001 peter

Optionize UPAGES for the i386. As part of this I split some of the low
level implementation stuff out of machine/globaldata.h to avoid exposing
UPAGES to lots more places. The end result is that we can double
the kernel stack size with 'options UPAGES=4' etc.

This is mainly being done for the benefit of a MFC to RELENG_4 at some
point. -current doesn't really need this so much since each interrupt
runs on its own kstack.


82308 25-Aug-2001 peter

s/teh/the/


82307 25-Aug-2001 julian

Add an explanatory note that would have saved me an hour or two
of confusion had it been there when I started reading the code..


82279 24-Aug-2001 jhb

Remove references to the old giant kernel lock in various comments.


82262 24-Aug-2001 peter

Export the actual KERNBASE to the symbol table. We can use nlist() to get
this without having to second guess it in userland.


82261 24-Aug-2001 peter

Move cpu_fxsr definition to C code (so debug info is generated) and where
it is easily #ifdef'ed so that we dont miss unintentional references to it.


82165 23-Aug-2001 peter

Fix a comment error that was fixed in the pc98 version. hw.maxmem is
really hw.physmem.


82157 23-Aug-2001 peter

Dont add UPAGES to the %cs segment limit. There is nothing there except
page tables.


82140 22-Aug-2001 iwasaki

Move CR4.PGE enabling code after paging is enabled via CR0.PG based on
the description (2.5. CONTROL REGISTERS) of Intel developer's manual at:
ftp://download.intel.com/design/PentiumII/manuals/24319202.pdf

Reviewed by: peter, bde, tlambert2@mindspring.com
Pointed-out by: "Shin'ya Kumabuchi" <kumabu@t3.rim.or.jp>
MFC after: 1 week


82127 22-Aug-2001 dillon

Move most of the kernel submap initialization code, including the
timeout callwheel and buffer cache, out of the platform specific areas
and into the machine independant area. i386 and alpha adjusted here.
Other cpus can be fixed piecemeal.

Reviewed by: freebsd-smp, jake


82121 22-Aug-2001 peter

Introduce two new sysctl's.. vm.kvm_size and vm.kvm_free. These are
purely informational and can give some advance indications of tuning
problems. These are i386 only for now as it seems that the i386 is
the only one suffering kvm pressure.


82118 21-Aug-2001 jhb

Push down Giant some in trap_pfault() so we don't grab Giant around
trap_fatal() to make restarting from panic's slightly easier. Before if
one did 'w 0 0' in ddb, the longjmp in ddb inside of trap_fatal() would
result in Giant being held (or recursed one level deeper) which led to
problems later on. You can now drop to teh debugger, do 'w 0 0', and
continue w/o a problem.


82031 21-Aug-2001 dillon

Fix bug in physmem_est calculation - the kernel_map size was not being
converted into pages.

Fix bug in maxbcache calculation, nbuf must be tested against maxbcache
rather then physmem_est.

Obtained from: bde


82025 21-Aug-2001 peter

Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this
stuff.


81933 20-Aug-2001 dillon

Limit the amount of KVM reserved for the buffer cache and for swap-meta
information. The default limits only effect machines with > 1GB of ram
and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX
and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and
kern.maxbcache. This has the effect of leaving more KVM available for
sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad
adds memory to a machine and then sees the kernel panic on boot due to
running out of KVM.

Also change the default swap-meta auto-sizing calculation to allocate half
of what it was previously allocating. The prior defaults were way too high.
Note that we cannot afford to run out of swap-meta structures so we still
stay somewhat conservative here.


81879 18-Aug-2001 peter

There is nothing special that requires SSE to be only on 686 class cpus.
This enables 586-only SMP kernels to compile again.

Problem reported by: Jacek Jedrzejczak <jacol@ids.gda.pl>


81711 15-Aug-2001 wpaul

Teach bus_dmamem_free() about contigfree(). This is a bit of a hack,
but it's better than the buggy behavior we have now. If we contigmalloc()
buffers in bus_dmamem_alloc(), then we must configfree() them in
bus_dmamem_free(). Trying to free() them is wrong, and will cause
a panic (at least, it does on the alpha.)

I tripped over this when trying to kldunload my busdma-ified if_rl
driver.


81704 15-Aug-2001 jhb

Whitespace fixes to make this mostly fit in 80 columns.


81584 13-Aug-2001 bde

Use interrupt gates instead of trap gates for breakpoint and trace
traps, so that ddb can keep control (almost) no matter how it is
entered. This breaks time-critical interrupts while the system is
stopped in ddb, but I haven't noticed any significant problems except
that applications become confused about the time. Lost time will be
adjusted for later. Anyway, the half-baked disabling of interrupts in
Debugger() gives the same problems for the usual way of entering ddb.


81583 13-Aug-2001 bde

Removed he BPTTRAP() macro and its use. It was intended for restoring
bug for bug compatibility to ddb trap handlers after fixing the debugger
trap gates to be interrupt gates, but the fix was never committed. Now
I want the fix to apply to ddb.


81544 12-Aug-2001 iwasaki

Fix some trivial bugs.
- fix segment limit mis-calculation for GCODE_SEL, GDATA_SEL, GPRIV_SEL,
LUCODE_SEL and LUDATA_SEL.
- move `loader(8) metadata' related printf() after cninit().
- use atop macro (address to pages) for segment limit calculation
instead of i386_btop macro (bytes to pages).
- fix style bugs for the declarations of ints.

Reviewed by: bde, msmith (and arch & audit ML)


81493 10-Aug-2001 jhb

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


81265 08-Aug-2001 peter

Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.


80431 27-Jul-2001 peter

Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel
for this, since it is easy to run into with large systems with lots of
shared mmap space.

Obtained from: yahoo


80421 26-Jul-2001 peter

Call the early tunable setup functions as soon as kern_envp is available.
Some things depend on hz being set not long after this.


80399 26-Jul-2001 bmilekic

- Do not handle the per-CPU containers in mbuf code as though the cpuids
were indices in a dense array. The cpuids are a sparse set and treat
them as such, setting up containers only for CPUs activated during
mb_init().

- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
map, in accordance with the above.

This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.

Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob


79995 19-Jul-2001 jlemon

Rename vm86pcb_lock -> vm86_lock to correctly reflect its function.

Requested by: jhb


79964 19-Jul-2001 jlemon

Expand the range of the vm86pcb_lock so that it protects manipulations
to the global vm86 page table as well.

Spotted by: Kazu


79893 19-Jul-2001 bsd

swtch.s: During context save, use the correct bit mask for clearing
the non-reserved bits of dr7.

During context restore, load dr7 in such a way as to not
disturb reserved bits.

machdep.c: Don't explicitly disallow the setting of the reserved bits
in dr7 since we now keep from setting them when we load dr7
from the PCB.

This allows one to write back the dr7 value obtained from
the system without triggering an EINVAL (one of the
reserved bits always seems to be set after taking a trace
trap).

MFC after: 7 days


79885 19-Jul-2001 kris

Quiet a variable format-string warning.

MFC after: 1 week


79662 13-Jul-2001 sobomax

Unbroke kernel if I686_CPU is not defined.


79627 12-Jul-2001 peter

Do not depend on pcb_savefpu backwards compat #define.


79623 12-Jul-2001 peter

Forgot this fix from another tree. make enable_sse() a real prototype.


79611 12-Jul-2001 peter

Move init_sse() out of the "GenuineIntel" section, my AthlonMP system
has it, for example, and it works fine.


79609 12-Jul-2001 peter

Activate SSE/SIMD. This is the extra context switching support that
we are required to do if we let user processes use the extra 128 bit
registers etc.

This is the base part of the diff I got from:
http://www.issei.org/issei/FreeBSD/sse.html
I believe this is by: Mr. SUZUKI Issei <issei@issei.org>
SMP support apparently by: Takekazu KATO <kato@chino.it.okayama-u.ac.jp>
Test code by: NAKAMURA Kazushi <kaz@kobe1995.net>, see
http://kobe1995.net/~kaz/FreeBSD/SSE.en.html

I have fixed a couple of style(9) deviations. I have some followup
commits to fix a couple of non-style things.


79573 11-Jul-2001 bsd

Add 'hwatch' and 'dhwatch' ddb commands analogous to 'watch' and
'dwatch'. The new commands install hardware watchpoints if supported
by the architecture and if there are enough registers to cover the
desired memory area.

No objection by: audit@, hackers@

MFC after: 2 weeks


79418 08-Jul-2001 julian

A set of changes to reduce the number of include files the kernel
takes from /usr/include. I cannot check them on alpha.. (will try beast)

Briefly looked at by: Warner Losh <imp@harmony.village.org>


79265 05-Jul-2001 dillon

Move vm_page_zero_idle() from machine-dependant sections to a
machine-independant source file, vm/vm_zeroidle.c. It was exactly the
same for all platforms and updating them all was getting annoying.


79263 04-Jul-2001 dillon

Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)


79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


79153 03-Jul-2001 tmm

Make the code to read the kernel message buffer via sysctl machine-
independent and rename the corresponding sysctls from machdep.msgbuf and
machdep.msgbuf_clear (i386 only) to kern.msgbuf and kern.msgbuf_clear.


79137 03-Jul-2001 iwasaki

Add Transmeta Crusoe LongRun support.

Submitted by: Tamotsu HATTORI <athlete@kta.att.ne.jp>
Reviewed by: arch@ folks
MFC after: 1 week


79124 03-Jul-2001 jhb

Quiet warning by removing ast() prototype.

Forgotten by: jhb (me)


79123 03-Jul-2001 jhb

Allow Giant to be recursed when a process terminates.


78983 29-Jun-2001 jhb

Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.


78981 29-Jun-2001 imp

Remove cruft from old bus.


78962 29-Jun-2001 jhb

Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by: jake (in principle)


78946 29-Jun-2001 jhb

Grab Giant around trap_pfault() for now.


78908 28-Jun-2001 jhb

Get kernel profiling on SMP systems closer to working by replacing the
mcount spin mutex with a very simple non-recursive spinlock implemented
using atomic operations.


78903 28-Jun-2001 bsd

Provide access to the IA32 hardware debug registers from the ddb
kernel debugger. Proper use of these registers allows setting
hardware watchpoints for use in kernel debugging.

MFC after: 2 weeks


78798 26-Jun-2001 kato

Recognize FC-PGA2 Pentium III (Tualatin).


78760 25-Jun-2001 dfr

Add code to detect Transmeta Crusoe cpus.


78636 22-Jun-2001 jhb

- Grab the proc lock around CURSIG and postsig(). Don't release the proc
lock until after grabbing the sched_lock to avoid CURSIG racing with
psignal.
- Don't grab Giant for addupc_task() as it isn't needed.

Reported by: tegge (signal race), bde (addupc_task a while back)


78631 22-Jun-2001 peter

Make the hw.physmem and hw.usermem variables unsigned so that they dont
come up as negative on machines with >2GB ram.


78427 18-Jun-2001 jhb

Initialize mutexes needed early on all in the same place so that the
startup routine more closely matches that of alpha and ia64. At some
point the common mutexes shared across all platforms probably should move
into sys/kern_mutex.c.


78426 18-Jun-2001 jhb

- Add support for decoding syscall names. (Brought over from the new alpha
trace code that was brought over from NetBSD.)
- Check for "syscall_with_err_pushed" as the label prior to a syscall trap
frame rather than "Xlcall_syscall" and "Xint0x80_syscall". We don't
have a valid trapframe during the short range of code that those two
symbols now cover.
- Simplify db_next_frame() to avoid duplicating the code for the different
trap frame types.
- Don't try to trace a swapped-out process. (Brought over from NetBSD via
the new alpha trace code.)


78425 18-Jun-2001 jhb

Include sys/pcpu.h to get the prototype for globaldata_register() to quiet
a warning.


78135 12-Jun-2001 peter

Hints overhaul:
- Replace some very poorly thought out API hacks that should have been
fixed a long while ago.
- Provide some much more flexible search functions (resource_find_*())
- Use strings for storage instead of an outgrowth of the rather
inconvenient temporary ioconf table from config(). We already had a
fallback to using strings before malloc/vm was running anyway.


77796 06-Jun-2001 jhb

Don't hold sched_lock across addupc_task().

Reported by: David Taylor <davidt@yadt.co.uk>
Submitted by: bde


77582 01-Jun-2001 tmm

Clean up the code exporting interrupt statistics via sysctl a bit:
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs

Requested by: bde
Reviewed by: bde (some time ago)


77502 30-May-2001 jhb

Quiet warnings by adding a prototype for set_user_ldt_rv() and making it
conditional on #ifdef SMP.


77486 30-May-2001 jhb

We can't grab the sched_lock in set_user_ldt() because when it is called
from cpu_switch(), curproc has been changed, but the sched_lock owner will
not be updated until we return to mi_switch(), thus we deadlock against
ourselves. As a workaround, push the acquire and release of sched_lock out
to the callers of set_user_ldt(). Note that we can't use a mtx_assert() in
set_user_ldt for the same reason.

Sleuting by: tmm
Tested by: tmm, dougb


77097 23-May-2001 jhb

Don't acquire Giant just to call trap_fatal(), we are about to panic
anyway so we'd rather see the printf's then block if the system is
hosed.


77082 23-May-2001 alfred

pmap_mapdev needs the vm_mtx, aquire it if not already locked


77031 23-May-2001 ru

- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.


77015 22-May-2001 bde

Convert npx interrupts into traps instead of vice versa. This is much
simpler for npx exceptions that start as traps (no assembly required...)
and works better for npx exceptions that start as interrupts (there is
no longer a problem for nested interrupts).

Submitted by: original (pre-SMPng) version by luoqi


76947 22-May-2001 jhb

Remove a few more spl's I missed earlier.

Reported by: Michael Harnois <mdharnois@home.com>
Pointy hat: me


76941 21-May-2001 jhb

Sort includes.


76939 21-May-2001 jhb

Axe unneeded spl()'s.


76904 20-May-2001 bde

Use a critical region to protect pushing of curproc's npx state to
curpcb in vm86_bioscall(). I don't know if the state is ever in the
npx at that point.


76903 20-May-2001 bde

Use a critical region to protect saving of the npx state in savectx().
Not doing this was fairly harmless because savectx() is only called
for panic dumps and the bug could at worse reset the state.

savectx() is still missing saving of (volatile) debug registers, and
still isn't called for core dumps.


76827 19-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


76770 17-May-2001 jhb

- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT.
- Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be
toggled after boot.
- Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just
afer the display of the copyright message instead of doing it by hand in
three MD places.


76769 17-May-2001 jhb

Use NHWI instead of APIC_IMEN_BITS.


76659 16-May-2001 jhb

Lock the procfs functions for doing a single step and reading/writing
registers better. Hold sched_lock not only for checking the flag but
also while performing the actual operation to ensure the process doesn't
get swapped out by another CPU while we the operation is being performed.


76650 15-May-2001 jhb

Remove unneeded includes of sys/ipl.h and machine/ipl.h.


76546 13-May-2001 bde

Use a critical region to protect pushing of the parent's npx state to the
pcb for fork(). It was possible for the state to be saved twice when an
interrupt handler saved it concurrently. This corrupted (reset) the state
because fnsave has the (in)convenient side effect of doing an implicit
fninit. Mundane null pointer bugs were not possible, because we save to
an "arbitrary" process's pcb and not to the "right" place (npxproc).

Push the parent's %gs to the pcb for fork(). Changes to %gs before
fork() were not preserved in the child unless an accidental context
switch did the pushing. Updated the list of pcb contents which is
supposed to inhibit bugs like this. pcb_dr*, pcb_gs and pcb_ext were
missing. Copying is correct for pcb_dr*, and pcb_ext is already
handled specially (although XXX'ly).

Reducing the savectx() call to an npxsave() call in rev.1.80 was a
mistake. The above bugs are duplicated in many places, including in
savectx() itself.

The arbitraryness of the parent process pointer for the fork()
subroutines, the pcb pointer for savectx(), and the save87 pointer
for npxsave(), is illusory. These functions don't work "right" unless
the pointers are precisely curproc, curpcb, and the address of npxproc's
save87 area, respectively, although the special context in which they
are called allows savectx(&dumppcb) to sort of work and npxsave(&dummy)
to work. cpu_fork() just doesn't work unless the parent process
pointer is curproc, or the caller has pushed %gs to the pcb, or %gs
happens to already be in the pcb.


76525 12-May-2001 deischen

Revert part of last commit. Instead of using %fs for KSD/TSD, we'll
follow Linux' convention and use %gs. This adds back the setting of
%fs to a sane value in sendsig(). The value of %gs remains preserved
to whatever it was in user context.


76494 11-May-2001 jhb

Simplify the vm fault trap handling code a bit by using if-else instead of
duplicating code in the then case and then using a goto to jump around
the else case.


76440 10-May-2001 jhb

- Split out the support for per-CPU data from the SMP code. UP kernels
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.

Reviewed by: jake


76434 10-May-2001 jhb

- Use sched_lock and critical regions to ensure that LDT updates are thread
safe from preemption and concurrent access to the LDT.
- Move the prototype for i386_extend_pcb() to <machine/pcb_ext.h>.

Reviewed by: silence on -hackers


76298 06-May-2001 deischen

When setting up the frame to invoke a signal handler, preserve the
%fs and %gs registers instead of setting them to known sane values.
%fs is going to be used for thread/KSE specific data by the new
threads library; we'll want it to be valid inside of signal handlers.

According to bde, Linux preserves the state of %fs and %gs when setting
up signal handlers, so there is precedent for doing this.

The same changes should be made in the Linux emulator, but when made,
they seem to break (at least one version of) the IBM JDK for Linux
(reported by drew).

Approved by: bde


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


76089 28-Apr-2001 jhb

Add in a missing call to forward_hardclock() in the SMP case.

Submitted by: bde


76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


76031 26-Apr-2001 jake

Remove a leading underscore that prevented I386_CPU kernels from
compiling.

Submitted by: Alexander N. Kabaev <ak03@gte.com>
PR: kern/26858


75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


75724 20-Apr-2001 jhb

Make the ap_boot_mtx mutex static.


75723 20-Apr-2001 jhb

Split up the db_printf's for 'show pcpu' so that we only output at most one
line for each db_printf(). Also, just use spaces to line the columns up
rather than trying to be fancy with tabs.


75570 17-Apr-2001 jhb

Blow away the panic mutex in favor of using a single atomic_cmpset() on a
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.


75421 11-Apr-2001 jhb

Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just
"redundant noise" and to match the IPI constant namespace (IPI_*).

Requested by: bde


75396 10-Apr-2001 jhb

Remove the old APIC I/O higher level IPI API in favor of the newer MI
API for IPI's that isn't tied to the Intel APIC. MD code can still use
the apic_ipi() function or dink with the apic directly if needed to send
MD IPI's.


75393 10-Apr-2001 jhb

Remove the BETTER_CLOCK #ifdef's. The code is on by default and is here
to stay for the foreseeable future.

OK'd by: peter (the idea)


75392 10-Apr-2001 jhb

Add an MI API for sending IPI's. I used the same API present on the alpha
because:
- it used a better namespace (smp_ipi_* rather than *_ipi),
- it used better constant names for the IPI's (IPI_* rather than
X*_OFFSET), and
- this API also somewhat exists for both alpha and ia64 already.


75357 09-Apr-2001 jhb

- One can now specify the decimal pid of a process to trace as a parameter.
Since pid's are not in the kernel address space, this doesn't conflict
with the funcionality of specifying an arbitrary frame pointer to the
trace command.
- If the first function of a backtrace maps to fork_trampoline, then this
is a newly fork'd process that has not been executed yet, so just print
out the first frame and then return for that case.
- Lower the default count from 65535 to 1024. ddb doesn't trace into
userland, and if the stack gets hosed and starts looping it's less
annoying.


75274 06-Apr-2001 jhb

Add a new ddb command 'show pcpu' which lists some of the per-cpu data.
Specifically, the cpuid, curproc, curpcb, npxproc, and idleproc members.
Also, if witness is compiled into the kernel, then a list of all the spin
locks held by this CPU is displayed. By default the information for the
current CPU is displayed, but a decimal cpu id may be specified as a
parameter to obtain information on a specific CPU.


75256 06-Apr-2001 jhb

Axe the per-cpu variable witness_spin_check as it was replaced by the
per-cpu spinlocks list.


74927 28-Mar-2001 jhb

Convert the allproc and proctree locks from lockmgr locks to sx locks.


74914 28-Mar-2001 jhb

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


74912 28-Mar-2001 jhb

Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.


74903 28-Mar-2001 jhb

Switch from save/disable/restore_intr() to critical_enter/exit().


74902 28-Mar-2001 jhb

Catch up to the mtx_saveintr -> mtx_savecrit change.


74810 26-Mar-2001 phk

Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.


74738 24-Mar-2001 obrien

Fix a problem where we were switching npxproc from underneath processes
running in process context in order to run interrupt handlers. This
caused a big smashing of the stack on AMD K6, K5 and Intel Pentium (ie, P5)
processors because we are using npxproc as a flag to indicate whether
the state has been pushed onto the stack.

Submitted by: bde


74670 23-Mar-2001 tmm

Export intrnames and intrcnt as sysctls (hw.nintr, hw.intrnames and
hw.intrcnt).

Approved by: rwatson


74283 15-Mar-2001 peter

Kill the 4MB kernel limit dead. [I hope :-)].
For UP, we were using $tmp_stk as a stack from the data section. If the
kernel text section grew beyond ~3MB, the data section would be pushed
beyond the temporary 4MB P==V mapping. This would cause the trampoline
up to high memory to fault. The hack workaround I did was to use all of
the page table pages that we already have while preparing the initial
P==V mapping, instead of just the first one.
For SMP, the AP bootstrap process suffered the same sort of problem and
got the same treatment.

MFC candidate - this breaks on 4.x just the same..

Thanks to: Richard Todd <rmtodd@ichotolot.servalan.com>


73936 07-Mar-2001 jhb

Unrevert the pmap_map() changes. They weren't broken on x86.

Sense beaten into me by: peter


73931 07-Mar-2001 jhb

- Release Giant a bit earlier on syscall exit.
- Don't try to grab Giant before postsig() in userret() as it is no longer
needed.
- Don't grab Giant before psignal() in ast() but get the proc lock instead.


73929 07-Mar-2001 jhb

Grab the process lock while calling psignal and before calling psignal.


73922 07-Mar-2001 jhb

Use the proc lock to protect p_pptr when waking up our parent in cpu_exit()
and remove the mpfixme() message that is now fixed.


73903 07-Mar-2001 jhb

Back out the pmap_map() change for now, it isn't completely stable on the
i386.


73862 06-Mar-2001 jhb

- Rework pmap_map() to take advantage of direct-mapped segments on
supported architectures such as the alpha. This allows us to save
on kernel virtual address space, TLB entries, and (on the ia64) VHPT
entries. pmap_map() now modifies the passed in virtual address on
architectures that do not support direct-mapped segments to point to
the next available virtual address. It also returns the actual
address that the request was mapped to.
- On the IA64 don't use a special zone of PV entries needed for early
calls to pmap_kenter() during pmap_init(). This gets us in trouble
because we end up trying to use the zone allocator before it is
initialized. Instead, with the pmap_map() change, the number of needed
PV entries is small enough that we can get by with a static pool that is
used until pmap_init() is complete.

Submitted by: dfr
Debugging help: peter
Tested by: me


73586 05-Mar-2001 jhb

Don't enable interrupts before calling sched_ithd for threaded interrupts.

Tested by: obrien


73017 25-Feb-2001 peter

Make the kernel actually compile and link under a.out, using
gcc -aout -mno-underscores. The bioscall.s tweak is not an a.out
requirement really, but to work around the bugs in the antique version of
gas that used for a.out. Makefile hacks are all that is needed to
get an a.out kernel. There is no telling if it will work though.
This is little more than an academic curiosity anyway since all it is
good for is situations where the boot code is hard wired, eg: rom
bootstraps (such as the gnat box).

GENERIC:
...
size -aout kernel ; chmod 755 kernel
text data bss dec hex
3051520 368640 198688 3618848 373820


73011 25-Feb-2001 jake

Remove the leading underscore from all symbols defined in x86 asm
and used in C or vice versa. The elf compiler uses the same names
for both. Remove asnames.h with great prejudice; it has served its
purpose.

Note that this does not affect the ability to generate an aout kernel
due to gcc's -mno-underscores option.

moral support from: peter, jhb


73001 25-Feb-2001 jake

- Rename the lcall system call handler from Xsyscall to Xlcall_syscall
to be more like Xint0x80_syscall and less like c function syscall().
- Reduce code duplication between the int0x80 and lcall handlers by
shuffling the elfags into the right place, saving the sizeof the
instruction in tf_err and jumping into the common int0x80 code.

Reviewed by: peter


72930 23-Feb-2001 peter

Activate USER_LDT by default. The new thread libraries are going to
depend on this. The linux ABI emulator tries to use it for some linux
binaries too. VM86 had a bigger cost than this and it was made default
a while ago.

Reviewed by: jhb, imp


72917 22-Feb-2001 jhb

The p_md.md_regs member of proc is used in signal handling to reference
the the original trapframe of the syscall, trap, or interrupt that entered
the kernel. Before SMPng, ast's were handled via a psuedo trap at the
end of doerti. With the SMPng commit, ast's were broken out into a
separate ast() function that was called from doreti to match the behavior
of other architectures. Unfortunately, when this was done, the
p_md.md_regs member of curproc was not updateda in ast(), thus when
signals are handled by userret() after an interrupt that returns to
userland, we end up using a stale trapframe that will result in the
registers from the old trapframe overwriting the real trapframe and
smashing all the registers right before we return to usermode. The saved
%cs:%eip from where we were in usermode are saved in the trapframe for
example.


72911 22-Feb-2001 jhb

- Change ast() to take a pointer to a trapframe like other architectures.
- Don't use an atomic operation to update cnt.v_soft in ast(). This is
the only place the variable is written to, and sched_lock is always
held when it is written, so it is already protected and the mutex release
of sched_lock asserts a memory barrier that ensures the value will be
updated in a timely fashion.


72900 22-Feb-2001 jhb

- Use TRAPF_PC() on the alpha to acess the PC in the trap frame.
- Don't hold sched_lock around addupc_task() as this apparently breaks
profiling badly due to sched_lock being held across copyin().

Reported by: bde (2)


72746 20-Feb-2001 jhb

- Don't call clear_resched() in userret(), instead, clear the resched flag
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.


72700 19-Feb-2001 bde

Removed all traces of T_ASTFLT (except for gaps where it was). It became
unused except in dead code when ast() was split off from trap().


72683 19-Feb-2001 bde

Changed the aston() family to operate on a specified process instead of
always on curproc. This is needed to implement signal delivery properly
(see a future log message for kern_sig.c).

Debogotified the definition of aston(). aston() was defined in terms
of signotify() (perhaps because only the latter already operated on
a specified process), but aston() is the primitive.

Similar changes are needed in the ia64 versions of cpu.h and trap.c.
I didn't make them because the ia64 is missing the prerequisite changes
to make astpending and need_resched per-process and those changes are
too large to make without testing.


72678 19-Feb-2001 bde

Fixed style bugs in clock.c rev.1.164 and cpu.h rev.1.52-1.53 -- declare
tsc_present in the right places (together with other variables of the
same linkage), and don't use messy ifdefs just to avoid exporting it in
some cases.


72640 18-Feb-2001 asmodai

Preceed/preceeding are not english words. Use precede or preceding.


72376 12-Feb-2001 jake

Implement a unified run queue and adjust priority levels accordingly.

- All processes go into the same array of queues, with different
scheduling classes using different portions of the array. This
allows user processes to have their priorities propogated up into
interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
32. We used to have 4 separate arrays of 32 queues each, so this
may not be optimal. The new run queue code was written with this
in mind; changing the number of run queues only requires changing
constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter. This
is intended to be used to create per-cpu run queues. Implement
wrappers for compatibility with the old interface which pass in
the global run queue structure.
- Group the priority level, user priority, native priority (before
propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
This was used to detect when a process' priority had lowered and
it should yield. We now effectively yield on every interrupt.
- Activate propogate_priority(). It should now have the desired
effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
idle loop. It interfered with propogate_priority() because
the idle process needed to do a non-blocking acquire of Giant
and then other processes would try to propogate their priority
onto it. The idle process should not do anything except idle.
vm_page_zero_idle() will return in the form of an idle priority
kernel thread which is woken up at apprioriate times by the vm
system.
- Update struct kinfo_proc to the new priority interface. Deliberately
change its size by adjusting the spare fields. It remained the same
size, but the layout has changed, so userland processes that use it
would parse the data incorrectly. The size constraint should really
be changed to an arbitrary version number. Also add a debug.sizeof
sysctl node for struct kinfo_proc.


72358 11-Feb-2001 markm

RIP <machine/lock.h>.

Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb


72334 10-Feb-2001 jake

Clear the reschedule flag after finding it set in userret(). This
used to be in cpu_switch(), but I don't see any difference between
doing it here.


72276 10-Feb-2001 jhb

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


72240 09-Feb-2001 jhb

Catch up to changes to inthand_add().


72239 09-Feb-2001 jhb

Use the MI ithread helper functions in the x86 interrupt code.


72238 09-Feb-2001 jhb

- Catch up to the new swi API changes:
- Use swi_* function names.
- Use void * to hold cookies to handlers instead of struct intrhand *.
- In sio.c, use 'driver_name' instead of "sio" as the name of the driver
lock to minimize diffs with cy(4).


72226 09-Feb-2001 jhb

Move the initailization of the proc lock for proc0 very early into the MD
startup code.


72225 09-Feb-2001 jhb

Woops, remove an obsolete reference to gd_cpu_lockid.


72220 09-Feb-2001 jhb

Remove unused forward_irq counters.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


72148 08-Feb-2001 jhb

Don't enable interrupts for a kernel breakpoint or trace trap. Otherwise,
this negates the explicit disabling of interrupts when entering the
debugger in Debugger().


72145 07-Feb-2001 jhb

When SMPng was first committed, we removed 'cpl' from the interrupt
frame. Teach ddb about this as there is one less word for it to skip
over when finding a trapframe on the interrupt frame stack.


72091 06-Feb-2001 asmodai

Fix typo: seperate -> separate.

Seperate does not exist in the english language.


72011 04-Feb-2001 peter

Clean up some leftovers from the root mount cleanup that was done some
time ago. FFS_ROOT and CD9660_ROOT are obsolete.


71990 04-Feb-2001 phk

Remove the LABPC driver.

Doesn't work, no maintainer, more promising code exists elsewhere.


71983 04-Feb-2001 dillon

This commit represents work mainly submitted by Tor and slightly modified
by myself. It solves a serious vm_map corruption problem that can occur
with the buffer cache when block sizes > 64K are used. This code has been
heavily tested in -stable but only tested somewhat on -current. An MFC
will occur in a few days. My additions include the vm_map_simplify_entry()
and minor buffer cache boundry case fix.

Make the buffer cache use a system map for buffer cache KVM rather then a
normal map.

Ensure that VM objects are not allocated for system maps. There were cases
where a buffer map could wind up with a backing VM object -- normally
harmless, but this could also result in the buffer cache blocking in places
where it assumes no blocking will occur, possibly resulting in corrupted
maps.

Fix a minor boundry case in the buffer cache size limit is reached that
could result in non-optimal code.

Add vm_map_simplify_entry() calls to prevent 'creeping proliferation'
of vm_map_entry's in the buffer cache's vm_map. Previously only a simple
linear optimization was made. (The buffer vm_map typically has only a
handful of vm_map_entry's. This stabilizes it at that level permanently).

PR: 20609
Submitted by: (Tor Egge) tegge


71880 31-Jan-2001 peter

Remove count for NSIO. The only places it was used it were incorrect.
(alpha-gdbstub.c got sync'ed up a bit with the i386 version)


71818 30-Jan-2001 peter

Remove some leftovers from the CMAP* stuff in globaldata and the
BSP and AP startup.


71817 30-Jan-2001 peter

Remove unused GD_CPU_LOCKID, GD_OTHER_CPUS, PS_IDLESTACK and
PS_IDLESTACK_TOP


71816 30-Jan-2001 jhb

Remove unnecessary locking to protect the p_upages_obj and p_addr
pointers.


71797 29-Jan-2001 peter

Convert mca (microchannel bus support) from something that we count
(bogus) to something that we test for the presence of.


71785 29-Jan-2001 peter

Send "#if NISA > 0" to the bit-bucket and replace it with an option.
These were compile-time "is the isa code present?" tests and not
'how many isa busses' tests.


71780 29-Jan-2001 peter

Gag. These compiled because I had a stray "eisa.h" in my config dir.


71779 29-Jan-2001 peter

Remove stray #include "isa.h"


71776 29-Jan-2001 peter

change 'count eisa' to 'optional eisa' and update the only consumer
of 'NEISA' - userconfig.c.
While there, send some defunct code to the file history.


71728 28-Jan-2001 bmilekic

Move the setting of curproc to idleproc up earlier in ap_init(). The
problem is that a mutex lock, prior to this change, is acquired before
the curproc is set to idleproc, so we mess ourselves up by calling
the mutex lock routine with curproc == NULL.

Moving it up after the aps_ready spin-wait has us hopefully setting it
after idleproc is setup.

Solved by: jake (the allmighty) :-)


71727 28-Jan-2001 tegge

Defer assignment of low level interrupt handlers for PCI interrupts
described in the MP table until something asks for the interrupt number
later on.


71665 26-Jan-2001 jake

Push Giant down into the trap handlers that need it, instead of
acquiring it unconditionally.

Reviewed by: jhb


71647 25-Jan-2001 jhb

Whitespace fix: convert code indented 6 spaces to use tabs instead.


71604 24-Jan-2001 jhb

- Change fork_exit() to take a pointer to a trapframe as its 3rd argument
instead of a trapframe directly. (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
fork_return() instead of child_return().
- Axe child_return().


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


71532 24-Jan-2001 jhb

- Remove all the #if 0'd code that used to implement IRQ forwarding.
- Remove #if 0'd lazy interrupt mask.


71528 24-Jan-2001 jhb

Setup the return values for a child process in the trapframe when we setup
the rest of the trapframe instead of doing it in fork_return().


71527 24-Jan-2001 jhb

- Kill the have_giant parameter to userret() along with all instances of
that name as a variable. Use mtx_owned(&Giant) where appropriate
instead.
- Proc locking.
- P_FOO -> PS_FOO.
- Update comments about enable interrupts during trap and why this may be
bad if we trap while holding a spin mutex.
- Don't bother resetting p to curproc in syscall() in case we are the child
returning from fork. The child hasn't returned from fork through syscall
in a while.
- Remove fork_return() as it has been superseded by the MI version.


71526 24-Jan-2001 jhb

- Proc locking.
- P_INMEM -> PS_INMEM.


71525 24-Jan-2001 jhb

- Relocate portions of this file to get it into an order closer to that of
the alpha mp_machdep.c.
- Proc locking.
- Catch up to the P_FOO -> PS_FOO proc flags changes.
- Stick ap_init()'s prototype with the other prototypes.
- Remove the Xforwardirq IPI.
- Remove unused simplelocks.
- Don't try to psignal() from forward_statclock(), but set the appropriate
signal pending flag in p_sflag instead.
- Add in KTR_SMP tracepoints for various SMP functions. (Brought over
from the alpha port)


71524 24-Jan-2001 jhb

- Proc locking.
- Setup proc0.p_heldmtx, proc0.contested, and curproc earlier so that we
can use mutexes.
- Initialize sched_lock and Giant earlier and enter Giant during init386.
- Use suser(9) instead of checking cr_uid directly.


71522 24-Jan-2001 jhb

Call fork_exit() now instead of futzing around in assembly during a fork
return.


71350 21-Jan-2001 des

First step towards an MP-safe zone allocator:
- have zalloc() and zfree() always lock the vm_zone.
- remove zalloci() and zfreei(), which are now redundant.

Reviewed by: bmilekic, jasone


71337 21-Jan-2001 jake

Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By: peter


71321 21-Jan-2001 peter

Remove APIC_INTR_DIAGNOSTIC - this has been disabled for some time now.
Remove some leftovers of removed SMP options.


71320 21-Jan-2001 jasone

Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex
initialization until after malloc() is safe to call, then iterate through
all mutexes and complete their initialization.

This change is necessary in order to avoid some circular bootstrapping
dependencies.


71318 21-Jan-2001 jake

Remove the per-cpu pages used for copy and zero-ing pages of memory
for SMP; just use the same ones as UP. These weren't used without
holding Giant anyway, and the routines that use them would have to
be protected from pre-emption to avoid migrating cpus.


71294 20-Jan-2001 jake

Rename the ASSYM MTX_RECURSE to MTX_RECURSECNT in order to not conflict
with the flag of the same name.


71292 20-Jan-2001 jake

Simplify the i386 asm MTX_{ENTER,EXIT} macros to just call the
appropriate function, rather than doing a horse-and-buggy
acquire. They now take the mutex type as an arg and can be
used with sleep as well as spin mutexes.


71287 20-Jan-2001 jake

- Make npx_intr INTR_MPSAFE and move acquiring Giant into the
function itself.
- Remove a hack to allow acquiring Giant from the npx asm trap
vector.


71262 19-Jan-2001 peter

Convert apm from a bogus 'count' into a plain option. Clean out some
other cruft from the files.alpha and files.ia64 that were related to this.


71261 19-Jan-2001 peter

Zap unused #include "apm.h"


71257 19-Jan-2001 peter

Use #ifdef DEV_NPX from opt_npx.h instead of #if NNPX > 0 from npx.h


71256 19-Jan-2001 peter

EEK! vm86bios.s has got #if NNPX > 0 code without a corresponding
#include "npx.h" - the code has been dead for a while and vm86 calls
have not been saving FPU context it seems.


71245 19-Jan-2001 peter

Remove reference to splz_unpend - it is long gone.


71244 19-Jan-2001 peter

Catch a few alternative names for the syscall entry frame, eg: post-ELF
and int $0x80 entry methods.


71243 19-Jan-2001 peter

apic_itrace_splz[] is unused


71228 19-Jan-2001 bmilekic

Implement MTX_RECURSE flag for mtx_init().
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.

The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.

The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.

Reviewed by: jhb


71211 18-Jan-2001 jhb

Protect p_stat and p_oncpu with sched_lock in forward_signal().


71098 16-Jan-2001 peter

Stop doing runtime checking on i386 cpus for cpu class. The cpu is
slow enough as it is, without having to constantly check that it really
is an i386 still. It was possible to compile out the conditionals for
faster cpus by leaving out 'I386_CPU', but it was not possible to
unconditionally compile for the i386. You got the runtime checking whether
you wanted it or not. This makes I386_CPU mutually exclusive with the
other cpu types, and tidies things up a little in the process.

Reviewed by: alfred, markm, phk, benno, jlemon, jhb, jake, grog, msmith,
jasone, dcs, des (and a bunch more people who encouraged it)


70954 12-Jan-2001 jake

Change return ??? to return -1 in some #if 0'ed code.


70952 12-Jan-2001 jake

Remove unused per-cpu variables inside_intr and ss_eflags.


70950 12-Jan-2001 bmilekic

Remove useless include of sys/mbuf.h (no longer useful since the
mbuf subsystem init was moved to a better place).


70928 11-Jan-2001 jake

- Remove compatibility macros for accessing per-cpu variables.
__FreeBSD_version 500015 can be used to detect their disappearance.
- Move the symbols for SMP_prvspace and lapic from globals.s to
locore.s.
- Remove globals.s with extreme prejudice.


70861 10-Jan-2001 jake

Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.


70798 08-Jan-2001 jake

Fix a warning. The type of globaldata.gd_prvspace has changed.


70793 08-Jan-2001 nyan

Correct typo.

Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)


70714 06-Jan-2001 jake

Use %fs to access per-cpu variables in uni-processor kernels the same
as multi-processor kernels. The old way made it difficult for kernel
modules to be portable between uni-processor and multi-processor
kernels. It is no longer necessary to jump through hoops.

- always load %fs with the private segment on entry to the kernel
- change the type of the self referntial pointer from struct privatespace
to struct globaldata
- make the globaldata symbol have value 0 in all cases, so the symbols
in globals.s are always offsets, not aliases for fields in globaldata
- define the globaldata space used for uniprocessor kernels in C, rather
than assembler
- change the assmebly language accessors to use %fs, add a macro
PCPU_ADDR(member, reg), which loads the register reg with the address
of the per-cpu variable member


70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


70161 18-Dec-2000 phk

Remove the "wd" driver.


70034 14-Dec-2000 jhb

Remove the "machine dependent" KTR trace buffer ddb commands. The code was
exactly the same on all platforms.


70006 14-Dec-2000 jake

Use _lapic+offset to access the local apic from assembly language
files, rather than the symbols in globals.s. The offsets are
generated by genassym.


69987 13-Dec-2000 jhb

If we fail to emulate a vm86 trap in kernel mode, then we use
vm86_trap() to return to the calling program directly. vm86_trap()
doesn't return, thus it was never returning to trap() to release
Giant. Thus, release Giant before calling vm86_trap().


69972 13-Dec-2000 tanimura

- If swap metadata does not fit into the KVM, reduce the number of
struct swblock entries by dividing the number of the entries by 2
until the swap metadata fits.

- Reject swapon(2) upon failure of swap_zone allocation.

This is just a temporary fix. Better solutions include:
(suggested by: dillon)

o reserving swap in SWAP_META_PAGES chunks, and
o swapping the swblock structures themselves.

Reviewed by: alfred, dillon


69971 13-Dec-2000 jake

Introduce a new potientially cleaner interface for accessing per-cpu
variables from i386 assembly language. The syntax is PCPU(member)
where member is the capitalized name of the per-cpu variable, without
the gd_ prefix. Example: movl %eax,PCPU(CURPROC). The capitalization
is due to using the offsets generated by genassym rather than the symbols
provided by linking with globals.o. asmacros.h is the wrong place for
this but it seemed as good a place as any for now. The old implementation
in asnames.h has not been removed because it is still used to de-mangle
the symbols used by the C variables for the UP case.


69947 13-Dec-2000 jake

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


69919 12-Dec-2000 jhb

Add in symbols needed in the WITNESS_ENTER and WITNESS_EXIT macros in
i386/include/mutex.h.


69881 12-Dec-2000 jake

- Add code to detect if a system call returns with locks other than Giant
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


69779 08-Dec-2000 jake

Revert the previous change I made to cpu_switch. It doesn't help as
much as I thought it would and according to bde was a pessimization.


69774 08-Dec-2000 phk

Staticize some malloc M_ instances.


69658 06-Dec-2000 peter

This is kind of a nasty hack, but it appears to solve the Compaq DL360
SMP problem. Compaq, in their infinite wisdom, forgot to put the IO apic
intpin #0 connection to the 8259 PIC into the mptable. This hack is to
look and see if intpin #0 has *no* table entry and adds a fake ExtInt
entry for the remap routines to use. isa/clock.c will still test the
interrupts. This entry is only ever used on an already broken system.


69651 06-Dec-2000 peter

Move io_apic_{read,write} from apic_ipl.s (where they do not belong) into
mpapic.c. This gives us the benefit of C type checking. These functions
are not called in any critical paths and are not used by the interrupt
routines.


69586 05-Dec-2000 jake

Remove the last of the MD netisr code. It is now all MI. Remove
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.

Reviewed by: jhb


69578 04-Dec-2000 peter

Cleanup some leftover lint from the old interrupt system.
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.

Reviewed by: jhb, jakeb


69536 03-Dec-2000 jake

Change cpu_switch to explicitly popl the callers program counter and
pushl that of the new process, rather than doing a movl (%esp) and
assuming that the stack has been setup right. This make the initial
stack setup slightly more sane, and will make it easier to stick
an interrupted process onto the run queue without its knowing.


69521 02-Dec-2000 markm

Namespace cleanup. Remove some #includes in favour of an explicit
declaration.

Asked for by: bde


69440 01-Dec-2000 jhb

Fix this slightly better by using NON_GPROF_RET instead of duplicating
hard-coded asm.

Suggested by: bde


69431 01-Dec-2000 jake

Change doreti to take a trapframe instead of an intrframe.
Remove associated pushes of dummy units to convert frame.

Reviewed by: jhb


69406 30-Nov-2000 jhb

Revert the previous change to this file. We have to hardcode in the opcode
for return because we do Evil Things(tm) with a 'ret' macro in
asmacros.h.

Noticed by: markm


69379 30-Nov-2000 marcel

Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286


69334 28-Nov-2000 jhb

Don't wait forever for CPUs to stop or restart. Instead, give up after a
timeout. If DIAGNOSTIC is turned on, then display a message to the console
with a map of which CPUs failed to stop or restart. This gives an SMP box
at least a fighting chance of getting into DDB if one of the other CPUs has
interrupts disabled.


69333 28-Nov-2000 jhb

Use atomic ops to close a race condition on the in_Debugger variable used
to only allow 1 CPU at a time to (non-recursively) enter the debugger.


69236 26-Nov-2000 dwmalone

Don't scroll of the end of the help pages when using the visual
option in for "boot -c".

PR: 17539
Submitted by: Edwin.Groothuis@cgmd76206.chello.nl


69147 25-Nov-2000 jlemon

Revert the last commit to the callout interface, and add a flag to
callout_init() indicating whether the callout is safe or not. Update
the callers of callout_init() to reflect the new interface.

Okayed by: Jake


69133 25-Nov-2000 alfred

These files are mpsafe.


69022 22-Nov-2000 jake

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


69006 21-Nov-2000 markm

Assembler fixes.

Fix opcodes that were typed as ".byte 0xNN, 0xMM" when an older
assembler could not recognise the newer Pentium instructions.
Reviewed by: jhb


69001 21-Nov-2000 jhb

Stop handcoding a couple of instructions since gas 2.10 can properly
assemble 16-bit code.

Noticed by: markm


68889 19-Nov-2000 jake

- Protect the callout wheel with a separate spin mutex, callout_lock.
- Use the mutex in hardclock to ensure no races between it and
softclock.
- Make softclock be INTR_MPSAFE and provide a flag,
CALLOUT_MPSAFE, which specifies that a callout handler does not
need giant. There is still no way to set this flag when
regstering a callout.

Reviewed by: -smp@, jlemon


68862 17-Nov-2000 jake

- Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by: John Hay <jhay@icomtek.csir.co.za>
jhb


68860 17-Nov-2000 jhb

- Change extra sanity checks in cpu_switch() to be conditional on INVARIANTS
instead of DIAGNOSTIC.
- Remove the p_wchan check as it no longer applies since a process may be
switched out during CURSIG() within msleep() or mawait().
- Remove an extra sanity check only needed during the early SMPng work.


68808 16-Nov-2000 jhb

Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch(). The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by: witness


68737 14-Nov-2000 jhb

Always enable interrupts during fork_trampoline() after releasing the
sched_lock. This is needed for kernel threads that are created before
interrupts are enabled. kthreads created by kld's that are created at
SI_SUB_KLD such as the random kthread.

Tested by: phk


68676 13-Nov-2000 nyan

Initialize bus_space_handle_t with zero (for PC-98).


68517 09-Nov-2000 takawata

Farewell our code. We will switch acpica code from Intel.
This code has help us comprehence ACPI spec .

Contributors of this code is as follows(except for FreeBSD commiter):
Yasuo Yokoyama,
Munehiro Matsuda,
and ALL acpi-jp@jp.freebsd.org people.

Thanks.

R.I.P.


68490 08-Nov-2000 asmodai

Fix some further english grammar and typo's.


68489 08-Nov-2000 asmodai

Fix typo's: UPGRADE_CPU_HW_CACHE -> CPU_UPGRADE_HW_CACHE


68441 07-Nov-2000 alfred

Protect against an infinite loop when prefaulting pages. This can
happen when the vm system maps past the end of an object or tries
to map a zero length object, the pmap layer misses the fact that
offsets wrap into negative numbers and we get stuck.

Found by: Joost Pol aka Nohican <nohican@marcella.niets.org>
Submitted by: tegge


68404 06-Nov-2000 msmith

Quieten some warnings about correct usage of assignments as truth values.


68063 31-Oct-2000 phk

Deprecate devsw->d_bmaj entirely.

This removes support for booting current kernels with very old bootblocks.

Device driver writers: Please remove initializations for the d_bmaj
field in your cdevsw{}.


67899 29-Oct-2000 phk

Remove unneeded <stddef.h> #includes.


67882 29-Oct-2000 phk

Remove unneeded #include <sys/proc.h> lines.


67759 28-Oct-2000 phk

Revert two experimental changes which escaped from my devel machine.


67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


67694 27-Oct-2000 bde

Declare or #define per-cpu globals in <machine/globals.h> in all cases.
The i386 UP case was messily different.


67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


67477 23-Oct-2000 jhb

Don't dink with interrupts in vm_page_zero_idle(). This code assumed it
was being called with interrupts disabled, when it was actually being called
with them enabled.

Pointed out by: tegge


67468 23-Oct-2000 non

Add PC-Card/ISA SCSI host adpater drivers from NetBSD/pc98
(a NetBSD port for NEC PC-98x1 machines). They are ncv for NCR 53C500,
nsp for Workbit Ninja SCSI-3, and stg for TMC 18C30 and 18C50.

I thank NetBSD/pc98 and bsd-nomads people.

Obtained from: NetBSD/pc98


67365 20-Oct-2000 jhb

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


67360 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Use cpu_throw() instead of cpu_switch() during cpu_exit() since we don't
need to save our previous state.


67358 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Catch up to the MI mutex structure due to saveflags,saveipl,savepsr
becoming saveintr.


67357 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for Giant and sched_lock.


67356 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- machine/ipl.h -> sys/ipl.h
- Use MUTEX_DECLARE() for clock_lock


67355 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Use MUTEX_DECLARE() and MTX_COLD for vm86pcb_lock


67352 20-Oct-2000 jhb

- Make the mutex code almost completely machine independent. This greatly
reducues the maintenance load for the mutex code. The only MD portions
of the mutex code are in machine/mutex.h now, which include the assembly
macros for handling mutexes as well as optionally overriding the mutex
micro-operations. For example, we use optimized micro-ops on the x86
platform #ifndef I386_CPU.
- Change the behavior of the SMP_DEBUG kernel option. In the new code,
mtx_assert() only depends on INVARIANTS, allowing other kernel developers
to have working mutex assertiions without having to include all of the
mutex debugging code. The SMP_DEBUG kernel option has been renamed to
MUTEX_DEBUG and now just controls extra mutex debugging code.
- Abolish the ugly mtx_f hack. Instead, we dynamically allocate
seperate mtx_debug structures on the fly in mtx_init, except for mutexes
that are initiated very early in the boot process. These mutexes
are declared using a special MUTEX_DECLARE() macro, and use a new
flag MTX_COLD when calling mtx_init. This is still somewhat hackish,
but it is less evil than the mtx_f filler struct, and the mtx struct is
now the same size with and without mutex debugging code.
- Add some micro-micro-operation macros for doing the actual atomic
operations on the mutex mtx_lock field to make it easier for other archs
to override/optimize mutex ops if needed. These new tiny ops also clean
up the code in some places by replacing long atomic operation function
calls that spanned 2-3 lines with a short 1-line macro call.
- Don't call mi_switch() from mtx_enter_hard() when we block while trying
to obtain a sleep mutex. Calling mi_switch() would bogusly release
Giant before switching to the next process. Instead, inline most of the
code from mi_switch() in the mtx_enter_hard() function. Note that when
we finally kill Giant we can back this out and go back to calling
mi_switch().


67346 20-Oct-2000 kato

Convert the type of bus_space_handle_t of pc98 from structure into
pointer to structure.

Reviewed by: nyan


67308 19-Oct-2000 jhb

Axe the idle_event eventhandler, and add a MD cpu_idle function used
for things such as halting CPU's, idling CPU's, etc.

Discussed with: msmith


67268 18-Oct-2000 mdodd

Use appropriate resource management accessors instead of directly
referencing structure members.

Use rman_get_size() instead of end - start + 1.


67265 17-Oct-2000 jhb

- Catch up to moving headers, machine/ipl.h -> sys/ipl.h
- Fix some whitespace bogons.

Submitted by: bde (2)


67247 17-Oct-2000 ps

Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before: After:
WCE TIME WCE TIME
------------------ ------------------
1 141.820972 1 15.600111
0 797.265072 0 65.480465

Obtained from: Yahoo!
Reviewed by: peter


67164 15-Oct-2000 phk

Remove unneeded #include <machine/clock.h>


67095 13-Oct-2000 peter

savectx() is now used exclusively by the crash dump system. Move the
i386 specific gunk (copy %cr3 to the pcb) from the MI dumpsys() to the
MD savectx().


67011 12-Oct-2000 bde

Moved the definitions of AST_PENDING and AST_RESCHED to the correct place.


66855 09-Oct-2000 bde

Unremoved used include of <machine/ipl.h>. Removing it in rev.1.95
significantly pessimized syscalls by arranging to do null rescheduling
on return from every syscall. (AST_RESCHED was not defined, and the
mask ~AST_RESCHED gets replaced by the useless mask ~0. This bug has
been fixed before, in rev.1.92.)


66716 06-Oct-2000 jhb

- Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's. This is necessary for the
clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
This is needed because both hardclock() and statclock() need to run in
the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
during the clock interrupt. Instead, use two p_flag's in the proc struct
to mark the current process as having a pending SIGVTALRM or a SIGPROF
and let them be delivered during ast() when hardclock() has finished
running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways. It was
broken on the x86 if it was turned on since cpl is gone. It's only use
was to bogusly run softclock() directly during hardclock() rather than
scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
mutex. Since the spin mutex already handles disabling/restoring
interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
signals in hardclock() that are to be delivered in ast().

Submitted by: jakeb (making statclock safe in a fast interrupt)
Submitted by: cp (concept of delaying signals until ast())


66713 06-Oct-2000 jhb

Various whitespace cleanups after the SMPng commit, which jumbled things
around a bit in the trap handling code.


66712 06-Oct-2000 jhb

Don't treat a kernel stack fault the same as a general protect fault or
a segment not present fault in the non-vm86 case.


66711 06-Oct-2000 jhb

Remove an unnecessary sti and spl0() in fork_trampoline. Interrupts
should be enabled by MTX_EXIT() now when it releases the sched_lock.


66698 05-Oct-2000 jhb

- Heavyweight interrupt threads on the alpha for device I/O interrupts.
- Make softinterrupts (SWI's) almost completely MI, and divorce them
completely from the x86 hardware interrupt code.
- The ihandlers array is now gone. Instead, there is a MI shandlers array
that just contains SWI handlers.
- Most of the former machine/ipl.h files have moved to a new sys/ipl.h.
- Stub out all the spl*() functions on all architectures.

Submitted by: dfr


66696 05-Oct-2000 jhb

Replace loadandclear() with atomic_readandclear_int().


66691 05-Oct-2000 jhb

- Remove somewhat bogus handling of the Giant mutex in the vm86 code.
- Add a vm86pcb_lock mutex that is used to lock the vm86pcb used when
making a vm86 call.


66613 04-Oct-2000 jasone

Fix spelling error ("exits" should be "exists").


66559 02-Oct-2000 peter

Fix a cosmetic sign problem on machines with 4G of ram.
0x00312000 - 0xe5fe7fff, 3855441920 bytes (4294859990 pages)
.. becomes
0x00314000 - 0xe5fe7fff, 3855433728 bytes (941268 pages)


66503 01-Oct-2000 peter

Fix the no-pci case of attaching isa, eisa and mca devices.
device_add_child() is meant to be called by the bus add_child method, not
to replace the bus add_child method. We could have called nexus_add_device
directly too, that would have also worked.

PR: 21657
Tested by: markm


66502 01-Oct-2000 iwasaki

Remove ACPI_NO_OSDFUNC_INLINE option from kernel configuration. Now
that it's enabled in acpireg.h only if DIAGNOSTIC option is specified.
ACPICA OSD functions will be compiled in machine/acpi_machdep.c again
tentatively (if DIAGNOSTIC option is specified).
# Should we have acpica_osd.c ?


66489 30-Sep-2000 msmith

More updates to the ACPI code:

- Move all register I/O into acpi_io.c
- Move event handling into acpi_event.c
- Reorganise headers into acpivar/acpireg/acpiio
- Move find-RSDT and find-ACPI-owned-memory into acpi_machdep
- Allocate all resources (except those detailed only by AML)
as real resources. Add infrastructure that will make adding
resource support to AML code easy.
- Remove all ACPI #ifdefs in non-ACPI code
- Removed unnecessary includes
- Minor style and commenting fixes

Reviewed by: iwasaki


66475 30-Sep-2000 bmilekic

Big mbuf subsystem diff #1: incorporate mutexes and fix things up somewhat
to accomodate the changes.

Here's a list of things that have changed (I may have left out a few); for a
relatively complete list, see http://people.freebsd.org/~bmilekic/mtx_journal

* Remove old (once useful) mcluster code for MCLBYTES > PAGE_SIZE which
nobody uses anymore. It was great while it lasted, but now we're moving
onto bigger and better things (Approved by: wollman).

* Practically re-wrote the allocation macros in sys/sys/mbuf.h to accomodate
new allocations which grab the necessary lock.

* Make sure that necessary mbstat variables are manipulated with
corresponding atomic() routines.

* Changed the "wait" routines, cleaned it up, made one routine that does
the job.

* Generalized MWAKEUP() macro. Got rid of m_retry and m_retryhdr, as they
are now included in the generalized "wait" routines.

* Sleep routines now use msleep().

* Free lists have locks.

* etc... probably other stuff I'm missing...

Things to look out for and work on later:

* find a better way to (dynamically) adjust EXT_COUNTERS

* move necessity to recurse on a lock from drain routines by providing
lock-free lower-level version of MFREE() (and possibly m_free()?).

* checkout include of mutex.h in sys/sys/mbuf.h - probably violating
general philosophy here.

The code has been reviewed quite a bit, but problems may arise... please,
don't panic! Send me Emails: bmilekic@freebsd.org

Reviewed by: jlemon, cp, alfred, others?


66442 29-Sep-2000 peter

Fill in some more missing bits from cpu_features according to the Intel
Pentium4 cpuid docs.


66441 29-Sep-2000 peter

First shot at identifying the Pentum 4 acording to our reading of the
the cpu_id extensions in the Intel docs. There is more info available.
See the following URL for more details.
http://developer.intel.com/design/processor/future/manuals/CPUID_Supplement.htm

Requested by: Intel


66416 28-Sep-2000 peter

Get out the roto-rooter and clean up the abuse of nexus ivars by the
i386/isa/pcibus.c. This gets -current running again on multiple host->pci
machines after the most recent nexus commits. I had discussed this with
Mike Smith, but ended up doing it slightly differently to what we
discussed as it turned out cleaner this way. Mike was suggesting creating
a new resource (SYS_RES_PCIBUS) or something and using *_[gs]et_resource(),
but IMHO that wasn't ideal as SYS_RES_* is meant to be a global platform
property, not a quirk of a given implementation. This does use the ivar
methods but does so properly. It also now prints the physical pci bus that
a host->pci bridge (pcib) corresponds to.


66407 27-Sep-2000 asmodai

Fix spelling of Katmai [Katami].


66400 27-Sep-2000 msmith

Since the nexus is responsible for creating the I/O resources (ports, memory)
it ought to be able to deal with devices directly attached to it having
allocations of such resources. Make it so.


66383 26-Sep-2000 kato

Recognize new Pentium III Xeon (stepping A0).

PR: 21233
Submitted by: ade


66277 22-Sep-2000 ps

Remove the NCPU, NAPIC, NBUS, NINTR config options. Make NAPIC,
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.

Reviewed by: peter


66206 22-Sep-2000 msmith

Implement halt-on-idle in the !SMP case, which should significantly
reduce power consumption on most systems.


66069 19-Sep-2000 eivind

Better error message when booting an SMP kernel on an UP system.


66053 19-Sep-2000 jlemon

Allow the user to make direct BIOS intcalls (via vm86 system) if they
successfully authenticate as root via the suser() call.


65856 14-Sep-2000 jhb

Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just
use struct mtx, struct witness, and struct witness_blessed.

Requested by: bde


65822 13-Sep-2000 jhb

- Remove the inthand2_t type and use the equivalent driver_intr_t type from
newbus for referencing device interrupt handlers.
- Move the 'struct intrec' type which describes interrupt sources into
sys/interrupt.h instead of making it just be a x86 structure.
- Don't create 'ithd' and 'intrec' typedefs, instead, just use 'struct ithd'
and 'struct intrec'
- Move the code to translate new-bus interrupt flags into an interrupt thread
priority out of the x86 nexus code and into a MI ithread_priority()
function in sys/kern/kern_intr.c.
- Remove now-uneeded x86-specific headers from sys/dev/ata/ata-all.c and
sys/pci/pci_compat.c.


65815 13-Sep-2000 bde

Be more careful about cleaning up the stack after function calls early
in the boot. The cleanup must be done in one of the few ways that
db_numargs() understands, so that early backtraces in ddb don't underrun
the stack. The underruns caused reboots a few years ago when there
was an unmapped page above the stack (trapping to abort the command
doesn't work early).

Cleaned up some nearby code.


65811 13-Sep-2000 bde

Fixed hang on booting with -d. mtx_enter() was called on an uninitialized
lock. The quick fix in trap.c was not quite the version tested and had no
effect; back it out.


65792 13-Sep-2000 jhb

Take out some unneeded debugging code and re-enable panic()'ing if we spin
on a spin lock for more then 5 seconds.


65782 12-Sep-2000 jhb

Clean up process accounting some more. Unfortunately, it is still not
quite right on i386 as the CPU who runs statclock() doesn't have a valid
clockframe to calculate statistics with.


65781 12-Sep-2000 bde

Quick fix for hang on booting with -d. mtx_enter() was called before
curproc was initialized. curproc == NULL was interpreted as matching
the process holding Giant... Just skip mtx_enter() and mtx_exit() in
trap() if (curproc == NULL && cold) (&& cold for safety).


65713 11-Sep-2000 jhb

When doing statistics for statclock on other CPU's, use the other CPUs'
idleproc pointers instead of our own for comparisons.

Submitted by: tegge


65624 08-Sep-2000 jasone

Rename mtx_enter(), mtx_try_enter(), and mtx_exit() and wrap them with cpp
macros that expand to pass filename and line number information. This is
necessary since we're using inline functions instead of macros now.

Add const to the filename pointers passed througout the mtx and witness
code.


65620 08-Sep-2000 jhb

Remove an unneeded extern declaration of cp_time.


65597 08-Sep-2000 jake

Really fix USER_LDT. (Don't use currentldt as an L-value.)


65587 07-Sep-2000 jake

Don't use currentldt as an L-value.
This should fix options USER_LDT.

Reported-by: John Hay <jhay@zibbi.mikom.csir.co.za>
Nickolay Dudorov <nnd@mail.nsk.ru>


65557 07-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


65556 07-Sep-2000 jasone

Add KTR, a facility that logs kernel events in order to to facilitate
debugging.

Acquired from: BSDi (BSD/OS)
Submitted by: dfr, grog, jake, jhb


65514 06-Sep-2000 phk

Introduce atomic_cmpset_int() and atomic_cmpset_long() from SMPng a
few hours earlier than the rest.

The next DEVFS commit needs these functions.

Alpha versions by: dfr
i386 versions by: jakeb

Approved by: SMPng


65500 05-Sep-2000 msmith

Teach the NFS && NFS_ROOT case how to pick up information left by the
PXE loader, and use this to build the nfs_diskless structure.


65389 03-Sep-2000 peter

Complain if we cannot find loader(8) metadata.


65372 02-Sep-2000 iwasaki

Add ACPI_BUS_SPACE stuff definitions in acpi_machdep.h.
Change to include this file rather than acpica_osd.h to use
only ACPI_BUS_SPACE stuff.


65292 31-Aug-2000 takawata

Merge rest piece of ACPI driver.To activate acpi driver ,add

device acpi

line. Merge finished. But still experimental phase.Need more hack!

Obtained from:ACPI for FreeBSD project


65273 31-Aug-2000 kato

Improved Cyrix 486DX supports for NEC PC-98.
- Enable WB cache via CCR2 and CR0.
- Set the need_pre_dma_flush when the CPU_I486_ON_386 option is
defined.

Submitted by: Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>


65047 24-Aug-2000 takawata

Add orthogonal part of ACPI support code.
This does not come effect until non-orthogonal part is commited.

Approved by: jkh
Obtained from: ACPI for FreeBSD CVS repository.


64837 19-Aug-2000 dwmalone

Replace the mbuf external reference counting code with something
that should be better.

The old code counted references to mbuf clusters by using the offset
of the cluster from the start of memory allocated for mbufs and
clusters as an index into an array of chars, which did the reference
counting. If the external storage was not a cluster then reference
counting had to be done by the code using that external storage.

NetBSD's system of linked lists of mbufs was cosidered, but Alfred
felt it would have locking issues when the kernel was made more
SMP friendly.

The system implimented uses a pool of unions to track external
storage. The union contains an int for counting the references and
a pointer for forming a free list. The reference counts are
incremented and decremented atomically and so should be SMP friendly.
This system can track reference counts for any sort of external
storage.

Access to the reference counting stuff is now through macros defined
in mbuf.h, so it should be easier to make changes to the system in
the future.

The possibility of storing the reference count in one of the
referencing mbufs was considered, but was rejected 'cos it would
often leave extra mbufs allocated. Storing the reference count in
the cluster was also considered, but because the external storage
may not be a cluster this isn't an option.

The size of the pool of reference counters is available in the
stats provided by "netstat -m".

PR: 19866
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: alfred (glanced at by others on -net)


64781 17-Aug-2000 bsd

Don't let an illegal value for dr7 get set, which can lead to an
unexpected TRCTRAP.

Reported by: John W. De Boskey <jwd@FreeBSD.org>


64728 16-Aug-2000 tegge

Prepare for a cleanup of pmap module API pollution introduced by the
suggested fix in PR 12378.

Keep track of all existing pmaps independent of existing processes.

This allows for a process to temporarily connect to a different address
space without the risk of missing an update of the original address space if
the kernel grows.

pmap_pinit2() is no longer needed on the i386 platform but is left as a
stub until the alpha pmap code is updated.

PR: 12378


64592 13-Aug-2000 jhb

Include machine/cputypes.h so we get the cpu_class variable. This is needed
if I386_CPU is defined in the kernel config file.


64529 11-Aug-2000 peter

Clean up some low level bootstrap code:

- stop using the evil 'struct trapframe' argument for mi_startup()
(formerly main()). There are much better ways of doing it.
- do not use prepare_usermode() - setregs() in execve() will do it
all for us as long as the p_md.md_regs pointer is set. (which is
now done in machdep.c rather than init_main.c. The Alpha port did it
this way all along and is much cleaner).
- collect all the magic %cr0 etc register settings into one place and
have the AP's call that instead of using magic numbers (!!) that keep
changing over and over again.
- Make it safe to call kthread_create() earlier, including during the
device probe sequence. It doesn't need the callback mechanism that
NetBSD's version uses.
- kthreads created this way are root-less as they exist before the root
filesystem is mounted. init(1) is set up so that it aquires the root
pointers prior to running. If other kthreads want filesystem acccess
we can make this code more generic.
- set all threads start times once we have decided what time it is.
- init uses a trampoline rather than the evil prepare_usermode() hack.
- kern_descrip.c has a couple of tweaks to deal with forking when there
is no rootdir or cwd etc.
- adjust the early SYSINIT() sequence so that a few prereqisites are in
place. eg: make sure the run queue is initialized before doing forks.

With this, the USB code can easily create a kthread to do the device
tree discovery. (I have tested it, it works nicely).

There are still some open issues before this is truely useful.
- tsleep() does not like working before the clock is running. It
sort-of tries to spin wait, but it can do more useful things now.
- stopping a kthread in kld code at unload time is "interesting" but
we have a solution for that.

The Alpha code needs no changes for this. It already uses pretty much the
same strategies, but a little cleaner.


64494 10-Aug-2000 tegge

Don't skip IOAPIC id conflict detection when only one pci bus is present.
PR: 20312
Reviewed by: Steve Roome <steve@sse0691.bri.hp.com>


64325 07-Aug-2000 tegge

Add workaround for livelock problem when starting APs.

With more than 1 AP present, an AP could fail to properly release
the mp lock before waiting for smp_started to become nonzero.

With early startup of APs, the BSP could fail to properly release
the mp lock before waiting for smp_started to become nonzero.


64294 06-Aug-2000 ps

Change the behavior of isa_nmi to log an error message instead of
panicing and return a status so that we can decide whether to drop
into DDB or panic. If the status from isa_nmi is true, panic the
kernel based on machdep.panic_on_nmi, otherwise if DDB is
enabled, drop to DDB based on machdep.ddb_on_nmi.

Reviewed by: peter, phk


64290 06-Aug-2000 tegge

Be more verbose when changing APIC ID on an IO APIC.

Don't allow cpu entries in the MP table to contain APIC IDs out of range.

Don't write outside array boundaries if an IO APIC entry in the MP table
contains an APIC ID out of range.

Assign APIC IDs for all IO APICs according to section 3.6.6 in the
Intel MP spec:

- If the current APIC ID on an IO APIC doesn't conflict with other
IO APICs or CPUs, that APIC ID should be used. The copy of the MP
table must be updated if the corresponding APIC ID in the MP table
is different.

- If the current APIC ID was in conflict with other units, the
corresponding APIC ID specified in the MP table is checked for conflict.

- If a conflict is still found then fall back to using a new unique ID.
The copy of the MP table must be updated.

- IDs out of range is considered to be in conflict.

During these operations, the IO_TO_ID array cannot be used, since any
conflict would have caused information loss. The array is then corrected,
since all APIC ID conflicts should have been resolved.

PR: 20312, 18919


64063 31-Jul-2000 luoqi

Handle write page faults (both write only or read-modify-write) as MI vm
write-only faults. This would allow write-only mmapped regions to function
correctly.


64031 30-Jul-2000 phk

Allow use of TSC even if APM is compiled in but disabled.


63981 28-Jul-2000 peter

Fix warning - isa/isavar.h is a prerequisite for isa/pnpvar.h


63140 14-Jul-2000 ps

Change the way NMI's are handled. Before, if DDB was enabled and
a NMI occured, you could type continue in DDB and the kernel would
not attempt to detect what type of NMI was recieved. Now we check
for the type of NMI first and then go to DDB if it is enabled.

This will solve the problem with having DDB enabled and getting an
NMI due to some possibly bad error and being able to continue the
operation of the kernel when you really want to panic and know
what happened.

Submitted by: jhb


62947 11-Jul-2000 tanimura

Finally merge newmidi.
(I had been busy for my own research activity until the last weekend)

Supported devices:

SB Midi Port (sbc + midi)
SB OPL3 (sbc + midi)
16550 UART (midi, needs a trick in your hint)
CS461x Midi Port (csa + midi)

OSS-compatible sequencer (seq)

Supported playing software:

playmidi (We definitely need more)

Notes:

/dev/midistat now reports installed midi drivers. /dev/sndstat reports
only pcm drivers. We need the new name(pcmstat?).

EMU8000(SB AWE) does not sound yet but does get probed so that the OPL3
synth on an AWE card works.

TODO:

MSS/PCI bridge drivers
Midi-tty interface to support general serial devices
Modules


62870 10-Jul-2000 kris

Don't call printf with no format string.

Reviewed by: msmith


62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


62298 01-Jul-2000 bsd

Fix my own style bugs (use of spaces instead of tabs for indentation).
This is a style-only change.


62088 25-Jun-2000 markm

Duh. Fix a fatfingered patch.


62085 25-Jun-2000 markm

Fix an uninitialised variable and a function return value.

Reported by: dillon


62057 25-Jun-2000 markm

Strip out the machine-independant parts of the memory device.
/dev/(u)random, /dev/null, /dev/zero are all moving to machine-independant
drivers.
Reviewed by: dfr


61995 23-Jun-2000 msmith

Add a stub driver to consume the PnP "system resource" items, and hide
them in the !bootverbose case.


61994 23-Jun-2000 msmith

Add PnP probe methods to some common AT hardware drivers. In each case,
the PnP probe is merely a stub as we make assumptions about some of this
hardware before we have probed it.

Since these devices (with the exception of the speaker) are 'standard',
suppress output in the !bootverbose case to clean up the probe messages
somewhat.


61992 23-Jun-2000 msmith

Stop trying to do anything funny with the interrupt resource range. The
AT PIC will consume IRQ 2 correctly in the !APIC_IO case.


61974 22-Jun-2000 green

Rename macros to all-uppercase. Get rid of a comment that was ironic
(I goofed on the bitshifts myself long ago ;) and a bit redundant:
code should be clear enough that it seldom needs comments at all.


61802 18-Jun-2000 phk

/152x/s/sound/SCSI/


61717 15-Jun-2000 phk

Add disk_enumerate() for finding names of disks. Vinum and libh will
need this RSN.

Remove a pointless warning in the root device locating code.

Remove the "wd" compatibility name from the "ad" driver.

WARNING: If you have not updated to use /dev/wd* in your /etc/fstab
and modern bootblocks, it would be a very good idea to do so BEFORE
you upgrade your kernel.


61623 13-Jun-2000 kato

Recognize Coppermine Celeron processors whose CPU ID = 0x68?. They
were recognized as "Pentium III/Pentium III Xeon."


61616 13-Jun-2000 kato

Added new options CPU_PPRO2CELERON and CPU_L2_LATENCY to support
Socket 8 to 370 converters. When (1) CPU_PPRO2CELERON option is
defined, (2) Intel CPU is found and (3) CPU ID is 0x66?, L2 cache is
enabled through MSR 0x11e. The L2 cache latency value can be
specified by CPU_L2_LATENCY option. Default value of L2 cache latency
is 5.

These options are useful if you use Socket 8 to Socket 370 converter
(e.g. Power Leap's PL-Pro/II.) Most PentiumPro BIOSs don't enable L2
cache of Mendocino Celeron CPUs because they don't know Celeron CPUs.
These options are needles if you use a Coppermine (FCPGA) Celeron or
PentiumIII, becuase the L2 cache enable bit is hard wired and L2 cache
is always enabled.


61533 10-Jun-2000 msmith

Don't include opt_smp.h - we don't use anything defined in it.


61531 10-Jun-2000 msmith

Correct the tests for ISA PIC/APIC so that they actually work.


61475 10-Jun-2000 peter

Unused include: #include "scbus.h"


61474 10-Jun-2000 peter

Unused include: #include "ether.h"


61469 10-Jun-2000 peter

Add option BROKEN_KEYBOARD_RESET to an opt_*.h file


61422 08-Jun-2000 bde

Always include the full symbol table (as specified by its start and
end values in bootinfo) in kernel space if it is loaded (i.e., if its
specified end address is nonzero), not just if it is loaded and DDB
is configured. This may be used to fix kldsym(2) for booting without
/dev/loader; currently, in this case, it just fixes unused pointers
and wastes space consistently. For booting in the normal way with
/boot/loader, the table is included and pointed to in a different way
and kldsym(2) works.


61362 07-Jun-2000 iwasaki

Fix gdt pointer for the current cpu on SMP.
This will support power-off only. Fix for suspend/resume will come later.
Also, MFC on this is shceduled on next week.

Submitted by: sumitani@bd2.hnes.nec.co.jp
Reviewed by: jlemon


61339 06-Jun-2000 dillon

INTR_TYPE_FAST / FAST_INTR interrupts (currently just serial interrupts)
have their own lock and do not need the MP lock. The SMP cleanup was
a little too conservative in MP locking fast interrupts but at least
it's trivial to fix. MFC soon.

Submitted by: bde


61220 03-Jun-2000 bde

Fixed some style bugs in the signal handling funcations. This doesn't
change the object file.


61136 31-May-2000 msmith

Further fixes for multiple-IO-APIC systems from Tor Egge:

Further experimentation showed that some Dell 2450 machines with the
prevention kludge installed still got T_RESERVED traps. CPU interrupt
vector 0x7A was observed to be triggered. This might have been the
bitwise OR of two different vectors sent from each of the IOAPICs at
the same time.

IOAPIC #0: 0x68 --> irq 8: RTC timer interrupt
IOAPIC #1: 0x32 --> irq 18: scsi host adapter or network interface
----
0x7a --> T_RESERVED

Both IOAPICs had ID 0.

Appendix B.3 in the MP spec indicates that the operating system is
responsible for assigning unique IDs to the IOAPICs.

The enclosed patch programs the IOAPIC IDs according to the IOAPIC
entries in the MP table.

Submitted by: tegge


61130 31-May-2000 bde

Pack the SWI bits to save some time and space.


61126 31-May-2000 bde

Add SWI_TQ_MASK to all interrupt masks except SWI_CLOCK_MASK. Use a
new macro SWI_LOW_MASK to give the mask for low priority SWIs instead
of hard-coding this mask as SWI_CLOCK_MASK.

Reviewed by: dfr


61081 29-May-2000 dillon

This is a cleanup patch to Peter's new OBJT_PHYS VM object type
and sysv shared memory support for it. It implements a new
PG_UNMANAGED flag that has slightly different characteristics
from PG_FICTICIOUS.

A new sysctl, kern.ipc.shm_use_phys has been added to enable the
use of physically-backed sysv shared memory rather then swap-backed.
Physically backed shm segments are not tracked with PV entries,
allowing programs which use a large shm segment as a rendezvous
point to operate without eating an insane amount of KVM in the
PV entry management. Read: Oracle.

Peter's OBJT_PHYS object will also allow us to eventually implement
page-table sharing and/or 4MB physical page support for such segments.
We're half way there.


61075 29-May-2000 dfr

Add SWI_TQ_MASK to imask definition.


61074 29-May-2000 dfr

Brucify the pmap_enter_temporary() changes.


61036 28-May-2000 dfr

Add a new pmap entry point, pmap_enter_temporary() to be used during
dumps to create temporary page mappings. This replaces the use of CADDR1
which is fairly x86 specific.

Reviewed by: dillon


60973 27-May-2000 jhb

- Remove unnecessary 'data32' and 'addr32' prefixes and #define's.
- Go ahead and use 'lgdt' again instead of hand-assembling the instruction.
During testing this code worked fine. If for some reason a 32-bit offset
is needed, 'lgdtl' should be used instead of reverting to manual machine
code.

Tested by: peter


60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


60933 25-May-2000 tegge

Reintroduce a workaround for a gas bug (misassembled lgdt instruction)
Use .code16 for the real mode part of the AP bootstrap trampoline code.


60881 24-May-2000 peter

pmap_enter() masked off the page offset bits, pmap_kenter() did not.
This (I believe) is the cause of the XFree86 startup and/or mptable(8)
panics when programs were reading from /dev/mem at non-page-aligned
offsets. The offsets were being converted into random page flags in the
page tables. :-( (including PG_PS = 4MB page size)


60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


60804 22-May-2000 obrien

Sort the sys includes.


60803 22-May-2000 obrien

AT&T asm syntax requires a leading '*' in front of the operand for
indirect calls and jumps.


60755 21-May-2000 peter

Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly
to various pmap_*() functions instead of looking up the physical address
and passing that. In many cases, the first thing the pmap code was doing
was going to a lot of trouble to get back the original vm_page_t, or
it's shadow pv_table entry.

Inspired by: John Dyson's 1998 patches.

Also:
Eliminate pv_table as a seperate thing and build it into a machine
dependent part of vm_page_t. This eliminates having a seperate set of
structions that shadow each other in a 1:1 fashion that we often went to
a lot of trouble to translate from one to the other. (see above)
This happens to save 4 bytes of physical memory for each page in the
system. (8 bytes on the Alpha).

Eliminate the use of the phys_avail[] array to determine if a page is
managed (ie: it has pv_entries etc). Store this information in a flag.
Things like device_pager set it because they create vm_page_t's on the
fly that do not have pv_entries. This makes it easier to "unmanage" a
page of physical memory (this will be taken advantage of in subsequent
commits).

Add a function to add a new page to the freelist. This could be used
for reclaiming the previously wasted pages left over from preloaded
loader(8) files.

Reviewed by: dillon


60676 18-May-2000 grog

Correct previous commit: solve the "stopped clock" syndrome in remote
kernel debugger.


60668 17-May-2000 msmith

If we are running in APIC_IO mode, pretend that we didn't see the BIOS
reporting an AT PIC. We do this because otherwise the PIC will claim
IRQ 2 in an unshareable mode, preventing other devices from legitimately
using it.

For symmetry, in !APIC_IO mode, ignore the APIC if it's reported.

This is a hack; a better solution would have the PIC's driver release
the IRQ if it was not going to be active.


60346 11-May-2000 peter

Move <machine/ipl.h> outside #ifdef SMP because it supplies AST_RESCHED.
Without this, it shows up as an undefined symbol in /kernel. (!)
(This looks very freaky when doing a nm /kernel!)


60345 11-May-2000 msmith

Attempt to work around problems caused by spurious interrupts and
uninitialised interrupts in the APIC. This seems to fix the problems
being seen on systems using the RCC chipsets, eg. Dell PowerEdge 24x0.

The actual nature of the problem probably needs further investigation,
but this patch allows us to actually function on these systems.

Submitted by: Drew Eckhardt <drew@Poohsticks.Org>


60303 10-May-2000 obrien

1. `movl' is for use with 32-bit operands. Do NOT use it with 16-bit
operands. `movw' could be used, but instead let the assembler decide
the right instruction to use.
2. AT&T asm syntax requires a leading '*' in front of the operand for
indirect calls and jumps.


60298 10-May-2000 obrien

Do not specify the size to move. Allow the assembler to figure it out.


60162 07-May-2000 ps

Fix checksum calculations. This should fix the network problems
in current where all packets were returning with bad checksums.
(observed with netstat -s).

Reviewed by: alfred


60104 06-May-2000 jlemon

Make in_cksum() a macro call to in_cksum_skip(), since it provides the
same functionality. Sharing code should help cache issues.

Remove in_cksum_partial, since its not being used, and we now have
a way to compute partial checksums on mbuf chains.


60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


59926 03-May-2000 dwhite

I mentioned yesterday that I could use some work, and Kelly says, "Commit my
PRs!" So here I go.

Add definitions for some of the AMD CPU feature bits. Also add a comment on
where to find the rest of them. This is a purely cosmetic change.

PR: i386/14438
Submitted by: Kelly Yancey <kbyanc@egroups.net>


59914 03-May-2000 phk

Remove 42 unneeded #include <sys/ioccom.h>.

ioccom.h defines only implementation detail, and should therefore
only be included from the #include which defines the ioctl tags,
in other words: never include it from *.c


59839 01-May-2000 peter

Move the MSG* and SEM* options to opt_sysvipc.h
Remove evil allocation macros from machdep.c (why was that there???) and
use malloc() instead.
Move paramters out of param.h and into the code itself.
Move a bunch of internal definitions from public sys/*.h headers (without
#ifdef _KERNEL even) into the code itself.

I had hoped to make some of this more dynamic, but the cost of doing
wakeups on all sleeping processes on old arrays was too frightening.
The other possibility is to initialize on the first use, and allow
dynamic sysctl changes to parameters right until that point. That would
allow /etc/rc.sysctl to change SEM* and MSG* defaults as we presently
do with SHM*, but without the nightmare of changing a running system.


59664 26-Apr-2000 dillon

Remove synchronizing instruction in MP unlock code. It turns out
not to be necessary.


59615 25-Apr-2000 grog

Fix a long-standing bug which caused massive character loss in remote
serial gdb: interrupts were causing either overruns or stealing
characters. Put splhigh() around the routines which transfer packets
across the line. Since this happens when the system is halted in
debug, this doesn't cause any particular problem. Now it is possible
to run the link at 115,200 bps.

PR: (not assigned yet, must be in limbo somewhere)

Add partial support for detecting non-existent gdb devices.

Add $FreeBSD$ tag.


59604 24-Apr-2000 obrien

* Use sys/sys/random.h rather than a i386 specific one.
* There was nothing that should be machine dependant about
i386/isa/random_machdep.c, so it is now sys/kern/kern_random.c.


59539 23-Apr-2000 nyan

Disable PCI BIOS on PC-98.


59495 22-Apr-2000 nyan

- PC-98 uses IRQ2 too.
- Fixed the range of DMA channels on PC-98.

Submitted by: "T.Yamaoka" <taka@windows.squares.net>


59440 20-Apr-2000 luoqi

IO apics are not necessarily page aligned, they are only required to be aligned
on 1K boundary. Correct a typo that would cause problem to a second IO apic.

Pointed out by: Steve Passe <smp.csn.net>


59368 18-Apr-2000 phk

Remove unneeded <sys/buf.h> includes.

Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks
by 924 bytes.


59294 16-Apr-2000 msmith

Some more i386-only BIOS-friendliness:

- Add support for using the PCI BIOS functions for configuration space
accesses, and make this the default.

- Make PNPBIOS the default (obsoletes the PNPBIOS config option).

- Add two new boot-time tunables to disable each of the above.


59249 15-Apr-2000 phk

Complete the bio/buf divorce for all code below devfs::strategy

Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.

CCD not converted yet, casts to struct buf (still safe)

atapi-cd casts to struct buf to examine B_PHYS


58962 03-Apr-2000 msmith

Remove the !(I386 & SMP) tests; we don't run SMP on an i386 system, and
they break the LINT build.


58941 02-Apr-2000 dillon

Make the sigprocmask() and geteuid() system calls MP SAFE. Expand
commentary for copyin/copyout to indicate that they are MP SAFE as
well.

Reviewed by: msmith


58937 02-Apr-2000 shin

Print out more hints at illegal cksum len value.


58934 02-Apr-2000 phk

Move B_ERROR flag to b_ioflags and call it BIO_ERROR.

(Much of this done by script)

Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.

Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.

Add bio_queue field for struct bio aware disksort.

Address a lot of stylistic issues brought up by bde.


58820 30-Mar-2000 peter

Make sysv-style shared memory tuneable params fully runtime adjustable
via sysctl. It's done pretty simply but it should be quite adequate.
Also move SHMMAXPGS from $machine/include/vmparam.h as the comments that
went with it were wrong... we don't allocate KVM space for the pages so
that comment is bogus.. The only practical limit is how much physical
ram you want to lock up as this stuff isn't paged out or swap backed.


58786 29-Mar-2000 kato

PC-98 BIOS copies the DX register into its work area. The value of it
shows `CPUID' and it is useful to identify CPU. So, it is copied from
BIOS work area to the cpu_id variable (PC-98 only).

Submitted by: chi@bd.mbn.or.jp (Chiharu Shibata)


58764 29-Mar-2000 dillon

The SMP cleanup commit broke need_resched, this fixes that and also
removed unncessary MPLOCKED and 'lock' prefixes from the interrupt
nesting level, since (A) the MP lock is held at the time, and (B) since
the neting level is restored prior to return any interrupted code
will see a consistent value.


58762 29-Mar-2000 kato

Added indirect pio into the bus space stuff for the NEC PC-98. bus.h
includes one of bus_at386.h and bus_pc98.h. Becuase only bus_pc98.h
supports indirect pio and bus_at386.h is identical to old bus.h, there
is no functional change in PC-AT's kernels. That is, it cannot cause
performance loss.

Submitted by: nyan
Reviewed by: imp
bde and luoqi provided useful comments for earlier version.


58755 28-Mar-2000 dillon

The SMP cleanup commit broke UP compiles. Make UP compiles work again.


58717 28-Mar-2000 dillon

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


58706 27-Mar-2000 dillon

Commit the buffer cache cleanup patch to 4.x and 5.x. This patch fixes a
fragmentation problem due to geteblk() reserving too much space for the
buffer and imposes a larger granularity (16K) on KVA reservations for
the buffer cache to avoid fragmentation issues. The buffer cache size
calculations have been redone to simplify them (fewer defines, better
comments, less chance of running out of KVA).

The geteblk() fix solves a performance problem that DG was able reproduce.

This patch does not completely fix the KVA fragmentation problems, but
it goes a long way

Mostly Reviewed by: bde and others
Approved by: jkh


58698 27-Mar-2000 jlemon

Add support for offloading IP/TCP/UDP checksums to NIC hardware which
supports them.


58538 24-Mar-2000 jhb

Update sysinstall to use struct uc_device instead of struct isa_device
for generating /boot/kernel.conf. Since this structure is shared, move
its definition out to a header file, just as struct isa_device was defined
in a header file. This fixes the sysinstall breakage in -current.


58377 20-Mar-2000 phk

Isolate the Timecounter internals in their own two files.

Make the public interface more systematically named.

Remove the alternate method, it doesn't do any good, only ruins performance.

Add counters to profile the usage of the 8 access functions.

Apply the beer-ware to my code.

The weird +/- counts are caused by two repocopies behind the scenes:
kern/kern_clock.c -> kern/kern_tc.c
sys/time.h -> sys/timetc.h
(thanks peter!)


58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


58304 19-Mar-2000 green

Set the bits in the mask from the left to the right, not backwards.

Submitted by: Coleman Kane <cokane@one.net>


58286 19-Mar-2000 peter

Completely decouple userconfig from 'struct isa_device' and friends.
Userconfig was only using this for convenience since newbus, and even
then it was rather inconvenient.


58132 16-Mar-2000 phk

Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.


58113 15-Mar-2000 peter

Remove stray pointers to wdc; it was removed a while back.


58105 15-Mar-2000 msmith

Trim a couple of over-long strings

Submitted by: des


57996 13-Mar-2000 bde

Disabled the optimization of not doing an invltlb_1pg() when changing
pte's from zero. The TLB is supposed to be invalidated when pte's are
changed _to_ zero, but this doesn't occur in all cases for global pages
(PG_G stops invltlb() from working, and invltlb_1pg() is not used
enough).

PR: 14141, 16568
Submitted by: dillon


57836 09-Mar-2000 msmith

Remove the PCI device class and all the PCI-only devices. With the newbus
conversion this no longer works, and userconfig is not expected to ever
again be used for these devices.

Approved by: jkh


57704 02-Mar-2000 dufault

I applied the wrong patch set. Back out anything associated
with the known bogus currtpriority. This undoes the previous changes to
sys/i386/i386/trap.c, sys/alpha/alpha/trap.c, sys/sys/systm.h

Now we have the patch set approved by bde.

Approved by: bde


57701 02-Mar-2000 dufault

Patches that eliminate extra context switches in FIFO case.
Fixes p1003_1b regression test in the simple case of no RR and
FIFO processes competing.

Reviewed by: jkh, bde


57571 28-Feb-2000 bsd

Reset the hardware debug registers when exec'ing a new image.

Reviewed by: bde,jlemon
Approved by: jkh


57548 28-Feb-2000 green

Fix a repetition typo about the settings the settings.

Submitted by: Kris Dow <kris@vilnya.demon.co.uk>


57362 20-Feb-2000 bsd

Don't forget to reset the hardware debug registers when a process that
was using them exits.

Don't allow a user process to cause the kernel to take a TRCTRAP on a
user space address.

Reviewed by: jlemon, sef
Approved by: jkh


57241 15-Feb-2000 jkh

Add userconfig entries for the new atapi devices.
Submitted by: Dan Papasian <bugg@bugg.strangled.net>


57178 13-Feb-2000 peter

Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs. Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by: jkh


56865 29-Jan-2000 peter

Zap isa_device -> id_conflicts. The sole user of it (userconfig) never
actually used it since the only device that specified it (vga0) was marked
as "FLG_INVISIBLE" in userconfig and therefore never shown.

Suggested by: bde


56814 29-Jan-2000 peter

Remove a workaround for a gas bug. It couldn't assemble a certain
lgdt instruction, but the binutils based one is fine and has been
for ages.


56797 29-Jan-2000 kato

Simplify messages of Pentium II, Pentium II Xeon, Celeron, Pentium III
and Pentium III Xeon CPUs. If a CPU is one of Pentium II, Pentium II
Xeon and Celeron, the message is always "Pentium II/Pentium II
Xeon/Celeron". If a CPU is one of Pentium III and Pentium III Xeon,
the message is always "Pentium III/Pentium III Xeon".


56623 26-Jan-2000 msmith

Correctly initialise the available IRQ numbers in the APIC_IO case.
IRQ 2 was being unilaterally disallowed, which is only appropriate if
the interrupt hardware is the traditional chained PIC arrangement.

Reviewed by: tegge (in principle)


56439 23-Jan-2000 peter

Clean up some more loose ends..
isa_device->id_ri_flags and RI_FAST were not implemented and did nothing.
The two drivers that were mistakenly thinking this was working were
cy.c and loran.c - these should be converted to newbus.
GC (garbage collect) isa_device->id_alive
GC userconfig.c references to isa_device->id_scsiid (!).


56437 23-Jan-2000 peter

GC isa_device->id_reconfig - it's not referenced anywhere anymore.
GC reconfig_isadev() - it's not used anymore.


56243 18-Jan-2000 billf

Cast rman_get_virtual() to a vm_offset_t.

Submitted by: msmith


56237 18-Jan-2000 alfred

unbreak (rv -> r), afaik what Mike intended, boots fine on my machine


56213 18-Jan-2000 msmith

Don't try to map memory resources into the kernel until they're actually
activated. Some of the things that get listed as "resources" aren't
necessarily suited for this.

(This shouldn't be a problem for any driver that correctly passes
RF_ACTIVE)


56048 15-Jan-2000 hosokawa

Added "sn" and "pcic" entries to UserConfig.


56024 15-Jan-2000 tanimura

A processor with the CPUID of 0x?8? is Pentium III.
(aka Coppermine)

Noticed by: Satoshi Sawada <k-sawata@gnoc2.comminet.or.jp>
Reviewd by: Takuma Yamada <fuzzy2@st.rim.or.jp>


55953 14-Jan-2000 peter

Pre 4.0 tidy up.

Collect together the components of several drivers and export eisa from
the i386-only area (It's not, it's on some alphas too). The code hasn't
been updated to work on the Alpha yet, but that can come later.

Repository copies were done a while ago.
Moving these now keeps them in consistant place across the 4.x series
as the newbusification progresses.

Submitted by: mdodd


55944 14-Jan-2000 wpaul

Add device driver support for USB ethernet adapters based on the CATC
USB-EL1202A chipset. Between this and the other two drivers, we should
have support for pretty much every USB ethernet adapter on the market.
The only other USB chip that I know of is the SMC USB97C196, and right
now I don't know of any adapters that use it (including the ones made
by SMC :/ ).

Note that the CATC chip supports a nifty feature: read and write combining.
This allows multiple ethernet packets to be transfered in a single USB
bulk in/out transaction. However I'm again having trouble with large
bulk in transfers like I did with the ADMtek chip, which leads me to
believe that our USB stack needs some work before we can really make
use of this feature. When/if things improve, I intend to revisit the
aue and cue drivers. For now, I've lost enough sanity points.


55891 13-Jan-2000 mdodd

Allow SMP systems with an MCA bus to work properly.

Reviewed by: peter


55823 11-Jan-2000 yokota

Add a new mechanism, cndbctl(), to tell the console driver that
ddb is entered. Don't refer to `in_Debugger' to see if we
are in the debugger. (The variable used to be static in Debugger()
and wasn't updated if ddb is entered via traps and panic anyway.)

- Don't refer to `in_Debugger'.
- Add `db_active' to i386/i386/db_interface.d (as in
alpha/alpha/db_interface.c).
- Remove cnpollc() stub from ddb/db_input.c.
- Add the dbctl function to syscons, pcvt, and sio. (The function for
pcvt and sio is noop at the moment.)

Jointly developed by: bde and me

(The final version was tweaked by me and not reviewed by bde. Thus,
if there is any error in this commit, that is entirely of mine, not
his.)

Some changes were obtained from: NetBSD


55604 08-Jan-2000 bde

Compile genassym.c with ordinary ${CFLAGS}. The (small) needs for
${GEN_CFLAGS} and -U_KERNEL became negative when all all the
genassym.c's were converted to be cross-built.

Makefile.*:
- Cleanups associated with the old genassym.
- Fixed deprecated spelling of ${.IMPSRC} as "$<".


55545 07-Jan-2000 marcel

Use genassym(1). The definitions of NKPDE and NKPT have been removed
because they are already defined in pmap.h, resulting in duplicate
definitions.

Reviewed by: bde


55540 07-Jan-2000 luoqi

Allow SMP && NCPU == 1 to work. From now on, there's no restriction on the
value of NCPU relative to the number of cpus physically present, the actual
number of cpus utilized will be the smaller of the two.


55429 05-Jan-2000 wpaul

Add device driver support for USB ethernet adapters based on the
Kawasaki LSI KL5KUSB101B chip, including the LinkSys USB10T, the
Entrega NET-USB-E45, the Peracom USB Ethernet Adapter, the 3Com
3c19250 and the ADS Technologies USB-10BT. This device is 10mbs
half-duplex only, so there's miibus or ifmedia support. This device
also requires firmware to be loaded into it, however KLSI allows
redistribution of the firmware images (I specifically asked about
this; they said it was ok).

Special thanks to Annelise Anderson for getting me in touch with
KLSI (eventually) and thanks to KLSI for providing the necessary
programming info.

Highlights:
- Add driver files to /sys/dev/usb
- update usbdevs and regenerate attendate files
- update usb_quirks.c
- Update HARDWARE.TXT and RELNOTES.TXT for i386 and alpha
- Update LINT, GENERIC and others for i386, alpha and pc98
- Add man page
- Add module
- Update sysinstall and userconfig.c


55420 04-Jan-2000 tegge

ISA device drivers use the ISA source interrupt number in locations where
the low level interrupt handler number should be used. Change
setup_apic_irq_mapping() to allocate low level interrupt handler X (Xintr${X})
for any ISA interrupt X mentioned in the MP table.

Remove an assumption in the driver for the system clock (clock.c) that
interrupts mentioned in the MP table as delivered to IOAPIC #0 intpin Y
is handled by low level interrupt handler Y (Xintr${Y}) but don't assume
that low level interrupt handler 0 (Xintr0) is used.

Don't allocate two low level interrupt handlers for the system clock.
Reviewed by: NOKUBI Hirotaka <hnokubi@yyy.or.jp>


55312 02-Jan-2000 phk

Move the "sti" instruction to right before the "hlt" to close a tiny
race condition.

Obtained from: bde and/or obrien


55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


55162 28-Dec-1999 wpaul

This commit adds device driver support for the ADMtek AN986 Pegasus
USB ethernet chip. Adapters that use this chip include the LinkSys
USB100TX. There are a few others, but I'm not certain of their
availability in the U.S. I used an ADMtek eval board for development.
Note that while the ADMtek chip is a 100Mbps device, you can't really
get 100Mbps speeds over USB. Regardless, this driver uses miibus to
allow speed and duplex mode selection as well as autonegotiation.
Building and kldloading the driver as a module is also supported.

Note that in order to make this driver work, I had to make what some
may consider an ugly hack to sys/dev/usb/usbdi.c. The usbd_transfer()
function will use tsleep() for synchronous transfers that don't complete
right away. This is a problem since there are times when we need to
do sync transfers from an interrupt context (i.e. when reading registers
from the MAC via the control endpoint), where tsleep() us a no-no.
My hack allows the driver to have the code poll for transfer completion
subject to the xfer->timeout timeout rather that calling tsleep().
This hack is controlled by a quirk entry and is only enabled for the
ADMtek device.

Now, I'm sure there are a few of you out there ready to jump on me
and suggest some other approach that doesn't involve a busy wait. The
only solution that might work is to handle the interrupts in a kernel
thread, where you may have something resembling a process context that
makes it okay to tsleep(). This is lovely, except we don't have any
mechanism like that now, and I'm not about to implement such a thing
myself since it's beyond the scope of driver development. (Translation:
I'll be damned if I know how to do it.) If FreeBSD ever aquires such
a mechanism, I'll be glad to revisit the driver to take advantage of
it. In the meantime, I settled for what I perceived to be the solution
that involved the least amount of code changes. In general, the hit
is pretty light.

Also note that my only USB test box has a UHCI controller: I haven't
I don't have a machine with an OHCI controller available.

Highlights:

- Updated usb_quirks.* to add UQ_NO_TSLEEP quirk for ADMtek part.
- Updated usbdevs and regenerated generated files
- Updated HARDWARE.TXT and RELNOTES.TXT files
- Updated sysinstall/device.c and userconfig.c
- Updated kernel configs -- device aue0 is commented out by default
- Updated /sys/conf/files
- Added new kld module directory


55117 26-Dec-1999 bde

Don't include <isa/isavar.h> or compile code depending on it when isa
is not configured. Including <isa/isavar.h> when it is not used is
harmful as well as bogus, since it includes "isa_if.h" which is not
generated when isa is not configured.


55110 26-Dec-1999 bde

Fixed breakage of read-only opening of /dev/*mem at securelevel > 0 in
previous pair of commits.

Spell the "securelevel > 0" check consistently.

Use the proc arg instead of curproc in mmopen() and mmclose().


55098 25-Dec-1999 bde

Fixed races accessing the RTC. The races apparently caused
apm_default_resume() to sometimes set a very wrong time.
(1) Accesses to the RTC index and data registers were not atomic enough.
Interrupts were not masked. This was only good enough until an
interrupt handler (rtcintr()) started accessing the RTC in FreeBSD-2.0.
(2) Access to the block of time registers in inittodr() was not atomic
enough. inittodr() has 244us to read the time registers. Interrupts
were not masked. This was only good enough until something (apm)
started calling inittodr() after boot time in FreeBSD-2.0.
The fix for (2) also makes the timecounter update more atomic, although
this is currently unimportant due to the low resolution of the RTC.

Problem reported by: mckay


54952 21-Dec-1999 eivind

Change incorrect NULLs to 0s


54890 20-Dec-1999 peter

Remove references to register_intr() etc in comments.


54882 20-Dec-1999 sheldonh

Complement the sum as required in in_cksum_finalize().

PR: 15472
Submitted by: wollman


54651 15-Dec-1999 tegge

Typo fix.


54618 15-Dec-1999 tegge

apic_irq() returns -1 when there is no match for (IOAPIC, intpin) pair.

Adjust some comments to better match the code.


54393 10-Dec-1999 phk

Remove reference to ze and zp drivers.


54210 06-Dec-1999 peter

Update for loss of pnp.h


54192 06-Dec-1999 luoqi

Need header <machine/smp.h> for prototype declaration of smp_rendezvous()
in my previous commit.


54188 06-Dec-1999 luoqi

User ldt sharing.


54134 04-Dec-1999 wpaul

Add the if_dc driver and remove all of the al, ax, dm, pn and mx drivers
which it replaces. The new driver supports all of the chips supported
by the ones it replaces, as well as many DEC/Intel 21143 10/100 cards.

This also completes my quest to convert things to miibus and add
Alpha support.


54128 04-Dec-1999 kato

The address 0x472 is used for the SCSI HDD geometry information on
PC-98. Therefore, the PC-98 kernel should not modify it.


54121 04-Dec-1999 marcel

oszsigcode -> szosigcode

Pointed out by: bde


54120 04-Dec-1999 marcel

Fix type of sf_addr.

Pointed out by: bde


54073 03-Dec-1999 mdodd

Remove the 'ivars' arguement to device_add_child() and
device_add_child_ordered(). 'ivars' may now be set using the
device_set_ivars() function.

This makes it easier for us to change how arbitrary data structures are
associated with a device_t. Eventually we won't be modifying device_t
to add additional pointers for ivars, softc data etc.

Despite my best efforts I've probably forgotten something so let me know
if this breaks anything. I've been running with this change for months
and its been quite involved actually isolating all the changes from
the rest of the local changes in my tree.

Reviewed by: peter, dfr


53888 29-Nov-1999 dillon

Make BOOTP work again.

Submitted by: Doug Ambrisko <ambrisko@whistle.com>


53879 29-Nov-1999 phk

How hard can it be to implement a 24bit counter in hardware ?

Make sure we read a likely value from the PIIX timecounter.

This should fix a large fraction of the "calcru: negative time"
warnings produced by SMP machines.

Another hole in one by: bde
Didn't belive Bruce: phk


53745 27-Nov-1999 bde

Moved scheduling-related code to kern_synch.c so that it is easier to fix
and extend. The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).

Agreed with in principle by: dufault


53706 26-Nov-1999 julian

Fix out-of-date comment


53648 24-Nov-1999 archie

Change the prototype of the strto* routines to make the second
parameter a char ** instead of a const char **. This make these
kernel routines consistent with the corresponding libc userland
routines.

Which is actually 'correct' is debatable, but consistency and
following the spec was deemed more important in this case.

Reviewed by (in concept): phk, bde


53641 23-Nov-1999 dillon

Even better then using %fs:0 in our locked synchronizing instruction,
we instead use 0(%esp), which is per-cpu, already pretty much
guarenteed to be locked into the cache, and does not stress the cache's
set associativity. invlpg might also be a good choice (suggested by
Ingo).

Obtained from: Linus Torvalds <torvalds@transmeta.com>


53639 23-Nov-1999 dillon

Add in required instruction serialization prior to releasing the
MP lock for the last time. The use of a locked instruction to
cpu-private memory is 3x faster then CPUID and 3x faster then the
use of a locked instruction to shared memory (the lock itself).

Instruction serialization is required to ensure that any pending
memory ops are properly flushed prior to the release of the lock,
due to out-of-order instruction execution by the cpu.


53624 23-Nov-1999 green

Fix a confusion between osigcontext and ucontext_t in the previous commit.
Since an osigcontext is smaller, if you check for a valid (much larger sized)
ucontext_t and it fails, we bogusly would reject the osigcontext as per
rev 1.378. Instead, check for osigcontext range validity first, and
ucontext_t later. This unbreaks Netscape.

Pointed to the right commit by: peter


53504 21-Nov-1999 pho

Moved useracc() to top of sigreturn as to avoid panic
caused by invalid arguments to rutine.

Reviewed by: marcel, phk


53503 21-Nov-1999 phk

s/p_cred->pc_ucred/p_ucred/g


53433 19-Nov-1999 phk

Use LIST_FOREACH to traverse the allproc list.

Submitted by: Jake Burkholder jake@checker.org


53425 19-Nov-1999 dillon

Optimize two cases in the MP locking code. First, it is not necessary
to use a locked cmpexg when unlocking a lock that we already hold, since
nobody else can touch the lock while we hold it. Second, it is not
necessary to use a locked cmpexg when locking a lock that we already
hold, for the same reason. These changes will allow MP locks to be used
recursively without impacting performance.

Modify two procedures that are called only by assembly and are already
NOPROF entries to pass a critical argument in %edx instead of on the
stack, removing a significant amount of code from the critical path
as a consequence.

Reviewed by: Alfred Perlstein <bright@wintelcom.net>, Peter Wemm <peter@netplex.com.au>


53106 12-Nov-1999 marcel

Change the type of sf_addr in struct {o}sigframe from char* to
register_t.

Fix some style bugs and bitrotted comments.

Submitted by: bde


53045 09-Nov-1999 alc

Passing "0" or "FALSE" as the fourth argument to vm_fault is wrong. It
should be "VM_FAULT_NORMAL".


52968 07-Nov-1999 phk

Patch got this one wrong, we want to check securelevel in open()


52967 07-Nov-1999 phk

Remove the iskmemdev() function. Make it the responsibility of the mem.c
drivers to enforce the securelevel checks.


52805 02-Nov-1999 jhb

Remove the prototypes for two functions that were removed when the
CD9660_ROOT option was axed.


52791 02-Nov-1999 phk

Remove two private copies of strtoul()

Spotted by: bde


52778 01-Nov-1999 msmith

This is a complete rewrite of vfs_conf.c, which changes the way the root
filesystem is discovered. Preference is given to using the kernel
environment variable vfs.root.mountfrom, which is set by the loader
according to the contents of /etc/fstab. Changes in the MD code
provide fallback mechanisms for systems not using the loader.

A more robust fallback path is also provided, with the last recourse
being to prompt on the console for a root device.

These changes drastically simplify the machine-dependant parts of
the root configuration process. In addition, support for CDROM root
devices has been removed; it was a nasty hack and didn't work.


52720 31-Oct-1999 alc

The useracc() calls in osigreturn() and sigreturn() should specify
VM_PROT_READ rather than VM_PROT_WRITE. (This mistake predates
the B_READ/B_WRITE -> VM_PROT_READ/VM_PROT_WRITE change.)

Submitted by: bde


52669 30-Oct-1999 iwasaki

i8254_restore is called from apm_default_resume() to reload
the countdown register.
this should not be necessary but there are broken laptops that
do not restore the countdown register on resume.
when it happnes, it messes up the hardclock interval and system clock,
which leads to the infamous "calcru: negative time" problem.

Submitted by: kjc, iwasaki
Reviewed by: Steve O'Hara-Smith <steveo@eircom.net> and committers.
Obtained from: PAO3


52647 30-Oct-1999 alc

The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to
eliminate an extra (useless) level of indirection in half of the page
queue accesses and (2) to use a single name for each queue throughout,
instead of, e.g., "vm_page_queue_active" in some places and
"vm_page_queues[PQ_ACTIVE]" in others.

Reviewed by: dillon


52644 30-Oct-1999 phk

Change useracc() and kernacc() to use VM_PROT_{READ|WRITE|EXECUTE} for the
"rw" argument, rather than hijacking B_{READ|WRITE}.

Fix two bugs (physio & cam) resulting by the confusion caused by this.

Submitted by: Tor.Egge@fast.no
Reviewed by: alc, ken (partly)


52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


52625 29-Oct-1999 phk

Remove #ifdef notyet code for doing I/O in a way we never will do it.


52469 24-Oct-1999 alc

Add text for the Athlon's MMX and 3DNow! (DSP) instruction extensions
to print_AMD_features.


52452 24-Oct-1999 dillon

Adjust the buffer cache to better handle small-memory machines. A
slightly older version of this code was tested by BDE and I.

Also fixes a lockup situation when kva gets too fragmented.

Remove the maxvmiobufspace variable and sysctl, they are no longer
used. Also cleanup (remove) #if 0 sections from prior commits.

This code is more of a hack, but presumably the whole buffer cache
implementation is going to be rewritten in the next year so it's no
big deal.


52397 19-Oct-1999 peter

Remove pccard attachment stub, this caused pccard unit 0 to be allocated
and unusable by the pccard system since pccard doesn't attach to the
nexus any more. This was stopping my 3c589D from working as pccard unit
0 is used directly for resource allocation and this fails when unit 0
isn't actually attached to anything.


52271 15-Oct-1999 tegge

Eliminate remaining part of incorrect PCI bus numbering sanity check on systems with more than one PCI bus.


52241 14-Oct-1999 dfr

* Add some verbose logging to the PnP parser and fix a couple of bugs.
* Move pnp_eisaformat() to pnp.c, declared in <isa/pnpvar.h>.
* Turn the pnpbios code into an enumerator for the isa bus. This allows
all devices known to the bios to be probed automatically.

Currently the pnpbios code is dependant on the PNPBIOS option. As the code
is tested more and when more drivers are converted this will be made the
default. I have PnP changes in the wings for fdc, atkbd, psm, pcaudio, and
joy. Sio already works with pnpbios.


52237 14-Oct-1999 kato

Recognize Pentium II w/ CPUID = 0x6XX and Pentium III Xeon w/ CPUID =
0x7XX.

Pointed out by: Brian Somers <brian@Awfulhak.org>


52199 13-Oct-1999 marcel

Fix a security bug. eflags was copied verbatim from userland.

Submitted by: bde


52177 12-Oct-1999 green

Enable MTRR support for K7 (Athlon) processors, which happens to have the
same interface as Intel's P6 family has. Incidentally, I had disabled
it in the first place since I knew the K7s were coming out soon but
did not want to assume they'd have the same MTRR interface as Intel's
chips.

Submitted by: Ville-Pertti Keinonen <will@iki.fi>


52150 12-Oct-1999 marcel

Now that userland, including modules don't use the osig* syscalls
and the kernel itself doesn't use any SYS_osig* constants, change
the syscalls to be of type COMPAT.


52140 11-Oct-1999 luoqi

Add a per-signal flag to mark handlers registered with osigaction, so we
can provide the correct context to each signal handler.

Fix broken sigsuspend(): don't use p_oldsigmask as a flag, use SAS_OLDMASK
as we did before the linuxthreads support merge (submitted by bde).

Move ps_sigstk from to p_sigacts to the main proc structure since signal
stack should not be shared among threads.

Move SAS_OLDMASK and SAS_ALTSTACK flags from sigacts::ps_flags to proc::p_flag.
Move PS_NOCLDSTOP and PS_NOCLDWAIT flags from proc::p_flag to procsig::ps_flag.

Reviewed by: marcel, jdp, bde


52121 11-Oct-1999 peter

Zap unneeded #includes

Submitted by: phk


51984 07-Oct-1999 marcel

Simplification of the signal trampoline and other cleanups.

o Remove unused defines from genassym.c that were needed
by the trampoline.
o Add load_gs_param function to support.s that catches
a fault when %gs is loaded with an invalid descriptor.
The function returns EFAULT in that case.
o Remove struct trapframe from mcontext_t and replace it
with the list of registers.
o Modify sendsig and sigreturn accordingly.

This commit contains a patch by bde.

Reviewed by: luoqi, jdp


51942 04-Oct-1999 marcel

Re-introduction of sigcontext.

struct sigcontext and ucontext_t/mcontext_t are defined in such
a way that both (ie struct sigcontext and ucontext_t) can be
passed on to sigreturn. The signal handler is still given a
ucontext_t for maximum flexibility.

For backward compatibility sigreturn restores the state for the
alternate signal stack from sigcontext.sc_onstack and not from
ucontext_t.uc_stack. A good way to determine which value the
application has set and thus which value to use, is still open
for discussion.

NOTE: This change should only affect those binaries that use
sigcontext and/or ucontext_t. In the source tree itself
this is only doscmd. Recompilation is required for those
applications.

This commit also fixes a lot of style bugs without hopefully
adding new ones.

NOTE: struct sigaltstack.ss_size now has type size_t again. For
some reason I changed that into unsigned int.

Parts submitted by: bde
sigaltstack bug found by: bde


51908 03-Oct-1999 marcel

Reinstate the 4th argument to old signal handlers. Don't set it
when the handler uses siginfo_t.


51834 01-Oct-1999 marcel

Implement the use of si_addr in siginfo_t.

Suggested by: jdp


51833 01-Oct-1999 marcel

Don't check %cs *after* it has being set in sigreturn. If the check
fails, applications could end up running in kernel mode (oops).

Submitted by: bde


51792 29-Sep-1999 marcel

sigset_t change (part 3 of 5)
-----------------------------

By introducing a new sigframe so that the signal handler operates
on the new siginfo_t and on ucontext_t instead of sigcontext, we
now need two version of sendsig and sigreturn.

A flag in struct proc determines whether the process expects an
old sigframe or a new sigframe. The signal trampoline handles
which sigreturn to call. It does this by testing for a magic
cookie in the frame.

The alpha uses osigreturn to implement longjmp. This means that
osigreturn is not only used for compatibility with existing
binaries. To handle the new sigset_t, setjmp saves it in
sc_reserved (see NOTE).

the struct sigframe has been moved from frame.h to sigframe.h
to handle the complex header dependencies that was caused by
the new sigframe.

NOTE: For the i386, the size of jmp_buf has been increased to hold
the new sigset_t. On the alpha this has been prevented by
using sc_reserved in sigcontext.


51680 26-Sep-1999 eivind

Move the declaration of panic() from sys/systm.h to sys/param.h.
Rationale: Wider access, so we can add assertions to header files.
panicstr is still in sys/systm.h

Suggested by: phk
Discussed with: peter


51659 25-Sep-1999 mjacob

Fix from Tor so that if we enter the debugger in the tristate going to
SMP (other CPUs stopped but SMP mode not really started).

Obtained from: Tor.Egge@fast.no


51658 25-Sep-1999 phk

Remove five now unused fields from struct cdevsw. They should never
have been there in the first place. A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags


51596 23-Sep-1999 jhay

Make the frequency tuneable via a sysctl.

Reviewed by: phk


51561 22-Sep-1999 luoqi

Display CPU (BSP) clock speed on SMP systems.


51530 22-Sep-1999 wpaul

Spruce up the ADMtek driver: conver to newbus, miibus and add support
for the AN985 "Centaur" chip, which is apparently the next genetation
of the "Comet." The AN985 is also a tulip clone and is similar to the
AL981 except that it uses a 99C66 EEPROM and a serial MII interface
(instead of direct access to the PHY registers).

Also updated various documentation to mention the AN985 and created
a loadable module.

I don't think there are any cards that use this chip on the market yet:
the datasheet I got from ADMtek has boxes with big X's in them where the
diagrams should be, and the sample boards I got have chips without any
artwork on them.


51474 20-Sep-1999 dillon

Fix bug in pipe code relating to writes of mmap'd but illegal address
spaces which cross a segment boundry in the page table. pmap_kextract()
is not designed for access to the user space portion of the page
table and cannot handle the null-page-directory-entry case.

The fix is to have vm_fault_quick() return a success or failure which
is then used to avoid calling pmap_kextract().


51451 20-Sep-1999 phk

On PIIX4 based SMP systems use the PMTMR register for timecounting.

It is about 2.5 microseconds or roughly 3 times faster to use this
"PIIX" timecounter than the "i8254" timecounter. Resolution is
also 3 times better.

The code cheats and don't register the PCI device, because other pieces
of code want to use it too.

Originally spotted by: msmith


51211 12-Sep-1999 green

Correction: mem.c devices are "D_MEM" (and D_MEM is added.)

Taken issue with by: phk


51207 12-Sep-1999 green

Mainly stylistic fixes:
1. return( -> return (
2. inappropriate ENODEV -> ENOTTY
3. some unreachable cases removed


51206 12-Sep-1999 green

Make the d_flags of mem devices D_DISK to signify that they are disk-like
random-seekable devices. This lets dd(1) know it can seek on them. It
also affects spec_vnopen() (IIRC), but only makes the path of execution smaller,
and does not change its behavior. This is when securelevel >= 2.


51193 12-Sep-1999 msmith

Some PnP BIOSsen return garbage in the high byte of the number-of-devices
field (or don't set the high byte at all). Clear it to avoid reporting
a silly number of devices.

Reported by: phk


51183 11-Sep-1999 peter

Make pmap_mapdev() deal with non-page-aligned requests.
Add a corresponding pmap_unmapdev() to release the KVM back to kernel_map.


51130 10-Sep-1999 phk

System clock don't update, because C6's TSC stop count up when run
HALT instruction.

PR: 13683
Submitted by: IMAI Takeshi <take-i@ceres.dti.ne.jp>
Reviewed by: phk


51126 10-Sep-1999 peter

Add text for the PN (Processor serial number) and XMM (extended SIMD/MMX2/
support), as well as a bunch of comments for what the various bits mean
(those that I remember anyway).


51121 10-Sep-1999 msmith

Look for the right ACPI signature.

Submitted by: dfr


51114 10-Sep-1999 msmith

Invoke smp_rendezvous_action() using the a.out compatible asnames.h
technique (bleagh).


51065 07-Sep-1999 luoqi

Save %gs in sigcontext when delivering a signal and restore them upon
return (in signal trampoline code). I plan to do the same on -stable,
so that we have a consistent interface to userland applications.

Reviewed by: bde


50992 06-Sep-1999 imp

Add pccard child to nexus. A better version would take care of this
with an identify method, but that has not been implemented.

Forgotten by: imp


50986 06-Sep-1999 wpaul

This commit adds driver support for PCI fast ethernet NICs based on
the Davicom DM9100 and DM9102 chipsets, including the Jaton Corporation
XPressNet. Datasheet is available from www.davicom8.com.

The DM910x chips are still more tulip clones. The API is reproduced
pretty faithfully, unfortunately the performance is pretty bad. The
transmitter seems to have a lot of problems DMAing multi-fragment
packets. The only way to make it work reliably is to coalesce transmitted
packets into a single contiguous buffer. The Linux driver (written by
Davicom) actually does something similar to this. I can't recomment this
NIC as anything more than a "connectivity solution."

This driver uses newbus and miibus and is supported on both i386
and alpha platforms.


50974 05-Sep-1999 wpaul

This commit adds driver support for the Silicon Integrated Systems
SiS 900 and SiS 7016 PCI fast ethernet chipsets. Full manuals for the
SiS chips can be found at www.sis.com.tw.

This is a fairly simple chipset. The receiver uses a 128-bit multicast
hash table and single perfect entry for the station address. Transmit and
receive DMA and FIFO thresholds are easily tuneable. Documentation is
pretty decent and performance is not bad, even on my crufty 486. This
driver uses newbus and miibus and is supported on both the i386 and
alpha architectures.


50972 05-Sep-1999 peter

Set up FPU state on the AP.

Tested by: phk


50962 05-Sep-1999 green

M_WAITOK->M_NOWAIT


50823 03-Sep-1999 mdodd

This adds the i386 specific support for systems with a MicroChannel
Architecture bus.

Reviewed by: msmith


50816 02-Sep-1999 luoqi

Some reorganization of sysarch() interface:
1. Move definitions of struct i386_*_args to the header file sysarch.h,
since they are part of the sysarch API. struct i386_get_ldt_args and
i386_set_ldt_args were identical, therefore make them into one
struct i386_ldt_args. Libc should use these definitions as well.
2. Return a more sensible EOPNOTSUPP for unknown operations.

Reviewed by: marcel


50788 02-Sep-1999 peter

Update for new pnp includes


50769 01-Sep-1999 dfr

This represents essentially a complete rewrite of the ISA PnP code. The
new system is integrated with the ISA bus code more cleanly and allows
the future addition of more enumerators such as PnPBIOS and ACPI.

This commit also enables the new pcm driver since it is somewhat tied to
the new PnP code.


50756 01-Sep-1999 nsayer

quoted string change: the si driver also covers the Specialix "SX"
product.


50677 31-Aug-1999 msmith

Make the error return from mem_range_attr_get actually do something useful
(return an error to the caller)


50674 30-Aug-1999 msmith

Check that there is memory range support before attempting to perform such
an operation, as a kernel client may not have previously checked the CPU
type (it may not be able to).

Also correct the function declaration style for the mem_range functions to
match the rest of this file (oops).

Submitted by: gibbs


50511 28-Aug-1999 phk

We don't need to pass the diskname argument all over the diskslice/label
code, we can find the name from any convenient dev_t


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50463 27-Aug-1999 jlemon

Reference the correct gdt[] entry on SMP. Remove the `generation' flag,
and always reload the selectors for every bios call.


50337 25-Aug-1999 msmith

Rename 'bios_jmp' to 'bios16_jmp' to make it clear what it's related to.


50336 25-Aug-1999 peter

Use the far jump for the base of the page arithmatic rather than the
calling function, otherwise Bad Things Happen(tm) when bios16_call is
not in the same page as bios_jmp.

Reviewed by: msmith


50313 24-Aug-1999 msmith

Work around a bad design in some PnP BIOS code whereby the BIOS can reach
off the top of our constructed stack segment while it's trying to copy a
maximally-sized PnP argument frame around.


50303 24-Aug-1999 alc

Cosmetic: Correct the Id string.

Submitted by: Peter Jeremy <jeremyp@gsmx07.alcatel.com.au>


50271 24-Aug-1999 bde

Fixed a misplaced cast to uintptr_t. Cosmetic.

Use device_get_nameunit() instead of rolling our own.


50268 23-Aug-1999 bde

`bootdev' is an ordinary u_long, so don't cast it to a pointer to print it.
gcc warns about the cast on i386's with 64-bit longs.

Print `bootdev' in all cases when we bail out because it is unreasonable.


50262 23-Aug-1999 alc

Implement a version of s_lock_try that doesn't cause the next s_lock
call to panic when SL_DEBUG is set. (SL_DEBUG is currently set
by default.)


50257 23-Aug-1999 phk

Now that we can bind cdevsw to the individual dev_t, divorce the PERFMON
stuff from mem.c. If PERFMON is there, it will "steal" a minor from
mem.c, but mem.c doesn't need to know about this.

Fixed type of cmd argument in perfmon_ioctl().


50254 23-Aug-1999 phk

Convert DEVFS hooks in (most) drivers to make_dev().

Diskslice/label code not yet handled.

Vinum, i4b, alpha, pc98 not dealt with (left to respective Maintainers)

Add the correct hook for devfs to kern_conf.c

The net result of this excercise is that a lot less files depends on DEVFS,
and devtoname() gets more sensible output in many cases.

A few drivers had minor additional cleanups performed relating to cdevsw
registration.

A few drivers don't register a cdevsw{} anymore, but only use make_dev().


50252 23-Aug-1999 peter

The nexus_attach() code works a lot better if it's actually connected to
the device methods... Also, don't fail to add eisa/isa because a previous
device failed to attach.


50251 23-Aug-1999 alc

Modify the macros IMASK_UNLOCK, CPL_UNLOCK, and REL_FAST_INTR_LOCK
to perform the s_unlock inline.


50197 22-Aug-1999 peter

The previous fix didn't do anything if you didn't have pnp. The ICU
macros are only called in the !APIC_IO case, include icu.h there.


50196 22-Aug-1999 green

Finish unbreaking autoconf.c includes (for non-SMP.)


50185 22-Aug-1999 peter

Oops, that wasn't so clever after all. struct isa_device is still a
prerequisite for this old pnp.h.


50184 22-Aug-1999 peter

Zap a heap of unused cruft now. We don't need the ISA/EISA/PCI hooks
here any more as they are self identifying. Only PNP remains but that
will be replaced any day now.
Also reword a comment that had been XXX'ed to death to make it clear[er]
why we don't enable interrupts before probing.
PCIBIOS interrupt routing controls may make this possible to fix one day.


50183 22-Aug-1999 peter

Take advantage of the apm/npx code and let them identify themselves rather
than having explicit hooks here.
Treat the eisa/isa attach a little differently so that we defer the
decision about to attach eisa/isa to the motherboard directly only if
the PCI probe (if it exists) fails to turn up a PCI->EISA/ISA bridge.
This restores the original device geometry where ISA and/or EISA attach
to their bridge rather than bypassing and going to the root.


50128 21-Aug-1999 wpaul

This commit adds device driver support for the Sundance Technologies ST201
PCI fast ethernet controller. Currently, the only card I know that uses
this chip is the D-Link DFE-550TX. (Don't ask me where to buy these: the
only cards I have are samples sent to me by D-Link.)

This driver is the first to make use of the miibus code once I'm sure
it all works together nicely, I'll start converting the other drivers.

The Sundance chip is a clone of the 3Com 3c90x Etherlink XL design
only with its own register layout. Support is provided for ifmedia,
hardware multicast filtering, bridging and promiscuous mode.


50094 20-Aug-1999 msmith

Loosen up the constructed argument segment generation slightly; rather than
trying to size it intelligently just make it 64k and leave it up to the caller
to ensure that the arguments all fit within that range.

This should resolve the issue that some people were seeing with the PnP BIOS
scan crashing on a large PnP node.


50081 20-Aug-1999 kato

There may exist two kinds of IBM BlueLightning CPU. One is that 5/2
test does not change undefined flag like Cyrix CPUs. Another is that
5/2 test changes undefined flag like Intel CPUs. Latter one could not
be detected and was recognized 486DX CPU. To solve this,
finishidentcpu() calls identblue() when cpu_vendor is null string
(that is, CPUID instruction is not supported) and cpu == CPU_486.
Tests have been done on IBM BlueLightning CPUs, i486SX and i486DX.


50037 19-Aug-1999 peter

Update for MI switch code, and trim a heap of unused (I believe) entries.


50036 19-Aug-1999 peter

Use the MI process selection. We use a quick routine to decide whether
to get the mplock and enter the kernel to run a process in the SMP case.


49999 18-Aug-1999 alc

Create callable (non-inline) versions of the atomic_OP_TYPE functions
that are linked into the kernel. The KLD compilation options are
changed to call these functions, rather than in-lining the
atomic operations.

This approach makes atomic operations from KLDs significantly
faster on UP systems (though somewhat slower on SMP systems).

PR: i386/13111
Submitted by: peter.jeremy@alcatel.com.au


49996 18-Aug-1999 msmith

Remove the SMBIOS detection and definitions; this should be handled in a
loadable module (under development).


49953 17-Aug-1999 msmith

Search for and interrogate the PnP BIOS if found. This code just prints
the PnP device IDs in verbose mode; it does not (yet) save any resource
data or contribute to the PnP process nor resource management.


49952 17-Aug-1999 msmith

Mindbogglingly, many BIOS vendors expect to be able to load %ds with
0x40 and then access data stored in real-mode segment 0x40, even when
called in protected mode. Microsoft unfortunately coddle these individuals,
and so must we if we want to run their code.

This change works around GPFs in some APM and PnP BIOS implementations.

Obtained from: Linux


49859 16-Aug-1999 gibbs

Fix a bug in busdma_mem_free() where we were improperly checking
the map associated with the region to free.


49679 13-Aug-1999 phk

The bdevsw() and cdevsw() are now identical, so kill the former.


49635 11-Aug-1999 alc

_pmap_allocpte:
If the pte page isn't PQ_NONE, panic rather than silently
covering up the problem.


49591 10-Aug-1999 alc

pmap_remove_pages:
Add KASSERT to detect out of range access to the pv_table and
report the errant pte before it's overwritten.


49569 09-Aug-1999 mpp

Remove a reference to config(8) when IRQ 2 is remapped to IRQ 9.
Config(8) contains no documentation about this.

Fix the help for the PnP irq and drq commands. This one caused
me a bit of head scratching the other night while trying to get
a problematic PnP device configured properly.


49558 09-Aug-1999 phk

Merge the cons.c and cons.h to the best of my ability. alpha may or
may not compile, I can't test it.


49474 06-Aug-1999 phk

Forgot the "bsd" slice, now setrootbyname() understands "wd0s1a".


49421 04-Aug-1999 msmith

Fix typo which would have caused MTRR support on non-SMP systems to
behave in an utterly random fashion.

Submitted by: gibbs


49337 31-Jul-1999 alc

pmap_object_init_pt:
Verify that object != NULL.


49326 31-Jul-1999 alc

Change the type of vpgqueues::lcnt from "int *" to "int". The indirection
served no purpose.


49304 31-Jul-1999 alc

Add parentheses for clarity.

Submitted by: dillon


49223 29-Jul-1999 msmith

Formatting-only cleanup accidentally omitted from the patch merge in the
previous major update. Bring new code into style alignment with the
existing code. No functional changes.


49207 29-Jul-1999 peter

GBIOSSTACK_SEL is undefined, but OTOH, BSSSEL apparently isn't used either.


49204 29-Jul-1999 msmith

Remove some duplicate definitions, as suggested by Alan Cox.


49203 29-Jul-1999 msmith

Fix for vmspace sharing as per Alan Cox. Thanks!


49197 29-Jul-1999 msmith

Major update to the kernel's BIOS-calling ability.

- Add support for calling 32-bit code in other segments
- Add support for calling 16-bit protected mode code

Update APM to use this facility.

Submitted by: jlemon


49196 29-Jul-1999 green

Remove XXX from the headers (broke the build, I'm betting.)


49195 29-Jul-1999 mdodd

Alter the behavior of sys/kern/subr_bus.c:device_print_child()

- device_print_child() either lets the BUS_PRINT_CHILD
method produce the entire device announcement message or
it prints "foo0: not found\n"

Alter sys/kern/subr_bus.c:bus_generic_print_child() to take on
the previous behavior of device_print_child() (printing the
"foo0: <FooDevice 1.1>" bit of the announce message.)

Provide bus_print_child_header() and bus_print_child_footer()
to actually print the output for bus_generic_print_child().
These functions should be used whenever possible (unless you can
just use bus_generic_print_child())

The BUS_PRINT_CHILD method now returns int instead of void.

Modify everything else that defines or uses a BUS_PRINT_CHILD
method to comply with the above changes.

- Devices are 'on' a bus, not 'at' it.
- If a custom BUS_PRINT_CHILD method does the same thing
as bus_generic_print_child(), use bus_generic_print_child()
- Use device_get_nameunit() instead of both
device_get_name() and device_get_unit()
- All BUS_PRINT_CHILD methods return the number of
characters output.

Reviewed by: dfr, peter


49186 28-Jul-1999 msmith

We're called too early to have any idea whether APM is going to be
active or not. The only sane thing we can do here is assume that if
APM is supported it might be active at some point, and bail.

In reality, even this isn't good enough; regardless of whether we support
APM or not, the system may well futz with the CPU's clock speed and throw
the TSC off. We need to stop using it for timekeeping except under
controlled circumstances. Curse the lack of a dependable high-resolution
timer.


49178 28-Jul-1999 msmith

Remove some droppings left over from the removal of the APM hooks.


49081 25-Jul-1999 cracauer

On FPU exceptions, pass a useful error code (one of the FPE_...
macros) to the signal handler, for old-style BSD signal handlers as
the second (int) argument, for SA_SIGINFO signal handlers as
siginfo_t->si_code. This is source-compatible with Solaris, except
that we have no <siginfo.h> (which isn't even mentioned in POSIX
1003.1b).

An rather complete example program is at
http://www3.cons.org/cracauer/freebsd-signal.c
This will be added to the regression tests in src/.

This commit also adds code to disable the (hardware) FPU from
userconfig, so that you can use a software FP emulator on a machine
that has hardware floating point. See LINT.


49076 25-Jul-1999 wpaul

This commit adds device driver support for Adaptec Duralink PCI fast
ethernet controllers based on the AIC-6915 "Starfire" controller chip.
There are single port, dual port and quad port cards, plus one 100baseFX
card. All are 64-bit PCI devices, except one single port model.

The Starfire would be a very nice chip were it not for the fact that
receive buffers have to be longword aligned. This requires buffer
copying in order to achieve proper payload alignment on the alpha.
Payload alignment is enforced on both the alpha and x86 platforms.
The Starfire has several different DMA descriptor formats and transfer
mechanisms. This driver uses frame descriptors for transmission which
can address up to 14 packet fragments, and a single fragment descriptor
for receive. It also uses the producer/consumer model and completion
queues for both transmit and receive. The transmit ring has 128
descriptors and the receive ring has 256.

This driver supports both FreeBSD/i386 and FreeBSD/alpha, and uses newbus
so that it can be compiled as a loadable kernel module. Support for BPF
and hardware multicast filtering is included.


49049 24-Jul-1999 yokota

- Correctly initialize cn_dev_t and cn_udev_t.
- Add D_TTY for alpha.

Reviewed by: bde, dfr


48974 22-Jul-1999 alc

Reduce the number of "magic constants" used for page coloring
by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same
thing at different places in the kernel. Drop PQ_PRIME3.


48963 21-Jul-1999 alc

Fix the following problem:

When creating new processes (or performing exec), the new page
directory is initialized too early. The kernel might grow before
p_vmspace is initialized for the new process. Since pmap_growkernel
doesn't yet know about the new page directory, it isn't updated, and
subsequent use causes a failure.

The fix is (1) to clear p_vmspace early, to stop pmap_growkernel
from stomping on memory, and (2) to defer part of the initialization
of new page directories until p_vmspace is initialized.

PR: kern/12378
Submitted by: tegge
Reviewed by: dfr


48937 20-Jul-1999 green

I missed a not. Also, remove invltlb(), since it's "unncessary [sic] because
wbinvd already flushes the the TLB."


48925 20-Jul-1999 msmith

Update of the i686 MTRR/memory range support.

- Support for setting memory range attributes on SMP systems using the
new SMP rendezvous function
- Don't print the confusing default memory type message.
- Allow legal overlapping range types.
- Turn interrupts back on after setting MTRRs in UP mode (whoops)
- Don't waste time calling invltlb() after wbinvd(); it's not
SMP-compatible (interrupts are off) and unncessary because
wbinvd already flushes the TLB.

This code is now essentially feature-complete.


48924 20-Jul-1999 msmith

Implement an all-CPU shootdown-style rendezvous facility. This allows
the caller to specify a function to be guarded between an entry and exit
barrier, as well as pre- and post-barrier functions.

The primary use for this function is synchronised update of per-cpu private
data. The implementation is almost (but not quite) MI; with a better
mechanism for masking per-CPU interrupts it could probably be hoisted.

Reviewed by: peter (partially)


48918 19-Jul-1999 peter

Fix a page size vs. KB mixup. The extra buffers allocated at a reduced
rate is meant to kick in at 64MB, not 256MB.

Reviewed by: Matthew Dillon <dillon@backplane.com>


48889 18-Jul-1999 bde

Updated acquire_timer2()'s state machine to work when the i8254 is
being used for timecounting. Fixed a race or two in it. Undisabled
it.

PR: 10455


48888 18-Jul-1999 bde

Don't let the machdep.tsc_freq sysctl proceed if the TSC is present
but broken, since tsc_timecounter is not initialised in that case,
and updating an uninitialised timecounter is fatal.

Fixed style bugs in the machdep.i8254_freq and machdep.tsc_freq
sysctls.

Reviewed by: phk


48868 17-Jul-1999 phk

Centralize dumpdev handling.


48832 16-Jul-1999 msmith

Add support for multiple PCI busses directly connected to the nexus.
This is only partially complete, but allows 450NX-based systems with
more than one PCI bus to be used again.

Submitted by: dfr


48729 10-Jul-1999 bde

Go back to the old (icu.s rev.1.7 1993) way of keeping the AST-pending
bit separate from ipending, since this is simpler and/or necessary for
SMP and may even be better for UP.

Reviewed by: alc, luoqi, tegge


48693 09-Jul-1999 wpaul

This commit adds driver support for the SysKonnect SK-984x series
gigabit ethernet adapters. This includes two single port cards
(single mode and multimode fiber) and two dual port cards (also single
mode and multimode fiber). SysKonnect is currently the only
vendor with a dual port gigabit ethernet NIC.

The ports on dual port adapters are treated as separate network
interfaces. Thus, if you have an SK-9844 dual port SX card, you
should have both sk0 and sk1 interfaces attached. Dual port cards
are implemented using two XMAC II chips connected to a single
SysKonnect GEnesis controller. Hence, dual port cards are really
one PCI device, as opposed to two separate PCI devices connected
through a PCI to PCI bridge. Note that SysKonnect's drivers use
the two ports for failover purposes rather that as two separate
interfaces, plus they don't support jumbo frames. This applies to
their Linux driver too. :)

Support is provided for hardware multicast filtering, BPF and
jumbo frames. The SysKonnect cards support TCP checksum offload
however this feature is not currently enabled (hopefully it will
be once we get checksum offload support).

There are still a few things that need to be implemeted, like
the ability to communicate with the on-board LM80 voltage/temperature
monitor, but I wanted to get the driver under CVS control and into
-current so people could bang on it.

A big thanks for SysKonnect for making all their programming info
for these cards (and for their FDDI and token ring cards) available
without NDA (see www.syskonnect.com).


48691 09-Jul-1999 jlemon

Implement support for hardware debug registers on the i386.

Submitted by: Brian Dean <brdean@unx.sas.com>


48677 08-Jul-1999 mckusick

These changes appear to give us benefits with both small (32MB) and
large (1G) memory machine configurations. I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.

* buffer cache hash table now dynamically allocated. This will
have no effect on memory consumption for smaller systems and
will help scale the buffer cache for larger systems.

* minor enhancement to pmap_clearbit(). I noticed that
all the calls to it used constant arguments. Making
it an inline allows the constants to propogate to
deeper inlines and should produce better code.

* removal of inherent vfs_ioopt support through the emplacement
of appropriate #ifdef's, with John's permission. If we do not
find a use for it by the end of the year we will remove it entirely.

* removal of getnewbufloops* counters & sysctl's - no longer
necessary for debugging, getnewbuf() is now optimal.

* buffer hash table functions removed from sys/buf.h and localized
to vfs_bio.c

* VFS_BIO_NEED_DIRTYFLUSH flag and support code added
( bwillwrite() ), allowing processes to block when too many dirty
buffers are present in the system.

* removal of a softdep test in bdwrite() that is no longer necessary
now that bdwrite() no longer attempts to flush dirty buffers.

* slight optimization added to bqrelse() - there is no reason
to test for available buffer space on B_DELWRI buffers.

* addition of reverse-scanning code to vfs_bio_awrite().
vfs_bio_awrite() will attempt to locate clusterable areas
in both the forward and reverse direction relative to the
offset of the buffer passed to it. This will probably not
make much of a difference now, but I believe we will start
to rely on it heavily in the future if we decide to shift
some of the burden of the clustering closer to the actual
I/O initiation.

* Removal of the newbufcnt and lastnewbuf counters that Kirk
added. They do not fix any race conditions that haven't already
been fixed by the gbincore() test done after the only call
to getnewbuf(). getnewbuf() is a static, so there is no chance
of it being misused by other modules. ( Unless Kirk can think
of a specific thing that this code fixes. I went through it
very carefully and didn't see anything ).

* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not
think this check is necessary, the buffer should flush properly
whether the vnode is locked or not. ( yes? ).

* removal of extra arguments passed to getnewbuf() that are not
necessary.

* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
vfs_cluster.c

* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
which should greatly aid flushing operations in heavy load
situations - both the pageout and update daemons will be able
to operate more efficiently.

* removal of b_usecount. We may add it back in later but for now
it is useless. Prior implementations of the buffer cache never
had enough buffers for it to be useful, and current implementations
which make more buffers available might not benefit relative to
the amount of sophistication required to implement a b_usecount.
Straight LRU should work just as well, especially when most things
are VMIO backed. I expect that (even though John will not like
this assumption) directories will become VMIO backed some point soon.

Submitted by: Matthew Dillon <dillon@backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


48636 06-Jul-1999 peter

Quieten gcc paranoia.


48632 06-Jul-1999 peter

Typo: s/0ff0/0xff0/


48621 06-Jul-1999 cracauer

Implement SA_SIGINFO for i386. Thanks to Bruce Evans for much more
than a review, this was a nice puzzle.

This is supposed to be binary and source compatible with older
applications that access the old FreeBSD-style three arguments to a
signal handler.

Except those applications that access hidden signal handler arguments
bejond the documented third one. If you have applications that do,
please let me know so that we take the opportunity to provide the
functionality they need in a documented manner.

Also except application that use 'struct sigframe' directly. You need
to recompile gdb and doscmd. `make world` is recommended.

Example program that demonstrates how SA_SIGINFO and old-style FreeBSD
handlers (with their three args) may be used in the same process is at
http://www3.cons.org/tmp/fbsd-siginfo.c

Programs that use the old FreeBSD-style three arguments are easy to
change to SA_SIGINFO (although they don't need to, since the old style
will still work):

Old args to signal handler:
void handler_sn(int sig, int code, struct sigcontext *scp)

New args:
void handler_si(int sig, siginfo_t *si, void *third)
where:
old:code == new:second->si_code
old:scp == &(new:si->si_scp) /* Passed by value! */

The latter is also pointed to by new:third, but accessing via
si->si_scp is preferred because it is type-save.

FreeBSD implementation notes:
- This is just the framework to make the interface POSIX compatible.
For now, no additional functionality is provided. This is supposed
to happen now, starting with floating point values.
- We don't use 'sigcontext_t.si_value' for now (POSIX meant it for
realtime-related values).
- Documentation will be updated when new functionality is added and
the exact arguments passed are determined. The comments in
sys/signal.h are meant to be useful.

Reviewed by: BDE


48618 06-Jul-1999 green

Add Centaur/IDT WinChip support.

Why in the world do people put breaks at the end of a switch's default case?


48615 06-Jul-1999 green

I made some cleanups, rearranged things a bit, and made AMD Features default
printing on CPUs that have it.
If there are no objections, I'll MFC all recent changes (harmless, really)
to 3.2 and PAO.


48579 05-Jul-1999 msmith

Move the initialisation/tuning of nmbclusters from param.c/machdep.c
into uipc_mbuf.c. This reduces three sets of identical tunable code to
one set, and puts the initialisation with the mbuf code proper.

Make NMBUFs tunable as well.

Move the nmbclusters sysctl here as well.

Move the initialisation of maxsockets from param.c to uipc_socket2.c,
next to its corresponding sysctl.

Use the new tunable macros for the kern.vm.kmem.size tunable (this should have
been in a separate commit, whoops).


48572 05-Jul-1999 green

Add an extra space to " AMD Features=" to make it line up well.


48571 05-Jul-1999 green

K6-III CPUs are now case:d in the appropriate switch; also, in
print_AMD_info(), L2 internal cache is shown, as are AMD's special CPUID
infos:

CPU: AMD-K6(tm) 3D processor (350.81-MHz 586-class CPU)
Origin = "AuthenticAMD" Id = 0x58c Stepping=12
Features=0x8021bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,PGE,MMX>
AMD Features=0x808029bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SYSCALL,PGE,MMX,3DNow!>

PR: kern/12512
Submitted by: Louis A. Mamakos <louie@TransSys.COM>


48546 04-Jul-1999 jlemon

Some cleanup and rearrangement. hw.physmem is now an absolute quantity;
we will never use more memory than this value (if specified), but will always
check memory for validity up to this amount.

Get rid of the speculative_mprobe option; the memory amount can now be
specified by hw.physmem.


48544 04-Jul-1999 mckusick

The buffer queue mechanism has been reformulated. Instead of having
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean
and dirty buffers have been separated. Empty buffers with KVM
assignments have been separated from truely empty buffers. getnewbuf()
has been rewritten and now operates in a 100% optimal fashion. That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).

Buffer flushing has been reorganized. Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur. This resulted in processes blocking on conditions
unrelated to what they were doing. This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits. We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.

The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat. The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.

reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed. This
algorithm has been changed. reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list. The new algorithm is deterministic but
not perfect. The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.

The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit. This bit allows processes working with filesystem
buffers to use available emergency reserves. Normal processes do not set
this bit and are not allowed to dig into emergency reserves. The purpose
of this bit is to avoid low-memory deadlocks.

A small race condition was fixed in getpbuf() in vm/vm_pager.c.

Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


48531 03-Jul-1999 peter

printf int/dev_t (pointer) warning


48521 03-Jul-1999 peter

Fix warnings in last commit (dev_t is not an int, and not even int
compatable in arg lists on the Alpha)


48512 03-Jul-1999 phk

Be more informative and try to ask the user in some instances if we can't
figure out the root device.


48505 03-Jul-1999 alc

An SMP-specific change: Add the lock prefix to RMW operations
on ipending.


48476 02-Jul-1999 msmith

Lightly overhaul the memory sizing code again.

- The kernel environment variable 'hw.physmem' can be used to set the
amount of physical memory space, based at 0, that FreeBSD will use.
Any memory detected over this limit is ignored. Documentation for
this is available under 'help set tunables' in the loader.

- In the case where system memory size can't be accurately determined,
hw.physmem is used as a best-guess memory size, but speculative
probing will be used to determine actual memory size if any of the
guesses or hints are 16M or more.

- If RB_VERBOSE, we list the memory regions as we test them.

- The compile-time option MAXMEM supplies a default value for
'hw.physmem'.


48449 02-Jul-1999 mjacob

Correct some ugly formatting. Remember to initialize the alignment tag.
Honor and pass a callers request to contigalloc if they had a non-zero
alignment constraint.


48445 02-Jul-1999 peter

Zap totally the npx0 memory size override. It only worked if statically
specified in the kernel config file - but setting options MAXMEM works
exactly the same. Userconfig overrides of this have not worked for
ages.

Also, change the getenv for the loader override to hw.physmem based on a
prior suggestion from Mike Smith. I think he still wants to change this
some, but this shouldn't get in his way. This is a forced setting of
the memory size, not a "cap". We probably should have a plain 'maxmem'
variable as well which does do a cap, without loosing the bios memory
configuration data.


48405 01-Jul-1999 peter

Look up the kernel environment for MAXMEM as a final override for the
memory size. If somebody wants to change the name, fine - I used this
since it's consistant with the config variable it replaces.
This is intended to replace the npx0 msize hack (which no longer works).


48404 01-Jul-1999 peter

Move kern_envp and preload initialization a little earlier so that we
can do a getenv_int() inside the memory sizing routines to override the
memory limit.


48391 01-Jul-1999 peter

Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface. kproc_start is still available
as a SYSINIT() hook. This allowed simplification of chunks of the
sysinit code in the process. This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process. It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit. This means that nfsiod
doesn't need to be in /sbin and is always "available". This is a fair bit
easier to do outside of the SYSINIT_KT() framework.


48343 29-Jun-1999 yokota

idev->id_irq is an interrupt MASK, not an interrupt number.
Thus, we need to convert the mask to the number (by ffs()) when
writing back this value to the resource in save_resource().


48327 28-Jun-1999 luoqi

Save common_tssd before it's loaded and the busy bit set.

Submitted by: bde


48316 28-Jun-1999 peter

Fix page fault in visual userconfig's save code. (I only use normal
userconfig, my original tweaks to visual mode were not well tested)

Submitted by: Peter Holm <peter@holm.cc>


48308 28-Jun-1999 peter

Use the same -UKERNEL strategy as the alpha to avoid the inlines etc.


48288 27-Jun-1999 alc

An SMP-specific change: Remove an unnecessary lock acquire and release
from every system call. (Storing a 32-bit constant is inherently
atomic.)

Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>


48266 27-Jun-1999 peter

Shut up gcc.


48242 26-Jun-1999 peter

Quieten some warnings as a result of changes in ls_items[] constness over
time.


48203 24-Jun-1999 jlemon

Fix warning message; that was 4GB, not 2GB. I apparently can't do
arithmetic today.


48202 24-Jun-1999 jlemon

Explicitly ignore any memory > 2GB, we don't support it yet.


48200 24-Jun-1999 jlemon

Only include AMD wt_alloc routines if I586_CPU is defined. Fixes
CPU_WT_ALLOC for cyrix chips.

Submitted by: "Brian Smith" <dbsoft@technologist.com>


48160 24-Jun-1999 green

This commit gives support for the Rise mP6 CPU. It has two changes:
1. Rise is recognized in identdcpu.c.
2. The TSC is not written to. A workaround for the CPU bug is being
applied to clock.c (the bug being that the mP6 has TSC enabled
in its CPUID-capabilities, but it only supports reading it. If we
try to write to it (MSR 16), a GPF occurs.) The new behavior is that
FreeBSD will _not_ zero the TSC. Instead, we do a bit of 64-bit
arithmetic.

Reviewed by: msmith
Obtained from: unfurl & msmith


48145 23-Jun-1999 msmith

Changes in the way that the APs are started appears to have removed the
problem with having more CPUs than NCPU.

PR: kern/4255
Submitted by: peter


48144 23-Jun-1999 luoqi

Do not setup 4M pdir until all APs are up.


48119 22-Jun-1999 msmith

Remove an unnecessary panic when sparse PCI bus numbering is encountered.
This is found eg. on some Compaq Proliant systems.

Submitted by: peter


48104 22-Jun-1999 yokota

The second phase of syscons reorganization.

- Split syscons source code into manageable chunks and reorganize
some of complicated functions.

- Many static variables are moved to the softc structure.

- Added a new key function, PREV. When this key is pressed, the vty
immediately before the current vty will become foreground. Analogue
to PREV, which is usually assigned to the PrntScrn key.
PR: kern/10113
Submitted by: Christian Weisgerber <naddy@mips.rhein-neckar.de>

- Modified the kernel console input function sccngetc() so that it
handles function keys properly.

- Reorganized the screen update routine.

- VT switching code is reorganized. It now should be slightly more
robust than before.

- Added the DEVICE_RESUME function so that syscons no longer hooks the
APM resume event directly.

- New kernel configuration options: SC_NO_CUTPASTE, SC_NO_FONT_LOADING,
SC_NO_HISTORY and SC_NO_SYSMOUSE.
Various parts of syscons can be omitted so that the kernel size is
reduced.

SC_PIXEL_MODE
Made the VESA 800x600 mode an option, rather than a standard part of
syscons.

SC_DISABLE_DDBKEY
Disables the `debug' key combination.

SC_ALT_MOUSE_IMAGE
Inverse the character cell at the mouse cursor position in the text
console, rather than drawing an arrow on the screen.
Submitted by: Nick Hibma (n_hibma@FreeBSD.ORG)

SC_DFLT_FONT
makeoptions "SC_DFLT_FONT=_font_name_"
Include the named font as the default font of syscons. 16-line,
14-line and 8-line font data will be compiled in. This option replaces
the existing STD8X16FONT option, which loads 16-line font data only.

- The VGA driver is split into /sys/dev/fb/vga.c and /sys/isa/vga_isa.c.

- The video driver provides a set of ioctl commands to manipulate the
frame buffer.

- New kernel configuration option: VGA_WIDTH90
Enables 90 column modes: 90x25, 90x30, 90x43, 90x50, 90x60. These
modes are mot always supported by the video card.
PR: i386/7510
Submitted by: kbyanc@freedomnet.com and alexv@sui.gda.itesm.mx.

- The header file machine/console.h is reorganized; its contents is now
split into sys/fbio.h, sys/kbio.h (a new file) and sys/consio.h
(another new file). machine/console.h is still maintained for
compatibility reasons.

- Kernel console selection/installation routines are fixed and
slightly rebumped so that it should now be possible to switch between
the interanl kernel console (sc or vt) and a remote kernel console
(sio) again, as it was in 2.x, 3.0 and 3.1.

- Screen savers and splash screen decoders
Because of the header file reorganization described above, screen
savers and splash screen decoders are slightly modified. After this
update, /sys/modules/syscons/saver.h is no longer necessary and is
removed.


48009 18-Jun-1999 green

K6-family MTRR support

This is tested, but I really can't say whether it works entirely. I
don't know exactly what to look for when testing it. So let's say this
is open for testing. Send any results to green@FreeBSD.org

Reviewed by: msmith (long ago)


48008 18-Jun-1999 green

Harmless change to prevent possible problems in the future. I made
sure that i686_mem was only used when
1. CPUID had MTRR set (this was there before)
2. the CPU was GenuineIntel (not there)
3. the CPU is a 686 (also not there)

This should prevent any problems with CPUs that set MTRR but aren't
compatibile with Intel's interface (none that I know of yet.)


48005 18-Jun-1999 bde

Changed the global `idt' from an array to a pointer so that npx.c
automatically hacks on the active copy of the IDT if f00f_hack()
has changed it. This also allows simplifications in setidt().
This fixes breakage of FP exception handling by rev.1.55 of
sys/kernel.h. FP exceptions were sent to npx.c's probe handlers
because npx.c "restored" the old handlers to the wrong copy of the
IDT. The SYSINIT for f00f_hack() was purposely run quite late to
avoid problems like this, but it is bogusly associated with the
SYSINIT for proc0 so it was moved with the latter.

Problem reported and fix tested by: Martin Cracauer <cracauer@cons.org>


47942 16-Jun-1999 tegge

Clean up bitrot in interrupt tracing code.


47926 15-Jun-1999 des

Kill option FAILSAFE.

PR: i386/12187
Approved by: bde


47892 13-Jun-1999 alc

Use pmap_kenter instead of pmap_enter to map the message buffer.


47862 10-Jun-1999 jlemon

Change variable used for calculating ending address of physical memory
from 'int' to 'vm_offset_t'.

Spotted by: Richard Cownie <tich@ma.ikos.com>


47842 08-Jun-1999 dt

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() to allocate
kernel virtual address space for UPAGES.


47763 05-Jun-1999 luoqi

Fix an accounting problem when prefaulting 4M pages.

PR: kern/11948


47688 01-Jun-1999 jlemon

Unbreak memory sizing for SMP.


47680 01-Jun-1999 phk

Introduce the makebdev() function, it does the same as the makedev()
function for now, but that will change.


47679 01-Jun-1999 jlemon

Null commit; note that there is a new memory sizing routine that uses
the BIOS calls to determine the memory configuration. This should fix
problems with >64M for good.

Reviewed by: Mike Smith


47678 01-Jun-1999 jlemon

Unifdef VM86.

Reviewed by: silence on on -current


47642 31-May-1999 dfr

Remove fd driver from its old home and change files which include rtc.h
to account for its new location.


47640 31-May-1999 phk

Simplify cdevsw registration.

The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.


47625 30-May-1999 phk

This commit should be a extensive NO-OP:

Reformat and initialize correctly all "struct cdevsw".

Initialize the d_maj and d_bmaj fields.

The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.


47611 30-May-1999 dfr

Activate/deactivate resources by calling the method, not through the
resource manager automatic handling of RF_ACTIVE.


47592 29-May-1999 phk

Stop the TSC from being used as timecounter on K5/step0 machines.


47588 28-May-1999 bde

Fixed glitches (jumps) of about 1/HZ seconds for the i8254 timecounter.
The old version only worked right when the time was read strictly
more often than every 1/HZ seconds, but we only guarantee reading
it every (1/HZ + epsilon) seconds. Part of rev.1.126-1.127 attempted
to fix this but didn't succeed. Detect counter rollover using the
heuristic from the old version of microtime() with additional
complications for supporting calls from fast interrupt handlers.
This works provided i8254 interrupts are not delayed by more than
1/(2*HZ) seconds.

This needs more comments, and cleanups for the SMP case, and more
testing of the SMP case before it is merged into RELENG_3.

Tested by: jhay


47575 28-May-1999 alc

pmap_object_init_pt:
The size of vm_object::memq is vm_object::resident_page_count,
not vm_object::size.


47444 24-May-1999 jb

- Make setroot() conditional on FFS etc, to avoid a compiler warning
on systems with no FFS.
- Remove all references to mfs from cpu_rootconf(). mfs_init is
called prior to cpu_rootconf(), so it can set mountrootfsname to mfs
and (more imporantly) set rootdev using the (bogus in Bruce's opinion)
special major number of 255.


47350 21-May-1999 wpaul

This commit adds driver support for PCI fast ethernet cards based on the
ADMtek AL981 "Comet" chipset. The AL981 is yet another DEC tulip clone,
except with simpler receive filter options. The AL981 has a built-in
transceiver, power management support, wake on LAN and flow control.
This chip performs extremely well; it's on par with the ASIX chipset
in terms of speed, which is pretty good (it can do 11.5MB/sec with TCP
easily).

I would have committed this driver sooner, except I ran into one problem
with the AL981 that required a workaround. When the chip is transmitting
at full speed, it will sometimes wedge if you queue a series of packets
that wrap from the end of the transmit descriptor list back to the
beginning. I can't explain why this happens, and none of the other tulip
clones behave this way. The workaround this is to just watch for the end
of the transmit ring and make sure that al_start() breaks out of its
packet queuing loop and waiting until the current batch of transmissions
completes before wrapping back to the start of the ring. Fortunately, this
does not significantly impact transmit performance.

This is one of those things that takes weeks of analysis just to come
up with two or three lines of code changes.


47307 18-May-1999 peter

Move pcibus (host -> pci bus) probe/attach routines from nexus
to pcibus.c. pci_cfgopen() becomes static and there are no more
bus #ifdef's in nexus.c.


47292 18-May-1999 alc

pmap_qremove:
Eliminate unnecessary TLB shootdowns.


47231 15-May-1999 obrien

Add `xe', the Xircom PC Card Ethernet driver.


47162 14-May-1999 jkoshy

Correct comment to refer to kget(8).


47101 13-May-1999 bde

Renamed the private copies of strlen and strcpy to gdb_strlen and
gdb_strcpy, respectively. This saves fixing the wrong return type
of the private strlen and makes the addresses of strlen and strcpy
unambiguous.


47081 12-May-1999 luoqi

Unbreak VESA on SMP.


47080 12-May-1999 luoqi

VM86_FRAMESIZE is now the size of vm86 frame, not the number of 4-byte words.

Requested by: Bruce


47048 12-May-1999 phk

Fix dumpon. It passes a udev_t from userland to kernel, that needs a
udev2dev() before we use it.

It really should pass a name like swapon does.


47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


47022 11-May-1999 luoqi

Do not hardcode size of struct vm86frame.

Submitted by: Jonathan Lemon <jlemon@americantv.com>


47021 11-May-1999 luoqi

Trap frame size has increased by 4.

Submitted by: Kazutaka YOKOTA <yokota@freebsd.org>


46945 11-May-1999 alc

The Intel Pentium Pro's performance counters are 40 bits wide. The Intel
manuals specifically say that reading the counters using the rdmsr
instruction returns a 64 bit value of which the higher 24 bits are
undefined. The code that reads the counters should then clear the
high 24 bits.

PR: i386/10632


46937 10-May-1999 bde

Fixed checking for maddr/msize conflicts. It was complete nonsense,
but was fairly harmless because not many devices have statically
configured msizes (none should have, but old-bus is missing post-probe
checks for maddr/msize conflicts, so sizes had to be statically
configured for maddr/msize conflict checking to actually work).

PR: 11146 (side issue)


46917 10-May-1999 dfr

Add missing suspend/resume methods.


46915 10-May-1999 peter

Move the mfs_getimage() prototype to mfs_extern.h duplicating it
everywhere.


46881 10-May-1999 bde

[Forgot to commit this in the batch a few days ago.]

Fixed profiling of elf kernels. Made high resolution profiling compile
for elf kernels (it is broken for all kernels due to lack of egcs support).

Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others. A few type declarations are still missing, mainly for SMP.

PR: 9413
Submitted by: Assar Westerlund <assar@sics.se> (initial parts)


46847 09-May-1999 peter

For what it's worth, idelayed is declared as a volatile in the headers,
and even though it's not used in this file make it a volatile here too.


46823 09-May-1999 peter

s/main/mi_startup/ for the kernel entry point so that egcs doesn't get
upset about it (and generate things like __main() calls that are reserved
for main()). Renaming was phk's suggestion, but I'd already thought about
it too. (phk liked my suggested name tada() but I decided against it :-)

Reviewed by: phk


46808 09-May-1999 phk

Oops. If ROOTDEVNAME isn't defined, have -r call -a.


46807 09-May-1999 phk

no longer used.


46806 09-May-1999 phk

Major lobotomy of config(8). The

config kernel mumble mumble

line has been obsoleted and removed and with it went all knowledge of
devices on the part of config.

You can still configure a root device (which is used if you give
the "-r" flag) but now with an option:

options ROOTDEVNAME=\"da0s2e\"

The string is parsed by the same code as at the "boot -a" prompt.

At the same time, make the "boot -a" prompt both more able and more
informative.

ALPHA/PC98 people: You will have to adapt a few simple changes
(defining rootdev and dumpdev somewhere else) before config works
for you again, sorry, but it's all in the name of progress.


46783 09-May-1999 phk

add some amount of sanity to the way the gdb stuff finds its device.

I'm not too happy about the result either, but at least it has less
chance of backfiring.

This particular feature could be called "a mess" without offending
anybody.


46780 09-May-1999 phk

Yet a major/dev_t confusion

Spotted by Bruce.


46773 09-May-1999 phk

Duh, bdevsw() takes dev_t arg.


46743 08-May-1999 dfr

Move the declaration of the interrupt type from the driver structure
to the BUS_SETUP_INTR call.


46737 08-May-1999 peter

Add some notes about the globalness of certain things like interrupts
and ISA DMA channels (ie: on most PCI systems, they are not.. they are
on the ISA side of the PCI-ISA bridge and could be duplicated if there
were multiple PCI-ISA bridges, say in a laptop docking station), while
the APIC resources would be global on SMP systems.
Also, revert a previous change, change some printfs back to panics.


46728 08-May-1999 peter

Don't print 'interrupting at irq nn' on the x86 family, it's not all
that big a deal just yet and isn't worth a whole line on the boot screen.
This could change later in the face of multi-ISA-bus (eg: laptop docking
stations with two independent ISA busses) and SMP/APIC systems. The Alpha
already has multiple interrupt destinations to deal with.


46715 08-May-1999 peter

Update for new resource_set_*() interface. Also deal with the conflicts
flag.


46703 08-May-1999 peter

Make sure the mem_range_AP_init() prototype is seen where it's needed, and
#ifdef SMP around it for fun.


46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


46647 07-May-1999 peter

Yet another kludge to maintain the isa_device illusion, this time malloc
an isa_driver and name pointer so the uc_devlist sysctl can get to it.


46635 07-May-1999 phk

Continue where Julian left off in July 1998:

Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline)
function.

Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
to the order of the cmaj/bmaj arguments!)

Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
(ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)


46624 07-May-1999 mckusick

Generalize to allow any serial port to be used as the GDB port.
Mark the GDB port in the config file with flags 0x80. Currently
only the sio driver checks these flags and sets up a GDB port,
but adding similar code to other serial drivers would be easy.
For backward compatibility, if an sio port is marked as the console
and no port is marked as the gdb port, the GDB port will be mapped
to the console port. This hack should go away at some point.


46600 06-May-1999 peter

Ensure prototype for pnp_configure() is visible.


46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


46548 06-May-1999 bde

Fixed profiling of elf kernels. Made high resolution profiling compile
for elf kernels (it is broken for all kernels due to lack of egcs support).

Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others. A few type declarations are still missing, mainly for SMP.

PR: 9413
Submitted by: Assar Westerlund <assar@sics.se> (initial parts)


46539 06-May-1999 luoqi

Initialize dblfault_tss.tss_fs to the per-cpu private data segment selector.


46537 06-May-1999 luoqi

Do not set curproc until proc0 is fully initialized (in proc0_init()).


46497 05-May-1999 jkh

Add information strings for a number of devices which have suddenly appeared
in the configuration name space.


46381 03-May-1999 billf

Add sysctl descriptions to many SYSCTL_XXXs

PR: kern/11197
Submitted by: Adrian Chadd <adrian@FreeBSD.org>
Reviewed by: billf(spelling/style/minor nits)
Looked at by: bde(style)


46357 03-May-1999 peter

Don't deref a NULL mem_range_softc.mr_op pointer on non-MTRR systems when
starting the AP.


46245 02-May-1999 msmith

Whoops, not all SMP systems have memory range attribute support. Don't
try to set it up on an AP unless we do.

Submitted by: dave adkins <adkin003@tc.umn.edu>


46215 30-Apr-1999 msmith

Add a hook that can be called to initialise a slave processor's memory
range attributes after they have been extracted from the master.

Hook up the i686 MP code to do this for each AP.

Be more careful about printing the default memory type for the i686.

Suggestions from: luoqi


46153 28-Apr-1999 dt

s/static foo_devsw_installed = 0;/static int foo_devsw_installed;/.
(Edited automatically)


46129 28-Apr-1999 luoqi

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


46116 27-Apr-1999 phk

Change suser_xxx() to suser() where it applies.


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


46089 26-Apr-1999 peter

Register the netisr's via SYSINIT rather than linker sets.


46072 25-Apr-1999 alc

pmap_dispose_proc and pmap_copy_page:
Conditionally compile 386-specific code.

pmap_enter:
Eliminate unnecessary TLB shootdowns.

pmap_zero_page and pmap_zero_page_area:
Use invltlb_1pg instead of duplicating the code.


46054 25-Apr-1999 phk

Make the machdep.i8254_freq and machdep.tsc_freq sysctls modify the
timecounter as well

Asked for by: bde, jhay


46025 24-Apr-1999 peter

#if 0 out the pci device list, not that you could do a lot with it. It
was only looking at old style drivers in the pcidevice_set, which doesn't
exist any more.. Ultimately, the pci and eisa bus drivers will check for
hints for wiring, flags and enable/disable etc as well.


45980 24-Apr-1999 kato

1MB is not 1024 * 1024 * 1024 but 1024 * 1024.


45960 23-Apr-1999 dt

Make pmap_collect() an official pmap interface.


45900 21-Apr-1999 peter

oops, SMP was missing includes for a typedef.


45897 21-Apr-1999 peter

Stage 1 of a cleanup of the i386 interrupt registration mechanism.
Interrupts under the new scheme are managed by the i386 nexus with the
awareness of the resource manager. There is further room for optimizing
the interfaces still. All the users of register_intr()/intr_create()
should be gone, with the exception of pcic and i386/isa/clock.c.


45838 19-Apr-1999 peter

Make userconfig saving actually work..


45833 19-Apr-1999 alc

_pmap_unwire_pte_hold and pmap_remove_page:
Use pmap_TLB_invalidate instead of invltlb_1pg to eliminate
unnecessary IPIs.

pmap_remove, pmap_protect and pmap_remove_pages:
Use pmap_TLB_invalidate_all instead of invltlb to eliminate
unnecessary IPIs.

pmap_copy:
Use cpu_invltlb instead of invltlb when updating APTDpde.

pmap_changebit:
Rather than deleting the unused "set bit" option (which may be
useful later), make pmap_changebit an inline that is used
by the new pmap_clearbit procedure.

Collectively, the first three changes reduce the number of TLB shootdown
IPIs by 1/3 for a kernel compile.


45821 19-Apr-1999 peter

unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.


45808 19-Apr-1999 peter

Always create attach points for the various child busses that can be
attached to the nexus. With one exception, this (for example) allows
you to do wierd things like kldload the eisa bus on the fly and then
drivers, and have it auto probe the eisa bus when the drivers come online.

The one exception being pci, it only adds the pcib after the presence of
the pci bus is detected and that's #if'ed code.

A side effect of this is that isa and eisa will be attached to the nexus
directly rather than the PCI->ISA or PCI->EISA bridges. I'm not sure if
this is good or bad at this point, but it seems to be closer to the way
things are for the i386 family... This is likely to be followed up.

This also fixes compilation without a PCI bus configured and will allow
eisa to work without PCI too.


45797 18-Apr-1999 peter

Compile without a PCI bus in the kernel.


45791 18-Apr-1999 peter

Implement an EISA new-bus framework. The old driver probe mechanism
had a quirk that made a shim rather hard to implement properly and it was
just easier to convert the drivers in one go. The changes to the
buslogic driver go beyond just this - the whole driver was new-bus'ed
including pci and isa. I have only tested the EISA part of this so far.

Submitted by: Doug Rabson <dfr@nlsystems.com>


45779 18-Apr-1999 kato

Added PC98 code.

Submitted by: Takahashi Yoshihiro <nyan@wyvern.cc.kogakuin.ac.jp>


45720 16-Apr-1999 peter

Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by: core


45704 15-Apr-1999 bde

Removed dead code and cleaned up. setconf() now just asks for the
root device name. The parser for the name is still too simple (it
forces slice = none, partition = 'a').


45703 15-Apr-1999 bde

Made booting with -a work for all configurations. Previously it
only worked for configurations with "swap on generic".

usr.sbin/config/config.y:
- ignore all "swap [on] device ...' specifications except for
warning about them. They haven't done anything related to swap
for almost 4 years, and were previously silently ignored,
except for "swap on generic" which stopped swap${KERNEL}.c
from being generated. Code to support swapping is now deader
than before.

usr.sbin/config/mkswapconf.c:
- don't generate a dummy setconf() function in swap${KERNEL}.c.

sys/i386/conf/files.i386:
- swapgeneric.c is now standard. It should be merged into autoconf.c
so that it doesn't conflict with swap${KERNEL}.c for kernels named
"generic".

sys/i386/i386/autoconf.c:
- don't call setroot() for mfs roots. Since setroot() doesn't do anything
harmful, this was just a waste of time, except possibly for booting with
-a it may have helped prevent an undesireable call to setconf() by
finding a bogus rootdev.
- honor -a for ffs roots. -a now overrides all other ways of specifying
the root device. Previously, -r had precedence over -a, and the -a
handling was usually a no-op.
- don't honor -a for non-ffs roots, since it would currently just get in
the way of a clean panic.

sys/i386/i386/swapgeneric.c:
- don't declare things that are now always declared in swap${KERNEL}.c.
Don't decide things that are now decided in autoconf.c. Code to
support the "generic" case is now dead instead of useless.


45677 14-Apr-1999 bde

Search bdevsw[] instead of the half-baked builtin table of devices
in the RB_ASKNAME case. I had thought that I made this change in
rev.1.18, but rev.1.18 only affects obscure subcases of the
RB_DFLTROOT case (subcases where the compiled in default root is
not found in bdevsw[] -- then the root device is set to the first
device in the half-baked table that is also in bdevsw[]).

Removed yet more vestiges of config-time swap configuration.


45676 14-Apr-1999 bde

Generate intrnames[] dynamically. This should be new-bus friendly.

Old version reviewed by: se


45643 13-Apr-1999 tegge

Backout early start of APs since it caused some machines to hang.


45577 11-Apr-1999 eivind

Staticize.


45566 11-Apr-1999 tegge

Add prototype for wait_ap().


45562 10-Apr-1999 tegge

Let BSP wait until all APs are initialized.


45555 10-Apr-1999 tegge

Test CF after a btrl operation instead of testing ZF (which is undefined).


45524 10-Apr-1999 alc

pmap_remove_pte:
Use "loadandclear" to update the pte.

pmap_changebit and pmap_ts_referenced:
Switch to pmap_TLB_invalidate from invltlb.


45436 07-Apr-1999 peter

Disable the mtrr copy calls, it doesn't work with the i686_mem.c stuff.
This should make it compile/link again.


45405 07-Apr-1999 msmith

mem.c
Split out ioctl handler a little more cleanly, add memory
range attribute handling for both kernel and user-space
consumers.

pmap.c
Remove obsolete P6 MTRR-related code.

i686_mem.c
Map generic memory-range attribute interface to the P6 MTRR
model.


45386 06-Apr-1999 wpaul

Add driver support for gigabit ethernet adapters based on the Alteon
Networks Tigon 1 and Tigon 2 chipsets. There are a _lot_ of OEM'ed
gigabit ethernet adapters out there which use the Alteon chipset so
this driver covers a fair amount of hardware. I know that it works with
the Alteon AceNIC, 3Com 3c985 and Netgear GA620, however it should also
work with the DEC/Compaq EtherWORKS 1000, Silicon Graphics Gigabit
ethernet board, NEC Gigabit Ethernet board and maybe even the IBM and
and Sun boards. The Netgear board is the cheapest (~$350US) but still
yields fairly good performance.

Support is provided for jumbo frames with all adapters (just set the
MTU to something larger than 1500 bytes), as well as hardware multicast
filtering and vlan tagging (in conjunction with the vlan support in
-current, which I should merge into -stable soon). There are some hooks
for checksum offload support, but they're turned off for now since
FreeBSD doesn't have an officially sanctioned way to support checksum
offloading (yet).

I have not added the 'device ti0' entry to GENERIC since the driver
with all the firmware compiled in is quite large, and it doesn't really
fit into the category of generic hardware.


45370 06-Apr-1999 alc

Two changes to pmap_remove_all:

1. Switch to pmap_TLB_invalidate from invltlb, eliminating a full TLB
flush where a single-page flush suffices. (Also, this eliminates some
unnecessary IPIs.)

2. Use "loadandclear" to update the pte, eliminating a race condition
on SMPs.

Change #2 should be committed to -STABLE.


45347 05-Apr-1999 julian

Catch a case spotted by Tor where files mmapped could leave garbage in the
unallocated parts of the last page when the file ended on a frag
but not a page boundary.
Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF,
in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h
vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c
ufs/ufs/ufs_readwrite.c kern/vfs_bio.c

Submitted by: Matt Dillon <dillon@freebsd.org>
Reviewed by: Alan Cox <alc@freebsd.org>


45270 03-Apr-1999 jdp

Restore support for executing BSD/OS binaries on the i386 by passing
the address of the ps_strings structure to the process via %ebx.
For other kinds of binaries, %ebx is still zeroed as before.

Submitted by: Thomas Stephens <tas@stephens.org>
Reviewed by: jdp


45252 02-Apr-1999 alc

Put in place the infrastructure for improved UP and SMP TLB management.

In particular, replace the unused field pmap::pm_flag by pmap::pm_active,
which is a bit mask representing which processors have the pmap activated.
(Thus, it is a simple Boolean on UPs.)

Also, eliminate an unnecessary memory reference from cpu_switch()
in swtch.s.

Assisted by: John S. Dyson <dyson@iquest.net>
Tested by: Luoqi Chen <luoqi@watermarkgroup.com>,
Poul-Henning Kamp <phk@critter.freebsd.dk>


45140 30-Mar-1999 phk

Purging lint from the Bruce filter.


44924 21-Mar-1999 phk

Link the bb structures together as we find them.


44917 20-Mar-1999 alc

Eliminate a pointless TLB flush from the SMP idle loop.

Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
Reviewed by: "John S. Dyson" <toor@dyson.iquest.net>


44863 18-Mar-1999 jlemon

Remove debugging printf that I overlooked.


44845 18-Mar-1999 jlemon

Change the vm86_datacall interface so that callers are now responsible
for passing in their own data space and associated page table information.

Update the support files so that any pages in the vm86 page table are
mapped, rather than just one page.

Restore the E820 memory probe, and have it use the new interface.


44844 18-Mar-1999 jlemon

Change btrl/btsl to cmpl/movl, since each cpu now has their own copy
of private_tss, and there's no need to use a bit array. Also fixes
the problem of using `je' after btrl, since cmpl sets ZF.

Noticed by: Luoqi, on -current


44809 16-Mar-1999 msmith

Don't call int15:820, it's not allowed to be called from vm86 mode.

PR: i386/10485
Submitted by: takawata@shidahara1.planet.sci.kobe-u.ac.jp


44807 16-Mar-1999 msmith

Look for the right ACPI table signature.

PR: i386/10587
Submitted by: Takanori Watanabe <takawata@shidahara1.planet.sci.kobe-u.ac.jp>


44704 13-Mar-1999 alc

pmap_qenter/pmap_qremove:
Use the pmap_kenter/pmap_kremove inline functions
instead of duplicating them.

pmap_remove_all:
Eliminate an unused (but initialized) variable.

pmap_ts_reference:
Change the implementation. The new implementation is much smaller
and simpler, but functionally identical. (Reviewed by
"John S. Dyson" <dyson@iquest.net>.)


44645 10-Mar-1999 roberto

Fix two tests against hex. values for CPUID.

PR: i386/10050
Submitted by: Kevin Day <toasty@dragondata.com>


44611 09-Mar-1999 phk

Make TIMER_FREQ a normal, undocumented option. Raise confusion to
a higher level with example in LINT.

Clarify comment about PPS_SYNC. Ignore for now that it doesn't
work in FLL mode, it will in a few days.


44510 06-Mar-1999 wollman

Expose a slightly-lower-level interface to timeouts which allows callers
to manage their own memory. Tested on my machine (make buildworld).
I've made analogous changes on the alpha, but don't have a machine
to test.

Not-objected-to by: dg, gibbs


44487 05-Mar-1999 bde

The magic "no-cpu" cpu number is 0xff. Don't misrepresent cpu
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them. In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.


44470 05-Mar-1999 alc

Fix an SMP-only TLB invalidation bug. Specifically, disable
a TLB invalidation optimization that won't work given the
limitations of our current SMP support.

This patch should be applied to -stable ASAP.

Thanks to John Capo <jc@irbs.com>,
Steve Kargl <sgk@troutmask.apl.washington.edu>, and
Chuck Robey <chuckr@mat.net>
for testing.


44327 28-Feb-1999 bde

Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process. Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by: dfr, phk


44289 26-Feb-1999 tegge

Don't call assign_apic_irq with a value for irq that is out of range.


44256 25-Feb-1999 bde

Don't forget to update `switchticks' in corner cases (except for
the alpha fork_trampoline(), forget it because it I believe it is
only necessary for the unsupported SMP case).


44215 22-Feb-1999 bde

Added a per-cpu variable `switchticks' for use in scheduling.


44189 21-Feb-1999 n_hibma

Add uhci and ohci driver names


44168 20-Feb-1999 roberto

Bit 24 of the Feature Flag is FXSR (for Fast FP Save and Restore).

Reminded by: Francis Dupont <Francis.Dupont@inria.fr>


44157 19-Feb-1999 luoqi

Introduce machine-dependent macro pgtok() to convert page count to number
of kilobytes. Its definition for each architecture could be optimized to
avoid potential numerical overflows.


44146 19-Feb-1999 luoqi

Hide access to vmspace:vm_pmap with inline function vmspace_pmap(). This
is the preparation step for moving pmap storage out of vmspace proper.

Reviewed by: Alan Cox <alc@cs.rice.edu>
Matthew Dillion <dillon@apollo.backplane.com>


44078 16-Feb-1999 dfr

* Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)


43970 13-Feb-1999 bde

Don't pass PSL_NT to vm86 signal handlers. Some vm86/real mode
programs, including msdos, set PSL_NT in probes for old cpu types,
although PSL_NT doesn't do anything useful in vm86 or real mode.
PSL_NT is even less useful in the signal handlers. It just causes
T_TSSFLT faults on return from syscalls made by the handlers.
These faults are fixed up lazily so that Xsyscall() doesn't have
to be slowed down to prevent them. The fault handler recently
started complaining about these faults occurring "with interrupts
disabled". It should not have, but the complaints pointed to this
bug.

PR: 9211


43887 11-Feb-1999 msmith

Zero p->retval[1] when starting a process. This value ends up in %edx
when the process starts, and having it nonzero causes statically-linked
Linux binaries to fail.

PR: i386/10015
Submitted by: Marcel Moolenaar <marcel@scc.nl>


43826 10-Feb-1999 des

Remove lpt from the device list.
Add the rdp driver (forgotten by Joerg?)


43820 10-Feb-1999 jkh

Save pnp changes into a sysctl variable for kget, just as is done
with the isa changes.


43758 08-Feb-1999 dillon

Adjust idle zero-page fill hysteresis based on tests. Use 2/3 and 4/5
zero-fill levels.

Adjust comment for ozfod in vmmeter.h - this counter represents
non-optimal ( on the fly ) zero fills, not prefills.


43752 08-Feb-1999 dillon

Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with
PQ_FREE. There is little operational difference other then the kernel
being a few kilobytes smaller and the code being more readable.

* vm_page_select_free() has been *greatly* simplified.
* The PQ_ZERO page queue and supporting structures have been removed
* vm_page_zero_idle() revamped (see below)

PG_ZERO setting and clearing has been migrated from vm_page_alloc()
to vm_page_free[_zero]() and will eventually be guarenteed to remain
tracked throughout a page's life ( if it isn't already ).

When a page is freed, PG_ZERO pages are appended to the appropriate
tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended.
When locating a new free page, PG_ZERO selection operates from within
vm_page_list_find() ( get page from end of queue instead of beginning
of queue ) and then only occurs in the nominal critical path case. If
the nominal case misses, both normal and zero-page allocation devolves
into the same _vm_page_list_find() select code without any specific
zero-page optimizations.

Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been
added and zero-page tracking adjusted to conform with the other changes.
Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free
pages. We may wish to increase both parameters as time permits. The
hysteresis is designed to avoid silly zeroing in borderline allocation/free
situations.


43612 04-Feb-1999 kato

Recognize Pentium II Xeon, Celeron and Pentium III cpus. Because CPU
names are printed on their packages and shown by BIOS, kernel does not
need to show details.

PR: 8751, 9320 and 9463


43592 04-Feb-1999 yokota

- Added atkbdc and atkbd to the device info array so that they don't
appear as "unknown device" in the visual UserConfig.
- Mark psm as FLG_FIXED.


43564 03-Feb-1999 dg

Fixed the type of target_page to vm_offset_t (unsigned). This fixes a
panic during boot on machines with >=2GB of RAM. Also changed some
incorrect printf conversion specifiers from %d to %u (signed to unsigned).
This fixes bugs when printing the amount of memory on machines with >=2GB
of RAM.


43530 02-Feb-1999 bde

Check for signals while reading /dev/urandom. Reading 10MB from
/dev/urandom takes about 38 seconds on a P5/133. It is useful
to be able to kill such reads almost immediately. Processes
doing such reads are now scheduled so their denial of service
is no worse than that of processes looping in user mode.


43474 31-Jan-1999 yokota

- Don't print unnecessary CLI command prompt (and faked "quit" command)
if RB_CONFIG is not set.
- Print a short, introductory banner for CLI before the first command
prompt, not after.


43447 31-Jan-1999 kato

Use offset to _pc98_system_parameter instead of immediate value which
assumes KERNBASE=0x100000.


43434 30-Jan-1999 kato

Moved pc98_system_parameter from .text to .data to make ELF kernel
work.


43403 29-Jan-1999 dillon

More const fixes for -Wall, -Wcast-qual


43387 29-Jan-1999 dillon

More -Wall / -Wcast-qual cleanup. Also, EXEC_SET can't use
C_DECLARE_MODULE due to the linker_file_sysinit() function
making modifications to the data.


43351 28-Jan-1999 dillon

Fix warnings related to -Wall -Wcast-qual


43340 28-Jan-1999 newton

Sun Bug ID 1251858 (on http://sunsolve1.sun.com) discusses the way that
Sun implemented iBCS2 compatibility on Solaris >= 2.6: The emulator
runs in user-mode, patching the LDT so that client programs making
syscalls through the old iBCS2 call gate get handled by the emulator
process. Unemulated syscalls therefore need their own call-gate that
bypasses the emulator. Sun chose LDT entry 4 to implement this, which
is what we've been using as LUDATA_SEL, so we need to change LUDATA_SEL
if we want to run Solaris executables.

Discussed with: Mike Smith


43314 28-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


43309 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile.

This commit includes significant work to proper handle const arguments
for the DDB symbol routines.


43138 24-Jan-1999 dillon

Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALL
to use the vm_page_dirty() inline.

The inline can thus do sanity checks ( or not ) over all cases.


42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


42880 20-Jan-1999 jkh

Make more messages conditional on bootverbose


42817 19-Jan-1999 peter

Break configure() into a couple of stages to allow insertion of
hooks (eg: by drivers or (pre)loadable modules into a convenient spot.


42772 17-Jan-1999 peter

Replaced by /usr/bin/gensetdefs a few months ago.


42765 17-Jan-1999 peter

Remove the LKM glue since the support (src/lkm) has been gone a while.
This was impossible to use as an LKM anyway, but does work as a preloaded
kld module though.


42754 17-Jan-1999 wpaul

Fix cut & paste mind-o: the entry for the xl driver should say ethernet,
not FDDI. *smak*


42732 16-Jan-1999 kato

There are two models of AMD K6-2 Model 8 (c.f. AMD's document), so the
CPU stepping must be checked. Also, fixed print_AMD_info.

Submitted by: Akio Morita <amorita@meadow.scphys.kyoto-u.ac.jp>


42705 15-Jan-1999 msmith

Fetch an overide for NMBCLUSTERS from the kernel environment. Never allow
the value to be reduced below that defined when the kernel was built.


42678 14-Jan-1999 bde

Backed out previous commit. MALLOC_DEFINE() needs <sys/kernel.h>.


42654 14-Jan-1999 jdp

Replace includes of <sys/kernel.h> with includes of
<sys/linker_set.h> in those files that use only the linker set
definitions.


42543 12-Jan-1999 eivind

Silence warnings.


42542 12-Jan-1999 eivind

Silence warnings by removing unused convenience function and
globalizing debugging functions.


42448 09-Jan-1999 dt

Oops --<, replace 1.216 with a version that actually check pv_entries (and
was tested for month or two in production).

Noticed by: Stephen McKay

Stephen also suggested to remove the complication at all. I don't do it as
it would be backout of a large part of 1.190 (from 1998/03/16)...


42444 09-Jan-1999 wpaul

Add driver support (and man page) for PCI fast ethernet cards based
on the ASIX AX88140A chip. Update /sys/conf/files, RELNOTES.TXT,
/sys/i388/i386/userconfig.c, sysinstall/devices.c, GENERIC and LINT
accordingly.

For now, the only board that I know of that uses this chip is the
Alfa Inc. GFC2204. (Its predecessor, the GFC2202, was a DEC tulip card.)
Thanks again to Ulf for obtaining the board for me. If anyone runs
across another, please feel free to update the man page and/or the
release notes. (The same applies for the other drivers.)

FreeBSD should now have support for all of the DEC tulip workalike
chipsets currently on the market (Macronix, Lite-On, Winbond, ASIX).
And unless I'm mistaken, it should also have support for all PCI fast
ethernet chipsets in general (except maybe the SMC FEAST chip, which
nobody seems to ever use, including SMC). Now if only we could convince
3Com, Intel or whoever to cough up some documentation for gigabit
ethernet hardware.

Also updated RELNOTEX.TXT to mention that the SVEC PN102TX is supported
by the Macronix driver (assuming you actually have an SVEC PN102TX with
a Macronix chip on it; I tried to order a PN102TX once and got a box
labeled 'Hawking Technology PN102TX' that had a VIA Rhine board inside
it).


42440 09-Jan-1999 bde

Removed a stray label that broke compiling in the (elf && profiling) case.

PR: 9369
Submitted by: Assar Westerlund <assar@sics.se>


42437 09-Jan-1999 bde

Fixed switching between consoles (sc0, vt0 or sioN) in userconfig.

Broken in: rev.1.315


42432 09-Jan-1999 bde

Fixed pedantic syntax errors caused by a trailing semicolon in a macro
definition.


42428 09-Jan-1999 bde

Don't put operands in clobber lists, since this is dubious for old
versions of gcc and broken for current versions of egcs.

Cleaned up the asm statement for do_cpuid() a little.

Submitted by: "John S. Dyson" <dyson@iquest.net> but rewritten by me


42412 08-Jan-1999 abial

Fix faulty logic in handling userconfig_script, INTRO_USERCONFIG and
RB_CONFIG.

Now, the code should do the right thing in the following cases, when
kernel is compiled with INTRO_USERCONFIG:

* when booted without userconfig_script and without RB_CONFIG, present
intro screen, and wait for user input.

* when booted with userconfig_script and without RB_CONFIG, DON'T present
intro screen unless explicitly asked in userconfig_script, basing on
assumption that if a user loads userconfig_script, (s)he already
decided what parameters to configure. Proceed with booting.

* when booted without userconfig_script, and with RB_CONFIG, enter
configuration utility and wait for user input.

* when booted with userconfig_script, and with RB_CONFIG, execute all
commands from userconfig_script, and DON'T leave the config utility,
but wait for user input.

And finally, regardless of the combination of the above parameters,
when intro screen is invoked either first or next times, and user
chooses to go back to CLI interface, unblock the quit command.


42406 08-Jan-1999 bde

Moved declarations related to copying and zeroing to the right place.


42396 08-Jan-1999 luoqi

Allocate kernel page table object (kptobj) before any kmem_alloc calls.
On a system with a large amount of ram (e.g. 2G), allocation of per-page
data structures (512K physical pages) could easily bust the initial kernel
page table (36M), and growth of kernel page table requires kptobj.


42381 07-Jan-1999 dt

Make pmap_ts_referenced check more than 1 pv_entry. (One should be carefull
when move elements to the tail of a list in a loop...)


42373 07-Jan-1999 yokota

Remove a hard-coded table of kernel console I/O functions exported
from sc, vt and sio drivers. Use instead a linker_set to collect them.

Staticize ??cngetc(), ??cnputc(), etc functions in sc and vt drivers.
We must still have siocngetc() and siocnputc() as globals because they
are directly referred to by i386-gdbstub.c :-(

Oked by: bde


42370 07-Jan-1999 abial

When compiled with INTRO_USERCONFIG, skip the intro screen anyway if we
already loaded and interpreted userconfig_script. Otherwise, when using
such kernel system would always block waiting for user input in UserConfig,
while the intention was to avoid this by having userconfig_script.

Reviewed by: msmith


42360 06-Jan-1999 julian

Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>


42342 06-Jan-1999 abial

Add new sysctl machdep.uc_devlist. It allows to cleanly retrieve the
list of devices which has been changed in UserConfig, without resorting
to reading /dev/kmem.

The data returned consist of series of struct isa_device and
char dev_name[8].

Ok'd by: jkh


42135 28-Dec-1998 msmith

Improved DDB_UNATTENDED behaviour. From the submitter:

There's something that's been bugging me for a while, so I decided to fix it.
FreeBSD now will DTRT WRT DDB and DDB_UNATTENDED (!debugger_on_panic), at least
in my opinion. The behavior change is such that:

1. Nothing changes when debugger_on_panic != 0.
2. When DDB_UNATTENDED (!debugger_on_panic), if a panic occurs, the
machine will reboot. Also, if a trap occurs, the machine will
panic and reboot, unlike how it broke to DDB before. HOWEVER,
a trap inside DDB will not cause a panic, allowing full use
of DDB without having to worry about the machine being stuck
at a DDB prompt if something goes wrong during the day.
Patches for this behavior follow my signature, and it would
be a boon to anyone (like me) who uses DDB_UNATTENDED, but
actually wants the machine to panic on a trap (otherwise,
what's the use, if the machine causes a fatal trap rather than
a true panic, of debugger_on_panic?). The changes cause no
adverse behavior, but do involve two symbols becoming global

Submitted by: Brian Feldman <green@unixhelp.org>


42112 27-Dec-1998 msmith

From the submitter:

CPU_WT_ALLOC does not work correctly for K6-2s of model 8+ and
probably K6-3s (when they appear on the market soon). In addition,
print_AMD_info() incorrectly printfs write allocation's size. I've
fixed them, so they now Do The Right Thing, and added a
"NO_MEMORY_HOLE" option to easily allow 15-16mb range handling for us
K6 and K6-2 users.

Submitted by: Brian Feldman <green@unixhelp.org>


41871 16-Dec-1998 bde

Removed the cast to a pointer in the definition of PS_STRINGS and
adjusted related casts to match (only in the kernel in this commit).
The pointer was only wanted in one place in kern_exec.c. Applications
should use the kern.ps_strings sysctl instead of PS_STRINGS, so they
shouldn't notice this change.


41868 16-Dec-1998 bde

Removed bogus casts of USRSTACK and/or the other operand in binary
expressions involving USRSTACK.


41787 14-Dec-1998 mckay

Fix tabs that should have been spaces. Some were in kernel error messages.


41770 14-Dec-1998 dillon

Get rid of uninitialized variable warnings. No bugs found, just
preinitializing some locals to 0 to get rid of the compiler warnings.


41769 14-Dec-1998 dillon

Get rid of uninitialized warning for local variable 'c'. There was no
bug, but set it to 0 anyway to get rid of warning.

Get rid of uninitialized warning for local variable 'line' as well as
make a minor change to its scope. Again, no bug.


41764 14-Dec-1998 dillon

author was assuming that nextpaddr declared *inside* the do loop would
survive within the loop. This is not guarenteed by C. I have moved
the nextpaddr declaration to outside the do loop.


41763 14-Dec-1998 dillon

Change local ddb_mode variable to volatile to handle GCC warning about
the variable possibly being clobbered by setjmp/longjmp.


41629 10-Dec-1998 steve

Cleanup up the wording for the F00F bug workaround message.

PR: 8041
Submitted by: Dan Nelson <dnelson@emsphone.com>


41612 09-Dec-1998 eivind

Get rid of CTLTYPE_OPAQUE in a SYSCTL_OPAQUE - it is added my the
SYSCTL_OPAQUE macro.


41599 08-Dec-1998 kato

Use CNAME macro for pc98_system_parameter, which is referenced from C
source.

Submitted by: Masanori Kanaoka <kana@saijo.mke.mei.co.jp>


41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


41547 06-Dec-1998 archie

Avoid compiler warning (printf arg type mismatch) when compiling #ifdef DEBUG


41541 05-Dec-1998 kato

Print out information for write-allocate of AMD CPUs.

Submitted by: Akio Morita <amorita@meadow.scphys.kyoto-u.ac.jp>


41502 04-Dec-1998 wpaul

An early Christmas present: add driver support for a whole bunch of
PCI fast ethernet adapters, plus man pages.

if_pn.c: Netgear FA310TX model D1, LinkSys LNE100TX, Matrox FastNIC 10/100,
various other PNIC devices

if_mx.c: NDC Communications SOHOware SFA100 (Macronix 98713A), various
other boards based on the Macronix 98713, 98713A, 98715, 98715A
and 98725 chips

if_vr.c: D-Link DFE530-TX, other boards based on the VIA Rhine and
Rhine II chips (note: the D-Link and certain other cards
that actually use a Rhine II chip still return the PCI
device ID of the Rhine I. I don't know why, and it doesn't
really matter since the driver treats both chips the same
anyway.)

if_wb.c: Trendware TE100-PCIE and various other cards based on the
Winbond W89C840F chip (the Trendware card is identical to
the sample boards Winbond sent me, so who knows how many
clones there are running around)

All drivers include support for ifmedia, BPF and hardware multicast
filtering.

Also updated GENERIC, LINT, RELNOTES.TXT, userconfig and
sysinstall device list.

I also have a driver for the ASIX AX88140A in the works.


41454 02-Dec-1998 kato

- For some old Cyrix CPUs, %cr2 is clobbered by interrupts. This
problem is worked around by using an interrupt gate for the page
fault handler. This code was originally made for NetBSD/pc98 by
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp> and has already
been in PC98 tree. Because of this bug, trap_fatal cannot show
correct page fault address if %cr2 is obtained in this function.
Therefore, trap_fatal uses the value from trap() function.
- The trap handler always enables interruption when buggy application
or kernel code has disabled interrupts and then trapped. This code
was prepared by Bruce Evans <bde@FreeBSD.org>.

Submitted by: Bruce Evans <bde@FreeBSD.org>
Naofumi Honda <honda@kururu.math.sci.hokudai.ac.jp>


41370 27-Nov-1998 tegge

Don't forget to update the pmap associated with aio daemons when adding
new page directory entries for a growing kernel virtual address space.


41367 26-Nov-1998 tegge

Attempt to handle interrupts delivered to all IO APICs by using the first
IO APIC with a sufficient number of pins.


41362 26-Nov-1998 eivind

Staticize.


41318 24-Nov-1998 eivind

Move the declaration of PPro_vmtrr from the header file to pmap.c,
replacing the one in the header file with a definition. This makes it
easier to work with tools that grok ANSI C only.


41173 15-Nov-1998 bde

Finished updating module event handlers to be compatible with
modeventhand_t.


41004 08-Nov-1998 dfr

* Fix a couple of places in the device pager where an address was
truncated to 32 bits.
* Change the calling convention of the device mmap entry point to
pass a vm_offset_t instead of an int for the offset allowing
devices with a larger memory map than (1<<32) to be supported
on the alpha (/dev/mem is one such).

These changes are required to allow the X server to mmap the various
I/O regions used for device port and memory access on the alpha.


40998 08-Nov-1998 msmith

Enable 686 class optimisations for all 686-class processors, not just the
Pentium Pro. This resolves the "Dog slow SMP" issue for Pentium II
systems.


40904 04-Nov-1998 peter

Remove stray(?) debugging printf's and cngetc()'s that freeze boot several
times waiting for keypresses.


40866 03-Nov-1998 msmith

Remove the USERCONFIG_BOOT option. Userconfig script data is searched
for in a loaded module of type "userconfig_script". The RB_CONFIG
flag will always result in the user being left inside userconfig at
the end of the script's execution, regardless of 'quit' commands in
the script. If the RB_CONFIG flag is not specified, the user will
never be left inside userconfig, even if the script does not have an
explicit exit command.

Add the INTRO_USERCONFIG option. This option forces the userconfig 'intro'
screen (after a script has optionally been executed). There is no longer
a need to queue an 'intro' command.


40794 31-Oct-1998 peter

Add John Dyson's SYSCTL descriptions, and an export of more stats to
a sysctl hierarchy (vm.stats.*). SYSCTL descriptions are only present
in source, they do not get compiled into the binaries taking up memory.


40751 30-Oct-1998 msmith

Add the ability to specify where on the at_shutdown queue a handler is
installed.

Remove cpu_power_down, and replace it with an entry at the end of the
SHUTDOWN_FINAL queue in the only place it's used (APM).

Submitted by: Some ideas from Bruce Walter <walter@fortean.com>


40700 28-Oct-1998 dg

Added a second argument, "activate" to the vm_page_unwire() call so that
the caller can select either inactive or active queue to put the page on.


40658 26-Oct-1998 bde

Check the major number of the boot device more carefully. There was only
a problem if the boot blocks passed bad data.

Check the major number of the dump device consistently.


40651 25-Oct-1998 bde

Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted
when bdevsw[] became sparse. We still depend on magic to avoid having to
check that (v_rdev) device numbers in vnodes are not NODEV.


40610 23-Oct-1998 phk

Update timecounters to new interface.


40551 21-Oct-1998 rnordier

Get things limping along again for the 80386 and friends. The
ELF assembler emits a redundant operand-size prefix for the
fnstsw %ax instruction, and this stops the show for 3.0-RELEASE.


40545 21-Oct-1998 dg

Decrement the now unused page table page's wire_count prior to freeing it.
It will soon be required that pages have a zero wire_count when being
freed.


40516 18-Oct-1998 wpaul

Add driver support for PCI fast ethernet adapters based on the
RealTek 8129/8139 chipset like I've been threatening. Update kernel
configs, userconfig.c, relnotes and sysinstall. No man page yet;
comming soon.

I consider this driver stable enough that I want to give it some
exposure in -current.


40503 18-Oct-1998 peter

Print a message if bootverbose that the emulator is present in the kernel.

Move the initialization before isa_configure() and npx, in case npx does
something to initialize the state of the emulator somehow.

I do not have any machines without a FPU so that I can test this with -
except an old 386sx motherboard in a box somewhere that might work...


40435 16-Oct-1998 peter

*gulp*. Jordan specifically OK'ed this..

This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.


40286 13-Oct-1998 dg

Fixed two potentially serious classes of bugs:

1) The vnode pager wasn't properly tracking the file size due to
"size" being page rounded in some cases and not in others.
This sometimes resulted in corrupted files. First noticed by
Terry Lambert.
Fixed by changing the "size" pager_alloc parameter to be a 64bit
byte value (as opposed to a 32bit page index) and changing the
pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
caused some 64bit offsets and sizes to be scrambled. Removing
the cast required adding casts at a few dozen callers.
There may be problems with other bogus casts in close-by
macros. A quick check seemed to indicate that those were okay,
however.


40272 12-Oct-1998 jkh

Add adw device.
Noticed by: phk


40179 10-Oct-1998 kato

mp_machdep.c: Set a vector to boot code (PC-98).
locore.s: Tell the bios to warmboot next time (PC-98).


40173 10-Oct-1998 kato

PC-98 doesn't have CMOS ram.


40169 10-Oct-1998 kato

PC-98 doesn't have CMOS ram.


40164 10-Oct-1998 jkh

Allow more flexible use of MFS root.
Submitted by: peter


40152 09-Oct-1998 peter

Relocate the preload module info from machdep specifically rather than
trying to do it in locore. We also walk through the module table
and relocate any MODINFO_ADDR pointers so that they become KVM relative
rather than physical addresses. This means that hacks for adding
0xf0000000 in places like MFS go away.


40130 09-Oct-1998 peter

Null commit.. CVS aborted on freefall last time (reaonly file).

An elf_reloc() function for the i386. Based on alpha/alpha/elf_machdep.c
and rtld-elf/i386/reloc.c.


40129 09-Oct-1998 peter

An elf_reloc() function for the i386. Based on alpha/alpha/elf_machdep.c
and rtld-elf/i386/reloc.c.


40089 09-Oct-1998 msmith

Initialise kernel environment and module metadata pointers.


40081 08-Oct-1998 msmith

Fix up the kernel environment and module data pointers in the bootinfo if
they are present.
If we are told where the end of the loaded kernel image is, believe it.


40067 08-Oct-1998 kato

BIOS ROM base address is 0xe8000 on PC-98.


40029 07-Oct-1998 gibbs

Fix a parent tag reference count bug during tag teardown.

Enable optimization for nobounce_dmamap clients by setting the map
held by the client to NULL. This allows the macros in bus.h to check
against a constant to avoid function calls.

Don't attempt to 'free()' contigmalloced pages in bus_dmamem_free().
We currently leak these pages, which is not ideal, but is better than
a panic. The leak will be fixed when contigmalloc is merged into the
bus dma framework after 3.0R.


40003 06-Oct-1998 kato

- Implement enabling write allocate on AMD K5/K6/K6-2 cpus.
The code was originaly contributed by Kelly Yancey
<kbyanc@freedomnet.com> in PR i386/6269 and revised by Akio Morita
<amorita@meadow.scphys.kyoto-u.ac.jp> and me. Test was performed by
Akio Morita and Toshiomi Moriki <moriki@db.is.kyushu-u.ac.jp>.
- Fix stylistic bug in identcpu.c.
- Update copyright in initcpu.c
- Fix typo in LINT.

PR: 6269 and 6270


39991 06-Oct-1998 msmith

teach userconfig about isa_devtab_cam


39982 05-Oct-1998 obrien

Undo most of the previous commit.


39976 05-Oct-1998 obrien

Now require *FS_ROOT to enable the ability to mount a *FS /.
Previously one could config(8) a kernel that would not link.


39872 01-Oct-1998 jlemon

Don't try to save FP state if npxproc is null.
Submitted by: Tor Egge


39786 29-Sep-1998 ache

vm86_datacall: always use workaround since temp. malloced buffer or stack
area can be passed (and mapped to page1!) as vesa.c does. Use contigmalloc
now to get proper alignment. Bump max buffer size to PAGE_SIZE


39779 29-Sep-1998 ache

Move workaround about page aligned data buffer directly to vm86_datacall,
it is impossible to use this func otherwise, i.e. all vesa calls are
potentially broken. Max arg size limited to 1024 for now, bump it, if needed.


39775 29-Sep-1998 jlemon

Don't erase curproc when making a vm86() call. The previous behavior
was a pessimization that broke schedcpu(). (It caused the process to
be remrq()'d).
Reviewed by: Tor Egge


39760 29-Sep-1998 abial

Add sysctl 'machdep.msgbuf_clear'. Setting it to anything causes the
kernel message buffer to be cleared. It comes handy in situations when
the only logging facility you have is the msgbuf.

Reviewed by: jkh


39755 29-Sep-1998 bde

Don't pretend to support ix86's with 16-bit ints by using longs just
to ensure 32-bit variables. Doing so broke ix86's with 64-bit longs.


39715 28-Sep-1998 peter

Fix (?) EISA interrupt configuration based on observation of what we've
seen in practice. The MPspec is ambiguous and/or contradicts itself.
We now look at the ELCR to determine the trigger mode (edge/level) of an
interrupt tagged as "conforming" in the mptable. EISA interrupts appear
to be presented to the APIC as active high in all cases (they are level
inverted) that we've seen, so use this for the 'conforming' level case.

Of note, the system I'm using has 2 PCI cards in it, and the PCI cards
interrupts (5 and 9) appear in the ELCR register as level sensitive and the
mptable lists 5 and 9 as coming from the EISA bus. The PCI interrupts
are active-high by the time they reach the APIC even though they are
electrically active low at the slot.

We should still work should somebody implement this on motherboards
differently in the future as long as the mptable is clear about the
trigger/polarity.

Current should work on Holm Tiffe's machine now.

Based on code from: Tor.Egge@fast.no


39714 28-Sep-1998 peter

Back out rev 1.6 (temporarily at least). <machine/asmacros.h> is used
here for getting the #defines for the removal of the leading '_' in symbols
in the assembler code. We could probably #include <machine/asnames.h>
instead, but everything else seems to use asmacros.h directly as well.
Once we convert symbols, this becomes irrelevant.


39703 28-Sep-1998 tegge

Initialize pcb_mpnest to 1 in the child process in cpu_fork(). This should
fix the 50% idle problem that the ELF /sbin/init triggered. The problem
appeared when the last context switch before a fork() call was due to
the kernel faulting in user pages via normal page faults (e.g. copyin).
Reviewed by: Peter Wemm <peter@netplex.com.au>


39702 28-Sep-1998 tegge

Use correct virtual address when configuring the per CPU idle page directory
for a vm86 call under SMP.


39648 25-Sep-1998 peter

Goodbye BOUNCE_BUFFERS, for a hack it has served us well.

The last consumer of this code (the old SCSI system) has left us and
the CAM code does it's own bouncing. The isa dma system has been
doing it's own bouncing for a while too.

Reviewed by: core


39638 25-Sep-1998 jkh

Add entry for ThunderLAN NIC.


39613 24-Sep-1998 bde

Don't redefine kernel. Makefile.i386 now defines it.
Removed some unused includes.


39526 20-Sep-1998 bde

Attempt to work around a bug in the previous commit related to
non-reentrancy of SMP clock locking. Depend on the giant lock
protecting clkintr().


39509 20-Sep-1998 bde

Remove vestiges of SLICE code.

Forgotten by: sos


39503 20-Sep-1998 bde

Ensure that the i8254 timecounter doesn't go backards. It sometimes
went backwards when interrupts were masked for more than one i8254
interrupt period. It sometimes went backwards when the i8254 counter
was reprogrammed. Neither of these should happen in normal operation.

Update the i8254 timecounter support variables atomically. Calling
timecounter functions from fast interrupt handlers may actually work
in all cases now.


39376 16-Sep-1998 msmith

Make consoles immutable. If you want to turn them off, use wizard mode.

Fix assorted formatting violations in the device table.


39243 15-Sep-1998 gibbs

autoconf.c:
Convert autoconf hooks from old SCSI system to CAM.

busdma_machdep.c:
bus_dmamap_free() should expect the nobounce map, not a NULL one.

mountroot.c:
swapgeneric.c:
da and od changes.

symbols.raw:
Nuke the old disk stat symbols.

userconfig.c:
Disable the SCSI listing code until it can be converted to CAM.


39197 14-Sep-1998 jdp

Add new functions fill_fpregs() and set_fpregs(), like fill_regs()
and set_regs() but for the floating point register state. The code
is stolen from procfs_machdep.c, and moved out of there into
machdep.c.

These functions are needed for generating ELF core dumps.


39187 14-Sep-1998 sos

Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.


39176 14-Sep-1998 abial

This implements retrieving the contents of message buffer via sysctl(3)
as "machdep.msgbuf". It's needed in case of using stripped kernels, where
normal dmesg (which has to use kvm) doesn't work.

The buffer is unwound, meaning that the data will be linear, possibly
with some leading NULLs.

Reviewed by: Jordan K. Hubbard <jkh@freebsd.org>


39046 10-Sep-1998 yokota

Pass the correct argument to npxsave(), otherwise vm86pcb will be
overwritten.


38908 07-Sep-1998 jkh

Add entries for xl0, tlc0 and adv0. Some of these aren't even
in LINT!


38893 06-Sep-1998 tegge

Don't go below the low water mark of free pages due to optional prefaulting
of pages.
PR: 2431


38888 06-Sep-1998 tegge

Maintain a mapping from irq number to (ioapic number, int pin) tuple,
and use this when masking/unmasking interrupts.

Maintain a mapping from (iopaic number, int pin) tuple to irq number,
and use this when configuring devices and programming the ioapics.

Previous code assumed that irq number was equal to int pin number, and
that the ioapic number was 0.

Don't let an AP enter _cpu_switch before all local apics are initialized.


38824 04-Sep-1998 luoqi

Make irq forwarding truely functional.


38807 04-Sep-1998 ache

PAGE_WAKEUP -> vm_page_wakeup


38779 03-Sep-1998 nsouch

Reviewed by: Doug Rabson
Submitted by: nsouch
root_bus_configure() call to initialize new bus arch in i386 env.


38717 01-Sep-1998 kato

- Fix style bug.
- hw.ispc98 -> machdep.ispc98.

Submitted by: Garrett Wollman (hw -> machdep)


38700 31-Aug-1998 luoqi

Use 16bit register in inline asm code to set segment registers.


38673 31-Aug-1998 kato

- hw.machine_arch returns cpu architecture type.
- moved definition of MACHINE_ARCH from cpu.h to parm.h as alpha.
- Added definitions of _MACHINE and _MACHINE_ARCH.
- Added hw.ispc98. The hw.ispc98 is 1 in PC98 kernel and is 0 in
IBM-PC kernel.

Discussed with: John Birrell <jb@FreeBSD.ORG>


38505 24-Aug-1998 bde

Fixed printf format errors. Only one left in LINT on i386's.


38490 23-Aug-1998 des

Don't check minor number of dump device at all.

Discussed-with: Jörg Wunsch


38488 23-Aug-1998 bde

Fixed printf format errors.


38485 23-Aug-1998 bde

Added D_TTY to the cdevswitch flags for all tty drivers. This is required
for the Lite2 fix for always returning EIO in dead_read().

Cleaned up the cdevswitch initializers for all tty drivers.

Removed explicit calls to ttsetwater() from all (tty) drivers. ttsetwater()
is now called centrally for opens, not just for parameter changes.


38422 18-Aug-1998 msmith

Presently there is only one `currentldt' variable for all cpus
in a SMP system. Unexpected things could happen if each cpu
has a different ldt setting and one cpu tries to use value
of currentldt set by another cpu.

The fix is to move currentldt to the per-cpu area. It includes
patches I filed in PR i386/6219 which are also user ldt related.

PR: i386/7591, i386/6219
Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>


38349 16-Aug-1998 bde

pmap.c:
Cast pointers to (vm_offset_t) instead of to (u_long) (as before) or to
(uintptr_t)(void *) (as would be more correct). Don't cast vm_offset_t's
to (u_long) just to do arithmetic on them.

mp_machdep.c:
Cast pointers to (uintptr_t) instead of to (u_long). Don't forget
to cast pointers to (void *) first or to recover from integral
possible integral promotions, although this is too much work for
machine-dependent code.

vm code generally avoids warnings for pointer vs long size mismatches
by using vm_offset_t to represent pointers; pmap.c often uses plain
`unsigned int' instead of vm_offset_t and didn't use u_long elsewhere,
but this style was messed up by code apparently imported from mp_machdep.c.


38347 16-Aug-1998 bde

Use an array of uintptr_t's instead of an array of u_longs to hold
address constants. This fixes some warnings for conversions from
64-bit integers to 32-bit pointers on i386's with 64-bit longs.
vm86 still uses too many u_longs.


38246 11-Aug-1998 bde

Register tty software interrupt handlers at run time using register_swi()
instead of at compile time using ifdefs.

Use _swi_null instead of dummycamisr. CAM and dpt should call
register_swi() instead of hacking on ihandlers[] directly.


38245 11-Aug-1998 bde

Fixed printf format errors.


38244 11-Aug-1998 bde

Implemented dynamic registration of software interrupt handlers. Not
used yet.

Use dummy SWI handlers to avoid some checks for null pointers.


38233 10-Aug-1998 bde

Fixed restoring of cpl after trap handling. The wrong cpl (SWI_AST_MASK
instead of 0) was "restored" after handling a trap that occurred while
returning to user mode. This bug was most noticeable for VM86 and is
still detected and fixed up (on return from the next exception) in doreti
if VM86 is configured.


38063 03-Aug-1998 msmith

Copy in the nfs_diskless structure if NFS_ROOT is defined. A previous
change to include nfs_root.h precluded NFS from being defined.
Submitted by: Parag Patel <parag@cgt.com>


38010 02-Aug-1998 gpalmer

Add the ISP Qlogic SCSI card to the list of known devices.


37920 28-Jul-1998 bde

Set p->p_switchtime to switchtime instead of to the current time in
fork_trampoline() if switchtime is valid. This fixes not accounting
for the time between the previous context switch and and the current
time (when the forked child starts up here) in most cases - the time
is now counted in the child's runtime. I think it actually fixes
all cases, and switchtime is always valid here, since there must have
been a context switch just before the forked child starts up. Some
code should be removed if this is correct. The check that switchtime
is valid sometimes gives a false negative because the check isn't
correct until the after the first context switch after the system
has been up for >= 1 second.


37919 28-Jul-1998 bde

Micro-optimized and cleaned up the clearing of switchtime in idle().

Cleaned up the conditionals in the disgusting SMP ifdef in idle().


37902 28-Jul-1998 jlemon

Fix an off-by-one error when setting the iomap bits.
Change struct i386_*_iomap to use ints instead of shorts/chars.
(pointed out by bde long ago, prodded into action by msmith)


37889 27-Jul-1998 jlemon

Re-arrange the page layout used by vm86_bioscall so that we can
potentially re-use the stack page.

Cosmetic cleanup of the code to de-obfuscate it and make it easier
to follow. There should be no functional changes in this commit.


37785 20-Jul-1998 msmith

Add the 'cs' driver for Crystal Semiconductor CS89x0 devices. This
supports PnP and if_media. I've been running a slightly older version
here for several weeks now.
Submitted by: Maxim Bolotin <max@rsu.ru>


37757 19-Jul-1998 jkh

A slap on the wrist to Dag-Erling, who plainly did not test this before
committing it. There was a large syntax error at line 404 which could
not possibly have allowed compilation. :)


37743 18-Jul-1998 des

Allow dump devices with dkpart != SWAP_PART on devfs/slice
systems. This test should probably be removed altogether.

See CVS log entries for revisions 1.97 and 1.98.


37679 15-Jul-1998 bde

%n in a comment was a poor abbreviation for Immediate-byte-signed,
especially now that %n format has almost gone away.


37675 15-Jul-1998 bde

Don't cast pointers to longs in asms. Changed all remaining longs
to int32_t's and all unsigned longs to u_int32_t's. Fixed the one
printf format broken by this. The old math emulator now compiles
cleanly on i386's with 64-bit longs. It may even work, provided
suword() doesn't actually write a long.


37652 15-Jul-1998 bde

Cast the value returned by strtoul() to a uintptr_t before casting
it to a pointer. There's nothing better than strtoul() for reading
pointers from strings, but the range checking should be better.


37651 15-Jul-1998 bde

Cast virtual addresses that happen to be represented as u_longs to
uintptr_t before casting them to pointers. Explicit u_longs should
never be used to represent virtual addresses... (vm_offset_t is
normally right).


37564 11-Jul-1998 bde

Fixed printf format errors.

Use offsetof() instead of null pointer hacks.


37561 11-Jul-1998 bde

Fixed printf format errors.


37557 11-Jul-1998 phk

Don't disable pmap_setdevram() which isn't called, but which could be,
but instead disable pmap_setvidram() which is called, but probably
shouldn't be.

PR: 7227, 7240


37555 11-Jul-1998 bde

Fixed printf format errors.


37553 11-Jul-1998 bde

Don't pretend to support ix86's with 16-bit ints by using longs just to
ensure 32-bit variables. Doing so mainly bogotified some printf formats.

Fixed disorder in md_var.h.


37506 08-Jul-1998 bde

Use not-so-new printf formats %r and/or %z instead of %n and/or %+x.


37504 08-Jul-1998 bde

Fixed bogus type of valuep in struct db_variable. It was `int *' and
became `long *' for alpha, but should always have been `db_expr_t *'.
Fixed variable types to match.


37389 04-Jul-1998 julian

There is no such thing any more as "struct bdevsw".

There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries. The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).

rawread()/rawwrite() went away as part of this though it's not strictly
the same patch, just that it involves all the same lines in the drivers.

cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.

Reviewed by: Eivind Eklund and Mike Smith
Changes suggested by eivind.


37315 30-Jun-1998 phk

Add 3 sysctl variables for future use by ps)1_


37311 30-Jun-1998 phk

Add PSE36 to the bits we know by name.


37272 30-Jun-1998 jmg

convert some nfs tunables to options, these are:
NFS_MINATTRTIMO VREG attrib cache timeout in sec
NFS_MAXATTRTIMO
NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec
NFS_MAXDIRATTRTIMO
NFS_GATHERDELAY Default write gather delay (msec)
NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this
NFS_WDELAYHASHSIZ and with this
NFS_MUIDHASHSIZ Tune the size of nfsmount with this
NFS_NOSERVER (already documented in LINT)
NFS_DEBUG turn on NFS debugging

also, because NFS_ROOT is used by very different files, it has been
renamed to opt_nfsroot.h instead of the old opt_nfs.h....


37101 21-Jun-1998 bde

Removed unused includes.


37099 21-Jun-1998 bde

Removed unused includes.
Ifdefed conditionally used includes.


37094 21-Jun-1998 bde

Removed unused includes.


37093 21-Jun-1998 bde

Ifdefed a conditionally used include.


37086 21-Jun-1998 bde

Converted add_interrupt_randomness() to take a `void *' arg. Rewrote
mmioctl() to fix hundreds of style bugs and a few error handling bugs
(don't check for superuser privilege for inappropriate ioctls, don't
check the input arg for the output-only MEM_RETURNIRQ ioctl, and don't
return EPERM for null changes).


37034 17-Jun-1998 bde

Don't declare isa device structs or isa interrupt handlers in <sys/conf>,
and don't depend on them being declared there. This will cause lots of
warnings for a few minutes until config is updated. Interrupt handlers
should never have been configured by config, and the machine generated
declarations get in the way of changing the arg type from int to void *.


36810 09-Jun-1998 phk

Add a tc_ prefix to struct timecounter members.

Urged by: bde


36809 09-Jun-1998 bde

Pass lists of possible root devices and their names up to the
machine-independent code and try mounting the devices in the
lists instead of guessing alternative root devices in a machine-
dependent way.

autoconf.c:
Reject preposterous slice numbers instead of silently converting
them to COMPATIBILITY_SLICE.

Don't forget to force slice = COMPATIBILITY_SLICE in the floppy
device name.

Eliminated most magic numbers and magic device names in setroot().

Fixed dozens of style bugs.

vfs_conf.c:
Put the actual root device name instead of "root_device" in the
mount struct if the actual name is available. This is useful after
booting with -s. If it were set in all cases then it could be used
to do mount(8)'s ROOTSLICE_HUNT and fsck(8)'s hotroot guess better.


36766 08-Jun-1998 dfr

Fix more of my DDB breakage.


36760 08-Jun-1998 dfr

Make DDB work again after I broke it :-(.


36741 07-Jun-1998 phk

Add a member function more to the timecounters, this one is for use
with latch based PPS implementations. The client that uses it will
be committed after more testing.


36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


36719 07-Jun-1998 phk

Add a "this" style argument and a "void *private" so timecounters can
figure out which instance to wount with.


36605 03-Jun-1998 bde

Ifdefed the netisr support.

PR: 6760
Reviewed by: joerg


36596 03-Jun-1998 msmith

If vm86 services are available, use these to perform the APM BIOS
probe and intialisation. This will ultimately remove the grubby (but
functional) hack that copies a real-mode function into low memory
early in locore.s.


36441 28-May-1998 phk

Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.

Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING: Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by: bde


36303 22-May-1998 des

Use switch instead of if/else chain for 686 model identification.
Add precise model identification for 586-family CPUs.


36286 21-May-1998 des

Correctly identify the precise CPU model within the 686 family: instead
of just printing "Pentium Pro", check the model (cpu_id & 0xf0) and print
the appropriate information.


36275 21-May-1998 dyson

Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)


36214 19-May-1998 dufault

Remove option for SCHED_FIFO. With this optional, SCHED_FIFO
is the same as RTPRIO_IDLE when it falls through to the default.


36200 19-May-1998 peter

Missing parens caused cpu features not to be printed for cyrix >= M2/MX.
Althought the comments say the datasheet doesn't list the device ID
registers on the M2/MX, they seem to be there and quite alive.
(It's interesting to note that the M2/MX calls itself a 686 class cpu but
is missing a heck of a lot of features, including VME, PGE, PSE, etc)


36198 19-May-1998 phk

Change a data type internal to the timecounters, and remove the "delta"
function.

Reviewed, but not entirely approved by: bde


36179 19-May-1998 phk

Make the size of the msgbuf (dmesg) a "normal" option.


36169 19-May-1998 tegge

Back out part of revision 1.198 commit (clearing kernel stack pages).
By request from David Greenman <dg@root.com>


36168 19-May-1998 tegge

Disallow reading the current kernel stack. Only the user structure and
the current registers should be accessible.
Reviewed by: David Greenman <dg@root.com>


36138 17-May-1998 tegge

Change simple lock handling to not depend upon having a local apic
available. The per-cpu variable ss_tpr has been replaced by ss_eflags.
This reduced the number of interrupts sent to the wrong CPU, due to
the cpu having the global lock being inside a critical region.

Remove some unneeded manipulation of tpr register in mplock.s.

Adjust code in mplock.s to be aware of variables on the stack being
destroyed by MPgetlock if GRAB_LOPRIO is defined.


36135 17-May-1998 tegge

Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot(). If the system was idle when
an interrupt caused a panic, this won't work. Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.


36125 17-May-1998 tegge

For SMP, use prv_PPAGE1/prv_PMAP1 instead of PADDR1/PMAP1.
get_ptbase and pmap_pte_quick no longer generates IPIs.
This should reduce the number of IPIs during heavy paging.


36121 17-May-1998 tegge

Clear kernel stack pages before usage.
Correct panic message in pmap_zero_page (s/CMAP /CMAP2 /).


36120 17-May-1998 tegge

Be more restrictive about when to disregard MP config table values
(previous commit broke support for Dec Personal Workstation).


36119 17-May-1998 phk

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


36095 16-May-1998 kato

Some of newer PC-98 may cause "Windows Protection Fault" when booting
Windows 95 after rebooting FreeBSD without power off. In PC-98
system, reboot mode is set via I/O port 0x37 in cpu_reset(), and
accessing of this port is the reason of the problem. To avnoid the
fault, current status of reboot mode should be checked before
accessing the I/O port.


36094 16-May-1998 kato

Disable local APIC in UP kernel. Intel specification update describes
that local APIC should be disabled in UP system. However, some of old
BIOS does not disable local APIC, and virtual wire mode through local
APIC may cause int 15.


36051 15-May-1998 dyson

Disable the auto-Write Combining setup for the pmap code. This
worked on a couple of machines of mine, but appears to cause problems
on others.


35977 12-May-1998 dyson

Some temporary fixes to SMP to make it more scheduling and signal friendly.
This is a result of discussions on the mailing lists. Kudos to those who
have found the issue and created work-arounds. I have chosen Tor's fix
for now, before we can all work the issue more completely.
Submitted by: Tor Egge


35974 12-May-1998 bde

Backed out previous commit. It is invalid to call d_ioctl() on
possibly non-open devices, and we don't want to restrict dumping
to swap devices anwyay. It is especially invalid to call d_ioctl()
in non-process context for panics. d_psize() can be called on
non-open devices, at least on non-SLICED ones that support d_dump(),
and setdumpdev() has depended on this for a long time although it
is probably wrong, but even d_psize() can't be called in non-process
context - that's why dumpsys() depends on previously computed values
although these values may be stale. The historical restriction to
devices with dkpart(dev) == SWAP_PART should go away.


35940 11-May-1998 dyson

Change some tests from CPU_CLASS686 to CPU_686 as appropriate, and
also correct a serious ommision that would cause process faulures
due to forgetting an invltlb type operatino. This was just a
transcription problem.


35933 11-May-1998 dyson

Support better performance with P6 architectures and in SMP
mode. Unnecessary TLB flushes removed. More efficient
page zeroing on P6 (modify page only if non-zero.)


35932 11-May-1998 dyson

Attempt to set write combining mode for graphics devices.


35812 06-May-1998 julian

Add dump support to the DEVFS/slice code.
now we can actually catch our crashes :-)

Submitted by: Luoqi Chen <luoqi@chen.ml.org> (the man who's everywhere)


35767 06-May-1998 gibbs

Implement bus_dmamem_* functions and correct a few nits reported by Peter Wemm.


35496 28-Apr-1998 eivind

Translate T_PROTFLT to SIGSEGV instead of SIGBUS when running under
Linux emulation. This make Allegro Common Lisp 4.3 work under
FreeBSD!

Submitted by: Fred Gilham <gilham@csl.sri.com>
Commented on by: bde, dg, msmith, tg
Hoping he got everything right: eivind


35456 26-Apr-1998 dyson

Add the PAT cpuid feature.


35409 23-Apr-1998 julian

Oops forgot to include opt_bootp.h
Bootp works a lot better with devfs if this is present.


35393 22-Apr-1998 tegge

Mask the interrupt before setting the corresponding bit in ipending if
the interrupt is already active.
Don't use lock prefix for operations on ipending.
Always use lock prefix for operations on iactive.


35357 20-Apr-1998 julian

Remove more LFS left-overs


35356 20-Apr-1998 julian

Remove an LFS clause, now that it is in the history,
anyone who wants to see what was needed cto revive LFS can see it.


35323 20-Apr-1998 julian

Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)


35319 19-Apr-1998 julian

Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.


35318 19-Apr-1998 tegge

Disregard the values for polarity and trigger mode in the MP config table
for ISA interrupts where the IOAPIC interrupt pin number is the same as
the ISA interrupt number (i.e. partial backout of previous commit)


35296 19-Apr-1998 bde

Support compiling with gcc -pedantic (don't use a bogus, null cast).


35256 17-Apr-1998 des

Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.


35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


35209 15-Apr-1998 bde

Support compiling with `gcc -ansi'. Fix missing `volatile's in __asm()
statements while I'm here.


35203 15-Apr-1998 bde

Fixed breakage of fork accounting in previous commit. A fork benchmark
reported about 15 times as much sys time as real time. getmicroruntime()
is confusing name.


35087 06-Apr-1998 peter

Fix VM86 compiles. a #include "opt_vm86.h" was missing, and the my_tr
variable was needed in the non-SMP case.

Submitted by: Jonathan Lemon <jlemon@americantv.com>


35077 06-Apr-1998 peter

Use real types for the SMP pages being allocated rather than arrays of
ints. Remove some no longer needed casts. Initialize the per-cpu
global data area using the structs rather than knowing too much about
layout, alignment, etc.


35076 06-Apr-1998 peter

clean up #ifdefs, define the variables that have to be per-cpu on SMP
in globals.s only and use externs always.


35075 06-Apr-1998 peter

_curpcb is always defined in globals.s instead of here in #ifdefs


35074 06-Apr-1998 peter

Bogus casts


35073 06-Apr-1998 peter

Defunct, now part of globals.s


35072 06-Apr-1998 peter

Rather than filling this file up with SMP .sets, use those from
globals.s instead.
Initialize curproc in the same place for both UP and SMP.


35071 06-Apr-1998 peter

Generate #defines that the asm code can access for the per-cpu data
structures.


35070 06-Apr-1998 peter

generate .sets for variables used in asm and C that are stored in per-cpu
space under SMP.


35058 06-Apr-1998 phk

Make a kernel version of the timer* functions called timerval* to be
more consistent.

OK'ed by: bde


35035 05-Apr-1998 tegge

Remove some unneeded statements that enabled interrupts.


35029 04-Apr-1998 phk

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


34990 01-Apr-1998 tegge

Add two workarounds for broken MP tables:

- Attempt to handle PCI devices where the interrupt is
an ISA/EISA interrupt according to the mp table.

- Attempt to handle multiple IO APIC pins connected to
the same PCI or ISA/EISA interrupt source. Print a
warning if this happens, since performance is suboptimal.
This workaround is only used for PCI devices.

With these two workarounds, the -SMP kernel is capable of running on
my Asus P/I-P65UP5 motherboard when version 1.4 of the MP table is disabled.


34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


34925 28-Mar-1998 dufault

Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work. Changes:

Change all "posix4" to "p1003_1b". Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.


34924 28-Mar-1998 bde

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


34880 24-Mar-1998 jlemon

Fix a stupid bug where I was returning the wrong value. It's a wonder
this code even worked in the first place.


34879 24-Mar-1998 jlemon

Only read the cr4 register if the cpu_feature flag indicates the machine
has VME support.

Noticed by: kato


34864 24-Mar-1998 kato

PC-98 does not have a BIOS call to get memory size.


34840 23-Mar-1998 jlemon

Add the ability to make real-mode BIOS calls from the kernel. Currently,
everything is contained inside #ifdef VM86, so this option must be
present in the config file to use this functionality.

Thanks to Tor Egge, these changes should work on SMP machines. However,
it may not be throughly SMP-safe.

Currently, the only BIOS calls made are memory-sizing routines at bootup,
these replace reading the RTC values.


34643 17-Mar-1998 kato

Make EPSON_BOUNCEDMA a new-style option.


34636 17-Mar-1998 msmith

Add missing entry to list of major device names. This list should not
exist.


34624 16-Mar-1998 msmith

Spell 'compatibility' like everyone else.


34622 16-Mar-1998 msmith

Use dkmakeminor() rather than magic knowledge of the size and location of
the slice field. Handle incomprehensible slice numbers slightly better.
Suggested by: bde


34617 16-Mar-1998 phk

Be less draconian about the TSC if APM is configured, use it for
timecounting if APM-BIOS isn't found.
Be just as draconian about SMP as always, but explain it better.


34611 16-Mar-1998 dyson

Some VM improvements, including elimination of alot of Sig-11
problems. Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1) Create an object for kernel page table allocations. This
fixes a bogus allocation method previously used for such, by
grabbing pages from the kernel object, using bogus pindexes.
(This was a code cleanup, and perhaps a minor system stability
issue.)

pmap.c:
2) Pre-set the modify and accessed bits when prudent. This will
decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3) Rather than calculating the beginning virtual byte offset
multiple times, stick the offset into the buffer header, so
that the calculated offset can be reused. (Long long multiplies
are often expensive, and this is a probably unmeasurable performance
improvement, and code cleanup.)

vfs_bio.c:
4) Handle write recursion more intelligently (but not perfectly) so
that it is less likely to cause a system panic, and is also
much more robust.

vfs_bio.c:
5) getblk incorrectly wrote out blocks that are incorrectly sized.
The problem is fixed, and writes blocks out ONLY when B_DELWRI
is true.

vfs_bio.c:
6) Check that already constituted buffers have fully valid pages. If
not, then make sure that the B_CACHE bit is not set. (This was
a major source of Sig-11 type problems.)

vfs_bio.c:
7) Fix a potential system deadlock due to an incorrectly specified
sleep priority while waiting for a buffer write operation. The
change that I made opens the system up to serious problems, and
we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8) Make clustered reads work more correctly (and more completely)
when buffers are already constituted, but not fully valid.
(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9) Create a vtruncbuf function, which is used by filesystems that
can truncate files. The vinvalbuf forced a file sync type operation,
while vtruncbuf only invalidates the buffers past the new end of file,
and also invalidates the appropriate pages. (This was a system reliabiliy
and performance issue.)

10) Modify FFS to use vtruncbuf.

vm_object.c:
11) Make the object rundown mechanism for OBJT_VNODE type objects work
more correctly. Included in that fix, create pager entries for
the OBJT_DEAD pager type, so that paging requests that might slip
in during race conditions are properly handled. (This was a system
reliability issue.)

vm_page.c:
12) Make some of the page validation routines be a little less picky
about arguments passed to them. Also, support page invalidation
change the object generation count so that we handle generation
counts a little more robustly.

vm_pageout.c:
13) Further reduce pageout daemon activity when the system doesn't
need help from it. There should be no additional performance
decrease even when the pageout daemon is running. (This was
a significant performance issue.)

vnode_pager.c:
14) Teach the vnode pager to handle race conditions during vnode
deallocations.


34591 15-Mar-1998 msmith

Use dsname() to generate the disk region name for the "changing root
device to" message. Suppress this message if only the slice number
has changed.


34571 14-Mar-1998 tegge

On SMP systems, initially follow the MP spec with regard to which pin
on the IOAPIC being connected to the 8254 timer interrupt.
Verify that timer interrupts are delivered. If they aren't, attempt
a fallback to mixed mode (i.e. routing the timer interrupt via the 8259 PIC).


34569 14-Mar-1998 tegge

Don't use the standard macros for disabling/enabling interrupt.
On SMP systems, this left the mpintr_lock simplelock locked, causing
further calls to disable_intr to deadlock or panic.


34507 12-Mar-1998 bde

Fixed breakage of the !SMP case in vm_page_zero_idle() in the
previous commit. Opportunities to clean pages were often missed,
and leaving of the idle state was sometimes delayed until the next
interrupt (after any that occurred while cleaning).

Fixed an unstaticization, a syntax error and a style bug in the
previous commit.


34506 12-Mar-1998 bde

Don't depend on "implicit int" or bloat the data section in the
declaration of mem_devsw_installed.

Reduced include nesting.


34455 10-Mar-1998 jkh

Add bktr and pcm entries by popular request. Also use more canonical
reference to [ENTER] in the docs rather than [RETURN].


34440 09-Mar-1998 eivind

Turn "PMAP_SHPGPERPROC" into a new-style option, add it to LINT, and
document it there.


34392 09-Mar-1998 msmith

"Correct behaviour" involves being consistent with the canonical names of
other partitions. In this case, they appear in the first slice in the
WHOLE_DISK_SLICE case.


34390 09-Mar-1998 msmith

Merge from 2.2; behave correctly in the presence of a slice number that
doesn't directly correspond to the slice field in the device minor number.


34309 08-Mar-1998 msmith

Construct the minor number for the root device taking into account the
slice number passed in by the bootblocks. This means the kernel will
not use the compatability slice to obtain the root filesystem when
booting from a sliced disk.

Use the extraction macros from reboot.h rather than stating them in full
again.


34206 07-Mar-1998 dyson

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


34197 07-Mar-1998 tegge

The APs now reload the interrupt descriptor table pointer after
f00f_hack has run.

Use the global r_idt descriptor in f00f_hack when in SMP mode,
so the APs find the relocated interrupt descriptor table.

Submitted by: Partially from David A Adkins <adkin003@tc.umn.edu>


34058 05-Mar-1998 tegge

Remove special handling for resuming clock interrupt when using APIC_IO.
The `generic' vector stubs do the right thing.


34057 05-Mar-1998 tegge

Use t_idt instead of idt inside setidt() if f00f_hack() has relocated the IDT.
Submitted by: Bruce Evans <bde@zeta.org.au>


34028 04-Mar-1998 dufault

Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.


34021 03-Mar-1998 tegge

When entering the apic version of slow interrupt handler, level
interrupts are masked, and EOI is sent iff the corresponding ISR bit
is set in the local apic. If the CPU cannot obtain the interrupt
service lock (currently the global kernel lock) the interrupt is
forwarded to the CPU holding that lock.

Clock interrupts now have higher priority than other slow interrupts.


34020 03-Mar-1998 tegge

Forward the signal if the process runs on a different CPU. This reduces
the signal handling latency for cpu-bound processes that performs very
few system calls.

The IPI for forcing an additional software trap is no longer dependent upon
BETTER_CLOCK being defined.


34019 03-Mar-1998 tegge

Reduce timeout before assuming that forwarding of hardclock or softclock
failed. Don't complain on forwarding failure, unless
BETTER_CLOCK_DIAGNOSTIC is defined.


34018 03-Mar-1998 tegge

When sending an IPI to a specific target, disable interrupts inside the
critical region in order to avoid sending the IPI to the wrong target.


33983 02-Mar-1998 peter

Update the ELF image activator to use some of the exec resources rather
than rolling it's own. This means that it now uses the "safe"
exec_map_first_page() to get the ld.so headers rather than risking a panic
on a page fault failure (eg: NFS server goes down).
Since all the ELF tools go to a lot of trouble to make sure everything
lives in the first page for executables, this is a win. I have not seen
any ELF executable on any system where all the headers didn't fit in the
first page with lots of room to spare.
I have been running variations of this code for some time on my pure ELF
systems.


33936 01-Mar-1998 dyson

1) Use a more consistent page wait methodology.
2) Do not unnecessarily force page blocking when paging
pages out.
3) Further improve swap pager performance and correctness,
including fixing the paging in progress deadlock (except
in severe I/O error conditions.)
4) Enable vfs_ioopt=1 as a default.
5) Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."


33929 28-Feb-1998 phk

Prevent the TSC from being used on APM machines, we have no idea if
it runs at a constant frequency. This was less of an issue before,
because the TSC only interpolated in the HZ intervals, but now where
the timecounter is used all the way, this becomes much more visible.

Nit: Fix a printf which triggered the bde-filter.


33817 25-Feb-1998 dyson

Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs. The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by: Partially from Tor Egge.


33753 23-Feb-1998 bde

Quick fix for the i8254 timecounter often gaining 10 msec.


33727 21-Feb-1998 jkh

Add missing CLOCK_UNLOCK() before write_eflags().
Submitted by: dave adkins <adkin003@tc.umn.edu>


33690 20-Feb-1998 phk

Replace TOD clock code with more systematic approach.

Highlights:
* Simple model for underlying hardware.
* Hardware basis for timekeeping can be changed on the fly.
* Only one hardware clock responsible for TOD keeping.
* Provides a real nanotime() function.
* Time granularity: .232E-18 seconds.
* Frequency granularity: .238E-12 s/s
* Frequency adjustment is continuous in time.
* Less overhead for frequency adjustment.
* Improves xntpd performance.

Reviewed by: bde, bde, bde


33678 20-Feb-1998 bde

Don't depend on "implicit int".


33676 20-Feb-1998 bde

Removed unused #includes.


33320 13-Feb-1998 kato

Use RDMSR instruction instead of WRMSR.


33309 13-Feb-1998 bde

Update timer0_prescaler_count before calling hardclock() while timer0
is "acquired". This fixes a TSC biasing error of about 10 msec when
pcaudio is active.

Update `time' before calling hardclock() when timer0 is being released.
This is not known to be important.

Added some delays in writertc(). Efficiency is not critical here, unlike
in rtcin(), and we already use conservative delays there.

Don't touch the hardware when machdep.i8254_freq is being changed but
the maximum count wouldn't change. This fixes jitter of up to 10 msec
for most small adjustments to machdep.i8254_freq. When the maximum
count needs to change, the hardware should be adjusted more carefully.


33307 13-Feb-1998 bde

Ifdefed some npx code. npx should be optional again.


33306 13-Feb-1998 bde

Fixed missing privilege checking and off-by-1 bounds checking in
i386_set_ioperm(). Don't use a magic number for the bound.

Fixed missing bounds checking in i386_get_ioperm(). Don't use a
magic number for the bound elsewhere in this function.

Removed some bogus initializers.


33282 12-Feb-1998 bde

Fixed initialization of the 4MB page. Kernels larger than about 2.75MB
(from _btext to _end) crashed in pmap_bootstrap(). Smaller kernels
worked accidentally.


33221 10-Feb-1998 eivind

Fix warning after previous staticization.


33197 10-Feb-1998 msmith

If the kernel was built with USERCONFIG_BOOT, and the '-c' (RB_CONFIG) flag
was supplied to the kernel when booting, ignore any 'quit' commands in the
userconfig script.

This provides an escape mechanism whereby a broken userconfig script may
be worked around.


33181 09-Feb-1998 eivind

Staticize.


33179 09-Feb-1998 eivind

Remove warnings from f00f_hack.


33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


33109 05-Feb-1998 dyson

1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2) When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3) When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
processor errata, and to minimize redundant processor updating of page
tables.
4) Modify pmap_protect so that it can only remove permissions (as it
originally supported.) The additional capability is not needed.
5) Streamline read-only to read-write page mappings.
6) For pmap_copy_page, don't enable write mapping for source page.
7) Correct and clean-up pmap_incore.
8) Cluster initial kern_exec pagin.
9) Removal of some minor lint from kern_malloc.
10) Correct some ioopt code.
11) Remove some dead code from the MI swapout routine.
12) Correct vm_object_deallocate (to remove backing_object ref.)
13) Fix dead object handling, that had problems under heavy memory load.
14) Add minor vm_page_lookup improvements.
15) Some pages are not in objects, and make sure that the vm_page.c can
properly support such pages.
16) Add some more page deficit handling.
17) Some minor code readability improvements.


33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


33068 04-Feb-1998 eivind

Make FAILSAFE a new-style option.


33056 03-Feb-1998 bde

Converted DISABLE_PSE to a new-style option.

Fixed some formatting in options.i386.


33051 03-Feb-1998 bde

Ifdefed some SMP and VM86 code. Note that although VM86 is not a global
option, the ifdef on it in a header works because only the name of the
VM86 extension is hidden.


32992 01-Feb-1998 bde

Declare printf() instead of including <stdio.h>, so that this doesn't
depend on anything outside of "sys".

Removed an unused include.

Don't use `extern' in a function declaration.


32937 31-Jan-1998 dyson

Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY. It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed. The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use. There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages. I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise. For now, we are stuck
with the current morass.


32925 31-Jan-1998 eivind

Make POWERFAIL_NMI, PPS_SYNC and NATM new style options.

This also fixes a couple of defunct options; submitted by bde.


32917 31-Jan-1998 eivind

Include "opt_nfs.h"

Pointed out by: Eric L. Hernes <erich@lodgenet.com>


32889 30-Jan-1998 phk

Retire LFS.

If you want to play with it, you can find the final version of the
code in the repository the tag LFS_RETIREMENT.

If somebody makes LFS work again, adding it back is certainly
desireable, but as it is now nobody seems to care much about it,
and it has suffered considerable bitrot since its somewhat haphazard
integration.

R.I.P


32884 30-Jan-1998 dyson

Make the bounce buffer code a little more robust when space isn't
available. If there isn't bounce space available, the bounce code
is disabled. This will allow most large systems to run properly
when the bounce space is mistakenly allocated above 16MB.


32850 28-Jan-1998 phk

APM calls inittodr(0) which is stupid, but at least stop setting the
clock back to when Dennis had a good idea.


32820 27-Jan-1998 kato

Execute cpuid if BIOS disables cpuid instruction of Cyrix 6x86MX CPU.


32801 26-Jan-1998 julian

Add Simon Shapiro's DPT driver
this shouldn't break anything existing.
Userland utilities to follow.


32781 25-Jan-1998 kato

Undo previous commit. The cpuid symbol has been already used by SMP
stuff.

Pointed-out by: Manfred Antar <root@mantar.slip.netcom.com>


32771 25-Jan-1998 kato

Execute cpuid if BIOS disables cpuid instruction of Cyrix 6x86MX CPU,
and store its result into cpu_id and cpu_feature variables.

Tested by: Simon Coggins <chaos@ultra.net.au>


32765 25-Jan-1998 kato

Even though BIOS writer's guide recommends cpuid instruction of Cyrix
6x86MX CPU is enabled (BIOS should not disable it), some BIOS disables
it via CCR4. In this case, cpu variable becomes CPU_486 and
identblue() is called. Because Cyrix 6x86MX has MSR and doesn't have
MSR1002, wrmsr instruction generates general protection fault.

Tested by: Simon Coggins <chaos@ultra.net.au>


32726 24-Jan-1998 eivind

Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.

This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.


32724 24-Jan-1998 dyson

Add better support for larger I/O clusters, including larger physical
I/O. The support is not mature yet, and some of the underlying implementation
needs help. However, support does exist for IDE devices now.


32702 22-Jan-1998 dyson

VM level code cleanups.

1) Start using TSM.
Struct procs continue to point to upages structure, after being freed.
Struct vmspace continues to point to pte object and kva space for kstack.
u_map is now superfluous.
2) vm_map's don't need to be reference counted. They always exist either
in the kernel or in a vmspace. The vmspaces are managed by reference
counts.
3) Remove the "wired" vm_map nonsense.
4) No need to keep a cache of kernel stack kva's.
5) Get rid of strange looking ++var, and change to var++.
6) Change more data structures to use our "zone" allocator. Added
struct proc, struct vmspace and struct vnode. This saves a significant
amount of kva space and physical memory. Additionally, this enables
TSM for the zone managed memory.
7) Keep ioopt disabled for now.
8) Remove the now bogus "single use" map concept.
9) Use generation counts or id's for data structures residing in TSM, where
it allows us to avoid unneeded restart overhead during traversals, where
blocking might occur.
10) Account better for memory deficits, so the pageout daemon will be able
to make enough memory available (experimental.)
11) Fix some vnode locking problems. (From Tor, I think.)
12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
(experimental.)
13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c
code. Use generation counts, get rid of unneded collpase operations,
and clean up the cluster code.
14) Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step. Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)


32684 21-Jan-1998 jkh

Add entry for SMC 9432TX cards (tx driver).


32625 19-Jan-1998 tegge

Nondestructive attempts to get simple locks when SL_DEBUG is defined.


32617 19-Jan-1998 tegge

The removal of a page from the free queue in vm_page_zero_idle was
imcomplete. Also set m->queue, in order to prevent vm_page_select_free
from selecting the page being zeroed.


32585 17-Jan-1998 dyson

Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap. Fix a problem with faulting in pages. Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.


32518 15-Jan-1998 gibbs

Addition of splsoftvm and a VM SWI to handle bus dma related callbacks.
This SWI may be useful for other, defered, VM tasks.


32516 15-Jan-1998 gibbs

Implementation of Bus DMA for FreeBSD-x86. This is sufficient to do
page level bounce buffering, but there are still some issues left to
address.


32464 12-Jan-1998 dyson

Adjust upwards the size of exec map in order to take into account the
additional PAGE_SIZE needed for exec operatino.


32358 09-Jan-1998 eivind

Make the BOOTP family new-style options (in opt_bootp.h)


32203 03-Jan-1998 obrien

AMD calls the PR166 and PR200, models 2 and 3 respectively.


32200 03-Jan-1998 obrien

Update AMD URL for CPU recognition docs.


32199 03-Jan-1998 kato

Fix typo. Option `CPU_SUSP_HLT' didn't work on Cyrix 486DX box.

Submitted by: nyan@wyvern.cc.kogakuin.ac.jp (Takahashi Yoshihiro)


32164 01-Jan-1998 msmith

Don't try to call into BIOS32 handlers outside the normal ROM
address range. They may have been trashed earlier in the boot
process, or the directory header may simply be bogus.

PR: 5140
Submitted by: Joel Faedi <Joel.Faedi@esial.u-nancy.fr>
Brought-to-attention-by: Derek Inksetter <derek@saidev.com>, bde


32126 30-Dec-1997 phk

Fix a race condition I introduced.

Reviewed by: bde (with only minor complaints :-)


32106 29-Dec-1997 phk

Fix a include gottcha in the SMP case.

Submitted by: Simon Shapiro <Shimon@Simon-Shapiro.ORG>


32054 28-Dec-1997 phk

More cleanup relating to our use of the TSC.
Look in the cpu_feature (CPUID output) to see if we have it.


32052 28-Dec-1997 phk

wash, sort and put in order various nits from the i586_ctr -> tsc
commit.

Pointed out by: bde


32042 28-Dec-1997 bde

Removed stale comment about printf's deficiencies and rewrote the code
that worked around them for the CLI devtab listing.

Print the `conflicts' flag in the CLI devtab listing.


32012 27-Dec-1997 peter

Back out previous commit, the so-called "unused code" was most definately
used, and caused a reference to an uninitialised variable (state).
I think I've fixed it now, but since nothing in the tree seems to use it,
I'm not sure.


32010 27-Dec-1997 peter

#include "opt_user_ldt.h" so that the #ifdef USER_LDT checks can work, as
commented about at length in the PR audit trail.

PR: 2412


32005 26-Dec-1997 phk

Rename "i586_ctr" to "tsc" (both upper and lower case instances).
Fix a couple of printfs too.

Warning: This changes the names of a couple of kernel options!


32003 26-Dec-1997 phk

Reorder to a more conventional if/then/else/endif structure.


31934 22-Dec-1997 dyson

Correct my previous fix for the UPAGES problem.


31930 22-Dec-1997 dyson

Hopefully fix the problem with the TLB not being updated correctly.
Problem tracked down by bde@freebsd.org, but this is an attempted
efficient fix.


31723 15-Dec-1997 tegge

Add support for low resolution SMP kernel profiling.

- A nonprofiling version of s_lock (called s_lock_np) is used
by mcount.

- When profiling is active, more registers are clobbered in
seemingly simple assembly routines. This means that some
callers needed to save/restore extra registers.

- The stack pointer must have space for a 'fake' return address
in idle, to avoid stack underflow.


31720 15-Dec-1997 tegge

Don't forward hardclock or statclock to stopped cpus. Disable forwarding
when a panic has occured.


31709 14-Dec-1997 dyson

After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations. The number
of UP TLB flushes is decreased much more than significantly with these
changes. Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.


31689 12-Dec-1997 tegge

Add needed #include.

Problem found by: Bruce Evans <bde@zeta.org.au>


31639 08-Dec-1997 fsmp

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


31638 08-Dec-1997 fsmp

The improvements to clock statistics by Tor Egge
Wrappered and enabled by the define BETTER_CLOCK (on by default in smpyests.h)

apic_vector.s also contains a small change I (smp) made to eliminate
the double level INT problem. It seems stable, but I haven't the tools
in place to prove it fixes the problem.

Reviewed by: smp@csn.net
Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>


31629 08-Dec-1997 fsmp

Removed the annoying "apic_ipi might be stuck" message.
Added commentary about the real problem and what needs to be done.


31574 06-Dec-1997 ache

Add AWE32 description to visualconfig


31564 06-Dec-1997 sef

Changes to allow event-based process monitoring and control.


31544 04-Dec-1997 jmg

document and make the NO_F00F_HACK a proper option...

also, sort some option includes while I'm here..

Forgotten by: sef


31535 04-Dec-1997 jkh

After consultation with David, change
#ifndef NO_F00F_HACK
to
#if defined(I586_CPU) && !defined(NO_F00F_HACK)


31515 03-Dec-1997 sef

Make has_f00f_bug extern, and get rid of some unused code in the f00f
code.

Submitted by: Mikael Karpberg & Cy Schubert


31507 03-Dec-1997 sef

Work around for the Intel Pentium F00F bug; this is Intel's recommended
workaround. Note that this currently eats up two pages extra in the system;
this could be alleviated by aligning idt correctly, and then only dealing with
that (as opposed to the current method of allocated two pages and copying the
IDT table to that, and then setting that to be the IDT table).


31424 26-Nov-1997 joerg

Removed an unused line of code, that caused an ``maybe used uninitialized''
warning.

Found by: Simon Shapiro


31415 25-Nov-1997 markm

From the author:

Here are the remanding changes required to support the Ensoniq
Soundscape using FreeBSD 3.0-current.

Notes:

1) ad1848_init already has code to detect if DMA_DUPLEX should
be set so it is not necessary (and is in fact a mistake) to
hard code setting it. Not all soundcards (i.e. the current
sscape driver) are capable of using DMA_DUPLEX.

2) The other changes are hopefully self explanatory. Feel free
to let me know if you need additional information.

Submitted by: john@feith.com (John Wehle)


31397 24-Nov-1997 bde

Fixed multiple definitions of boothowto.


31389 24-Nov-1997 bde

Fixed some #include messes.

Hid the check of the user %cs in syscall() under `#ifdef DIAGNOSTIC'.


31338 21-Nov-1997 jlemon

Correct CPU_CYRIX_NO_LOCK fix.
PR: 5121
Pointed out by: Matthew Hunt


31337 21-Nov-1997 bde

Fixed setting of `safepri'. It should be SWI_AST_MASK most of the
time, but was left at 0. This caused the "can't happen" case in
splz_swi to happen for panics when tsleep() calls splx(safepri)
and there is a SWI_AST pending. This was harmless because the
the error handling happens to be right. Debugging this was tricky
because debugger traps force SWI_AST_MASK on in `cpl'.


31336 21-Nov-1997 bde

Moved splhigh()/spl0() calls from isa_configure() to configure() so that
there is a natural place to initialize `safepri' in a future commit.
Spinoffs:
- spl0() gets called in the unlikely event that isa is not configured.
- configure() has better control over enabling interrupts.
- it is now less unclear that interrupts aren't actually enabled early.
Rev.1.48 of autoconf.c seems to have done the opposite of what was
intended - moving the isa_configure() call delayed the spl0() side
effect.
Added some comments about the bogons. Removed the splhigh() call since
it is a no-op.


31328 21-Nov-1997 peter

Previous commit refers to SWAP_PART, which is only defined if the include
file that it's in is #included...


31322 20-Nov-1997 bde

Removed a duplicate (sloppy common-style) definition.

Fixed some style bugs.


31321 20-Nov-1997 bde

Moved some extern declarations to header files (unused ones to /dev/null).


31319 20-Nov-1997 bde

Avoid passing some more `retval's.


31318 20-Nov-1997 bde

Fixed wrong limits for the kernel text in db_numargs(). The
interval [VM_MIN_KERNEL_ADDRESS, etext] was used instead of
[btext, etext). Added a comment about this being completely
wrong for LKMs. This only affects interpreting the instructions
after the return to attempt decide the number of args. The
attempt usually fails anyway.


31317 20-Nov-1997 bde

Fixed write enabling of the kernel text section. The overlap
checking was mostly wrong at the boundaries. For the lower limit,
VM_MIN_KERNEL_ADDRESS was used instead of btext and there was an
off-by-(`size' - 1) error. For the upper limit, &etext was used
instead of etext and there was an off-by-1 error. The bugs were
harmless because `size' is not too large and some memory is mapped
just beyond the ends. We still depend on the former to avoid
having to handle the case where the memory range covers the whole
text section, and on the latter to prevent problems when we map
just beyond an end to allow writing an address range that overlaps
the end.

Fixed placement of a nearby comment.


31316 20-Nov-1997 bde

Don't allow setting the dump device to any partition except the
one traditionally reserved for swap devices. The restrictions
should now be the same as the ones for dumpsys(). The restriction
on the partition should be removed someday, and dumpsys() shouldn't
repeat all the checks.


31253 18-Nov-1997 bde

Removed #unused includes.

Added a used #include (don't depend on yet to be fixed namespace pollution).


31249 18-Nov-1997 bde

Don't #include <machine/smp.h> even in the SMP case. Fixed the one
place that depended on it. The "bazillion warnings" mentioned in the
log for rev.1.45 apparently aren't a problem any more. It is hard
to be sure because the SIMPLELOCK_DEBUG option turns off (and breaks)
things in the SMP case.


31030 07-Nov-1997 tegge

Use UPAGES when setting up private pages for SMP (which includes idle stack).


31017 07-Nov-1997 phk

Rename some local variables to avoid shadowing other local variables.

Found by: -Wshadow


31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


30994 06-Nov-1997 phk

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


30976 06-Nov-1997 kato

Identify MediaGX CPU correctly. Old MeidaGX CPU and GXm CPU are
distinguished. CPU-classes of MeidaGX CPU and GXm CPU are 486-class
and 586-class, respectively.

PR: 4936


30964 05-Nov-1997 kato

Fix rare 6x86 CPU whose DIR0 = 0x20 - 0x28 case.


30918 04-Nov-1997 kato

Use same address for USERCONFIG_BOOT on PC-98 as IBM-PC.

Submitted by: H. Nokubi <h-nokubi@nmit.tmg.nec.co.jp>
Forgotten by: kato


30898 02-Nov-1997 jkh

Add ide_pci device.


30813 28-Oct-1997 bde

Removed unused #includes.


30805 28-Oct-1997 bde

Don't include <machine/cputypes.h> or declare cputype/class interfaces
in <machine/cpu.h>. Moved the declarations to <machine/cputypes.h>.
Fixed style bugs in the moved code. Fixed everything that depended on
the nested include. Don't include <machine/cpu.h> (in the changed files)
unless something in it is used directly.


30789 27-Oct-1997 bde

Moved declaration of etext from <machine/md_var.h> to <machine/cpu.h>
and fixed everything that dependended on it being declared in the old
place. It is used in "machine-independent" code in subr_prof.c.

Moved declaration of btext from subr_prof.c to <machine/cpu.h>. It
is machine-dependent.


30788 27-Oct-1997 bde

Oops, <machine/psl.h> is used unconditionally in -current.


30786 27-Oct-1997 bde

Cleaned up #includes.

Ifdefed conditionally used includes.

Finished changing indentation of per-statement comments to 40.


30754 27-Oct-1997 dyson

Check to see if the pv_limits are initialized before checking.


30732 26-Oct-1997 dyson

Change the initial amount of memory allocated for pv_entries to be proportional
to the amount of system memory. Also, clean-up some of the new pv_entry
mgmt code.


30720 26-Oct-1997 nate

- Do a bunch of gratuitous changes intended to make the code easier to
follow.
* Rename/reorder all of the pccard structures, change many of the member
names to be descriptive, and follow more closely other 'bus' drivers
naming schemes.
* Rename a bunch of parameter and local variable names to be more
consistant in the code.
* Renamed the PCCARD 'crd' device to be the 'card' device
* KNF and make the code consistant where it was obvious.
* ifdef'd out some unused code


30702 25-Oct-1997 dyson

Somehow an error crept in during the previous commit.


30701 25-Oct-1997 dyson

Support garbage collecting the pmap pv entries. The management doesn't
happen until the system would have nearly failed anyway, so no signficant
overhead is added. This helps large systems with lots of processes.


30700 24-Oct-1997 dyson

Decrease the initial allocation for the zone allocations.


30689 24-Oct-1997 phk

Remove the stuff we do not use from global scope. Export them again as
needed.


30623 21-Oct-1997 msmith

Reference the DMI table inside the SMBIOS table correctly, not using a variable
that won't be initialised until a later test.
Submitted by: bde via -Wunused


30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


30343 12-Oct-1997 peter

Try and fix some style problems


30309 11-Oct-1997 phk

Distribute and statizice a lot of the malloc M_* types.

Substantial input from: bde


30275 10-Oct-1997 peter

Compensate for pcb.h tweaks.

(Bruce pointed out the nesting)


30273 10-Oct-1997 peter

GPROC0_SEL isn't used in any *.s files it seems..


30265 10-Oct-1997 peter

Convert the VM86 option from a global option to an option only depended
on by the files that use it. Changing the VM86 option now only causes
a recompile of a dozen files or so rather than the entire kernel.


30162 06-Oct-1997 kato

Added two Cyrix 6x86/6x86MX options.

- CPU_CYRIX_NO_LOCK enables weak locking. If this option is not set and
FAILESAFE is defined, NO_LOCK bit of CCR1 is cleared.
- CPU_WT_ALLOC enables write-through allocation.


30136 06-Oct-1997 dyson

It is possible that MB's with really broken bios's not set up more of
the mtrr registers. This just fills in more of the registers.


30112 05-Oct-1997 dyson

Make sure that the memory type registers are the same for each CPU
in a P6 SMP system. Some MB bios'es don't set the registers up correctly
for the AP's. Additionally, set the memory between 0xa0000 and 0xbffff
as write combining.


30082 03-Oct-1997 kato

Call identifycyrix() when 6x86MX CPU is found. The identifycyrix()
function sets cyrix_did. Old code could not display correct variable.

Reviewed by: Hideyuki Suzuki <hideyuki@sat.t.u-tokyo.ac.jp>


29945 28-Sep-1997 gibbs

Fix a serious bug I introduced while adding in support for CAM interrupts.
It seems I didn't count my 0's properly when adding the new masks into
icu_vector.s pushing SWI_AST_MASK off the end of the array and screwing
up the indexing for SWI_CLOCK_MASK.

Fix the bug icu_vector.s and also reformat the code in both icu_vector.s and
apic_vector.s so that it will be much harder to make the same mistake in
the future.

Submitted by: Bruce Evans <bde@zeta.org.au>


29851 25-Sep-1997 dg

Fix a bug where the speculative memory probe wouldn't occur on systems that
report slightly more than 64MB of total memory. This can happen due to the
total being the sum of both base and extended memory.
Submitted by: Alan Cox <alc@cs.rice.edu>


29789 24-Sep-1997 phk

Look for another couple of magic bios things..


29742 23-Sep-1997 bde

Moved setconf() call after root configuration again. This fixes a
null pointer panic in the "generic" version of setconf().

Removed the resulting near-duplicate printf.


29702 22-Sep-1997 peter

Turn on CR4_VME on the AP's the same as the BSP. Note that we do not
[yet] probe the AP's for their cpuid/capabilities etc, so this is a fudge
at best.

Problem noted by: Jonathan Lemon <jlemon@americantv.com>


29677 21-Sep-1997 gibbs

aha1542.c aic6360.c cy.c fd.c ft.c
if_ie.c if_wl.c if_zp.c isa.c isa_device.h
labpc.c mcd.c ncr5380.c scd.c seagate.c si.c
sio.c tw.c ultra14f.c wcd.c wd.c:

Update for changes in the callout interface.

apic_vector.s icu_vector.s ipl.s ipl_funcs.c:

Add CAM software/hardware interrupt support.


29675 21-Sep-1997 gibbs

autoconf.c:
Add cpu_rootconf and cpu_dumpconf so that configuring these
two devices can be better controlled by the MI configuration
code.

machdep.c:
MD initialization code for the new callout interface.

trap.c:
Add support for printing out whether cam interrupts are masked
during a panic.


29663 21-Sep-1997 peter

Implement the parts needed for VM86 under SMP.


29655 21-Sep-1997 dyson

Add support for more than 1 page of idle process stack on SMP systems.


29639 20-Sep-1997 phk

For AMD chips, pick up the long description from the chip if
possible. (This is not really a typographical improvement in the
case of the K6 it seems, but AMD appearantly want it too look
that way). Also if bootverbose, dump some more info about the
chip.


29566 18-Sep-1997 jmg

reduce the number of warnings this file emits during compiling


29368 14-Sep-1997 peter

Update select -> poll in drivers.


29330 13-Sep-1997 joerg

Revert the logic behind my last change, and use a function called
`is_physical_memory()' now for the decision whether to dump some
region of memory or not.

Suggested by: davidg


29280 10-Sep-1997 joerg

Do not ever try to coredump adapter memory regions.

PR: 4486
Submitted by: tegge@idi.ntnu.no (Tor Egge)

Implement a function is_adapter_memory() in order to determine what
should nto be dumped at all. Currently, only populated with the ``ISA
memory hole''. Adapter regions of other busses should be added.


29243 09-Sep-1997 jmg

add neccessary calls to autoconf for pnp,

also teach userconfig about the new pnp commands, for usage see pnp(4)


29213 07-Sep-1997 fsmp

General cleanup of the lock pushdown code. They are grouped and enabled
from machine/smptests.h:

#define PUSHDOWN_LEVEL_1
#define PUSHDOWN_LEVEL_2
#define PUSHDOWN_LEVEL_3
#define PUSHDOWN_LEVEL_4_NOT


29202 07-Sep-1997 bde

Removed more vestiges of config-time swap configuration.


29174 07-Sep-1997 dyson

Fix an intermittent problem during SMP code operation. Not all of the
idle page table directories for all of the processors was being updated
during kernel grow operations. The problem appears to be gone now.


29151 05-Sep-1997 peter

Argh, what was I thinking?? Don't (yet) halt the CPU in the idle loop
while waiting for an interrupt (rather than spinning on the runqueue status
bits), since the other cpu can put stuff in there and the sleeping cpu may
not get an interrupt for a while. When we have a reschedule IPI, this can
come back.

Pointed out by: fsmp


29128 05-Sep-1997 peter

Cosmetic adjustment for the trap/double fault/panic cpu id listing.
It now prints the apic id in hex rather than decimal.


29110 04-Sep-1997 dg

Cosmetic change to last commit: speculative_mtest -> speculative_mprobe.


29109 04-Sep-1997 dg

Changed the memory sizing code so that if the following conditions
are met:

1) The BIOS indicates that there is exactly 64MB of RAM, and
2) The memory size isn't specified with the MAXMEM option or
the npx0 msize hack,

...then do a speculative memory probe beyond the 64MB's until the
first bad page is encountered. This is an admitted hack, but should
nonetheless deal with detecting the correct amount of memory in nearly
all of the modern systems with >64MB of RAM.
Also made a change that will cause the list of detected memory chunks
to be printed if bootverbose is set.


29096 04-Sep-1997 jkh

Correct ancient spelling bogon.


29041 02-Sep-1997 bde

Removed unused #includes.


29000 01-Sep-1997 fsmp

General cleanup of the sub-system locking macros.
Eliminated the RECURSIVE_MPINTRLOCK.
clock.c and microtime use clock_lock.
sio.c and cy.c use com_lock.

Suggestions by: Bruce Evans <bde@zeta.org.au>


28999 01-Sep-1997 fsmp

Cleanup.


28984 01-Sep-1997 bde

Move closer to supporting VM86 under SMP.

LINT now compiles but doesn't link. Other link-time breakage for LINT
is now visible (SMP is incompatible with SIMPLELOCK_DEBUG).
Submitted by: jlemon


28981 01-Sep-1997 bde

Removed unused #includes.


28976 31-Aug-1997 bde

Fixed options SHOW_BUSYBUFS and PANIC_REBOOT_WAIT_TIME which were broken
by incomplete cutting and pasting from machdep.c to kern_shutdown.c.

PR: 3953


28951 31-Aug-1997 fsmp

Debug version of simple_lock. This will store the CPU id of the
holding CPU along with the lock. When a CPU fails to get the lock
it compares its own id to the holder id. If they are the same it
panic()s, as simple locks are binary, and this would cause a deadlock.

Controlled by smptests.h: SL_DEBUG, ON by default.

Some minor cleanup.


28943 30-Aug-1997 fsmp

Added clock_lock protection to microtime.


28921 30-Aug-1997 fsmp

Another round of lock pushdown.
Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel.
UP kernel expects that this is enough to guarantee exclusive access to
regions of code bracketed by these 2 functions.
Add a simplelock to bracket clock accesses in clock.c: clock_lock.

Help from: Bruce Evans <bde@zeta.org.au>


28910 29-Aug-1997 fsmp

Support for the new FAST_HI algorithm, enabled.

Preliminary support for the INTR_SIMPLELOCK algorithm, disabled.
Note that this code is NOT ready.


28909 29-Aug-1997 fsmp

Support for the new FAST_HI algorithm.
Improved interrupt handling, fewer silo overflows.

With help from: dave adkins <adkin003@gold.tc.umn.edu>


28900 29-Aug-1997 kato

Use correct member of scsi_cint for scbus. Add a space between lun
and flags.

Reviewed by: kato
Submitted by: Chiharu Shibata <chi@rd.njk.co.jp>


28872 28-Aug-1997 jlemon

Remove the vm86 support as an LKM, and link it directly into the kernel
if 'options "VM86"' is in the config file. The LKM was really for
development, and has probably outlived its usefulness.


28809 26-Aug-1997 peter

Correct some things I forgot about until it was too late with smp_active.
smp_active = 1 used to indicate that the system had frozen previously
started AP's, while smp_active = 0 was "AP's not yet started". I have split
this into smp_started (which is set when the AP's come online), and
smp_active is left for turning on/off AP scheduling.


28808 26-Aug-1997 peter

Clean up the SMP AP bootstrap and eliminate the wretched idle procs.

- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)

More cleanup to follow, this is more of a checkpoint than a
'finished' thing.


28747 25-Aug-1997 bde

Finished (?) support for DISABLE_PSE option. 2-3MB of kernel vm was sometimes
wasted.

Fixed type mismatches for functions with vm_prot_t's as args. vm_prot_t
is u_char, so the prototypes should have used promoteof(u_char) to match
the old-style function definitions. They use just vm_prot_t. This depends
on gcc features to work. I fixed the definitions since this is easiest.
The correct fix may be to change vm_prot_t to u_int, to optimize for time
instead of space.

Removed a stale comment.


28745 25-Aug-1997 bde

Removed a tautological comment.


28743 25-Aug-1997 bde

Removed a bogus comment.


28736 25-Aug-1997 fsmp

Eliminate the blocking of INTs while spinning for the safe simplelock.


28717 25-Aug-1997 peter

s/.align/.p2align/ so that we get the same results when building elf
objects (the tools are a bit better)


28669 24-Aug-1997 fsmp

A clean fix for the spl "deadlock before smp_active" problem.

Added a new variable, 'bsp_apic_ready', which is set as soon as the bootstrap
CPU has initialized its local APIC. Conditionalize the GENSPLR functions
to call ss_lock ONLY after bsp_apic_ready is TRUE; This should prevent
any problems with races between the time the 1st AP becomes ready and the
time smp_active is set.


28641 24-Aug-1997 fsmp

The last of the encapsolation of cpl/spl/ipending things into a critical
region protected by the simplelock 'cpl_lock'.

Notes:

- this code is currently controlled on a section by section basis with
defines in machine/param.h. All sections are currently enabled.

- this code is not as clean as I would like, but that can wait till later.

- the "giant lock" still surrounds most instances of this "cpl region".
I still have to do the code that arbitrates setting cpl between the
top and bottom halves of the kernel.

- the possibility of deadlock exists, I am committing the code at this
point so as to exercise it and detect any such cases B4 the "giant lock"
is removed.


28551 21-Aug-1997 bde

#include <machine/limits.h> explicitly in the few places that it is required.


28496 21-Aug-1997 charnier

Revert my previous commit about using CS_SECURE macro.
Requested by: Bruce.


28487 21-Aug-1997 fsmp

Made PEND_INTS default.
Made NEW_STRATEGY default.
Removed misc. old cruft.

Centralized simple locks into mp_machdep.c
Centralized simple lock macros into param.h

More cleanup in the direction of making splxx()/cpl MP-safe.


28442 20-Aug-1997 fsmp

Preperation for moving cpl into critical region access.
Several new fine-grained locks.
New FAST_INTR() methods:
- separate simplelock for FAST_INTR, no more giant lock.
- FAST_INTR()s no longer checks ipending on way out of ISR.
sio made MP-safe (I hope).


28359 18-Aug-1997 charnier

Use CS_SECURE macro.
Reviewed by: John Dyson


28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


28262 16-Aug-1997 msmith

Add the 'ppc' ISA-bus parallel-port chipset driver.


28124 12-Aug-1997 dyson

Back out a part of the disk scheduling "improvements" :-(. Let me know
how the system works now!!!


28074 11-Aug-1997 jkh

Make the CLI mode message a little less intimidating.
Suggested by: Richard Underwood <ru@atticus.com>


28044 10-Aug-1997 fsmp

Oops, fix breakage to UP kernel.


28043 10-Aug-1997 fsmp

Added trap specific lock calls: get_fpu_lock, etc.
All resolve to the GIANT_LOCK at this time, it is purely a logical partitioning.


28041 10-Aug-1997 fsmp

Cheap fix for kern/4255.
If the problem is seen this fix suggests a compile-time work-around then panics.


28027 09-Aug-1997 fsmp

Some fixes towards making "default configs" work again.
Still not fixed, no idea why.

Debug help from: "Thomas D. Dean" <tomdean@ix.netcom.com>


28026 09-Aug-1997 fsmp

Minor conditionalization of XXX_MPLOCK on PEND_INTS.


28023 09-Aug-1997 fsmp

Added 'lock' instruction before 3 places that update ipending.
This may or may not fix the "high IO freezes SMP kernel" problem.


28013 09-Aug-1997 dyson

Modify the scheduling policy to take into account disk I/O waits
as chargeable CPU usage. This should mitigate the problem of processes
doing disk I/O hogging the CPU. Various users have reported the
problem, and test code shows that the problem should now be gone.


27998 09-Aug-1997 dyson

Add the code that represents most of the interface between the VM86
pseudo-machine and the rest of the FreeBSD kernel.
Submitted by: Jonathan Lemon <jlemon@americantv.com>


27993 09-Aug-1997 dyson

VM86 kernel support.
Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>,
Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>,
and probably alot of others.
Submitted by: Jnathan Lemon <jlemon@americantv.com>


27983 08-Aug-1997 julian

Make a function static to quieten gcc


27982 08-Aug-1997 julian

Clean up the console muting functionality.
this has been in production now for a long time with no known effects.


27950 07-Aug-1997 dyson

Fix the DDB breakpoint code when using the 4MB page support.


27947 07-Aug-1997 dyson

More vm_zone cleanup. The sysctl now accounts for items better, and
counts the number of allocations.


27940 06-Aug-1997 peter

printf does not understand %hd in the kernel


27923 05-Aug-1997 dyson

Another attempt at cleaning up the new memory allocator.


27922 05-Aug-1997 dyson

Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use
the new zone code in pmap.c so that we can get rid of the ugly ad-hoc
allocations in pmap.c.


27906 05-Aug-1997 msmith

memcmp -> bmcp
Submitted by: smp, bde


27904 05-Aug-1997 dyson

Modify pmap to use our new memory allocator.


27903 05-Aug-1997 dyson

Slightly reorder some operations so that the main processor gets global
mappings early on.


27902 05-Aug-1997 dyson

Remove the PMAP_PVLIST conditionals in pmap.*, and another unneeded define.


27899 05-Aug-1997 dyson

Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator. This new allocator will also be
used for machine dependent pmap PV entries.


27896 04-Aug-1997 fsmp

pushed down "volatility" of simplelock to actual int inside the struct.

Submitted by: bde@zeta.org.a


27893 04-Aug-1997 fsmp

Eliminate frequent silo overflows by restoring the TEST_LOPRIO code.
This code was eliminated when the PEND_INTS algorithm was added. But it was
discovered that PEND_INTS only worsen latency for FAST_INTR() routines,
which can't be marked pending.

Noticed & debugged by: dave adkins <adkin003@gold.tc.umn.edu>


27872 04-Aug-1997 msmith

Correctly checksum the DMI signature structure. Format the BSD revision
number therein.

Report from: dave adkins <adkin003@gold.tc.umn.edu>


27823 01-Aug-1997 msmith

Support functions for working with x86 PC-architecture BIOS.
Initially functionality is confined to 32-bit BIOS functions, however
it is envisioned that BIOS support may be enlisted for other
activities in the future.


27781 31-Jul-1997 fsmp

Moved the free case to top of MPgetlock and MPtrylock
Added some lock hit profiling.


27780 31-Jul-1997 fsmp

Converted the TEST_LOPRIO code to default.
Created mplock functions that save/restore NO registers.
Minor cleanup.


27779 31-Jul-1997 fsmp

Converted the TEST_LOPRIO code to default.
removed PEND_INTS 1st try
direct call to MPtrylock


27728 28-Jul-1997 fsmp

Modified the PEND_INTS algorithm to fix the ISA INT loss problem.

Noticed by: dave adkins <adkin003@gold.tc.umn.edu> and others.


27697 26-Jul-1997 fsmp

mpapic.c & mp_machdep:
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.

mplock.s:
- minor update of try_mplock for new algorithm where a CPU uses try_mplock
instead of get_mplock in the ISRs.


27696 26-Jul-1997 fsmp

clock.c:
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.

apic_vector.s:
- new algorithm where a CPU uses try_mplock instead of get_mplock:
if successful continue as before.
if fail set ipending bit, mask INT (to avoid recursion), cleanup & iret.

This allows the CPU to return to successful work, while the ISR will be run
by the CPU holding the lock as part of the doreti dance.


27666 24-Jul-1997 fsmp

simplelock functions removed from apic_ipl.s.

ASM optimization by: Bruce Evans <bde@zeta.org.au>


27654 24-Jul-1997 kato

Treat 6x86MX CPU as 686-class CPU instead of 586-class CPU.


27634 23-Jul-1997 fsmp

New simple_lock code in asm:
- s_lock_init()
- s_lock()
- s_lock_try()
- s_unlock()

Created lock for IO APIC and apic_imen (SMP version of imen)
- imen_lock

Code to use imen_lock for access from apic_ipl.s and apic_vector.s.
Moved this code *outside* of mp_lock.

It seems to work!!!


27619 23-Jul-1997 fsmp

Coded simple_lock and friends in asm.


27616 22-Jul-1997 fsmp

Last commit didn't take, operator error???


27615 22-Jul-1997 fsmp

Hid the existance of imen via a dump routine.


27614 22-Jul-1997 fsmp

Cleaned up an ugly printout.


27566 21-Jul-1997 dyson

Fix a crash that has manifest itself while running X after the 4MB
page upgrades.


27563 20-Jul-1997 fsmp

Developed a new strategy for handling the 8254/8259/APIC issue.


27561 20-Jul-1997 fsmp

Minor cleanup.
Pass string arg to apic_dump.
Moved bootverbose printing of SMP enabled INTs from clock.c to autoconf.c


27560 20-Jul-1997 fsmp

Minor cleanup.


27555 20-Jul-1997 bde

Removed unused #includes.


27541 20-Jul-1997 bde

Oops, I removed one too many #include. <machine/frame.h> was previously
included twice as a side effect of including unrelated headers, but I
removed the #include of one of the headers and will soon fix the nested
#include in the other.


27536 20-Jul-1997 bde

Fixed bitrot in fpu LKMs.


27535 20-Jul-1997 bde

Removed unused #includes.


27523 19-Jul-1997 fsmp

Added code to support #define APIC_PIN0_TIMER.
This code ALWAYS runs the 8254 timer thru the 8259 ICU.
It depricates the usage of "options SMP_TIMER_NC" in the config file.


27522 19-Jul-1997 fsmp

Added #code to support define APIC_PIN0_TIMER.
This code ALWAYS runs the 8254 timer thru the 8259 ICU.
It depricates the usage of "options SMP_TIMER_NC" in the config file.


27520 19-Jul-1997 fsmp

SMP or APIC_IO:
- Increased NIDT to 256.
- Moved IPI vectors up above the linux compat vector.
- Removed runtime setup of RTC vector.


27517 18-Jul-1997 fsmp

Split TEST_CPUSTOP code into CPUSTOP_ON_DDBBREAK and mainline code.


27490 18-Jul-1997 fsmp

Made the printing of the APIC INTs depend on bootverbose.


27489 18-Jul-1997 fsmp

printf cleanup.


27484 17-Jul-1997 dyson

Hopefully fix a few problems that could cause hangs in SMP mode.
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.

We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...


27464 17-Jul-1997 dyson

Add support for 4MB pages. This includes the .text, .data, .data parts
of the kernel, and also most of the dynamic parts of the kernel. Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)

The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.


27424 15-Jul-1997 kato

Oops, added popfl after trynexgen label.

PR: 4091
Submitted by: Kazutaka YOKOTA <yokota@zodiac.mech.utsunomiya-u.ac.jp>


27411 15-Jul-1997 fsmp

Removed several "magic numbers".


27408 15-Jul-1997 fsmp

Cleanup.


27407 15-Jul-1997 fsmp

Tighten up asm code for TEST_PRIO and other misc. things.
Use some new defines in place of "magic numbers".


27406 15-Jul-1997 fsmp

Tighten up asm code for EOI access.


27353 13-Jul-1997 fsmp

new code to control other CPUs: stop_cpus()/restart_cpus()/_Xstopcpu
this code is controlled by smptests.h: TEST_CPUSTOP, OFF by default

new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
this code is controlled by smptests.h: TEST_ALTTIMER, ON by default


27352 13-Jul-1997 fsmp

Cleanup old stop_cpus/restart_cpus() cruft.
new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
new code to control other CPUs: stop_cpus()/restart_cpus()/_Xstopcpu


27290 09-Jul-1997 fsmp

Screwed up again, gotta remember to turn off those debugs!


27289 08-Jul-1997 fsmp

General cleanup of APIC code.
stop_cpus()/restart_cpus() STILL not working!


27288 08-Jul-1997 fsmp

Minor cleanup of APIC code.


27286 08-Jul-1997 fsmp

added #define IPI_LEVEL


27264 07-Jul-1997 fsmp

ifdef a TEST_CPUSTOP debug properly.

Submitted by: Kenneth Merry <ken@plutotech.com>


27256 07-Jul-1997 fsmp

Opps, forgot to turn off the debugs...


27255 07-Jul-1997 fsmp

stop_cpus(), currently BROKEN! (turned off in smptests.h by default).
restart_cpus(), currently BROKEN! (turned off in smptests.h by default).


27254 06-Jul-1997 fsmp

Preliminary support for Xspuriousint.
Preliminary support for stopcpus()/restartcpus().


27253 06-Jul-1997 fsmp

Added some (temporary) macros for debugging.
New strategy for handling the TPR (Task Priority Register).
Test code to sync CPUs.


27251 06-Jul-1997 fsmp

First cut at code for handling "spurious INTerrupts".
First cut at code for handling CPU stop/restart.

Notes:
not working properly yet.


27250 06-Jul-1997 fsmp

#ifdef out debug for now...


27133 01-Jul-1997 bde

Un-inline a call to spl0(). It is not time critical, and was only inline
because there was no non-inline spl0() to call.

Don't frob intr_nesting_level in idle() or cpu_switch(). Interrupts
are mostly disabled then, so the frobbing had little effect.


27132 01-Jul-1997 bde

Added ifdefs so that this compiles when neither I586_CPU nor I586_CPU
is defined, or SMP is defined. It is silly to configure PERFMON when
it can't work (it will be disabled at runtime), but I like to leave
the PERFMON configuration alone when I temporarily disable support for
modern CPUs to run regression tests.

Removed an unused #include.


27131 01-Jul-1997 bde

Un-inline a call to spl0(). It is not time critical, and was only inline
because there was no non-inline spl0() to call.


27130 01-Jul-1997 bde

Some staticized variables were still declared to be extern.


27129 01-Jul-1997 bde

Removed extra definition of constty. It is defined in subr_prf.c.


27007 27-Jun-1997 fsmp

apic_vector.s:
- added Xcpustop IPI code to support stop_cpus()/restart_cpus().
it is off by default, enable via smptests.h:TEST_CPUSTOP

intr_machdep.h:
- moved +ICULEN to lower level.
- added entry for Xcpustop.


27005 27-Jun-1997 fsmp

Added POST code output to various points of the startup code.

General cleanup.

New functions to stop/start CPUs via IPIs:

- int stop_cpus( u_int map );
- int restart_cpus( u_int map );

Turned off by default, enabled via smptests.h:TEST_CPUSTOP.
Current version has a BUG, perhaps a deadlock?


27004 27-Jun-1997 fsmp

Experimental calls to stop_cpus()/restart_cpus() within breakpoint calls.
Turned off by default in smptests.h.


27003 27-Jun-1997 fsmp

Added other_cpus to CPU private page.

This variable is a bitmap showing all CPUs present EXCEPT the CPU
owning the variable. In other words, it is equal to the global bitmap
'all_cpus' minus its own bit.


27001 27-Jun-1997 fsmp

Program lint1 to handle NMIs.

Till now NMIs would be ignored. Now an NMI is caught by the BSP.
APs still ignore NMI, am working on code to allow a CPU to stop other CPUs
via an IPI.


26994 27-Jun-1997 fsmp

Removed '#include <machine/smptests.h>' line, no longer needed.


26985 27-Jun-1997 kato

Added CPU_DIRECT_MAPPED_CACHE option which sets L1 cache in direct
mapped mode on Cyrix 486DLC box.


26954 26-Jun-1997 tegge

Back out a bad commit.


26950 25-Jun-1997 fsmp

Merged/renamed functions:

- get_isa_apic_mask() -> isa_apic_mask()
- get_isa_apic_irq() && get_eisa_apic_irq() -> isa_apic_pin()
- get_pci_apic_irq() -> pci_apic_pin()


26949 25-Jun-1997 fsmp

Modified to use merged/renamed functions:

- get_isa_apic_mask() -> isa_apic_mask()
- get_isa_apic_irq() && get_eisa_apic_irq() -> isa_apic_pin()


26944 25-Jun-1997 tegge

Allow kernel configuration file to override PMAP_SHPGPERPROC. The default
value (200) is too low in some environments, causing a fatal
"panic: get_pv_entry: cannot get a pv_entry_t". The same panic might
still occur due to temporary shortage of free physical memory
(cf. PR i386/2431).


26943 25-Jun-1997 tegge

Block some interrupts during the call to pmap_zero_page in
vm_page_zero_idle. This fixes some occurences of the problem
reported in PR kern/3216: "panic: pmap_zero_page: CMAP busy"


26896 24-Jun-1997 tegge

Ensure that the boot CPU honours write protection in kernel mode.
This fixes one of the problems noted in PR kern/3688.


26888 24-Jun-1997 kato

Recognize AMD K5 PR166 and PR200 CPUs.


26886 24-Jun-1997 fsmp

Fix calculation of initial mplock value.
We now use LOGICAL, not PHYSICAL, IDs to calculate the mplock.


26882 24-Jun-1997 fsmp

Fixed breakage for "default" configurations in mptable_pass1().


26812 22-Jun-1997 peter

Preliminary support for per-cpu data pages.

This eliminates a lot of #ifdef SMP type code. Things like _curproc reside
in a data page that is unique on each cpu, eliminating the expensive macros
like: #define curproc (SMPcurproc[cpunumber()])

There are some unresolved bootstrap and address space sharing issues at
present, but Steve is waiting on this for other work. There is still some
strictly temporary code present that isn't exactly pretty.

This is part of a larger change that has run into some bumps, this part is
standalone so it should be safe. The temporary code goes away when the
full idle cpu support is finished.

Reviewed by: fsmp, dyson


26811 22-Jun-1997 peter

Kill some stale leftovers from the earlier attempts at SMP per-cpu pages


26809 22-Jun-1997 msmith

From the submitted patch :

The kernel with USERCONFIG_BOOT and VISUAL_USERCONFIG option presents
the user the kernel configuration menu upon boot.

The user can navigate the menu with cursor keys. I think it would be
nice if the user can navigate and select a menu item with regular keys
as well, so that the user who is using a serial console which is not
so capable of esc sequences still can choose a menu item.

With the following patch we can select an item by typing an item
number, 1, 2, or 3, or mnemonic `s' to skip UserConfig, 'v' to enter
the visual mode, and `c' to start the CLI mode. `p', `u', `n', and `d'
will move cursor up and down.

Submitted by: yokota


26659 15-Jun-1997 wollman

Fix another power down braino.


26657 15-Jun-1997 wollman

When APM is configured, turn off the power when halting for good.


26494 07-Jun-1997 bde

Preserve %fs and %gs across context switches. This has a relatively low
cost since it is only done in cpu_switch(), not for every exception.
The extra state is kept in the pcb, and handled much like the npx state,
with similar deficiencies (the state is not preserved across signal
handlers, and error handling loses state).


26447 04-Jun-1997 pst

Document a non-standard gdbremote protocol extension (kludge, really)
that I snuck in to our GDB last year. This allows you to debug headless
machines by sharing the console port between the debugger and the system
console. It's not 100% reliabile, but it works well. It's optional
and disabled by default.
Submitted by: Juniper Networks


26388 02-Jun-1997 peter

Fill in some gaps in the cpuid features list..
bit 10 is the old bit for MTRR (presumably this changed, an older P5 I
have has got it, the newer cpus have the new MTRR bit set)
bit 11 is SEP (fast syscalls), bit 23 is MMX
Fill in the other reserved ones with a stub so that we can see them if
they turn up.

Obtained from: Intel AP-485 rev.06


26379 02-Jun-1997 dfr

Change isa_device.h to intr_machdep.h


26373 02-Jun-1997 dfr

Move interrupt handling code from isa.c to a new file. This should make
isa.c (slightly) more portable and will make my life developing the really
portable version much easier.

Reviewed by: peter, fsmp


26309 31-May-1997 peter

Include file updates.. <machine/spl.h> -> <machine/ipl.h>, add
<machine/ipl.h> to those files that were depending on getting SWI_*
implicitly via <machine/cpufunc.h>


26302 31-May-1997 peter

The SWI_NET_MASK and SWI_TTY_MASK handlers are now back adjacent to the
top of the hardware interrupt handlers. Apparently this is slightly
faster with the bit scanning instruction that looks these up - this set of
changes reverts the original change.

Reviewed by: bde


26298 31-May-1997 kato

- Use `6x86MX' instead of `M2'. Cyrix officially use `6x86MX' for the
CPU code-named `M2'.

- Use the result of cpuid instruction instead of DIR to identify
6x86MX cpu. DIR0 and DIR1 are not documented in the data sheet, and
cpuid instruction is enabled at reset time.

- Add a function, init_6x86MX() to initialize 6x86MX cpu. It supports
CPU_SUSP_HLT and CPU_IORT options. It always sets NC1 (640K - 1M is
not cached.), and enables L1 cache in write-back mode.

- Fix typo in the comment in identblue().


26295 31-May-1997 fsmp

Modified code in direction of supporting MULTIPLE_IOAPICS.

- moved read_io_apic_maskc24() from isa/mpapic.h, disabled it, currently unused.
- cleaned up various panic() calls.


26270 29-May-1997 fsmp

Code such as apic_base[APIC_ID] converted to lapic__id

Changes to pmap.c for lapic_t lapic && ioapic_t ioapic pointers,
currently equal to apic_base && io_apic_base, will stand alone with the
private page mapping.


26267 29-May-1997 peter

remove no longer needed opt_smp.h includes


26266 29-May-1997 peter

minor style police (recent divergence from KNF code)


26265 29-May-1997 peter

remove opt_smp.h and fix the reason it was needed.


26264 29-May-1997 peter

No longer need opt_smp.h here


26203 27-May-1997 fsmp

Nuke the printing of the unredirect message unless bootverbose.


26171 26-May-1997 fsmp

Fix breakage from my last commit where mp_start() was missing from UP builds.


26169 26-May-1997 fsmp

Changed inclusion of isa/icu.s to isa/ipl.s.
This is part of the breakup of UP/SMP specific INTerrupt code.


26168 26-May-1997 fsmp

Split vector.s into UP and SMP specific files:
- vector.s <- stub called by i386/exception.s
- icu_vector.s <- UP
- apic_vector.s <- SMP

Split icu.s into UP and SMP specific files:
- ipl.s <- stub called by i386/exception.s (formerly icu.s)
- icu_ipl.s <- UP
- apic_ipl.s <- SMP

This was done in preparation for massive changes to the SMP INTerrupt
mechanisms. More fine tuning, such as merging ipl.s into exception.s,
may be appropriate.


26155 26-May-1997 fsmp

Added a test called 'LATE_START'.

This is now the default, it delays most of the MP startup to the function
machdep.c:cpu_startup(). It should be possible to move the 2 functions
found there (mp_start() & mp_announce()) even further down the path once
we know exactly where that should be...

Help from: Peter Wemm <peter@spinner.dialix.com.au>


26129 25-May-1997 fsmp

Made the array vec[] a global.
This allows the APIC code to reorder the vectors at runtime.


26108 25-May-1997 fsmp

Broke up parse_mp_table() into 2 passes:
- The 1st (preparse_mp_table()) counts the number of cpus, busses, etc. and
records the LOCAL and IO APIC addresses.
- The 2nd pass (parse_mp_table()) does the actual parsing of info and recording
into the incore MP table.

This will allow us to defer the 2nd pass untill malloc() & private pages
are available (but thats for another day!).


26102 24-May-1997 fsmp

Delay mp_start() till after the msgbuf is mapped. We really want to delay
it till even later but tss setup prevents that right now...


26101 24-May-1997 fsmp

Now that panic() is properly printing messages for early SMP panics all
the 'printf("..."); panic("\n")' sections are returned to 'panic("...")'.


26037 23-May-1997 charnier

typo (Cyirx -> Cyrix).


26019 22-May-1997 fsmp

Convert all:
panic( "xxxxx\n" );

to:
printf( "xxxxx\n" );
panic( "\n" );

For some as yet undetermined reason the argument to panic() is often NOT
printed, and the system sometimes hangs before reaching the panic printout.
So we hopefully at least print some useful info before the hang, as oppossed to
leaving the user clueless as to what has happened.


25985 21-May-1997 jdp

This commit affects ELF kernels only.

Remove "setdefs.h" and arrange to generate it automatically at
ELF kernel build time.

"gensetdefs.c" is a utility which scans a set of ELF object files
and outputs a line ``DEFINE_SET(name, length);'' for each linker
set that it finds. When generating an ELF kernel, this is run just
before the final link to generate "setdefs.h".

Remove the init_sets() function from "setdef0.c", and its call from
"machdep.c". Since "gensetdefs.c" calculates the length of each
set, it is no longer necessary in an ELF kernel to count the set
elements at kernel initialization time. Also remove "set_of_sets"
which was used for this purpose.

Link "setdef0" and "setdef1" into the kernel only if building for
ELF. Since init_sets() is no longer used, there is no need to link
them into an a.out kernel.


25925 19-May-1997 kato

Recognize AMD 486 CPUs.


25837 15-May-1997 tegge

Ignore the supplied nfs_diskless structure from the bootstrap loader
if we want to use NFS v3 to mount root and swap.


25723 11-May-1997 tegge

Bring in some kernel bootp support. This removes the need for netboot
to fill in the nfs_diskless structure, at the cost of some kernel
bloat. The advantage is that this code works on a wider range of
network adapters than netboot. Several new kernel options are
documented in LINT.
Obtained from: parts of the code comes from NetBSD.


25711 11-May-1997 bde

Fixed initialization of ldt[]. Unused entries were garbage. A comment
was stale.

Fixed initialization of gdt[] for the BDE_DEBUGGER case. APM entries
clobbered debugger entries if the debugger was loaded (APM is incompatible
with BDE_DEBUGGER) and unused entries were garbage if the debugger wasn't
loaded.


25648 10-May-1997 bde

Cleaned up #includes. Lite2 cleaned up <sys/mount.h> so no kludges
are required for NFS now.

Ifdefed SMP #defines.


25607 09-May-1997 peter

Fatal trap 13: Brain fault, commit botched.
Current task: transcribing patch
Time: for a coffee
Faulting address: bde


25597 09-May-1997 peter

Bandaid old register offset code after md_regs changed type..

Submitted by: Eivind Eklund <perhaps@yes.no>


25559 07-May-1997 fsmp

fix bug in get_isa_apic_mask() where EISA bus was ignored.

Submitted by: Peter Wemm <peter@spinner.DIALix.COM>


25558 07-May-1997 peter

Don't allow access to illegal addresses in /dev/kmem to panic kernel
(eg: above 0xffc00000). Programs using /dev/kmem are implicitly racing
the kernel, and can get right up high in memory. I've been running
these for some time now, but with printfs. It's saved two panics at
least that I can remember.


25557 07-May-1997 peter

clean up forked child creation. This is simplified also by having
md_regs being struct trapframe *. Do a npxsave() if needed and copy the
pcb rather than use the increasingly defunct savectx(). Copy %edi and
%ebp explicitly.

Submitted by: bde

XXX npxproc could be declared in npx.h so the externs with smp fruit
are not needed.


25556 07-May-1997 peter

md_regs is struct trapframe * now, rather than int []
Remove TF_REGP() macro and use. The original reason (address space
problems due to having UPAGES in mapped into user space) is gone. It
looks cleaner without it.


25555 07-May-1997 peter

md_regs is now a struct trapframe *


25554 07-May-1997 peter

forgotten comment


25552 07-May-1997 peter

simplify IOPL gain/remove privs code. It's easier with md_regs
being a trapframe.


25499 05-May-1997 fsmp

Code to handle SMP/APIC_IO mapping of ISA INTs to APIC pins above IRQ15.

- doesn't break my system.
- NOT yet verified on the affected motherboard.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25495 05-May-1997 kato

Use `MediaGX' instead of `Gx86'.


25494 05-May-1997 kato

Use `M2' instead of `6x86 with MMX'. Cyrix seems to use `M2' officially.


25485 05-May-1997 peter

correct the order of the variables
use #ifdef where possible instead of #if defined

Submitted by: the KNF police, ie: bde :-)


25472 05-May-1997 dyson

Make sure that *fork() always returns with %edx == 1 in the
child. This was sometimes not happening correctly during my
threads code work.


25460 04-May-1997 joerg

This mega-commit brings the following:

. It makes cd9660 root f/s working again.
. It makes CD9660 a new-style option.
. It adds support to mount an ISO9660 multi-session CD-ROM as the root
filesystem (the last session actually, but that's what is expected
behaviour).

Sigh. The CDIOREADTOCENTRYS did a copyout() of its own, and thus has
been unusable for me for this work. Too bad it didn't simply stuff
the max 100 entries into the struct ioc_read_toc_entry, but relied on
a user supplied data buffer instead. :-( I now had to reinvent the
wheel, and created a CDIOREADTOCENTRY ioctl command that can be used
in a kernel context.

While doing this, i noticed the following bogosities in existing CD-ROM
drivers:

wcd: This driver is likely to be totally bogus when someone tries
two succeeding CDIOREADTOCENTRYS (or now CDIOREADTOCENTRY)
commands with requesting MSF format, since it apparently
operates on an internal table.

scd: This driver apparently returns just a single TOC entry only for
the CDIOREADTOCENTRYS command.

I have only been able to test the CDIOREADTOCENTRY command with the
cd(4) driver. I hereby request the respective maintainers of the
other CD-ROM drivers to verify my code for their driver. When it
comes to merging this CD-ROM multisession stuff into RELENG_2_2 i will
only consider drivers where i've got a confirmation that it actually
works.


25457 04-May-1997 peter

Don't remove i586_ctr_freq from scope, leave it defined as zero. This
simplifies some assumptions and stops some code compile problems.

This should fix the compile hiccup in PR#3491, but smp kernel profiling
isn't likely to be fixed by this.


25423 03-May-1997 fsmp

disabled checks for smp_active == 0.
this was wasting precious cycles for no apparent (to me) reason.
it is currently bracketed by BOTHER_TO_CHECK, define to restore old behaviour.


25420 03-May-1997 fsmp

improved io_apic_setup().
deals with motherboards that map ISA IRQs to APIC IRQS above 15.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25419 03-May-1997 fsmp

new function to turn an APIC pin# into an INT mask.
added missing APIC_IO define.

Submitted by: "John S. Dyson" <toor@dyson.iquest.net>


25362 01-May-1997 fsmp

cleaned up FAST_IPI code.
- one-liners all become inline.
- multi-liners become functions.
- FAST_IPI defines go away.

re-worked APICIPI_BANDAID code.
- now refered to as DETECT_DEADLOCK.
- on by default.


25361 01-May-1997 fsmp

fixed spelling error.

Submitted by: Bruce Albrecht <bruce@zuhause.mn.org>


25292 29-Apr-1997 fsmp

Enabled 'FIX_MP_TABLE_WORKS' code.
This code re-numbers PCI busses in the MP table to match PCI semantics
when the MP BIOS fails to do it properly.

Reviewed by: Peter Wemm <peter@spinner.DIALix.COM>


25243 28-Apr-1997 fsmp

cleaned out an old FIXME.


25216 28-Apr-1997 fsmp

removed all the TEST_UPPERPRIO crud.


25215 28-Apr-1997 fsmp

remove all the SMP_INVLTLB defines, making the code default for APIC_IO.

Reviewed by: informal discussion with Peter Wemm <peter@spinner.DIALix.COM>


25204 27-Apr-1997 fsmp

informal discussion between Bruce Evans <bde@zeta.org.au>,
Peter Wemm <peter@spinner.DIALix.COM>, Steve Passe <smp@csn.net>

removed all the IPI_INTS code.
made the XFAST_IPI32 code default, renaming Xfastipi32 to Xinvltlb.


25194 27-Apr-1997 peter

Whoops.. We forgot to turn off the 4MB Virtual==Physical mapping at address
zero from bootstrap in the non-SMP case.

Noticed by: bde


25177 26-Apr-1997 peter

fix & instead of && in #if statement
reorder #includes to alphabetical order

Noted by: bde


25175 26-Apr-1997 peter

Remove the curproc printing on trap/interrupt/etc. It's outlived it's
usefulness, and there were problems with it anyway.

Found by: bde


25173 26-Apr-1997 peter

Back out bogus code that slipped past my read of the pre-merge diff
(Problems noted by Bruce)


25172 26-Apr-1997 peter

Fix some SMP merge bugs (from Bruce) -
#include out of order
pccard_configure() called twice
munged tab (existing problem made worse)


25164 26-Apr-1997 peter

Man the liferafts! Here comes the long awaited SMP -> -current merge!

There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people. A special thanks to Steve Passe for implementing
the APIC code!


25159 26-Apr-1997 kato

Add new cpu type, CPU_CY486DX, which shows Cyrix 486S/DX series CPUs,
and initialization routine for those CPUs.

Tested by: Bob Bishop <rb@gid.co.uk>


25083 22-Apr-1997 jdp

Make the necessary changes so that an ELF kernel can be built. I
have successfully built, booted, and run a number of different ELF
kernel configurations, including GENERIC. LINT also builds and
links cleanly, though I have not tried to boot it.

The impact on developers is virtually nil, except for two things.
All linker sets that might possibly be present in the kernel must be
listed in "sys/i386/i386/setdefs.h". And all C symbols that are
also referenced from assembly language code must be listed in
"sys/i386/include/asnames.h". It so happens that failure to do
these things will have no impact on the a.out kernel. But it will
break the build of the ELF kernel.

The ELF bootloader works, but it is not ready to commit quite yet.


25038 20-Apr-1997 phk

Fix up the "hlt vector" change I made.
Reviewed by: bde, bde, bde


25015 19-Apr-1997 kato

Don't disable CPU cache in init_486dlc. If BIOS supports Cyrix 486,
BIOS enables CPU cache and other registers. If BIOS does not supports
it, CPU cache is disabled at reset time.

This commit closes PR/3292.

PR: 3292


24982 16-Apr-1997 ache

Comment out rawcb, it is not used / not present anymore


24980 16-Apr-1997 kato

Use reset port before clearing page table in cpu_reset if PC98 is
defined. Clearing page table could hang some new PC-98.


24933 14-Apr-1997 phk

Forget all about APM. Instead of "hlt" call through a vector which
APM can then fiddle with. Default for the vector is to "htl; ret"


24929 14-Apr-1997 bde

Use the same IOPL check as in syscons.
Reviewed by: pst, joerg


24925 14-Apr-1997 bde

Fixed printing of registers in dbflalt_handler(). The registers
were always in a tss; that tss just changed from the one in the
pcb to common_tss (who knows where it was when there was no curpcb?).
Not using the pcb also fixed the problem that there is no pcb in
idle(), so we now always get useful register values.


24900 13-Apr-1997 bde

Don't forget to set `runtime' in fork_trampoline(). The time slice before
switching to a child for the first time was being counted twice. I think
this only affected unimportant statistics.

Simplified arg handling in fork_trampoline(). splz() doesn't actually
smash the registers of interest.


24852 13-Apr-1997 dyson

Decrease the amount of memory allocated for bouncing. This will
allow large systems to boot successfully with bounce buffers compiled
in. We are now limiting bounce space to 512K. The 8MB allocated for
a 512MB system is very bogus -- and that is now fixed.


24851 13-Apr-1997 dyson

The pmap code was too generous in the allocation of kva space for
the pv entries. This problem has become obvious due to the increase
in the size of the pv entries. We need to create a more intelligent
policy for pv entry management eventually.
Submitted by: David Greenman <dg@freebsd.org>


24848 13-Apr-1997 dyson

Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
thread from the rest of the group.

Fix the case where a thread does an exec. It is almost nonsense for a thread
to modify the other threads address space by an exec, so we
now automatically divorce the address space before modifying it.


24702 07-Apr-1997 peter

Lower the spl() of the new process from splhigh() right away, since
nothing else will lower it until either much later, or never(?) for
kernel processes.

This basically re-fixes what Bruce fixed in rev 1.29 of kern_fork.c,
which was broken again now the child does not execute back up the fork()
calling tree.


24693 07-Apr-1997 peter

Clean up some dead wood. Kill the page table page for mapping the
proc0/idlePTD/bootstrap stack into place in user space. We save 4K.
Remove p0upa, it is now unneeded.


24691 07-Apr-1997 peter

The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space. Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss. This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.


24690 07-Apr-1997 peter

No longer use an i386tss as the basis of our pcb - it wasn't particularly
convenient and makes life difficult for my next commit. We still need
an i386tss to point to for the tss slot in the gdt, so we use a common
tss shared between all processes.

Note that this is going to break debugging until this series of commits
is finished. core dumps will change again too. :-( we really need
a more modern core dump format that doesn't depend on the pcb/upages.

This change makes VM86 mode harder, but the following commits will remove
a lot of constraints for the VM86 system, including the possibility of
extending the pcb for an IO port map etc.

Obtained from: bde


24676 06-Apr-1997 mckay

Prevent wedging of the stat clock because of missed interrupts.
This should cure the "alternate system clock has died!" problem.

Discussed with: bde, joerg


24666 06-Apr-1997 dyson

Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)


24494 01-Apr-1997 bde

Removed a wrong comment of mine.

Removed unused #includes.


24493 01-Apr-1997 bde

Fixed gratuitous ANSIisms.

Removed unused declarations.


24437 31-Mar-1997 dg

Changed the way that the exec image header is read to be filesystem-
centric rather than VM-centric to fix a problem with errors not being
detectable when the header is read.
Killed exech_map as a result of these changes.
There appears to be no performance difference with this change.


24418 30-Mar-1997 joerg

Implement the `detach' command for remote GDB. It gets you back at DDB.


24361 29-Mar-1997 bde

Don't keep cpu interrupts enabled during the lookup in vm_page_zero_idle().
Lookup isn't done every time the system goes idle now, but it can still
take > 1800 instructions in the worst case, so if cpu interrupts are kept
disabled then it might lose 20 characters of sio input at 115200 bps.

Fixed style in vm_page_zero_idle().


24345 28-Mar-1997 bde

Added a setjmp() and a longjmp() so that an unexpected trap inside
ddb isn't necessarily fatal. You can now do silly things like
`call vprint' and `show map' without losing control.


24342 28-Mar-1997 joerg

Something long overdue: compile inb() and outb() into the kernel as
functions if DDB is available. The remaining occurences are usually
only inlined and thus not available in DDB.

I'm sure Bruce will have 23 additions to these 30 lines of code, but
at least it's a starting point. ;-)


24283 25-Mar-1997 mpp

Change sigreturn() to return EFAULT if it is passed an
address outside of the process's address space.
Now it matches its man page :-). Closes PR# 2682.

Discussed with: bde
Submitted by: Jonathan Lemon <jlemon@americantv.com>


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


24200 24-Mar-1997 kato

Fix typo.
Submitted by: Bruce Evans <bde@zeta.org.au>


24113 22-Mar-1997 kato

Oops, I forgot to `cvs add'. This file is a part of new CPU
identification and initialization routines.


24112 22-Mar-1997 kato

Improved CPU identification and initialization routines. This
supports All Cyrix CPUs, IBM Blue Lightning CPU and NexGen (now AMD)
Nx586 CPU, and initialize special registers of Cyrix CPU and msr of
IBM Blue Lightning CPU.

If revision of Cyrix 6x86 CPU < 2.7, CPU cache is enabled in
write-through mode. This can be disabled by kernel configuration
options.

Reviewed by: Bruce Evans <bde@freebsd.org> and
Jordan K. Hubbard <jkh@freebsd.org>


24099 22-Mar-1997 dyson

Decrease the latency/overhead in the prezero code when there is
an adequate number of prezeroed pages.


23855 13-Mar-1997 joerg

Various stylistic improvements regarding num_eisa_slots & co.:

. properly declare the variable in in a .h file, as opposed to
using a private extern declaration in userconfig.c;
. move the definition of EISA_SLOTS and therefore the inclusion of
opt_eisa.h into eisaconf.c.


23642 11-Mar-1997 msmith

Avoid double-s when #conflicts drops from 10 to 9.


23409 05-Mar-1997 bde

Made FPU stuff conditional on npx as well as I586_CPU.


23393 05-Mar-1997 bde

Only print clock calibration messages if the system was booted with -v.

Submitted by: partly by gpalmer


23386 05-Mar-1997 gpalmer

Back out the patch to break up the clock probe lines. Instead, follow
Bruce's suggestion of deleting "relative to mc146818A clock ",
thus shortening the line ...


23375 04-Mar-1997 gpalmer

Split the rather long and line-wrapping clock probe messages on boot.
(2.2?)

Submitted by: Mathew Dood <winter@jurai.net>


23230 01-Mar-1997 ache

Add missing #include <machine/segments.h> for ISPL and SEL_UPL macros


23206 28-Feb-1997 bde

Print function args in the current radix instead of always in hex.

Print the stack pointer together with the frame pointer in the trap,
syscall and interrupt messages. The frame pointer is not very useful
for locating syscall args since syscall functions don't have a frame
pointer.

Print all the numbers in the trap, syscall and interrupt messages in
the default radix. The syscall number was confusing because it was
printed in decimal.

Use %#n format more and 0x%x less. 0x%x of course doesn't work with
a variable radix. ddb is now fairly consistent about using %+#n to
print all numbers. It omits the '+' for signed numbers the '#' in a
few cases (e.g., for function args) to save space.


23070 24-Feb-1997 alex

Typo police.


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22603 12-Feb-1997 mpp

This no longer depends on NFS being defined so that it
can check for an NFS root. With Lite2, the file system
type can be checked by checking if the rootfs name == "nfs".


22564 11-Feb-1997 bde

Restored changes from rev.1.58-1.60 which were blown away by the
previous commit.


22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


22130 30-Jan-1997 dg

Removed PG_N from here, too. Some machines don't like it and it's unnecessary.


22129 30-Jan-1997 dg

Removed unnecessary PG_N flag from device memory mappings. This is handled
by the CPU/chipset already and was apparantly triggering a hardware bug that
causes strange parity errors.


22106 29-Jan-1997 bde

Estimate an initial overhead of 0 usec instead of 20 usec in DELAY().
I have code to calibrate the overhead fairly accurately, but there
is little point in using it since it is most accurate on machines
where an estimate of 0 works well. On slow machines, the accuracy
of DELAY() has a large variance since it is limited by the resolution
of getit() even if the initial delay is calibrated perfectly.

Use fixed point and long longs to speed up scaling in DELAY().
The old method slowed down a lot when the frequency became variable.
Assume the default frequency for short delays so that the fixed
point calculation can be exact.

Fast scaling is only important for small delays. Scaling is done
after looking at the counter and outside the loop, so it doesn't
decrease accuracy or resolution provided it completes before the
delay is up. The comment in the code is still confused about this.


21979 24-Jan-1997 bde

Fixed some formatting bugs (mostly regressions in rev.1.48). Replaced
some magic numbers by pmap constants. Cosmetic.


21975 24-Jan-1997 bde

Initialize CR0_MP in setregs() in case npx0 is disabled or not configured.
Disabling npx0 works right now.

Don't reference `npxdriver' if npx0 is not configured. Not configuring
npx0 doesn't quite work yet.

Don't clear potential non-npx pcb flags in setregs().


21974 24-Jan-1997 obrien

KNF style police.

Reported by: Bruce
Thanks to: Bruce for also providing a diff.


21953 23-Jan-1997 dyson

Remove some dead code from trapwrite.
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


21944 22-Jan-1997 dyson

Fix I386 copyout support. The new page-table management code will
not lazy-fault page table pages. Update the copyout support to take
that into account. This should fix some segfault problems on such
machines.

After a short test period, we'll move this into 2.2.

Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


21857 19-Jan-1997 obrien

Add bits to identify AMD K5 and K6 cpu's.
Tested only on my AMD K5 PR-133. Bit values for K6 taken from AMD document
on how to test such things.

2.2 Candidate.


21809 17-Jan-1997 jkh

Add Intel EtherExpress Pro/10 Ethernet adapter to list of supported
devices.


21783 16-Jan-1997 bde

Guard against the i8254 timer being uninitialzed if DELAY() is
called early for console i/o. The timer is usually in BIOS mode
if it isn't explicitly initialized. Then it counts twice as fast
and has a max count of 65535 instead of 11932. The larger count
tended to cause infinite loops for delays of > 20 us. Such delays
are rare. For syscons and kbdio, DELAY() is only called early
enough to matter for ddb input after booting with -d, and the delay
is too small to matter (and too small to be correct) except in the
PC98 case. For pcvt, DELAY() is not used for small delays (pcvt
uses its own broken routine instead of the standard broken one),
but some versions call DELAY() with a large arg when they unnecessarily
initialize the keyboard for doing console output. The problem is
more serious for pcvt because there is always some early console
output.

Guard against the i8254 timer being partially or incorrectly
initialized. This would have prevented the endless loop.

Should be in 2.2.


21737 15-Jan-1997 dg

Fix bug related to map entry allocations where a sleep might be attempted
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.

Reviewed by: wollman


21734 15-Jan-1997 bde

Fixed longstanding annoying warning about a type mismatch. pmap doesn't
really uses pt_entry_t internally, so don't use it here.

Fixed range checking for writing. The partial page (if any) following
etext wasn't writable.


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


21642 13-Jan-1997 msmith

- Add [?] key and corresponding helptext/pager.
(helptext from Philippe Regnauld)
- Make introfunc() work with serial terminals.
(submitted by Jean-Marc Zucconi)
- Eliminate excessive statusline redraw during screen updates.
(requested by Peter Wemm)
- Some trivial output formatting dinks.


21568 11-Jan-1997 dyson

When we changed pmap_protect to support adding the writeable
attribute to a page range, we forgot to set the PG_WRITEABLE
flag in the vm_page_t. This fixes that problem.


21540 11-Jan-1997 nate

Moved pccard_configure() to the end of the configure() list. This
avoids problems with the PCIC controller grabbing an interrupt that
another card needs.

Closes PR: kernel/2405

Reviewed by: bde


21529 11-Jan-1997 dyson

Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.) Upper level recoded to use
pmap_ts_referenced.


21384 07-Jan-1997 bde

Made minimal changes to make this work again (it depended on devconf).

swapgeneric.c hasn't had anything to do with swapping for some time.
It just makes the -a boot option actually work, and breaks the static
configuration of rootdev and dumpdev. It should be reorganized to
only support -a.


21278 04-Jan-1997 bde

Fixed context switching of FPU state after a fault in
i586_optimized_copyin/out.


21277 04-Jan-1997 bde

Fixed botched tables:
- the operands for bt, bts, arpl and `enter' were reversed.
- btr was reported as bts (with the correct operand order).
- cmpxchg was misplaced. It was misplaced differently in the
comments. It is misplaced differently again in the i486 manual.
I put it where the i586 manual and gas say it is.
- fucompp was misplaced.
- the rr table for(s) some versions of fstp, fcom and fcomp was non-null.
This caused some invalid opcodes to be reported as "" instead of as
"<bad instruction>".
- the word and long versions of the fi* instructions were reversed.
- aaa and daa were reversed.

Fixed bugs involving unusual operand sizes:
- 32-bit registers weren't always forced for bswap or for moves to and
from special registers.
- the operand sizes weren't reported for [l]call or [l]jmp.
- displacements weren't truncated mod 2^16 when the operand size was
16-bit.
- too-large displacements and offsets were fetched, and too-large
offsets were reported, when the operand size was 16-bit.
- sign extended immediate bytes were extended too far when the operand
size was 16-bit.

Fixed bugs involving usual operand sizes:
- 8-bit source registers weren't forced for mov[sz]b[wl].
- 16-bit source registers weren't forced for mov[sz]w[wl].
- immediate bytes were sometimes reported as sign extended even for
byte operations. Same for immediate words in word operations.
- the immediate byte was not reported as sign extended for `push'.

Finished Pentium support:
- cpuid, cmpxchg8b and rsm were missing.

Finished i287 support:
- fneni, fndisi and fsetpm were missing. These are harmless nops on
later FPUs.

Improvements:
- report invalid opcodes 0xd6 and 0xf1 using .byte. They are special
in not causing invalid operand exceptions when executed.
- report the immediate byte for unusual aam and aad instuctions.
Immediate bytes other than 0x0a always worked and are documented to
work on Pentiums.


21181 02-Jan-1997 se

Add code to copy the LDT, if required.

This code was sent to me by Bruce Evans, and seems to fix some
possible kernel panic in case of an execution error. It did not
cause any problems on my system, but I did never observe the
problem this patch is supposed to fix, anyway.

This patch is a NOP, unless the kernel is built with "options
USER_LDT", and doesn't affect the GENERIC kernel for this reason.

I want to have it in 2.2: it fixes a bug ...

Submitted by: bde


20998 29-Dec-1996 dyson

Superficial clean-up of useracc calls. (The useracc usage of
B_READ/B_WRITE is bogus anyway.) Might as well make the call prettier
anyway.


20997 29-Dec-1996 dyson

Allow pmap_protect to increase permissions. This mod can eliminate
the need for unnecessary vm_faults.
Submitted by: Alan Cox <alc@cs.rice.edu>


20858 23-Dec-1996 joerg

Cosmetic (wrt. the screen display) change: when re-enabling a device,
make sure it won't go into the PCI section. Disabling and re-enabling
ed0 made it to the wrong section.

Submitted by: msmith


20748 21-Dec-1996 phk

Give MFS_ROOT priority over NFS as root filesystem.

2.2 candidate.


20651 18-Dec-1996 bde

Only handle copyin/out/etc faults when not in an interrupt handler.
This makes unexpected faults (in an interrupt handler) more likely
to crash properly. It could be done even better (more robustly and
more efficiently) using lazy fault handling.


20650 18-Dec-1996 bde

Fixed formatting of KERN_DUMPDEV.

Should be in 2.2.


20641 18-Dec-1996 bde

Moved the printing of the BIOS geometries from cpu_startup() into
configure() where it always belonged. It was originally slightly
misplaced after configure(). Rev.138 left it completely misplaced
before the DEVFS, DRIVERS and CONFIGURE sysinits by not moving it
together with configure().

Restored the printing of bootinfo.bi_n_bios_used now that it can
be nonzero.


20619 18-Dec-1996 se

Add comment for "amd" driver: Tekram DC-390(T) and generic AMD 53c974 PCI SCSI.


20581 17-Dec-1996 msmith

Increase the allowable port address from 0x2000 to 0xffff to aid in
making PCI look like ISA.

Closes pr kern/860.

Prompted-by: peter's buglist mailout (excellent idea btw 8)


20578 17-Dec-1996 dg

Fix nbuf calculation /4 -> /8. 2.2 already has it this way.

Reviewed by: dyson


20513 15-Dec-1996 se

Update comment for "ncr" SCSI controllers (add Symbios and new chips)


20502 15-Dec-1996 kato

Fix typo.


20471 14-Dec-1996 jkh

Make the USERCONFIG_BOOT semantics closer to what was original
intended.


20468 14-Dec-1996 joerg

Fix the breakage i've introduced with the EISA slot feature, by
depending the addition on NEISA > 0.


20450 14-Dec-1996 joerg

Add a small hack to UserConfig that allows to override the number of
EISA slots to probe. This is mainly intended to allow installing the
system on an HP Netserver with an on-board AIC7xxx EISA SCSI
controller, that is sitting on EISA slot # 11.

Documentation updates explaining this hack will follow shortly.

Note that this can go away again as soon as the EISA device probing
is more intelligent about the address space clash with the PCI address
space.

2.2 candidate.

Not objected by: freebsd-core :)


20368 12-Dec-1996 swallace

Soften range-check for LDTs.

Reviewed by: bde


20348 12-Dec-1996 dg

Fix allocation for exech_map to be 16*PAGE_SIZE rather than 32*PAGE_SIZE
so that it is scaled the same as exec_map (16 concurrent exec'ers).


20313 11-Dec-1996 dyson

One minor mod to set the limit of nbufs to 2048 from 1536. More important
fix to exech_map, it used 32*ARG_MAX, and it should use 32*PAGE_SIZE.


20230 09-Dec-1996 msmith

Improve the explicitness of the "you cannot do anything with PCI
devices" message.

Closes conf/2130.


20146 05-Dec-1996 dyson

Clean-up of the new buffer kva allocation code. Also, there was an
error in the !BOUNCE_BUFFERS case.


20070 01-Dec-1996 bde

Removed all references to b_cylinder (aka b_cylin). It was evil and
hasn't been used for a year or two since disksort() started sorting
on b_pblkno.


20068 01-Dec-1996 dyson

Fix a problem with the new buffer_map management code. Additionally,
decrease the size of buffer_map to approx 2/3 of what it used to be
(buffer_map can be smaller now.) The original commit of these changes
increased the size of buffer_map to the point where the system would
not boot on large systems -- now large systems with large caches will
have even less problems than before.


20018 29-Nov-1996 bde

Fixed EFAULT handling in i586_copyin() and i586_copyout(). Use a
consistent stack frame in fastmove() so that only one new fault handler
is necessary.

Should be in 2.2. Harmless until the i586 versions are reenabled.


20017 29-Nov-1996 bde

Don't print bootinfo.bi_n_bios_used in cpu_startup() since it is always
zero because no drivers have had a chance to change it.


20016 29-Nov-1996 bde

Don't clobber the SIGCONT bit in the signal mask in sigreturn(). Use
the `sigcantmask' macro to get the correct set of unmaskable signals.

Found by: NIST-PCTS.


19993 27-Nov-1996 phk

Make intro command a NO-OP if no VISUAL_USERCONFIG


19957 25-Nov-1996 phk

Make a kernel with DDB but without sio possible again. This is only
a stopgap measure, a more complete solution is on somebodys whiteboard
(and we all know that THAT means :-).

Reviewed by: pst


19828 17-Nov-1996 dyson

Improve the caching of small files like directories, while not
substantially increasing buffer space. Specifically, we double
the number of buffers, but allocate only half the amount of memory
per buffer. Note that VDIR files aren't cached unless instantiated
in a buffer. This will significantly improve caching.


19772 15-Nov-1996 jkh

Change this back to movl for -current since it seems to work there.
Bruce says that movl is broken in -stable, which would certainly explain
why this didn't work there.


19750 14-Nov-1996 jkh

movl instruction should have been lea (this is why userconfig didn't
work in 2.1).

Spotted-by-the-keen-eyes-of: Don Lewis <Don.Lewis@tsc.tdk.com>


19744 14-Nov-1996 jkh

TRUE/FALSE are used even outside of VISUAL_USERCONFIG - move them accordingly.
Submitted-By: Don Lewis <Don.Lewis@tsc.tdk.com>


19690 12-Nov-1996 bde

Fixed buffer overflow for large values in editval(). The buffers were
one too small for (hex) 12345678 and 4 too small for -1234567890. Large
values can be created by config and userconfig although not (previously)
by visual userconfig.

Fixed a sign extension bug for backspacing on "negative" hex values in
editval().

Increased field width and range for `flags' so that all possible values
can be displayed and edited.


19678 12-Nov-1996 bde

Removed another #include of opt_temporary.h.

YA2.2C.


19674 12-Nov-1996 bde

Removed #include of "opt_temporary.h". All the temporary options went
away, so this header is no longer generated.

This change should be in 2.2. The old version shouldn;t have been in
2.2 (blush).


19668 12-Nov-1996 bde

Fixed spelling error in previous commit. This did not compile.


19653 11-Nov-1996 bde

Replaced I586_OPTIMIZED_BCOPY and I586_OPTIMIZED_BZERO with boot-time
negative-logic flags (flags 0x01 and 0x02 for npx0, defaulting to unset = on).
This changes the default from off to on. The options have been in current
for several months with no problems reported.

Added a boot-time negative-logic flag for the old I5886_FAST_BCOPY option
which went away too soon (flag 0x04 for npx0, defaulting to unset = on).

Added a boot-time way to set the memory size (iosiz in config, iosize in
userconfig for npx0).

LINT:
Removed old options. Documented npx0's flags and iosiz.

options.i386:
Removed old options.

identcpu.c:
Don't set the function pointers here. Setting them has to be delayed
until after userconfig has had a chance to disable them and until after
a good npx0 has been detected.

machdep.c:
Use npx0's iosize instead of MAXMEM if it is nonzero.

support.s:
Added vectors and glue code for copyin() and copyout().
Fixed ifdefs for i586_bzero().
Added ifdefs for i586_bcopy().

npx.c:
Set the function pointers here.
Clear hw_float when an npx exists but is too broken to use.
Restored style from a year or three ago in npxattach().


19634 11-Nov-1996 msmith

Update the database of known devices (people, please consider this when you
are adding new drivers...) to match, as best I can tell, majors.i386.

Improve behaviour when attempting to save changes for devices that should
not be changeable. Now correclty avoids non-device items, PCI devices and
devices with no isa_device structure.

Submitted by: (observations from) joerg, bde


19621 11-Nov-1996 dyson

Support the PG_G flag on Pentium-Pro processors. This pretty
much eliminates the unnecessary unmapping of the kernel during
context switches and during invtlb...


19523 08-Nov-1996 asami

Remove option I586_FAST_BCOPY. The code will be included by default
if I586_CPU is defined. Note there is a runtime check so the code
won't be run for non-Pentium CPUs anyway.

2.2 candidate, this code has been tested for almost half year in -current.


19503 07-Nov-1996 joerg

Fix the message buffer mapping. This actually allows to increase
the message buffer size in <sys/msgbuf.h>.

Reviewed by: davidg,joerg
Submitted by: bde


19480 07-Nov-1996 msmith

Add the 'piix' device, to get the right flags for it. This appears to
fix Joerg's freeze.

Definitely a 2.2 candidate.

Reviewed by: joerg


19417 05-Nov-1996 msmith

Protect against PCI devices which may have their 'changed' flag set,
or list items which may look like devices but which don't have an
isa_device structure attached to them.

This _shouldn't_ be possible, but it appears to have been
observed-by: Joerg


19346 03-Nov-1996 dyson

Fix a problem with running down processes that have left wired
mappings with mlock. This problem only occurred because of the
quick unmap code not respecting the wired-ness of pages in the
process. In the future, we need to eliminate the dependency
intrinsic to the design of the code that wired pages actually
be mapped. It is kind-of bogus not to have wired pages mapped,
but it is also a weakness for the code to fall flat because
of a missing page.

This show fix a problem that Tor Egge has been having, and also
should be included into 2.2-RELEASE.


19274 31-Oct-1996 julian

Further improved version of hadling a HALT when there is no console.


19269 30-Oct-1996 asami

More merge and update.

(1) deleted #if 0

pc98/pc98/mse.c

(2) hold per-unit I/O ports in ed_softc

pc98/pc98/if_ed.c
pc98/pc98/if_ed98.h

(3) merge more files by segregating changes into headers.

new file (moved from pc98/pc98):

i386/isa/aic_98.h

deleted:

well, it's already in the commit message so I won't repeat the
long list here ;)

Submitted by: The FreeBSD(98) Development Team


19268 30-Oct-1996 julian

if there is no console, cngetc should act like getc and return -1

make callers aware of this in those cases where it can occur.


19232 28-Oct-1996 imp

comaptibles->compatibles


19186 26-Oct-1996 bde

Removed initialization of a variable that went away. Oops.


19173 25-Oct-1996 bde

Print the clock calibration messages all on one (long) line again so
that they are easy to grep for.

Removed now-unused i586 counter variables.

Fixed some style bugs.


19171 25-Oct-1996 bde

Removed #include of <machine/clock.h>. It is no longer used, and would
break when I remove LOCORE support from clock.h.
I586_CTR_MULTIPLIER_SHIFT = 32 from clock.h is actually still used, but
32 is so magic that it doesn't get used explicitly.


19119 23-Oct-1996 dyson

Account for the UPAGES in the same way as before moving the MD code
from vm_glue into pmap.c. Now RSS should appear to be the same as before.


19064 20-Oct-1996 phk

Removing old isdn stuff.


18963 16-Oct-1996 bde

Fixed miscounting for non-statistical (GUPROF) profiling:
- use CROSSJUMP() and CROSSJUMP_LABEL() for conditional jumps from idle()
into cpu_switch() and vice versa.
- moved badsw code to after cpu_switch().

Cosmetic changes:
- moved sw0 string to be immediately after its caller (badsw).
- removed unused #include.


18951 16-Oct-1996 julian

Add support for embedded operation withou console
The boot.c patch is applied only to teh JULIAN_HACK branch
the muted console is controlable by a sysctl variable kern.consmute


18937 15-Oct-1996 dyson

Move much of the machine dependent code from vm_glue.c into
pmap.c. Along with the improved organization, small proc fork
performance is now about 5%-10% faster.


18930 14-Oct-1996 jkh

Change the boot-time menu.


18904 13-Oct-1996 dyson

Minor optimization for final rundown of a pmap.


18897 12-Oct-1996 dyson

Performance optimizations. One of which was meant to go in before the
previous snap. Specifically, kern_exit and kern_exec now makes a
call into the pmap module to do a very fast removal of pages from the
address space. Additionally, the pmap module now updates the PG_MAPPED
and PG_WRITABLE flags. This is an optional optimization, but helpful
on the X86.


18896 12-Oct-1996 bde

Cleaned up:
- fixed a sloppy common-style declaration.
- removed an unused macro.
- moved once-used macros to the one file where they are used.
- removed unused forward struct declarations.
- removed __pure.
- declared inline functions as inline in their prototype as well
as in theire definition (gcc unfortunately allows the prototype
to be inconsistent).
- staticized.


18892 12-Oct-1996 bde

Removed nested include if <sys/socket.h> from <net/if.h> and
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.


18854 10-Oct-1996 bde

Added missing include of "opt_cpu.h". I missed it because PERFMON
features are used without testing for i586 features that they depend
on. Configuring option PERFMON without configuring a suitable cpu
still doesn't fail right.


18850 10-Oct-1996 bde

Fixed spelling errors in function names in previous commit.


18847 09-Oct-1996 jkh

Correct the intro text so user is not encouraged to step off cliffs.
Also document the fact that you can hit ESC to skip configuration.


18842 09-Oct-1996 bde

Put I*86_CPU defines in opt_cpu.h.


18838 09-Oct-1996 jkh

Accept 'Q' or 'ESC' in intro screen as a "Jane, stop this crazy thing!"
request.


18837 09-Oct-1996 bde

Enable the i586-optimized bcopy if the cpu is a "586" and option
I586_OPTIMIZED_BCOPY is configured.

Similarly for bzero/I586_OPTIMIZED_BZERO.

Fake 586's had better have a hardware FPU with non-broken exception
handling (we mask exceptions, but broken exception handling may trap
on the instructions that do the masking). I guess this means that
the routines won't work on most 386's or FPUless 486's even when they
have a h/w FPU.


18835 09-Oct-1996 bde

Added i586-optimized bcopy() and bzero().

These are based on using the FPU to do 64-bit stores. They also
use i586-optimized instruction ordering, i586-optimized cache
management and a couple of other tricks. They should work on any
i*86 with a h/w FPU, but are slower on at least i386's and i486's.
They come close to saturating the memory bus on i586's. bzero()
can maintain a 3-3-3-3 burst cycle to 66 MHz non-EDO main memory
on a P133 (but is too slow to keep up with a 2-2-2-2 burst cycle
for EDO - someone with EDO should fix this). bcopy() is several
cycles short of keeping up with a 3-3-3-3 cycle for writing. For
a P133 writing to 66 MHz main memory, it just manages an N-3-3-3,
3-3-3-3 pair of burst cycles, where N is typically 6.

The new routines are not used by default. They are always configured
and can be enabled at runtime using a debugger or an lkm to change
their function pointer, or at compile time using new options (see
another log message).

Removed old, dead i586_bzero() and i686_bzero(). Read-before-write is
usually bad for i586's. It doubles the memory traffic unless the data
is already cached, and data is (or should be) very rarely cached for
large bzero()s (the system should prefer uncached pages for cleaning),
and the amount of data handled by small bzero()s is relatively small
in the kernel.

Improved comments about overlapping copies.

Removed unused #include.


18760 06-Oct-1996 bde

Fixed build of LINT yet again. getchar() clashed with getchar() in pcvt.
Staticized it in userconfig. The one in pcvt is unused.

Removed bogus unused arg to getchar(). This should not have compiled
in the USERCONFIG_BOOT case, but the getchar() was also non-prototyped
and defined in K&R style.

Staticized the badly named global variable `next'. Even static variables
should have a unique module-specific prefix so that they can be referenced
easily in debuggers, etc.


18755 06-Oct-1996 dg

Moved a #if for VISUAL_USERCONFIG case...the last commit didn't completely
fix the problem.


18743 06-Oct-1996 jkh

Conditionalize introfunc on USERCONFIG_BOOT && VISUAL_USERCONFIG


18702 05-Oct-1996 jkh

Multiple changes stacked as one commit since they all depend on one another.

First, change sysinstall and the Makefile rules to not build the kernel
nlist directly into sysinstall now. Instead, spit it out as an ascii
file in /stand and parse it from sysinstall later. This solves the chicken-n-
egg problem of building sysinstall into the fsimage before BOOTMFS is built
and can have its symbols extracted. Now we generate the symbol file in
release.8.

Second, add Poul-Henning's USERCONFIG_BOOT changes. These have two
effects:

1. Userconfig is always entered, rather than only after a -c
(don't scream yet, it's not as bad as it sounds).

2. Userconfig reads a message string which can optionally be
written just past the boot blocks. This string "preloads"
the userconfig input buffer and is parsed as user input.
If the first command is not "USERCONFIG", userconfig will
treat this as an implied "quit" (which is why you don't need
to scream - you never even know you went through userconfig
and back out again if you don't specifically ask for it),
otherwise it will read and execute the following commands
until a "quit" is seen or the end is reached, in which case
the normal userconfig command prompt will then be presented.

How to create your own startup sequences, using any boot.flp image
from the next snap forward (not yet, but soon):

% dd of=/dev/rfd0 seek=1 bs=512 count=1 conv=sync <<WAKKA_WAKKA_DOO
USERCONFIG
irq ed0 10
iomem ed0 0xcc000
disable ed1
quit
WAKKA_WAKKA_DOO


Third, add an intro screen to UserConfig so that users aren't just thrown
into this strange screen if userconfig is auto-launched. The default
boot.flp startup sequence is now, in fact, this:

USERCONFIG
intro
visual

(Since visual never returns, we don't need a following "quit").

Submitted-By: phk & jkh


18653 03-Oct-1996 jkh

Make return or newline synonymous with down-arrow in the value editor.
It's a lot easier to whap through multiple changes if you can use the return
key.


18649 03-Oct-1996 jkh

Fix stupid typo.


18648 03-Oct-1996 jkh

Add fxp and vx to list of known device types.


18642 02-Oct-1996 jkh

New support for displaying PCI devices without making you insane.
Submitted-by: msmith [bless his heart]


18574 30-Sep-1996 bde

Whaddya know, visual userconfig sort of supports pci, but the support
was always disabled because "pci.h" wasn't included. Now the configured
pci devices are listed and you can edit bogus flags for them.

Fixed bitrot in the disabled code. A used #include was removed and const
poisoning wasn't fixed.

Removed unused #include.


18548 28-Sep-1996 dyson

Essentially rename pmap_update to be invltlb. It is a very machine
dependent operation, and not really a correct name. invltlb and invlpg
are more descriptive, and in the case of invlpg, a real opcode.

Additionally, fix the tlb management code for 386 machines.


18538 28-Sep-1996 bde

Restored my change in rev.1.119 which was clobbered by the previous commit.


18528 28-Sep-1996 dyson

Move pmap_update_1pg to cpufunc.h. Additionally,
use the invlpg opcode instead of the nasty looking .byte directives.
There are some other minor micro-level code improvements to pmap.c


18514 27-Sep-1996 peter

part 2 of the bsdi compat tweak attempt. I believe that BSDI use both
lcall 7,0 (ie: ldt slot 0) and lcall 0x87,0 (ldt slot 16, it's shifted
three bits to the left). I was fiddling with this so long ago, I don't
recall the specifics.


18511 27-Sep-1996 peter

I've been meaning to commit this for months. Implement select()
for /dev/random and /dev/urandom. Both are always writable, urandom is
always readable, and /dev/random is readable when >= 8 bits are in the
pool.


18428 20-Sep-1996 bde

Changed an arg name in the pseudo-prototype for bzero() to match
the prototype.

Put the jump table for i486_bzero() in the data section. This
speeds up i486_bzero() a little on Pentiums without significantly
affecting its speed on 486's.

Don't waste time falling through 14 nop's to return from do1 in
i486_bzero().

Use fastmove() for counts >= 1024 (was > 1024). Cosmetic.

Fixed profiling of fastmove().

Restored meaningful labels from the pre-1.1 version in fastmove().
Local labels are evil.

Fixed (high resolution non-) profiling of __bb_init_func().


18381 19-Sep-1996 phk

Various de-bogotifications of userconfig.


18380 19-Sep-1996 phk

Add APM_IDLE_CPU option, that is off by default.
I maintain that it saves more power to simply "hlt" the CPU than to
spend tons of time trying to tell the APM bios to do the same.
In particular if you do it 100 times a second...


18309 15-Sep-1996 phk

Fix something that has annoyed me since day one of userconfig: when the
<More> prompt receives a 'q', just abandon the list, the user wants out.


18297 14-Sep-1996 bde

Attached simple external ddb commands `show rtc', `show pgrpdump'
and `show cbstat'. The pgrpdump code was previously controlled by
`#ifdef DEBUG'.


18288 14-Sep-1996 bde

Changed cncheckc() interface so that it is 8-bit clean - return -1
instead of 0 if there is no input.

syscons.c:
Added missing spl locking in sccncheckc(). Return the same value as
sccngetc() would. It is wrong for sccngetc() to return non-ASCII, but
stripping the non-ASCII bits doesn't help.


18287 14-Sep-1996 bde

Changed cncheckc() interface so that it is 8-bit clean - return -1
instead of 0 if there is no input.


18275 13-Sep-1996 bde

Made debugging code (pmap_pvdump()) compile again so that I can test LINT.
I don't know if it actually works.


18274 13-Sep-1996 bde

Removed another devconf leftover.
Fixed a new #include style bug.
Removed an unused #include.


18260 12-Sep-1996 dyson

Primarily a fix so that pages are properly tracked for being
modified. Pages that are removed by the pageout daemon were
the worst affected. Additionally, numerous minor cleanups,
including better handling of busy page table pages. This
commit fixes the worst of the pmap problems recently introduced.


18252 11-Sep-1996 phk

Make userconfig two (default: on) options:
USERCONFIG to enable
VISUAL_USERCONFIG to get the gui stuff too.
Requested by: pst


18239 11-Sep-1996 dyson

A minor fix to the new pmap code. This might not fix the global problems
with the last major pmap commits.


18232 10-Sep-1996 bde

Removed bogus LARGMEM code and option. The code paniced when
biosextmem > 65536, but biosextmem is a 16-bit quantity so it is
guaranteed to be < 65536. Related cruft for biosbasemem was
mostly cleaned up in rev.1.26.


18207 10-Sep-1996 bde

Updated #includes to 4.4Lite style.


18169 08-Sep-1996 dyson

Addition of page coloring support. Various levels of coloring are afforded.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache. (Parameters can be generated
to support arbitrary cache sizes also.)


18163 08-Sep-1996 dyson

Improve the scalability of certain pmap operations.


18084 06-Sep-1996 phk

Remove devconf, it never grew up to be of any use.


18023 03-Sep-1996 nate

Cleaned up version of my 'extended BIOS' patch. This one is commented
better and much simpler to understand, and works just as well (better)
as a bonus.

Submitted by: bde


17986 01-Sep-1996 dg

Change an splclock that needs to be an splhigh into an splhigh.

Reviewed by: bde


17983 01-Sep-1996 nate

If the basemem value supplied by the bootblocks, differs from the value
returned by the RTC, use the bootblock supplied value. Also, map the
'stolen by BIOS' memory in the same manner as the ISA-hole memory, since
it is really an extenstion of the BIOS. This is necessary for 32-bit
BIOS functions such as APM support on laptops, and the loss of memory
for non-necessary functions seems to be at most 4k.

Reviewed by: phk
Obtained from: email conversation with jtk@atria.com


17950 30-Aug-1996 pst

Improvements from Bruce Evans


17865 28-Aug-1996 pst

Clean up formatting and fix an & -> && bug pointed out by bde


17847 27-Aug-1996 pst

Support for GDB remote debug protocol.

Sponsored by: Juniper Networks, Inc. <pst@jnx.com>


17677 19-Aug-1996 julian

Collect all the functioons concerned with rebooting into one place
also add the at_shutdown callout list, and change the one user of
the present (broken) method (the vn driver) to use the new scheme.


17559 12-Aug-1996 wollman

Back out mistaken local change that sneaked in on the last commit.


17558 12-Aug-1996 wollman

Don't declare the user_ldt functions unless USER_LDT is defined.
Eliminates an obnoxious warning.


17521 11-Aug-1996 dg

Add support for i686 machine check trap.


17503 10-Aug-1996 joerg

Teach UserConfig about ANSI (DEC?) ``application mode'' arrow key
sequences (ESC O A, as opposed to ESC [ A).


17490 10-Aug-1996 peter

Add recognition for the AMD 5x86 CPU models.

Submitted by: A JOSEPH KOSHY <koshy@india.hp.com>


17488 10-Aug-1996 peter

Trivial cosmetic tweak to make the i[56]86 CPU MHz reprting round to the
nearest .01 Mhz rather than simply truncating it downwards.

This hack makes this 89.999928 Mhz clock correctly round to the closer
90.00-MHz rather than 89.99-MHz:
> i586 clock: 89999928 Hz, i8254 clock: 1193152 Hz
> CPU: Pentium (90.00-MHz 586-class CPU)


17395 02-Aug-1996 bde

Eliminated i586_ctr_rate. Use i586_ctr_freq instead.


17393 02-Aug-1996 bde

Reduced division by i586_ctr_rate to multiplication by i586_ctr_multiplier.


17371 31-Jul-1996 bde

Eliminated pcb_inl. It was always 0 because context switches don't occur
in interrupt handlers.


17366 31-Jul-1996 dg

Converted timer/run queues to 4.4BSD queue style. Removed old and unused
sleep(). Implemented wakeup_one() which may be used in the future to combat
the "thundering herd" problem for some special cases.

Reviewed by: dyson


17355 30-Jul-1996 bde

Fixed longstanding bug of not checking `dumpdev' or setting `dumplo'
early enough when the dump device is specified in the config file.

Removed stale comment about configuration root and swap devices.

Don't bother clearing dumplo when dumpdev is set to NODEV. Everything
is controlled by dumpdev.

Fixed the kern.dumpdev sysctl. Writes were handle bogusly.


17353 30-Jul-1996 bde

Fixed the machdep.i8254_freq and machdep.i586_freq sysctls. Writes were
handled bogusly.

Centralized the setting of all the frequency variables. Set these
variables atomically. Some new ones aren't used yet.


17334 30-Jul-1996 dyson

Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find. This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.


17329 29-Jul-1996 dyson

Fix a problem with a DEBUG section of code.


17325 29-Jul-1996 dyson

Fix an error in statement order in pmap_remove_pages, remove the pmap
pte hint (for now), and general code cleanup.


17321 28-Jul-1996 dyson

Fix a problem that pmap update was not being done for kernel_pmap. Also
remove some (currently) gratuitious tests for PG_V... This bug could
have caused various anomolous (temporary) behavior.


17294 27-Jul-1996 dyson

This commit is meant to solve a couple of VM system problems or
performance issues.

1) The pmap module has had too many inlines, and so the
object file is simply bigger than it needs to be.
Some common code is also merged into subroutines.
2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
Unfortunately, a few have needed to be added also.
The removal caused the need for more vm_page_lookups.
I added lookup hints to minimize the need for the
page table lookup operations.
3) Removal of some bogus performance improvements, that
mostly made the code more complex (tracking individual
page table page updates unnecessarily). Those improvements
actually hurt 386 processors perf (not that people who
worry about perf use 386 processors anymore :-)).
4) Changed pv queue manipulations/structures to be TAILQ's.
5) The pv queue code has had some performance problems since
day one. Some significant scalability issues are resolved
by threading the pv entries from the pmap AND the physical
address instead of just the physical address. This makes
certain pmap operations run much faster. This does
not affect most micro-benchmarks, but should help loaded system
performance *significantly*. DG helped and came up with most
of the solution for this one.
6) Most if not all pmap bit operations follow the pattern:
pmap_test_bit();
pmap_clear_bit();
That made for twice the necessary pv list traversal. The
pmap interface now supports only pmap_tc_bit type operations:
pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
Additionally, the modified routine now takes a vm_page_t arg
instead of a phys address. This eliminates a PHYS_TO_VM_PAGE
operation.
7) Several rewrites of routines that contain redundant code to
use common routines, so that there is a greater likelihood of
keeping the cache footprint smaller.


17236 21-Jul-1996 joerg

Post-commit review by Bruce. Mostly stylistic changes.

Submitted by: bde


17231 20-Jul-1996 joerg

Major cleanup of the timerX_{acquire,release} stuff. In particular,
make it more intelligible, improve the partially bogus locking, and
allow for a ``quick re-acquiration'' from a pending release of timer 0
that happened ``recently'', so it was not processed yet by clkintr().
This latter modification now finally allows to play XBoing over
pcaudio without losing sounds or getting complaints. ;-) (XBoing
opens/writes/closes the sound device all over the day.)

Correct locking for sysbeep().

Extensively (:-) reviewed by: bde


17194 17-Jul-1996 bde

Fixed adjustment of `time' when timer0 is released. 27465 was 27645 in
a comment and in code that was only used when pcaudio was closed. The
maximum error was 66 usec.


17178 15-Jul-1996 nate

Moved declaration of zbuf outside of #ifdef DEVFS code.


17174 15-Jul-1996 bde

Quick fix for previous commit: don't free zbuf on close since it may be
in use in another process that blocked in uiomove().


17166 14-Jul-1996 dyson

Almost gratuitious improvement of the performance of reading
/dev/zero.


17121 12-Jul-1996 bde

Removed "optimization" using gcc's builtin memcpy instead of bcopy.
There is little difference now since the amount copied is large,
and bcopy will become much faster on some machines.


17120 12-Jul-1996 bde

Renamed upa to p0upa to match p0upt.

Cleaned up some comments.


17118 12-Jul-1996 bde

Export `dumpmag' to utilities but not to the kernel.

Restored a truncated comment.


17117 12-Jul-1996 bde

Fixed cloned comments about npx traps to match context.


17109 12-Jul-1996 bde

Fixed operand order for shld and shrd.

Finished the constant poisoning that was begun in rev.1.14. Consts
aren't very poisonous (or useful) unless -Wcast-qual is in CFLAGS,
and it isn't in the default CFLAGS.


17108 12-Jul-1996 bde

Don't use NULL in non-pointer contexts.


17014 08-Jul-1996 wollman

Fix something that's been bugging me for a long time: move the CPU
type identification code out of machdep.c and into a new file of its
own. Hopefully other grot can be moved out of machdep.c as well
(by other people) into more descriptively-named files.


16874 01-Jul-1996 bde

Use the standard timer (interrupt) frequency while calibrating the clocks.
Testing with the high frequency of 20000 Hz (to find problems) only found
the problem that this frequency is too high for slow i386's.

Disable interrupts while setting the timer frequency. This was unnecessary
before rev.1.57 and forgotten in rev.1.57. The critical (i8254) interrupts
are disabled in another way at boot time but not in the sysctl to change
the frequency.


16747 26-Jun-1996 dyson

When page table pages were removed from process address space, the
resident page stats were not being decremented. This mode corrects
that problem.


16725 25-Jun-1996 bde

trap.c:
Fixed profiling of system times. It was pre-4.4Lite and didn't support
statclocks. System times were too small by a factor of 8.

Handle deferred profiling ticks the 4.4Lite way: use addupc_task() instead
of addupc(). Call addupc_task() directly instead of using the ADDUPC()
macro.

Removed vestigial support for PROFTIMER.

switch.s:
Removed addupc().

resourcevar.h:
Removed ADDUPC() and declarations of addupc().

cpu.h:
Updated a comment. i386's never were tahoe's, and the deferred profiling
tick became (possibly) multiple ticks in 4.4Lite.

Obtained from: mostly from NetBSD


16723 25-Jun-1996 bde

Save John Polstra's initial fix for profiling for reference. The
multiplication in addupc() overflowed for addresses >= 256K, assuming
the usual profil(2) scale parameter of 0x8000. addupc() will go away
soon.

Submitted by: John Polstra <jdp@polstra.com>


16680 25-Jun-1996 dyson

Limit the scan for preloading pte's to the end of an object.


16532 20-Jun-1996 dg

Properly account for non-page aligned buffers.


16530 20-Jun-1996 dg

Minor KNF formatting change to vmapbuf() and vunmapbuf().


16499 19-Jun-1996 dyson

Clean up vmapbuf and vunmapbuf significantly. The previous code was
very rough.


16471 18-Jun-1996 bde

Removed unused #includes of <i386/isa/icu.h> and <i386/isa/icu.h>. icu.h
is only used by the icu support modules and by a few drivers that know
too much about the icu (most only use it to convert `n' to `IRQn'). isa.h
is only used by ioconf.c and by a few drivers that know too much about
isa addresses (a few have to, because config is deficient).


16428 17-Jun-1996 bde

In getit(), use read_eflags()/write_eflags() to preserve the interrupt
enable flag instead of enable_intr() to restore it to its usual state.
getit() is only called from DELAY() so there is no point in optimising
its speed (this wasn't so clear when it was extern), and using
enable_intr() made it inconvenient to call DELAY() from probes that need
to run with interrupts disabled.


16427 17-Jun-1996 bde

In microtime(), use pushfl/popfl to preserve the interrupt enable flag
instead of sti to it restore to its usual state. pushfl/popfl is
actually faster in protected mode on Pentiums (4+3 cycles instead of 9),
and using sti made it extremely inconvenient to call microtime() from
fast interrupt handlers. pushfl/popfl is a couple of cycles slower than
sti on 486's and a couple more cycles slower on 386's, but the relative
cost of using it is not large since microtime() has to use slow i/o
instructions on the old cpus.


16415 17-Jun-1996 dyson

Several bugfixes/improvements:
1) Make it much less likely to miss a wakeup in vm_page_free_wakeup
2) Create a new entry point into pmap: pmap_ts_referenced, eliminates
the need to scan the pv lists twice in many cases. Perhaps there
is alot more to do here to work on minimizing pv list manipulation
3) Minor improvements to vm_pageout including the use of pmap_ts_ref.
4) Major changes and code improvement to pmap. This code has had
several serious bugs in page table page manipulation. In order
to simplify the problem, and hopefully solve it for once and all,
page table pages are no longer "managed" with the pv list stuff.
Page table pages are only (mapped and held/wired) or
(free and unused) now. Page table pages are never inactive,
active or cached. These changes have probably fixed the
hold count problems, but if they haven't, then the code is
simpler anyway for future bugfixing.
5) The pmap code has been sorely in need of re-organization, and I
have taken a first (of probably many) steps. Please tell me
if you have any ideas.


16344 13-Jun-1996 asami

A fast memory copy for Pentiums using floating point registers.
It is called from copyin and copyout.

The new routine is conditioned on I586_CPU and I586_FAST_BCOPY, so you
need

options "I586_FAST_BCOPY"

(quotes essenstial) in your kernel config file.

Also, if you have other kernel types configured in your kernel, an
additional check to make sure it is running on a Pentium is inserted.
(It is not clear why it doesn't help on P6s, it may be just that the
Orion chipset doesn't prefetch as efficiently as Tritons and friends.)

Bruce can now hack this away. :)


16328 12-Jun-1996 joerg

Externalize the declaration of dc_list. This is required in order to
get a ``generic'' kernel (``config kernel swap generic'') to compile.


16324 12-Jun-1996 dyson

Fix a very significant cnt.v_wire_count leak in vm_page.c, and some
minor leaks in pmap.c. Bruce Evans made me aware of this problem.


16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


16300 11-Jun-1996 pst

Move warning messages under bootverbose


16299 11-Jun-1996 pst

Put clock calibration #defines in opt_clock.h to ease reconfiguration


16215 08-Jun-1996 bde

Stop using the alias `pcb_ptd' for `pcb_tcc.tss_cr3'. Use the (existing)
alias `pcb_cr3' instead. That is still one alias too many, but is convenient
for me since I've replaced the tss in the pcb by a few scalar variables in
the pcb.


16213 08-Jun-1996 bde

Removed bogus `altfmt' code. No alternative formats are supported, but
altfmt was abused to sometimes screw up the disassembly of the bytes
following unconditional jump instructions. Gas doesn't pad to a longword
boundary like the comment said - that is the programmer's responsibility.


16211 08-Jun-1996 bde

Removed recently introduced unnecessary #includes of <machine/cpu.h>
(bootverbose isn't there in -current) and nearby unnecessary #includes.


16197 08-Jun-1996 dyson

Adjust the threshold for blocking on movement of pages from the cache
queue in vm_fault.

Move the PG_BUSY in vm_fault to the correct place.

Remove redundant/unnecessary code in pmap.c.

Properly block on rundown of page table pages, if they are busy.

I think that the VM system is in pretty good shape now, and the following
individuals (among others, in no particular order) have helped with this
recent bunch of bugs, thanks! If I left anyone out, I apologize!

Stephen McKay, Stephen Hocking, Eric J. Chet, Dan O'Brien, James Raynard,
Marc Fournier.


16166 07-Jun-1996 dyson

Fix a bug in the pmap_object_init_pt routine that pages aren't taken
from the cache queue before being mapped into the process.


16136 05-Jun-1996 dyson

I missed a case of the page table page dirty-bit fix.


16122 05-Jun-1996 dyson

Keep page-table pages from ever being sensed as dirty. This should fix
some problems with the page-table page management code, since it can't
deal with the notion of page-table pages being paged out or in transit.
Also, clean up some stylistic issues per some suggestions from
Stephen McKay.


16079 02-Jun-1996 dyson

Don't carry the modified or referenced bits through to the child
process during pmap_copy. This minimizes unnecessary swapping or creation of
swap space. If there is a hold_count flaw for page-table
pages, clear the page before freeing it to lessen the chance of a system
crash -- this is a robustness thing only, NOT a fix.


16075 02-Jun-1996 joerg

Be slightly more verbose during configure() in the bootverbose case.
This breaks the long silence after the ``npx0'' message and allows to
track some of the problems regarding the root f/s decisions.


16057 01-Jun-1996 dyson

Fix the problem with pmap_copy that breaks X in small memory machines. Also
close some windows that are opened up by page table allocations. The
prefaulting code no longer uses hold counts, but now uses the busy
flag for synchronization.


16029 31-May-1996 peter

Jump some hoops to have the *.s code being able to be run through both an
ansi and traditional cpp.

The nesting rules of macros are different, which required some changes.
Use __CONCAT(x,y) instead of /**/.
Redo some comments to use /* */ rather than "# comment" because the ansi
cpp cares about those, and also cares about quote matching.


16026 31-May-1996 dyson

This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also. There is still
a hang problem using X in small memory machines.


15977 29-May-1996 dyson

The wrong address (pindex) was being used for the page table directory. No
negative side effects right now, but just a clean-up.


15926 27-May-1996 phk

Cleanup the last of the assembly time "-KERNBASE" relocations.


15868 22-May-1996 peter

Fix harmless warning.. pmap_nw_modified was not having it's arg
cast to pt_entry_t like the others inside the DIAGNOSTIC code.


15858 22-May-1996 dyson

A serious error in pmap.c(pmap_remove) is corrected by this. When
comparing the PTD pointers, they needed to be masked by PG_FRAME, and
they weren't. Also, the "improved" non-386 code wasn't really an
improvement, so I simplified and fixed the code. This might have
caused some of the panics caused by the VM megacommit.


15832 21-May-1996 dyson

To quote Stephen McKay: pmap_copy is a complex NOP at this moment :-).

With this fix from Stephen, we are getting the target fork performance
that I have been trying to attain: P5-166, before the mega-commit: 700-800usecs,
after: 600usecs, with Stephen's fix: 500usecs!!! Also, this could be the
solution of some strange panic problems...
Reviewed by: dyson@freebsd.org
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


15819 19-May-1996 dyson

Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.


15809 18-May-1996 dyson

This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

More usage of the TAILQ macros. Additional minor fix to queue.h.
Performance enhancements to the pageout daemon.
Addition of a wait in the case that the pageout daemon
has to run immediately.
Slightly modify the pageout algorithm.
Significant revamp of the pmap/fork code:
1) PTE's and UPAGES's are NO LONGER in the process's map.
2) PTE's and UPAGES's reside in their own objects.
3) TOTAL elimination of recursive page table pagefaults.
4) The page directory now resides in the PTE object.
5) Implemented pmap_copy, thereby speeding up fork time.
6) Changed the pv entries so that the head is a pointer
and not an entire entry.
7) Significant cleanup of pmap_protect, and pmap_remove.
8) Removed significant amounts of machine dependent
fork code from vm_glue. Pushed much of that code into
the machine dependent pmap module.
9) Support more completely the reuse of already zeroed
pages (Page table pages and page directories) as being
already zeroed.
Performance and code cleanups in vm_map:
1) Improved and simplified allocation of map entries.
2) Improved vm_map_copy code.
3) Corrected some minor problems in the simplify code.
Implemented splvm (combo of splbio and splimp.) The VM code now
seldom uses splhigh.
Improved the speed of and simplified kmem_malloc.
Minor mod to vm_fault to avoid using pre-zeroed pages in the case
of objects with backing objects along with the already
existant condition of having a vnode. (If there is a backing
object, there will likely be a COW... With a COW, it isn't
necessary to start with a pre-zeroed page.)
Minor reorg of source to perhaps improve locality of ref.


15722 10-May-1996 wollman

Allocate mbufs from a separate submap so that NMBCLUSTERS works as
expected.


15694 09-May-1996 phk

Fix brino on my part. _etext doesn't include the padding to a page
boundary, which means that it doesn't mark the start of the data
section (which is then inaccessible to the programmer ??).
Hopefully fixes recent locore reboot problems.


15583 03-May-1996 phk

Another sweep over the pmap/vm macros, this time with more focus on
the usage. I'm not satisfied with the naming, but now at least there is
less bogus stuff around.


15565 02-May-1996 phk

Move atdevbase out of locore.s and into machdep.c
Macroize locore.s' page table setup even more, now it's almost readable.
Rename PG_U to PG_A (so that I can...)
Rename PG_u to PG_U. "PG_u" was just too ugly...
Remove some unused vars in pmap.c
Remove PG_KR and PG_KW
Remove SSIZE
Remove SINCR
Remove BTOPKERNBASE

This concludes my spring cleaning, modulus any bug fixes for messes I
have made on the way.

(Funny to be back here in pmap.c, that's where my first significant
contribution to 386BSD was... :-)


15543 02-May-1996 phk

removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
NPDEPG

Major macro cleanup.


15538 02-May-1996 phk

First pass at cleaning up macros relating to pages, clusters and all that.


15534 02-May-1996 phk

KGDB is dead. It may come back one day if somebody does it.


15508 01-May-1996 bde

Added calibration the i8254 and the i586 clocks agains the RTC at boot
time. The results are currently ignored unless certain temporary options
are used.

Added sysctls to support reading and writing the clock frequency variables
(not the frequencies themselves). Writing is supposed to atomically
adjust all related variables.

machdep.c:
Fixed spelling of a function name in a comment so that I can log this
message which should have been with the previous commit.

Initialize `cpu_class' earlier so that it can be used in startrtclock()
instead of in calibrate_cyclecounter() (which no longer exists).

Removed range checking of `cpu'. It is always initialized to CPU_XXX
so it is less likely to be out of bounds than most variables.

clock.h:
Removed I586_CYCLECTR(). Use rdtsc() instead.

clock.c:
TIMER_FREQ is now a variable timer_freq that defaults to the old value of
TIMER_FREQ. #define'ing TIMER_FREQ should still work and may be the best
way of setting the frequency.

Calibration involves counting cycles while watching the RTC for one second.
This gives values correct to within (a few ppm) + (the innaccuracy of the
RTC) on my systems.


15507 01-May-1996 bde

i386/machdep.c
include/clock.h
isa/clock.c


15501 01-May-1996 bde

Don't return unused values in cpu_switch() or savectx().

Don't preserve unused registers in the NPX case in savectx().


15500 01-May-1996 bde

Removed unused #include.


15471 30-Apr-1996 phk

Remove a spurious mapping that was introduced earlier.


15428 28-Apr-1996 phk

Fix some bugs I introduced and some old ones as well.
Add BDE_DEBUGGER back.
Improve quality of comments.
Thanks Bruce!

Reviewed by: phk
Submitted by: bde


15403 26-Apr-1996 bde

Fixed a bug introoduced in the previous change. ISA device memory was
mapped to semi-random place(s) depending on the content(s) of physical
address 0xA0000. This was fatal at least on my system with a some
memory-mapped devices. Console syscons somehow wasn't affected. It
bogusly hardcodes the address. Sigh.


15392 26-Apr-1996 phk

A significant debogofication of locore.s. I havn't found any actualy
bugs, but it is a lot easier to navigate this twisted code now.


15379 25-Apr-1996 phk

Fix cpu_fork for real.

Suggested by: bde


15345 22-Apr-1996 nate

- add apm to the GENERIC kernel (disabled by default), and add some comments
regarding apm to LINT
- Disabled the statistics clock on machines which have an APM BIOS and
have the options "APM_BROKEN_STATCLOCK" enabled (which is default
in GENERIC now)
- move around some of the code in clock.c dealing with the rtc to make
it more obvios the effects of disabling the statistics clock

Reviewed by: bde


15340 22-Apr-1996 dyson

This fixes a troubling oversight in some of the pmap code enhancements.
One of the manifiestations of the problem includes the -4 RSS problem
in ps.

Reviewed by: dyson
Submitted by: Stephen McKay <syssgm@devetir.qld.gov.au>


15304 19-Apr-1996 phk

savectx returns through cpu_switch in case of the child, so it must
return void just like cpu_switch. Fix prototype and usage from machdep.c


15301 18-Apr-1996 phk

Fix a bogon. cpu_fork & savectx ecpected cpu_switch to restore %eax,
they shouldn't.


15256 13-Apr-1996 bde

Fixed handling of device flags. The real flags were never used.

Submitted by: Michael Smith <msmith@atrad.adelaide.edu.au>


15232 13-Apr-1996 bde

Use PCB_SAVEFPU_SIZE instead of a too-small size in savectx(). This
bug only affected FPU emulators. It might have caused bogus FPU states
in core dumps and in the child pcb after a fork. Emulated FPU states
in core dumps don't work for other reasons, and the child FPU state
is reinitialized by exec, so the problem might not have caused any
noticeable affects.

Cleaned up #includes.


15231 13-Apr-1996 bde

Generate #define of PCB_SAVEFPU_SIZE for use in savectx().


15215 12-Apr-1996 phk

Make alltraps a .globl so that DDB doesn't make people belive they have
an ALIGNFLT on their hands all the time.


15123 07-Apr-1996 bde

Use breakpoint() function instead of inline assembler.


15119 07-Apr-1996 bde

Changed #includes of <i386/include/foo.h> to #includes of <machine/foo.h>.


15117 07-Apr-1996 bde

Removed never-used #includes of <machine/cpu.h>. Many were apparently
copied from bad examples.


15109 07-Apr-1996 bde

Fixed the ownership and permissions of /dev/io. Rev.1.32 broke rev.1.29.


15088 07-Apr-1996 dyson

Major cleanups for the pmap code.


15065 06-Apr-1996 dg

Switch 586/686 back to generic_bzero and #if 0'd the "optimized" code. It
turns out that it actually reduces performance in real-world cases.

Noticed by: bde


15054 05-Apr-1996 ache

Fix adjkerntz expression priority


15045 05-Apr-1996 ache

Add wall_cmos_clock sysctl variable, needed to manage adjkerntz even for
UTC cmos clocks (needed for Local Timezone FSes)


14988 01-Apr-1996 scrappy

Convert from using devfs_add_devsw() to devfs_add_devswf()

Fixed Permissions/Ownership in DEVFS to reflect /dev


14967 31-Mar-1996 dg

Change if/goto into a while loop.


14943 31-Mar-1996 bde

Moved rtcin() to clock.c.

Always delay using one inb(0x84) after each i/o in rtcin() - don't
do this conditional on the bogus option DUMMY_NOPS not being defined.
If you want an optionally slightly faster rtcin() again, then inline
it and use a better named option or sysctl variable. It only needs
to be fast in rtcintr().


14922 29-Mar-1996 wollman

There is no need to zero out the TSC when configuring a counter,
says Mike Haertel.


14911 29-Mar-1996 bde

Removed never-used files.


14893 28-Mar-1996 wollman

Sync up the Pentium implementation with the documentation.
Previously, the sense of the E flag was reversed on
Pentiums.


14888 28-Mar-1996 wollman

Nit: according to the Harvard code, it is necessary to clear the timestamp
counter before loading the performance-monitor control register. I'm
not sure I believe this, but we'll follow their lead for the moment.
As a result of this commit, the performance-monitoring test program that
I wrote now works (the program will find its way to share/examples).


14887 28-Mar-1996 wollman

Teach the disassembler about the 0f,3x family of instructions
(RDMSR, RDTSC, WRMSR, and RDPMC).


14880 28-Mar-1996 bde

Undid the last 2 commits. Rev.1.43 reversed the changes in rev.1.42 and
rev.1.44 was a subset of them.


14873 28-Mar-1996 scrappy

Switched from using devfs_add_sw() to using devfs_add_swf()

Reviewed by: julian@freebsd.org


14868 28-Mar-1996 dyson

Remove a now unnecessary prototype from pmap.c. Also remove now
unnecessary vm_fault's of page table pages in trap.c.


14867 28-Mar-1996 dyson

Significant code cleanup, and some performance improvement. Also,
mlock will now work properly without killing the system.


14860 27-Mar-1996 wollman

A slightly-closer-to-working version that includes code appropriate
to regular Pentiums. Unfortunately, it doesn't work on mine,
but I'm not sure if this is the fault of the driver.


14846 27-Mar-1996 bde

Fixed permissions of /devfs/*random.

Fixed group and permissions of /devfs/perfmon.


14845 27-Mar-1996 bde

Fixed mode of /devfs/console.


14837 27-Mar-1996 bde

Print stack pointer and frame pointer in trap messages.

Fixed "trace/trap" message.

Reviewed by: davidg


14836 27-Mar-1996 bde

Eliminated dependency on opt_sysvipc.h.


14835 27-Mar-1996 bde

Removed vestiges of dummy frame at top of tmpstk.

Use alignment macros where appropriate.

Cleaned up #includes.


14834 27-Mar-1996 bde

Fixed traceback for the following cases:
- legitimate null frames from idle() (traceback was aborted after a null
pointer trap)
- second instruction of normal function prologue, and last instruction of
a function (caller wasn't reported).

Reviewed by: davidg


14825 26-Mar-1996 wollman

Add support for Pentium and Pentium Pro performance counters.
(This code is as yet untested; to come after man page is written.)
This also adds inlines to cpufunc.h for the RDTSC, RDMSR, WRMSR, and RDPMC
instructions. The user-mode interface is via a subdevice of mem.c;
there is also a kernel-size interface which might be used to aid
profiling.


14773 23-Mar-1996 nate

Whoops, back out the last commit, which was accidentally committed at
the same time as the if_zp cleanup patch.

The commit that occurred was an incomplete patch for APM on my laptop
and needs more work.


14772 23-Mar-1996 nate

Now that ac->ac_ipaddr and arpwhohas() no longer exist, remove the
ifdef'd out code that used it.


14691 19-Mar-1996 nate

Always enable interrupts before calling the APM idle/busy routines.

Suggested by: phk@FreeBSD.org


14607 13-Mar-1996 dyson

Make sure that we pmap_update AFTER modifying the page table entries.
The P6 can do a serious job of reordering code, and our stuff could
execute incorrectly.


14595 12-Mar-1996 dg

Killed some historical #define cruft that we've never used in FreeBSD:

UDOT_SZ
SYSPTSIZE
USRPTSIZE
MSGBUFPTECNT
DMMIN
DMMAX
DMTEXT
USRIOSIZE
VM_PHYS_SIZE


14577 12-Mar-1996 nate

Removed undocumented an unused APM_SLOWSTART code.


14527 11-Mar-1996 hsu

For Lite2: proc LIST changes.
Reviewed by: david & bde


14503 11-Mar-1996 hsu

Change type of code argument to sendsig from unsigned to u_long to make it
consistent w/ signalvar.h and kern_sig.c.
Reviewed by: davidg & bde


14470 10-Mar-1996 dyson

Improved efficiency in pmap_remove, and also remove some of the pmap_update
optimizations that were probably incorrect.


14433 09-Mar-1996 dyson

Correct some new and older lurking bugs. Hold count wasn't being
handled correctly. Fix some incorrect code that was included
to improve performance. Significantly simplify the pmap_use_pt and
pmap_unuse_pt subroutines. Add some more diagnostic code.


14348 03-Mar-1996 jkh

USER_LDT changes for the Willows TwinXPDK toolkit. Only tested with WINE
since that's the only other USER_LDT using code that I know of.
Submitted by: Gary Jennejohn <Gary.Jennejohn@munich.netsurf.de>
Obtained from: {Origin of diffs may be someone else - I only rec'd them from
Gary}


14331 02-Mar-1996 peter

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


14329 02-Mar-1996 peter

This file is "obsolete" and no longer used or referenced.


14328 02-Mar-1996 peter

Add more options into the conf/options and i386/conf/options.i386 files
and the #include hooks so that 'make depend' is more useful. This
covers most of the options I regularly use (but not all) and some other
easy ones.


14245 25-Feb-1996 dyson

Re-insert a missing pmap_remove operation.


14243 25-Feb-1996 dyson

Fix a problem with tracking the modified bit. Eliminate the
ugly inline-asm code, and speed up the page-table-page tracking.


14081 13-Feb-1996 phk

Correct & Update the printing of CPU features. We have printed rubbish
since version 1.117 when Garrett made the switch to %b. Updated to
reflect Intel AP-485 (241618-004).


13915 05-Feb-1996 dg

Unspam my changes in rev 1.54 that John spammed in rev 1.55.


13909 04-Feb-1996 dyson

Changed vm_fault_quick in vm_machdep.c to be global. Needed for
new pipe code.


13908 04-Feb-1996 dg

Rewrote cpu_fork so that it doesn't use pmap_activate, and removed
pmap_activate since it's not used anymore. Changed cpu_fork so that
it uses one line of inline assembly rather than calling mvesp() to
get the current stack pointer. Removed mvesp() since it is no longer
being used.


13902 04-Feb-1996 pst

Tell userconfig about qcam


13852 02-Feb-1996 dg

Killed last change - it was bogus. cpu_switch() already assumes that
return address is on the stack.


13758 30-Jan-1996 wollman

No longer use the cyclecounter to attempt to correct for late or missed
clock interrupts.

Keep a 1-in-16 smoothed average of the length of each tick. If the
CPU speed is correctly diagnosed, this should give experienced users
enough information to figure out a more suitable value for `tick'.


13740 30-Jan-1996 dg

savectx() strikes again: the saved stack pointer wasn't properly adjusted
to remove the return address. It's only the frame pointer and luck that
allowed the code to work at all.


13729 30-Jan-1996 dg

Increase tmpstk size to 8K and make certain it is longword aligned.


13646 27-Jan-1996 bde

Allocate DMA bounce buffers only when requested by drivers. Only the
fd and wt drivers need bounce buffers, so this normally saves 32K-1K
of kernel memory.

Keep track of which DMA channels are busy. isa_dmadone() must now be
called when DMA has finished or been aborted.

Panic for unallocated and too-small (required) bounce buffers.

fd.c:
There will be new warnings about isa_dmadone() not being called after
DMA has been aborted.

sound/dmabuf.c:
isa_dmadone() needs more parameters than are available, so temporarily
use a new interface isa_dmadone_nobounce() to avoid having to worry
about panics for fake parameters. Untested.


13580 23-Jan-1996 dg

Simplified savectx() a little and fixed a bug that caused it to return
garbage in the child process rather than "1" like it is supposed to.

Reviewed by: bde


13543 21-Jan-1996 joerg

Initialize the cpu_class variable. This prevents i386 machines from
panicing with a privileged instruction fault early at boot time.
Submitted by: rock@wurzelausix.CS.Uni-SB.DE (D. Rock)


13495 19-Jan-1996 peter

Some trivial fixes to get it to compile again, plus some new lint:
- cpuclass should be cpu_class
- CPUCLASS_I386 should be CPUCLASS_386
(^^ those only show up if you compile for i386)
- two missing prototypes on new functions
- one missing static


13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


13453 16-Jan-1996 ache

Since new bcd* macros not argument range overflow resistant,
fix argument overflow for years >= 2000


13446 15-Jan-1996 phk

Get rid of two and a half printf in the kernel.
Add more features to the one remaining to handle the job:
+ signed quantity.
# alternate format
- left padding
* read width as next arg.
n numeric in (argument specified) default radix.

Fix the DDB debugger to use these.
Use vprintf in debug routine in pcvt.

The warnings from gcc may become more wrong and intolerable because
of this.

Warning: I have not checked the entire source for unsupported or
changed constructs, but generally belive that there are only a few.

Suggested by: bde


13445 15-Jan-1996 phk

My wife is busy making me a new conical hat, so you don't need to
send any to me this time. Commited an old copy of this files where
the tables were swapped. Duh!.


13444 15-Jan-1996 phk

Soren called an said that I screwed up badly, so I backup until
I find out how... Sorry.


13438 15-Jan-1996 phk

Make bin2bcd and bcd2bin global macroes instead of having local
implementations all over the place.


13402 12-Jan-1996 bde

Fixed handling of Feb 29 in resettodr().


13350 08-Jan-1996 ache

Replace ugly year/month calculations in resettodr to more clean
variants, idea taken from NetBSD clock.c.
At least year calculation was wrong, pointed by Bruce.
Use different strategy to store year for BIOS without RTC_CENTURY


13265 05-Jan-1996 wollman

Convert BOUNCE_BUFFERS and BOUNCEPAGES to new option scheme.


13228 04-Jan-1996 wollman

Convert DDB to new-style option.


13226 04-Jan-1996 wollman

Convert SYSV IPC to new-style options. (I hope I got everything...)
The LKMs will need an extra file, to come later.


13203 03-Jan-1996 wollman

Converted two options over to the new scheme: USER_LDT and KTRACE.


13130 31-Dec-1995 joerg

Restrict /devfs/io perms to 0600.

Nobody in our regular source tree, or in the non-distfile part of the
ports tree does use /dev/io anyway, so this might be replaced by
another scenario some day.


13125 30-Dec-1995 dg

In memory test, cast pointer as "volatile int *", not "int *" to make sure
that gcc doesn't cache the value used in the test. Pointed out by Erich
Boleyn <erich@uruk.org>.


13085 28-Dec-1995 dg

Made bzero a function vector and added a 586/686 optimized version of
bzero.
Deprecated blkclr (removed it).
Removed some old cruft from cpufunc.h.

The optimized bzero was submitted by Torbjorn Granlund <tege@matematik.su.se>
The kernel adaption and other changes by me.


13081 28-Dec-1995 dg

Fix one more label that I overlooked with the P6 support. Sigh.


13065 27-Dec-1995 dg

Update bcopyb & bcopy to reflect changes I made in the libc version of
bcopy:

Be smarter about handling overlapped copies and only go backwards if it
is really necessary. Going backwards on a P6 is much slower than forwards
and it's a little slower on a P5. Also moved the count mask and 'std'
down a few lines - it's a couple percent faster this way on a P5.


13056 27-Dec-1995 markm

Modify the ioctl to handle revectored interrupts for the entropy gatherers.


13031 26-Dec-1995 bde

Removed almost all traces of libkern.a. The objects that were in
libkern.a are now specified by listing their source files in
files.${MACHINE}. The list is machine-dependent to save space.
All the necessary object for each machine must be linked into the
kernel in case an lkm wants one.


13014 25-Dec-1995 dg

Fix a lable goofup I made in the previous P6 support changes.


13004 25-Dec-1995 dg

Fix typo in CPUCLASS.


13000 24-Dec-1995 dg

Add Pentium Pro CPU detection and special handling. For now, all the
optimizations we have for 586s also apply to 686s...this will be fine-
tuned in the future as appropriate.


12990 23-Dec-1995 dg

Use FASTER_NOP rather than NOP in rtcin() - only one inb delay was ever
needed.
Reviewed by: bde


12978 22-Dec-1995 bde

Staticized code that was hidden by `#ifdef DEBUG'.


12977 22-Dec-1995 bde

Increased the double fault stack size from 512 to PAGE_SIZE. This is
wasteful, but better than clobbering the variables below the stack.
About 300 bytes of variables were clobbered when I examined double
faults using ddb. Perhaps a page that is known not to be accessed by
the double fault handler could be used. Such pages are not easy to
find, since the double fault handler calls panic() which calls sync()
and possibly dumpsys().


12960 22-Dec-1995 phk

Remove crufty "pg" function.


12958 22-Dec-1995 dg

Fix a small logic bug that caused the arguments of the previous frame to
be used instead of the ones for the current frame if a breakpoint had been
set at the entry to a function.


12954 21-Dec-1995 julian

i386/i386/conf.c is no longer needed.. remove it from files.i386
redistribute a few last routines to beter places and shoot the file

I haven't act actually 'deleted' the file yet togive people time
to
have done a config.. I.e. they are likely to have done one in a week or so
so I'll remove it then..
it's now empty.
makes the question of a USL copyright rather moot.


12953 21-Dec-1995 julian

Reviewed by: peter (verbally :)
Move functions specific to mem.c to mem.c from conf.c


12952 21-Dec-1995 dg

Rewrote most of the ddb stack traceback code. These changes are smarter
about decoding trap/syscall/interrupt frames and generally works better
than the previous stuff.
Removed some special (incorrect) frobbing of the frame pointer that
was messing some things up with the new traceback code.


12941 20-Dec-1995 wollman

Increase Pentium cyclecounter calibration time to 131072 us. This
experimentally seems to give better results on my machine.


12930 19-Dec-1995 dg

Corrected a typo in a comment.


12929 19-Dec-1995 dg

Implemented a (sorely needed for years) double fault handler to catch stack
overflows.
It sure would be nice if there was an unmapped page between the PCB and
the stack (and that the size of the stack was configurable!). With the
way things are now, the PCB will get clobbered before the double fault
handler gets control, making somewhat of a mess of things. Despite this,
it is still fairly easy to poke around in the overflowed stack to figure
out the cause.


12904 17-Dec-1995 bde

Fixed 1TB filesize changes. Some pindexes had bogus names and types
but worked because vm_pindex_t is indistinuishable from vm_offset_t.


12889 16-Dec-1995 peter

Catch a couple more null devsw dereferences...


12850 14-Dec-1995 bde

Added a prototype. Merged prototype lists.


12849 14-Dec-1995 bde

Added a prototype.


12848 14-Dec-1995 bde

Moved some more prototypes outside of ifdefs and grouped them together.


12844 14-Dec-1995 bde

Fixed staticization of DDB functions.


12840 14-Dec-1995 bde

Removed my devsw access functions [un]register_cdev() and
getmajorbyname() which were a better (sigh) temporary interface to
the going-away devswitches.

Note that SYSINIT()s to initialize the devswitches would be fatal
in syscons.c and pcvt_drv.c (and are bogus elsewhere) because they
get called independently of whether the device is attached; thus
devices that share a major clobber each other's devswitch entries
until the last one wins.

conf.c:
Removed stale #includes and comments.


12839 14-Dec-1995 bde

Fixed the type of some sysinit functions.


12827 14-Dec-1995 peter

GENERIC/LINT: Remove redundant quoting on some option lines.
LINT: add a couple of new/missing/undocumented options
files.i386: add linux code so that you can compile a kernel with static
linux emulation ("options LINUX")
i386/*: use #if defined(COMPAT_LINUX) || defined(LINUX) to enable static
support of linux emulation (just like "IBCS2" makes ibcs2 static)

The main thing this is going to make obvious, is that the LINUX code
(when compiled from LINT) has a lot of warnings, some of which dont look
too pleasant..


12820 14-Dec-1995 phk

Another mega commit to staticize things.


12819 14-Dec-1995 phk

A Major staticize sweep. Generates a couple of warnings that I'll deal
with later.
A number of unused vars removed.
A number of unused procs removed or #ifdefed.


12817 14-Dec-1995 phk

Make math_emulators LKMable.


12813 13-Dec-1995 julian

devsw tables are now arrays of POINTERS to struct [cb]devsw
seems to work hre just fine though I can't check every file
that changed due to limmited h/w, however I've checked enught to be petty
happy withe hte code..

WARNING... struct lkm[mumble] has changed
so it might be an idea to recompile any lkm related programs


12791 12-Dec-1995 gibbs

Have Eisa and PCI probes occur before ISA probes. Buslogic EISA and PCI cards
can be found in ISA compatibility mode by the ISA driver, but since the
EISA and PCI probes are non-invasive, we prefer them to find the card first.
Since both EISA and PCI probes can rely on interrupts, enable them before
probing of any type is performed. All ISA probes are still "protected" by
splhigh().


12767 11-Dec-1995 dyson

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


12731 10-Dec-1995 bde

Removed new alias d_size_t for d_psize_t.

Removed old aliases d_rdwr_t and d_ttycv_t for d_read_t/d_write_t and
d_devtotty_t.

Sorted declarations of switch functions into switch order.

Removed duplicated comments and declarations of nonexistent switch
functions.


12724 10-Dec-1995 phk

Staticize and cleanup.


12722 10-Dec-1995 phk

Staticize and cleanup.
remove a TON of #includes from machdep.


12713 10-Dec-1995 bde

Restored used function fusword() (used by GPL math emulator).


12702 09-Dec-1995 phk

Remove various unused symbols and procedures.


12701 09-Dec-1995 phk

Move sysctl machdep.consdev to cons.c


12678 08-Dec-1995 phk

Julian forgot to make the *devsw structures static.


12675 08-Dec-1995 julian

Pass 3 of the great devsw changes
most devsw referenced functions are now static, as they are
in the same file as their devsw structure. I've also added DEVFS
support for nearly every device in the system, however
many of the devices have 'incorrect' names under DEVFS
because I couldn't quickly work out the correct naming conventions.
(but devfs won't be coming on line for a month or so anyhow so that doesn't
matter)

If you "OWN" a device which would normally have an entry in /dev
then search for the devfs_add_devsw() entries and munge to make them right..
check out similar devices to see what I might have done in them in you
can't see what's going on..
for a laugh compare conf.c conf.h defore and after... :)
I have not doen DEVFS entries for any DISKSLICE devices yet as that will be
a much more complicated job.. (pass 5 :)

pass 4 will be to make the devsw tables of type (cdevsw * )
rather than (cdevsw)
seems to work here..
complaints to the usual places.. :)


12673 07-Dec-1995 peter

The static prototype for setroot() was apparently accidently moved
into a block of code that was #ifdef CD9660, meaning you got a compile
failure if you didn't have the CD9660 filesystem configured.


12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


12655 06-Dec-1995 bde

Added explicit #include of <vm/vm.h> so that conf.c doesn't break when
vnode_if.h doesn't include vm stuff.


12649 06-Dec-1995 peter

Moving the kern.dumpdev sysctl handler from kern_sysctl.c to swapgeneric.c
is not real helpful since swapgeneric.c doesn't seem to be used, except
perhaps on a GENERIC kernel. (Sorry Paul.. :-)

I've moved it from swapgeneric.c to autoconf.c, since autoconf.c also deals
with dumpdev things. There may be a better place.....


12640 05-Dec-1995 bde

Fixed ity's d_stop entry. itystop() wasn't used. itystop() is inadequate
but probably harmless. It's hard to tell because apparently no one runs
ity.

Fixed ity's d_reset entry. `nx' entries should never be used for existing
devices.

conf.c:
Moved a prototype to a better place.

Removed a stale #define.


12637 05-Dec-1995 bde

Removed dummy routines sscstrategy(), sscread(), sscwrite() and
sscselect(). Use the standard dummies nostrategy(), noread(),
nowrite() and noselect() instead.

sscread() and sscwrite() returned bogus errnos. It isn't possible
to return an error from a select routine so noselect() is just as
bogus as sscselect() (it's equivalent to nullselect()).


12623 04-Dec-1995 phk

A major sweep over the sysctl stuff.

Move a lot of variables home to their own code (In good time before xmas :-)

Introduce the string descrition of format.

Add a couple more functions to poke into these marvels, while I try to
decide what the correct interface should look like.

Next is adding vars on the fly, and sysctl looking at them too.

Removed a tine bit of defunct and #ifdefed notused code in swapgeneric.


12609 03-Dec-1995 bde

Added prototypes.

Staticized.

Moved a NULL-initialized variable to the bss.


12607 03-Dec-1995 bde

Completed function declarations and/or added prototypes.


12604 03-Dec-1995 bde

Staticized.

Completed function declarations and added prototypes.

Cleaned up prototypes.

Cleaned up #includes.

Removed unused variable `dkn'.


12533 29-Nov-1995 wollman

Fix Pentium CPU rate diagnosis:
- Don't print out meaningless iCOMP numbers, those are for droids.
- Use a shorter wait to determine clock rate to avoid deficiencies
in DELAY().
- Use a fixed-point representation with 8 bits of fraction to store
the rate and rationalize the variable name. It would be
possible to use even more fraction if it turns out to be
worthwhile (I rather doubt it).

The question of source code arrangement remains unaddressed.


12521 29-Nov-1995 julian

If you're going to mechanically replicate something in 50 files
it's best to not have a (compiles cleanly) typo in it! (sigh)


12520 29-Nov-1995 julian

#ifdef out nearly the entire file of conf.c when JREMOD is defined
add a few safety checks in specfs because
now it's possible to get entries in [cd]devsw[] which are ALL NULL
so it's better to discover this BEFORE jumping into the d_open() entry..

more check to come later.. this getsthe code to the stage where I
can start testing it, even if I haven't caught every little error case...
I guess I'll find them quick enough..


12517 29-Nov-1995 julian

OK, that's it..
That's EVERY SINGLE driver that has an entry in conf.c..
my next trick will be to define cdevsw[] and bdevsw[]
as empty arrays and remove all those DAMNED defines as well..

Each of these drivers has a SYSINIT linker set entry
that comes in very early.. and asks teh driver to add it's own
entry to the two devsw[] tables.

some slight reworking of the commits from yesterday (added the SYSINIT
stuff and some usually wrong but token DEVFS entries to all these
devices.

BTW does anyone know where the 'ata' entries in conf.c actually reside?
seems we don't actually have a 'ataopen() etc...

If you want to add a new device in conf.c
please make sure I know
so I can keep it up to date too..

as before, this is all dependent on #if defined(JREMOD)
(and #ifdef DEVFS in parts)


12499 28-Nov-1995 peter

After having put on my Asbestos suit, complete the MFS_ROOT part of Terry's
mountroot changes. This means that the mfs_initminiroot functionality
into the root mfs_mount....


12471 24-Nov-1995 bde

Staticized. Moved some ero-initialized values to the bss.

Added prototypes.


12429 20-Nov-1995 phk

Mega commit for sysctl.
Convert the remaining sysctl stuff to the new way of doing things.
the devconf stuff is the reason for the large number of files.
Cleaned up some compiler warnings while I were there.


12417 20-Nov-1995 phk

Remove unused vars.


12357 18-Nov-1995 bde

Fixed the type of vm_fault_quick() - don't convert types back and forth
through bogus immediate types.

Added prototypes.


12356 18-Nov-1995 bde

Fixed handling of trace traps when cons_unavail is set. Added comments
about handing of other cases.


12290 14-Nov-1995 phk

Fix a couple of printfs.


12243 12-Nov-1995 phk

The entire sysctl callback to read/write version. I havn't tested this as
much as I'd like to, but the malloc stunt I tried for an interim for
sure does worse.
Now we can read and write from any kind of address-space, not only
user and kernel, using callbacks.
This may be over-generalization for now, but it's actually simpler.


12223 12-Nov-1995 bde

Oops, forgot the following log message in the previous commit:

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


12220 12-Nov-1995 bde

Reviewed by:
Submitted by:
Obtained from:


12204 11-Nov-1995 bde

Cleaned up after moving the prototypes:
- collapsed #if-#elses that became null.
- removed dead comments.

Moved #defines that always have the same value to the tables.
Collapsed more #if-#elses that became null. None are left.

Removed repetitive comments.


12186 10-Nov-1995 phk

convert more sysctl variables.


12118 06-Nov-1995 bde

Replaced bogus macros for dummy devswitch entries by functions.
These functions went away:

enosys (hasn't been used for some time)
enxio
enodev
enoioctl (was used only once, actually for a vop)

if_tun.c:
Continued cleaning up...

conf.h:
Probably fixed the type of d_reset_t. It is hard to tell the correct
type because there are no non-dummy device reset functions.

Removed last vestige of ambiguous sleep message strings.


12091 05-Nov-1995 gibbs

Modifications for the new eisaconf.


12078 04-Nov-1995 markm

Remove the #ifdev DEVRANDOM's, as promised.

/dev/random is now a part of the kernel! you will need to make
the device in /dev: sh MAKEDEV random
and take a look at some test code in src/tools/test/random.


12072 04-Nov-1995 bde

Finished(?) moving prototypes for devswitch functions to <machine/conf.h>.
One was hidden in an ifdef.

Continued cleaning up not so new init stuff.

Removed some more /*ARGSUSED*/ for devswitch functions.


12071 04-Nov-1995 bde

Moved prototypes for devswitch functions from conf.c and driver sources
to <machine/conf.h>. conf.h was mechanically generated by
`grep ^d_ conf.c >conf.h'. This accounts for part of its ugliness. The
prototypes should be moved back to the driver sources when the functions
are staticalized.


12008 02-Nov-1995 peter

When the sync-on-shutdown fails to clear all buffers, this bit of code
can print them out.
I have seen that MFS can leave BUSY buffers, preventing a clean reboot...


11963 31-Oct-1995 peter

Add a simplistic netisr register routine - I need this now for ppp-2.2.


11956 31-Oct-1995 joerg

Minor correction for the "od" driver.

Submitted by: akiyama@kme.mei.co.jp (Shunsuke Akiyama)


11946 30-Oct-1995 markm

Security fix - do not allow anyone but root to choose the interrupts used
in the the randomising process.
(This is a change to the /dev/random ioctl()))


11940 30-Oct-1995 bde

Removed bogus statics in declarations that don't allocate storage.

Added prototypes.


11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


11919 29-Oct-1995 bde

Fix mmioctl() for !DEVRANDOM case. mmioctl() is a function, not a
pointer to a function.


11875 28-Oct-1995 markm

Theodore Ts'po's random number gernerator for Linux, ported by me.
This code will only be included in your kernel if you have
'options DEVRANDOM', but that will fall away in a couple of days.
Obtained from: Theodore Ts'o, Linux


11872 28-Oct-1995 phk

Remove unused functions and variables, make things static, and other cleanups.


11703 23-Oct-1995 dg

Remove PG_W bit setting in some cases where it should not be set.

Submitted by: John Dyson <dyson>


11693 23-Oct-1995 dg

More improvements to the logic for modify-bit checking. Removed
pmap_prefault() code as we don't plan to use it at this point in time.

Submitted by: John Dyson <dyson>


11642 22-Oct-1995 dg

Simplified some expressions.


11602 21-Oct-1995 phk

A mixed bag of changes, relating to getting the state in "lsdev" right,
and pccard support to work sensibly. Better by far, but still not good.


11523 15-Oct-1995 phk

Pull all of libkern.a in (though not mcount) so the LKM's don't come
out shorthanded. Makes the idea of libkern pretty void now...


11474 14-Oct-1995 dg

Latest fixes from Serge:

I tried to solve the problem of IDE probing compatibility in this version.
When compiled without an ATAPI option, the wd driver is
fully backward compatible with 2.0.5. With ATAPI option,
the wdprobe becomes strictly weaker. That is, if wdprobe works
without ATAPI option, it will always work with it too.

Another problem was with the CD-ROM drive attached as a slave
in the IDE bus, where there is no master. All IDE CD-ROM
drives are shipped in slave configuration, and most users
just plug them in, never thinking about jumpers.
It works fine with ms-dos and ms-windows, and this
version of the driver supports it as well.

The eject op can now load disks. Just repeat it twice,
and the disk will be ejected and then loaded back.

The disc cannot be ejected if it is mounted.

Submitted by: Serge Vakulenko, <vak@cronyx.ru>


11471 14-Oct-1995 jkh

Coerce the exit message into making more sense.


11464 14-Oct-1995 bde

Don't allow mmapping of physical page 6 (ENXIO).

nxmmap() returned a bogus value as well as having a bogus type. Some
drivers use nxmmap() for configured devices (`nx' functions should
only be used for unconfigured devices). These drivers allowed mmapping
physical page 6, which may have interesting contents. vm has kludges
to avoid the same bug with nullop() returning page 0 and enodev()
returning page 19 (ENODEV), but didn't handle enxio() returning page 6.
vm is the wrong place to handle these bugs.


11463 14-Oct-1995 bde

Restore initialization of %ecx for the !I586_CPU case.

Don't declare _i586_ctr_bias. The usual style, which was
followed in microtime.s, is to omit extern declarations.


11461 13-Oct-1995 wollman

Only compile Pentium microtime in Pentium kernels.

Submitted by: Michael Butler <imb@scgt.oz.au>


11453 12-Oct-1995 bde

Fix select().

Remove some unused code and never-working backwards compatibility code.

Add prototypes.
Reviewed by: babkin@hq.icb.chel.su (Serge Babkin)


11452 12-Oct-1995 wollman

Reduce jitter of Pentium microtime() implementation by letting the counter
free-run and doing a subtract in microtime() rather than resetting the
counter to zero at every clock tick. In combination with the changes to
kern_clock.c, this should eliminate all the immediately obvious sources
of systematic jitter in timekeeping on Pentium machines.


11401 10-Oct-1995 swallace

Remove the IBCS2 option for the socksys driver. A pointer to /dev/null
will work fine now.


11390 10-Oct-1995 bde

Include <sys/sysproto.h> so that machdep.c compiles cleanly again
(the prototype for sync() moved).

KNFize and otherwise clean up printing of BIOS geometries.

Add prototypes.

Continue cleaning up new init stuff.


11343 09-Oct-1995 bde

Fix tracing of syscalls. The previous fix required the undocumented
option DDB_NO_LCALLS to stop ddb getting control and broke all ddb
tracing. Now there is no option and no way for ddb to trace at
address _Xsyscall or to _Xsyscall, but tracing everywhere else
works. The previous fix did unnecessary things for Linux syscalls.

Don't bother checking that syscall frames are for user mode.

Make debugger traps inside the kernel (except at addresses _Xsyscall
and _Xsyscall+1) fatal if ddb is not configured. They "can't happen".

Add prototypes.

Remove stupid comments, e.g., /*ARGSUSED*/ for args that are used.


11255 06-Oct-1995 jkh

matcd -> matcdc


11222 05-Oct-1995 phk

remove GCC divsi3 routines which are never used.


11163 04-Oct-1995 julian

Submitted by: Juergen Lock <nox@jelal.hb.north.de>
Obtained from: other people on the net ?

1. stepping over syscalls (gdb ni) sends you to DDB, and returned
to the wrong address afterwards, with or without DDB. patch in
i386/i386/trap.c below.

2. the linux emulator (modload'ed) still causes panics with DIAGNOSTIC,
re-applied a patch posted to one of the lists...


11147 03-Oct-1995 wpaul

Make swapgeneric.c compile and (sort of) work again.

This is really just a token gesture at this point. We can't really turn
on swapping here anymore because (I think) we need to do a namei() lookup
in order to get a vnode which we can pass to swaponvp(). But to do a
namei(), we need a properly initialized proc structure, and we don't
have one. (I was hopping to toss it proc0, but that isn't initialized
until later.) We could conceiveably fabricate a dummy proc structure
to make namei() happy, but that's too much work for too little payoff.

For now, swapgeneric's only remaining saving grace is that it lets
you set the root device with -a.


11116 01-Oct-1995 dg

Insert zeroed pages at the head of the zero queue rather than at the tail.
A measurable performance improvement results from the potential for the
page to be partially cached when it is eventually used.


10924 20-Sep-1995 dg

Fix rounding bug in last commit that would have caused the problem to not
be completely fixed.


10921 20-Sep-1995 jkh

Add `visual' command to help menu.
Fix a small benign display bug.
Submitted by: Michael Smith <msmith@atrad.adelaide.edu.au>


10913 20-Sep-1995 jkh

Now supports `expand all' and `collapse all'.
Now comes up in the old line-oriented interface by default for serial
and pcvt folk with a `visual' command for going to the visual interface.
Best of both worlds, no?
Submitted by: Michael Smith <msmith@atrad.adelaide.edu.au>


10826 16-Sep-1995 pst

Our existing Cyrix cache-disable code was short-cutting the steps for
setting the control register. Make the read and write operations two
completely separate steps.

While we're at it, pull in the whole set of Cyrix cache control options
from NetBSD-current, since a few motherboards do the right thing with
the Cyrix chip.

There is no option to disable the internal cache completely (yet).

Reviewed by: pst
Obtained from: NetBSD


10812 15-Sep-1995 dg

Check for page being resident when doing I/O with /dev/kmem and return
EFAULT if it is not resident. This prevents the system from manufacturing
a zero-fill page for unused but allocated areas of the kernel's VM. This
should fix the "CMAP busy" panic that some people saw during system
startup.


10782 15-Sep-1995 dg

1) Killed 'BSDVM_COMPAT'.
2) Killed i386pagesperpage as it is not used by anything.
3) Fixed benign miscalculations in pmap_bootstrap().
4) Moved allocation of ISA DMA memory to machdep.c.
5) Removed bogus vm_map_find()'s in pmap_init() - the entire range was
already allocated kmem_init().
6) Added some comments.

virual_avail is still miscalculated NKPT*NBPG too large, but in order to
fix this properly requires moving the variable initialization into locore.s.
Some other day.


10762 15-Sep-1995 dg

1) Don't double map the kernel page tables. The double mapping was never
used and went a long way toward confusing the code.
2) Fix proc0's initial stack to not be 48 bytes smaller than it needs to
be.
3) Correct comment about 'first' arg to init386().


10666 10-Sep-1995 bde

Make pcvt and syscons live in the same kernel. If both are enabled, then
the first one in the config has priority. They can be switched using
userconfig().

i386/i386/conf.c:
Initialize the shared syscons/pcvt cdevsw entry to `nx'.

Add cdevsw registration functions.

Use devsw functions of the correct type if they exist.

i386/i386/cons.c:
Add renamed syscons entry points to constab.

i386/i386/cons.h:
Declare the renamed syscons entry points.

i386/i386/machdep.c:
Repeat console initialization after userconfig() in case the current
console has become wrong. This depends on cn functions not wiring down
anything important.

sys/conf.h:
Declare new functions.

i386/isa/isa.[ch]:
Add a function to decide which display driver has priority. Should be
done better.

i386/isa/syscons.c:
Rename pccn* -> sccn*.

Initialize CRTC start address in case the previous driver has moved it.

i386/isa/syscons.c, i386/isa/pcvt/*
Initialize the bogusly shared variable Crtat dynamically in case the
stored value was changed by the previous driver.

Initialize cdevsw table from a template.

Don't grab the console if another display driver has priority.

i386/isa/syscons.h, i386/isa/pcvt/pcvt_hdr.h:
Don't externally declare now-static cdevsw functions.

i386/isa/pcvt/pcvt_hdr.h:
Set the sensitive hardware flag so that pcvt doesn't always have lower
priority than syscons. This also fixes the "stupid" detection of the
display after filling the display with text.

i386/isa/pcvt/pcvt_out.c:
Don't be confused the off-screen cursor offset 0xffff set by syscons.

kern/subr_xxx.c:
Add enough nxio/nodev/null devsw functions of the correct type for syscons
and pcvt.


10665 10-Sep-1995 bde

cons.c:
Split off cdevsw initialization in cninit() into a new function
cninit_finish() that isn't called until all hardware device drivers
have been attached. The bdevsw entry of the driver for the physical
console needs to be hooked after the physical driver has been
attached in case the attachment modified the entry.

Rearrange cninit() to avoid changing cn_tab until the driver for the
physical console has been initialized, so that the previous driver
(if any) can be used for debugging.

Start removing half-baked lint support. bdevsw functions usually have
unused args but /*ARGSUSED*/ was used for only about 5% of them.

cons.h:
Declare cn_init_finish().

autoconf.c:
Call cn_init_finish().

Start adding prototypes. Functions with bogus linkage (extern where
static is probably should be static) are explicitly declared as extern
so that the can be found easily (extern in a non-header is usually
wrong).

All:
Continue cleaning up init stuff: init functions shall be static;
INITs should be at the start of files...


10653 09-Sep-1995 dg

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


10624 08-Sep-1995 bde

Fix benign type mismatches in devsw functions. 82 out of 299 devsw
functions were wrong.


10619 08-Sep-1995 julian

DAng! forgot this as well
cdevsw entry (71) for the asc scanner driver


10616 08-Sep-1995 dg

1) Really print 'real' memory - use Maxmem, not physmem.
2) Output K bytes instead of pages as this means something to more people.
3) Moved printf of avail memory to after vm_bounce_init() call so that
bounce buffers are included in the figure.
4) Killed initcpu(); it's an unused vestige from the VAX.


10612 07-Sep-1995 jkh

Render devices inactive properly.
Protect against name overflow.
Submitted by: Michael Smith <msmith@atrad.adelaide.edu.au>


10609 07-Sep-1995 dg

Minor cleanup and (very) small micro optimization to Xsyscall (and the
linux one)..


10594 06-Sep-1995 wpaul

Put back the "real memory =" printf() that vanished when the code to
handle holes in memory was added.


10573 06-Sep-1995 jkh

New version of userconfig from Michael Smith. Give this one a try,
folks! Let him know what you think.
Submitted by: Michael Smith <msmith@atrad.adelaide.edu.au>


10550 03-Sep-1995 jkh

Now apply local changes to make this compile in 2.2-current (sorry about
delay between commit of original author's sources and my changes - I was
called away).


10546 03-Sep-1995 dyson

Machine dependent routines to support pre-zeroed free pages. This
significantly improves demand zero performance.


10540 03-Sep-1995 jkh

Bring the Digiboard driver (ALPHA version) into -current. Includes
latest patches for PC/Xe boards.
Submitted by: "Serge A. Babkin" <babkin@hq.icb.chel.su>


10537 03-Sep-1995 julian

devfs changes..
changes to allow devices that don't probe (e.g. /dev/mem)
to create devfs entries
this required giving 'configure' its own SYSINIT entry
so we could duck in just before it with a DEVFS init
and some device inits..
my devfs now looks like:
./misc
./misc/speaker
./misc/mem
./misc/kmem
./misc/null
./misc/zero
./misc/io
./misc/console
./misc/pcaudio
./misc/pcaudioctl
./disks
./disks/rfloppy
./disks/rfloppy/fd0.1440
./disks/rfloppy/fd1.1200
./disks/floppy
./disks/floppy/fd0.1440
./disks/floppy/fd1.1200
also some sligt cleanups.. DEVFS needs a lot of work
but I'm getting back to it..


10431 30-Aug-1995 bde

Declare vfs_mountroot() in the right place.


10425 29-Aug-1995 bde

Remove relocation of Crtat. Drivers already relocate it (somewhat
bogusly). We used to undo the driver relocation here before doing
a somewhat less bogus relocation. The result was a null relocation
here.


10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


10268 25-Aug-1995 bde

Remove extra args from the calls to getit(). The bug was benign with the
default function call convention.


10157 21-Aug-1995 dg

A couple of micro optimizations to improve NULL syscall performance by
about 2%.


10126 20-Aug-1995 dg

Fixed a few bugs and annoyances with boot():

1) deal with cold flag better
2) check for key input more often
3) get rid of unused variables
4) minor formatting improvements


10109 19-Aug-1995 joerg

First part of importing the Japanese `od' driver.

Claim the major numbers (before sombedoy else jumps in again and
claims the slots for his foocd driver :-), install all the hooks that
are required.

While i've been at this, i've cleaned up some of the routines at the
end of i386/conf.c; all the importers of the latest CDROM drivers
forgot to fill in the appropriate information. The `ata' driver
(vapourware?) does only occupy a slot in the bdevsw[] array, btw.

The actual import of the code does require a minor change in the SCSI
subsystem, and i want to have this reviewed by Peter first, so it will
be deferred for some days. The driver is already working for me
though.

Submitted by: akiyama@kme.mei.co.jp (Shunsuke Akiyama)


10097 18-Aug-1995 jkh

Bring in Serge Vakulenko's IDE CDROM (ATAPI) driver. A number of
people have now indicated to me that it's working more than well
enough to bring into -current.
Submitted by: Serge Vakulenko <vak@cronyx.ru>


10092 17-Aug-1995 dg

Killed some unused stuff inherited from Bill Jolitz. Note that since
this changes the size of the pcb struct, gdb will need to be rebuilt
or debugging won't work correctly.

Reviewed by: Bruce Evans


10077 16-Aug-1995 jkh

Remove the fundamentally useless probe and attach commands I added.
They're landmines.


10063 15-Aug-1995 bde

Fake a call frame for traps so that `gdb -k' can report where fatal
traps occurred. This also helps ddb backtrace through trap frames.
Backtracing through syscall and interrupt frames still doesn't work
but it is relatively unimportant and more expensive to fix.


9940 05-Aug-1995 peter

Grab next major (68) for the Specialix SI/XIO driver which is due to
come in RSN. As Jordan said "First in, first served.."


9831 31-Jul-1995 jkh

Reserve space for Jim Lowe's impending Matrox Meteor card driver.
Submitted by: james


9799 30-Jul-1995 dg

Fix a bug in my disabled version of trap_pfault()...curpcb may be NULL even
when curproc isn't. This condition occurs at system startup and perhaps
at other times.


9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


9758 29-Jul-1995 bde

Removed the poorly named unused bogus common variable `total'. It was
obsoleted by the VM_METER sysctl a long time ago.


9744 28-Jul-1995 dg

Fixed bug I introduced with the memory-size code rewrite that broke
floppy DMA buffers...use avail_start not "first". Removed duplicate
(and wrong) declaration of phys_avail[].

Submitted by: Bruce Evans, but fixed differently by me.


9638 22-Jul-1995 bde

Remove vat_audio driver support here too.

Clean up formatting for reserved and Talisman driver entries.

Fix d_select entry for unused driver.


9578 19-Jul-1995 dg

Rewrote memory sizing code to generally deal with holes in extended memory.
This code change should allow certain Compaq machines with a 128K hole
at 16MB to work.


9551 16-Jul-1995 dg

Added crdselect definition for !NCRD case.


9550 16-Jul-1995 peter

This fixes a compiler warning, and a cosmetic problem with the linux
emul code when compiling with "options KTRACE".
ktrsyscall() was expecting an array of integers, this was passing the
address of a structure containing an array of integers..
The cosmetic problem was that it was calling the "enter syscall"
trace hook twice - this looks like a cut/paste error/typo.


9547 16-Jul-1995 phk

Reviewed by: phk
Submitted by: Andrew McRae <andrew@mega.com.au>

Some initial commits from the pcmcia stuff, to make life easier for the
testers.

We will use the name "pccard" since that is really the buzzword at present.


9546 16-Jul-1995 phk

Make the bootinfo structure visible from sysctl.
This can be used in libdisk to guess a better bios-geometry.


9545 16-Jul-1995 joerg

Include ``options POWERFAIL_NMI'' for owners of older (non-apm)
notebooks where a powerfail condition (external power drop; battery
state low) is signalled by an NMI. Makes it beep instead of panicing.

Reviewed by: davidg


9533 16-Jul-1995 dg

Truncate the fault address to a page boundry when calling vm_fault(). The
last change to fix the fault-twice bug with page tables wasn't quite
complete.


9524 14-Jul-1995 dg

Fixed bug that caused page tables to be faulted twice instead of once.

Submitted by: John Dyson


9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


9482 11-Jul-1995 bde

Enable pcvt in LINT and don't generate a compile time error if syscons
and pcvt are both configured when LINT is defined. There will be a
link time error instead. This is to test building of pcvt more often.


9345 28-Jun-1995 dg

Killed redundant vnode_pager_umount() call. This is already done at
FS unmount time.


9344 28-Jun-1995 dg

Make path to kernel absolute if it is passed in relative. This fixes
a related bug in some of the new 'foo'boot bootstrap code that has been
added over the past months. This change makes it no longer necessary
for the bootstrap to fix up the path (i.e. it can be removed).


9326 26-Jun-1995 bde

Partially fix `sysctl machdep.console_device'. The fix will be complete
when syscons stops mapping the console to minor MAXCONS. There is
usually no corresponding device in /dev, and the correct device has
minor 0.

cons.c:
Initialize cn_tty properly, so that CPU_CONSDEV can work.
Comment about too many variants of the console tty pointer.

machdep.c:
Return device NODEV and not error EFAULT when there is no console device.


9310 25-Jun-1995 joerg

Add a `reset' command to UserConfig. Our documentation does
explicitly advise the users to reset the machine in case they have
done bogus things (to prevent `dset' from merging the changes into
/kernel), and it's also useful for machines with serial consoles that
are physically in another place.


9218 14-Jun-1995 bde

Replace \n\r by \n in error messages.


9217 14-Jun-1995 bde

Output \n as \r\n, not as \n\r.


9202 11-Jun-1995 rgrimes

Merge RELENG_2_0_5 into HEAD


8876 30-May-1995 rgrimes

Remove trailing whitespace.


8833 29-May-1995 dg

Fix setdumpdev():
- the major number wasn't checked, so accesses beyond the end of bdevsw[]
were possible. Bogus major numbers are easy to get because `sysctl -w'
doesn't handle dev_t's reasonably - it doesn't convert names to dev_t's
and it converts the number 1025 to the dev_t 0x35323031.
- Driver d_psize() functions return -1 to indicate error ENXIO or ENODEV
(the interface is too braindamaged to say which). -1 was interpreted
as a size and resulted in the bogus error ENOSPC.
- it was possible to set the dumpdev for devices without a d_psize()
function. This is equivalent to setting the dumpdev to NODEV except
it confuses sysctl.
- change a 512 to DEV_BSIZE. There is an official macro dtoc() for
converting "pages" to disk blocks but it is never used in /usr/src/sys.
There is much confusion between PAGE_SIZE sized pages and NBPG sized
pages. Maxmem consists of both.

Not fixed:
- there is nothing to invalidate the dumpdev if the media goes away.
This reduces the benefits of the early calculation of dumplo. Bounds
checking in the dump routines is relied on to reduce the risk of
damage and little would be lost by relying on the dump routines to
calculate dumplo.
- no attempt is made to stay away from the start of the device to
avoid clobbering labels.

Fix wrong && anachronistic comment about the type of bootdev.

Reviewed by: davidg
Submitted by: Bruce Evans


8748 25-May-1995 dg

Made "NMBCLUSTERS" calculation dynamic and fixed bogus use of "NMBCLUSTERS"
in machdep.c (it should use the global nmbclusters). Moved the calculation
of nmbclusters into conf/param.c (same place where nmbclusters has always
been assigned), and made the calculation include an extra amount based
on "maxusers". NMBCLUSTERS can still be overrided in the kernel config
file as always, but this change will make that generally unnecessary. This
fixes the "bug" reports from people who have misconfigured kernels seeing
the network hang when the mbuf cluster pool runs out.

Reviewed by: John Dyson


8590 18-May-1995 dg

Added "BROKEN_KEYBOARD_RESET" option to disable using the keyboard reset
in cpu_reset(). Some MBs don't deal with this properly.

Submitted by: Rod Grimes


8504 14-May-1995 dg

Changed swap partition handling/allocation so that it doesn't
require specific partitions be mentioned in the kernel config
file ("swap on foo" is now obsolete).

From Poul-Henning:

The visible effect is this:

As default, unless
options "NSWAPDEV=23"
is in your config, you will have four swap-devices.
You can swapon(2) any block device you feel like, it doesn't have
to be in the kernel config.

There is a performance/resource win available by getting the NSWAPDEV right
(but only if you have just one swap-device ??), but using that as default
would be too restrictive.

The invisible effect is that:

Swap-handling disappears from the $arch part of the kernel.
It gets a lot simpler (-145 lines) and cleaner.

Reviewed by: John Dyson, David Greenman
Submitted by: Poul-Henning Kamp, with minor changes by me.


8481 12-May-1995 wollman

The death of `options NODUMP'. Now the dump area can be dynamically
configured (and unconfigured) on the fly. A sysctl(3) MIB variable is
provided to inspect and modify the dump device setting.


8473 12-May-1995 wpaul

- Add an entry to allow swapping on a vn device (if one is configured).
- Do the right thing when booting in NFS diskless mode, which is nothing.
Make the default unconfigured entries for swdevt[0] and dumplo something
that swapconf() will ignore and not choke on (the swap setup is done
in nfs_vfsops.c when booting diskless).


8456 11-May-1995 rgrimes

Fix -Wformat warnings from LINT kernel.


8448 11-May-1995 bde

Add variable `idelayed' and macros setdelayed() and schedsofttty()
to access it. setdelayed() actually ORs the bits in `idelayed' into
`ipending' and clears `idelayed'.

Call setdelayed() every (normal) clock tick to convert delayed
interrupts into pending ones.

Drivers can set bits in `idelayed' at any time to schedule an interrupt
at the next clock tick. This is more efficient than calling timeout().
Currently only software interrupts can be scheduled.


8433 11-May-1995 wpaul

If you config a kernel with 'config kernel swap generic' and try to
boot diskless with it, you get a panic because setconf() is only
called for mountroot == ffs_mountroot. It really needs to be called
no matter what manner of rootfs we have. I can't really say if
swapgeneric will work with a CD-ROM though. (I get the feeling I'm
the only one who uses swapgeneric these days anyway.)


8427 11-May-1995 wollman

Delete two debugging printfs that mistakenly crept in.


8426 11-May-1995 wollman

Make networking domains drop-ins, through the magic of GNU ld. (Some day,
there may even be LKMs.) Also, change the internal name of `unixdomain'
to `localdomain' since AF_LOCAL is now the preferred name of this family.
Declare netisr correctly and in the right place.


8349 08-May-1995 jkh

The value -1 is special. Allow it.
Submitted by: bde


8330 07-May-1995 jkh

Add additional check for IRQ > 15. This code still needs a lot of work!
Remove silly "Naffy, the Wonder Porpoise" attribution and add more
justifiable (and overdue) attribution to Bruce Evans. Look at it
as a delete and add operation batched together, not a substitution. :-)


8329 07-May-1995 jkh

Duh! Get the section number for config(8) right! :)


8328 07-May-1995 jkh

If user specifies IRQ 2, remap it to IRQ 9 with a warning.
Suggested by: rgrimes


8214 02-May-1995 dg

Added a memcpy() routine.


8211 01-May-1995 dyson

Fixed a problem that can cause left-over pv_entries and as
as side-effect, removed some legacy code that was necessary
when we called vm_fault inside of vm_fault_quick instead of using
the kernel/user space byte move routines.


8114 28-Apr-1995 dufault

Add National Instruments "LabPC" driver


8074 26-Apr-1995 rgrimes

Add outb to keyboard controller to do a cpu_reset, this fixes 2 known
cases of motherboards that failed to reboot.


8055 25-Apr-1995 phk

Add support for MFS root filesystem.


8047 24-Apr-1995 bde

Undo the move of `#include "sc.h"' etc. to cons.h. It broke anything
that includes <machine/cons.h>.


8023 23-Apr-1995 bde

Declare the console switch functions completely.

Move declarations of console functions to cons.h (they should be
config(8)ed).


8015 23-Apr-1995 julian

hmm spotted a difference resulting from a merge I didn't examine close enough


8014 23-Apr-1995 julian

include hooks for EISA configuration (possibly wrong :)


8007 23-Apr-1995 phk

Forgot this commit the other day. The receiving end of the "boot -C" option.


7994 22-Apr-1995 wpaul

Tiny printf formatting change: if we have no cpu_vendor or cpu_id info,
don't generate a newline. (Yeah, I'm picking nits, but that empty line
I get on my 386 just looks dumb, okay? :)


7930 18-Apr-1995 rgrimes

Reapply my fix for this:
Output the CPU features line during the probe on a seperate line, for
folks with lots of features the output use to wrap and look ugle.


7908 17-Apr-1995 phk

Print the BIOS geometries in a human-readable format.


7874 16-Apr-1995 dg

Remove gratuitous waste of 2K of memory for BIOS variables. We never load
the kernel at 0-640k; we haven't had the ability to do that since before
2.0R. Furthermore, I fail to see how putting an instruction at 0 and then
doing a .org 0x500 is going to prevent the stuff from getting clobbered
in the first place; a.out is just too stupid to know about sparse address
spaces.


7818 14-Apr-1995 dufault

Add scsi target. Add "after config" call to autoconf so that scsi
targets will be configured after all scsi busses have been configured.


7814 14-Apr-1995 wpaul

Hopefully I won't get flamed for this: insert a few more #if defined(I486_CPU)
and #if defined (I586_CPU) thingies into identifycpu() so that we only
compile in what's actually needed for a given CPU. So far as I can tell,
none of my 386 machines generate a cpu_vendor code, so I made the extra vendor
and feature line conditional on I486_CPU and I586_CPU. (Otherwise we
print out a blank line which looks silly.)


7792 13-Apr-1995 wpaul

This a subtle reminder to people that not everybody compiles their
kernels with 'options I586_CPU.'

The declaration for pentium_mhz is hidden inside an #ifdef I586_CPU,
but machdep.c refers to it whether I586_CPU is defined or not. This
temporary hack puts the offending code inside an #ifdef I586_CPU as
well so that a kernel without it will successfully compile.

I must emphasize the word 'temporary:' somebody needs to seriously
beat on the identifycpu() function with an #ifdef stick so that
I386_CPU, I486_CPU and I586_CPU will do the right things.


7780 12-Apr-1995 wollman

Add a class field to devconf and mst drivers.
For those where it was easy, drivers were also fixed to call
dev_attach() during probe rather than attach (in keeping with the
new design articulated in a mail message five months ago). For
a few that were really easy, correct state tracking was added as well.
The `fd' driver was fixed to correctly fill in the description.
The CPU identify code was fixed to attach a `cpu' device. The code
was also massively reordered to fill in cpu_model with somethingremotely
resembling what identifycpu() prints out. A few bytes saved by using
%b to format the features list rather than lots of ifs.


7748 10-Apr-1995 wollman

Define tuncdev for the benefit of tunnel LKM so that it knows which
device slot to take.


7731 10-Apr-1995 phk

Changes to make FreeBSD use a CDROM as rootdev, for installation purposes.
If "BOOTCDROM" is defined, you get this pretty special case stuff.


7717 09-Apr-1995 jkh

This is the new submission of the matcd driver. In addition to the
new driver code, there are diffs to several other existing files
on the system and a man page.

This version of matcd implements the rest of the key ioctls related to
playing audio CDs and reading table of contents information from any
type of disc.

This update also corrects several problems detected since the original
version 1(10) was released. These include:
1. Jordons report on the kernel -c string problem.
2. A problem with the driver being confused by other types of
devices located at addresses it probes.
3. An old CD TOC wouldn't always be cleared after a disc change.
4. Cleaned up code so -Wall yields no warnings on 2.0 and later.
5. A problem with drive getting out of sync with the driver when
changing between CD-Data and CD-DA.

There have only been two reports from the field relating to problems
so either the first release isn't really being used or doesn't have
many problems.

If there are any problems with this submission, please let me know.

Submitted by: Frank Durda IV <uhclem%nemesis@fw.ast.com>


7693 09-Apr-1995 dg

Cosmetic changes.


7680 08-Apr-1995 joerg

Implement a simple hook (or hack?) to allow graphics device console
drivers to protect DDB from being invoked while the console is in
process-controlled (i.e., graphics) mode.

Implement the logic to use this hook from within pcvt. (I'm sure
Søren will do the syscons part RSN).

I've still got one occasion where the system stalled, but my attempts
to trigger the situation artificially resulted int the expected
behaviour. It's hard to track bugs without the console and DDB
available. :-/


7644 06-Apr-1995 rgrimes

Output the CPU features line during the probe on a seperate line, for
folks with lots of features the output use to wrap and look ugle.

Reviewed by: phk


7603 03-Apr-1995 wpaul

Fix implicit declaration the right way. (This has not been a good
weekend for me.)


7602 02-Apr-1995 wpaul

Fix implicit declaration of cngetc().


7588 02-Apr-1995 joerg

Attempt to fix the `you can log into console only once' problem (PR
#179). The fix implements a ttyhalfclose() (sort of), resetting the
session and pgrp pointers when the physical device is about to be
closed.

Suggested by: bde


7490 30-Mar-1995 dg

Made pmap_testbit a static function.


7447 28-Mar-1995 sos

Added ata device....

Not that it is ready yet, but I'm getting there eventually...


7430 28-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) that I didn't notice when I fixed
"all" such warnings before.


7419 27-Mar-1995 ache

Add hooks for upcoming riscom/8 driver


7402 26-Mar-1995 dg

Changed pmap_changebit() into a static function as it always should
have been.

Submitted by: John Dyson


7214 21-Mar-1995 dg

Added a new version of trap_pfault() that disallows kernel page faults
to the user address space unless pcb_onfault is set. The code is currently
commented out because iBCS2 and process debugging parts of the kernel
need to be changed/fixed first.


7213 21-Mar-1995 dg

Changed some #ifdef DIAGNOSTIC code that I added to be #ifdef DEBUG.


7170 19-Mar-1995 dg

Removed redundant newlines that were in some panic strings.


7105 17-Mar-1995 phk

Remove a spurious printf.


7103 17-Mar-1995 dg

Call dev_shutdownall() just after unmounting filesystems.


7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


7032 12-Mar-1995 ugen

Save all changed devices so thet dset would be able to retrive
data and device driver could change it as it wishes to..


6995 11-Mar-1995 phk

Moved bb stuff to support.s per Bruces suggestion.


6981 10-Mar-1995 phk

Add a dummy ___bb_init_func for BB profiling of the kernel.
To use this: recompile src/gnu/usr.bin/cc, compile your kernel. The
files you want to profile should be compiled with '-a -g'. "strip -x"
the kernel and run. You don't need to profile all files in the kernel.
My next commit is the program to extract the data from the running kernel.


6978 10-Mar-1995 dg

kmem_alloc() returns zero-filled memory; it isn't necessary to bzero()
it.


6977 10-Mar-1995 dg

Removed unnecessary routines vm_get_pmap() and vm_put_pmap().
kmem_alloc() returns zero filled memory, so no need to explicitly
bzero() it.


6949 07-Mar-1995 dg

Increased number of buffers to 1/12 of (page_count - 1024). This makes the
cache minimum closer to 10% in the usual case.


6913 05-Mar-1995 joerg

Make pcvt actually work.


6907 05-Mar-1995 jkh

Reserve cdev 63 for Brian Litzinger & his Talisman Lite MPEG decoder
Submitted by: Brian Litzinger <brian@MediaCity.com>


6891 04-Mar-1995 dufault

Add processor type and worm type drivers


6874 04-Mar-1995 dg

Removed obsolete vtrace() and reorganized a little.


6846 03-Mar-1995 dg

Use copyout to install the sigframe rather than directly writing to the
user's stack.


6845 02-Mar-1995 wpaul

Make 'config [kernel] swap generic' work again. It would seem that
'matcd' has taken over the position and major number of 'pcd,' so
swapgeneric.c needs to be updated accordingly. (It was still looking
for pcd.h.)


6838 02-Mar-1995 dufault

Fix it so that systems with NSCBUS==0 don't have undefineds.


6830 02-Mar-1995 jkh

Move the matcd includes to the right place.


6828 02-Mar-1995 jkh

Fix up the matcd import.


6820 02-Mar-1995 jkh

Changes to incorporate the Matsushita CDROM driver (otherwise known as
the "Sound blaster CDROM").
Submitted by: Frank Durda IV <bsdmail@nemesis.lonestar.org>


6817 01-Mar-1995 dg

Use su/fubyte instead of directly touching the user's address space.


6811 01-Mar-1995 dufault

1. Added a "scsi" command to userconfig as a start
2. "uk" is no longer an option.


6807 01-Mar-1995 dg

Various changes from John and myself that do the following:

New functions create - vm_object_pip_wakeup and pagedaemon_wakeup that
are used to reduce the actual number of wakeups.
New function vm_page_protect which is used in conjuction with some new
page flags to reduce the number of calls to pmap_page_protect.
Minor changes to reduce unnecessary spl nesting.
Rewrote vm_page_alloc() to improve readability.
Various other mostly cosmetic changes.


6806 01-Mar-1995 dg

Slight change to include file order to accommodate upcoming changes.


6782 28-Feb-1995 pst

Incorporate bde's code-review comments.

(a) bring back ttselect, now that we have xxxdevtotty() it isn't dangerous.
(b) remove all of the wrappers that have been replaced by ttselect
(c) fix formatting in syscons.c and definition in syscons.h
(d) add cxdevtotty

NOT DONE:
(e) make pcvt work... it was already broken...when someone fixes pcvt to
link properly, just rename get_pccons to xxxdevtotty and we're done


6773 27-Feb-1995 ugen

same..


6765 27-Feb-1995 ugen

Remove strncmp() fro userconfig,it's
already in libkern..if there is any other place
strncmp() used,remove it there guys..


6734 26-Feb-1995 bde

Replace all remaining instances of `i386/include' by `machine' and fix
nearby #include inconsistencies.


6731 26-Feb-1995 bde

Eliminate my private type `bool_t'.


6712 25-Feb-1995 pst

(a) remove the pointer to each driver's tty structure array from cdevsw
(b) add a function callback vector to tty drivers that will return a pointer
to a valid tty structure based upon a dev_t
(c) make syscons structures the same size whether or not APM is enabled so
utilities don't crash if NAPM changes (and make the damn kernel compile!)
(d) rewrite /dev/snp ioctl interface so that it is device driver and i386
independant


6592 21-Feb-1995 jkh

Totally get rid of the snic driver.
Submitted by: davidg


6579 20-Feb-1995 dg

Use of vm_allocate() and vm_deallocate() has been deprecated.


6547 18-Feb-1995 wpaul

Do away with 'options SWAP_GENERIC' once and for all: I get ill
just thinking about it.

Two changes need to be made to allow 'config kernel swap generic' to
work properly without requiring any compile-time flags:

/usr/src/usr.sbin/config/mkswapconf.c: we need to define a dummy stub
for the setconf() function to replace the one in swapgeneric.c that
isn't available in non-generic configurations.

/usr/src/sys/i386/i386/autoconf.c: the -a boot flag causes setroot()
to be skipped and lets setconf() prompt the user for a root device.
If you skip setroot() in a non-generic kernel, you could get severely
hosed. To avoid this, we silently ignore the -a flag if rootdev != NODEV.
(rootdev is always initialized to NODEV in swapgeneric.c, so if
we find that rootdev is something other than NODEV, we know we're
not using a generic configuration.)


6512 17-Feb-1995 phk

This is the latest version of the APM stuff from HOSOKAWA, I have looked
briefly over it, and see some serious architectural issues in this stuff.

On the other hand, I doubt that we will have any solution to these issues
before 2.1, so we might as well leave this in.

Most of the stuff is bracketed by #ifdef's so it shouldn't matter too much
in the normal case.

Reviewed by: phk
Submitted by: HOSOKAWA, Tatsumi <hosokawa@mt.cs.keio.ac.jp>


6439 15-Feb-1995 dg

Use proc0's proc struct rather than curproc's when calling sync.


6438 15-Feb-1995 jkh

Whap a few things not used in the Cyclades driver, at least not for now.


6434 15-Feb-1995 jkh

Quiet the last of the warnings. Must have had my eyes closed when
I committed this! ;-(


6426 15-Feb-1995 jkh

Boy, no sooner do I clean it up then dirty it again! Clean up the enxio
stuff I should have caught earlier.


6423 15-Feb-1995 dg

Killed the pmap_use_pt and pmap_unuse_pt prototypes as they are now in
machine/pmap.h.


6397 14-Feb-1995 jkh

Bring in the ISDN driver entries.


6389 14-Feb-1995 ugen

cdevsw[] entry for snoop device.
I took 53 and it was too late when Jordan sayd
it better been 60..besides i have no clue how to make it 60..
Jordan- pleeease don't kill me..(This is also useful(??) device..)


6380 14-Feb-1995 sos

First attempt to run linux binaries. This is only the changes needed to
the generic kernel. The actual emulator is a separate LKM. (not finished
yet, sorry).
Submitted by: sos@freebsd.org & sef@kithrup.com


6377 14-Feb-1995 phk

Removed a YF comment.


6354 14-Feb-1995 phk

Yves has sent us a ~600 Kb patch, which shuts up gcc entirely for the
entire kernel.
Unfortunately we didn't send him a copy of the style guide before he did it.
I'm trying to find all the benign and downright sound bits and will commit
them without any other explanation than "YF fix" if they are merely cosmetic.

Reviewed by: phk
Submitted by: yves@dutncp8.tn.tudelft.nl (Yves Fonk)


6327 12-Feb-1995 dg

Carefully choose the low limit for number of buffers to acheive the best
performance on small memory machines.


6325 12-Feb-1995 dg

Fixed a bogus comment and made a stylistic change (testl instead of orl
to test for zero).


6312 11-Feb-1995 jkh

Remove the evil I perpetrated on this file in the name of the Cyclades driver,
now we're back to the old way. By way of amends, I cleaned up all the
casting evils and generally neated this file up as much as possible. It
still, however, needs to die.


6308 11-Feb-1995 phk

Intels App Note AP-485 applied.
We will now tell a good deal more about the CPU if Intel made it.

What is a i486DX2 Write-Back Enhanced CPU ?


6301 10-Feb-1995 dg

Changed extended memory test so that it's non-destructive and not a
complete test (it never was "complete", which is why it was bogus). Now
only a single longword is checked in each page.


6299 10-Feb-1995 dg

Removed obsolete and unused vmtime() function.


6297 10-Feb-1995 dg

Removed unnecessary check for pr_scale in the AST/OWEUPC case.


6296 10-Feb-1995 dg

Check P_PROFIL flag for profiling rather than pr_scale as it makes more
sense.


6272 09-Feb-1995 jkh

The whole NEW_CONF_C_SYNTAX was bogus; David's right, it can't be
optional at all. Make it non-optional.
Submitted by: davidg


6266 09-Feb-1995 jkh

Some scary macros from Heikki Suonsivu <hsu@cs.hut.fi>. Actually, they
make cdev entries look almost readable. If his stuff works for most of
the entries in here, it might be worth it to refit them all.


6219 06-Feb-1995 jkh

The very minimum driver required to support a Video Spigot. See the
copyright notices in the code for information on where to go to pick
up additional useful bits.
Submitted by: Jim Lowe <james@blatz.cs.uwm.edu>


6213 06-Feb-1995 jkh

Commit my userconfig stuff:

1. add iosize command, and show it in `ls'
2. add a probe command
3. add an attach command

[the latter 2 do the obvious thing - call the device's routine and print the
status returned].


6198 05-Feb-1995 jkh

This was wrong - PCVT and syscons don't share the same entrypoint
names.
Submitted by: mh


6126 02-Feb-1995 dg

Mostly cosmetic changes. Use KERNBASE instead of UPT_MAX_ADDRESS in
some comparisons as it is more correct (we want the kernel page tables
included).
Reorganized some of the expressions for efficiency.
Fixed the new pmap_prefault() routine - it would sometimes pick up the
wrong page if the page in the shadow was present but the page in object
was paged out. The routine remains unused and commented out, however.
Explicitly free zero reference count page tables (rather than waiting
for the pagedaemon to do it).

Submitted by: John Dyson


6105 01-Feb-1995 se

Reviewed by: se
Submitted by: wolf (Wolfgang Stanglmeier)
PCI specific code moved to /sys/pci.


6064 31-Jan-1995 amurai

Add Tunnel devcie for ppp (iijppp)


6008 29-Jan-1995 bde

Fix disassembly of `bt[crs] $Ib,E'.


5999 29-Jan-1995 ats

Correct a name of one structure member in the sigaltstack structure.
Now it matches the man page and also the only other commercial implementation
i have found so far ( Solaris 2.x).
Changed the name from ss_base to ss_sp.


5943 26-Jan-1995 dg

Fix from Doug Rabson for a panic related to not initializing the kernel's
PTD.

Submitted by: John Dyson


5916 26-Jan-1995 dg

Comment out pmap_prefault for the time being (perhaps until after 2.1).
The object_init_pt routine is still enabled and used, however, and this
is where most of the 'pre-faulting' performance improvement comes from.


5914 26-Jan-1995 dg

Make sure that the pages being 'pre-faulted' are currently on a queue.


5910 25-Jan-1995 dg

Be a bit less fast and loose about setting non-cacheablity of pages.


5908 25-Jan-1995 bde

Load the kernel symbol table in the boot loader and not at compile time.
(Boot with the -D flag if you want symbols.)

Make it easier to extend `struct bootinfo' without losing either forwards
or backwards compatibility.

ddb_aout.c:
Get the symbol table from wherever the loader put it.
Nuke db_symtab[SYMTAB_SPACE].

boot.c:
Enable loading of symbols. Align them on a page boundary. Add printfs
about the symbol table sizes.
Pass the memory sizes to the kernel.
Fix initialization of `unit' (it got moved out of the loop).
Fix adding the bss size (it got moved inside an ifdef).
Initialize serial port when RB_SERIAL is toggled on.
Fix comments.
Clean up formatting of recently added code.

io.c:
Clean up formatting of recently added code.

netboot/main.c, machdep.c, wd.c:
Change names of bootinfo fields.

LINT:
Nuke SYMTAB_SPACE.
Fix comment about DODUMP.

Makefile.i386:
Nuke use of dbsym.
Exclude gcc symbols from kernel unless compiling with -g.
Remove unused macro.
Fix comments and formatting.

genassym.c:
Generate defines for some new bootinfo fields. Change names of old ones.

locore.s:
Copy only the valid part of the `struct bootinfo' passed by the loader.
Reserve space for symbol table, if any.

machdep.c:
Check the memory sizes passed by the loader, if any. Don't use them yet.

bootinfo.h:
Add a size field so that we can resolve some mismatches between the loader
bootinfo and the kernel boot info. The version number is not so good for
this because of historical botches and because it's harder to maintain.
Add memory size and symbol table fields. Change the names of everything.

Hacks to save a few bytes:

asm.S, boot.c, boot2.S:
Replace `ouraddr' by `(BOOTSEG << 4)'.

boot.c:
Don't statically initialize `loadflags' to 0. Disable the "REDUNDANT"
code that skips the BIOS variables. Eliminate `total'. Combine some
more printfs.

boot.h, disk.c, io.c, table.c:
Move all statically initialzed data to table.c.

io.c:
Don't put the A20 gate bits in a variable.


5905 25-Jan-1995 jmz

Add entry for joystick


5837 24-Jan-1995 dg

Changed buffer allocation policy (machdep.c)
Moved various pmap 'bit' test/set functions back into real functions; gcc
generates better code at the expense of more of it. (pmap.c)
Fixed a deadlock problem with pv entry allocations (pmap.c)
Added a new, optional function 'pmap_prefault' that does clustered page
table preloading (pmap.c)
Changed the way that page tables are held onto (trap.c).

Submitted by: John Dyson


5805 23-Jan-1995 dg

Kill redundant declarations of d_open_t and d_close_t.


5796 23-Jan-1995 phk

Moved the typedefs of d_<foo>_t into sys/sys/conf.h and used them in
the definition of struct [cb]devsw. Guess Bruce never got around to
complete this (?)

Poul-Henning


5794 23-Jan-1995 phk

Nobody seems to need "mem_no" anymore, so I zapped it.


5771 21-Jan-1995 bde

Don't use mi_switch() to terminate cpu_exit(). Calling it just happened to
work (mi_switch() counted the last timeslice again but this didn't affect
the exiting process' rusage because the rusage has already been finalized).

Remove stale comment.


5770 21-Jan-1995 bde

Remove unused definitions of vm statistics counters. Most of the
counting is now done in C. There are still about 100 unused
definitions for other things.


5769 21-Jan-1995 bde

Don't count context switches here, they are already counted in mi_switch().


5764 21-Jan-1995 bde

Keep track of open devices better to avoid closing the console device when
the physical device is closed. Previously only the reverse case was handled.
Abuse the cdevsw interface instead of the vfs interface to do this.

Remove unnecessary #includes.


5747 20-Jan-1995 sos

Second round in syscons update:

Support for pseudo graphic mouse cursor (not complete yet)
Some cheap speed fixes.
More cleanups.
Call ourselves scxxxx finally.


5722 19-Jan-1995 ats

Submitted by: Bruce Evans
Put in the much shorter and cleaner version for the calibrate_cycle_counter
for the Pentium that Bruce suggested. Tested here on my Pentium and
it works okay.


5675 17-Jan-1995 bde

The %eflags checking introduced in the previous commit was too zealous.
sigreturn() sometimes failed for ordinary returns from signal handlers.
Failures of ordinary returns "can't happen" and are badly handled.
"Temporary" fix: allow users to corrupt PSL_RF. This is fairly
harmless. A correct fix would involve saving the old %eflags (and
perhaps the old segment registers) where the user can't get at them.


5642 15-Jan-1995 dg

Fixed some page table reference count problems; these changes may not be
complete, but should be closer to correct than before.


5603 14-Jan-1995 bde

Fix security holes in sigreturn(), ptrace() and procfs. sigreturn()
attempted to check for insecure and fatal eflags and segment
selectors, but missed many cases and got the IOPL check back to
front. The other syscalls didn't check at all.

sys_process.c, machdep.c:
Only allow PT_WRITE_U to write to the registers (ordinary and FP).

psl.h, locore.s, machdep.c:
Eliminate PSL_MBZ, PSL_MBO and PSL_USERCLR. We are not supposed
to assume anything about the reserved bits. Use PSL_USERCHANGE
and PSL_KERNEL instead. Rename PSL_USERSET to PSL_USER.

exception.s:
Define a private label for use by doreti when returning to user
mode fails.

machdep.c:
In syscalls, allow changing only the eflags that can be changed on
486's in user mode (no longer attempt to allow benign IOPL changes;
allow changing the nasty PSL_NT; don't allow changing the i586
bits).

Don't attempt to check all the cases involving invalid selectors
and %eip's. Just check for privilege violations and let the invalid
things cause a trap.

procfs_machdep.c:
Call the ptrace register functions to do all the work for reading
and writing ordinary registers and for single stepping.

trap.c:
Ignore traps caused by PSL_NT being set. Previously, users could
cause a fatal trap in user mode by setting PSL_NT and executing an
iret, and a fatal trap in kernel mode by setting PSL_NT and making
a syscall. PSL_NT was cleared too late and not in enough modes to
fix the problem.

Make all traps in user mode (except T_NMI) nonfatal.

Recover from traps caused by attempting to load invalid user
registers in doreti by restarting the traps so that they appear to
occur in user mode.
---

Fix bogons that I noticed while fixing the above:

psl.h:
Fix some comments.

Uniformize idempotency ifdef.

exception.s, machdep.c:
Remove rsvd[0-14]. rsvd0 hasn't been reserved since the 486 came
out. Replace rsvd0 by `align'. rsvd[0-11] used wrong (magic
non-unique) trap numbers. Replace rsvd[1-14] by rsvd.

locore.s:
Enable alignment check flag on 486's and 586's.

machdep.c:
Use a better type for kstack[].

Use TFREGP() to find the registers.

Reformat ptrace functions from SEF to something closer to KNF.

procfs_machdep.c:
The wrong pointer to the registers got fixed as a side effect.

Implement reading and writing of FP registers.

/proc/*/*regs now work (only) for processes that are in memory.

Clean up comments.

trap.c, trap.h:
Remove unused trap types.


5588 14-Jan-1995 bde

Eliminate T_KDBTRAP, which will soon go away. It was only used for an
unreachable case label in kdb_trap().

Use the correct case labels in kdb_trap() so that normal ddb entry doesn't
print a message.

Change all printf's to db_printf's. Now you can put a breakpoint at printf,
and ddb entry messages don't spam the syslog output.

Cosmetic:

Use ISPL() instead of magic numbers.

Don't compile the unused function kdb_kbd_trap().

Improve some asms.

Print the arg to Debugger().


5580 14-Jan-1995 dg

Add missing object_lock/unlock.


5516 11-Jan-1995 jkh

Change GENERIC to SWAP_GENERIC for now. I need the GENERIC kernel to
build by default again! When the furor subsides, maybe something better
can be done, but..


5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


5449 09-Jan-1995 phk

cdev #50 -> pcmcia


5440 08-Jan-1995 dufault

added ssc device


5434 07-Jan-1995 jkh

Fix indenting of gsc entry, reserve 48 for cyclades.


5431 07-Jan-1995 ats

Work around a compiler bug in gcc2.6.3 in handling (long long) variables and
shifting. Also correct the original code as Garrett noticed it in mail.
Leave the mishandled code in to use it later if future versions of gcc
are correct. The code was part of the calibrate_cyclecounter routine to
get the speed of the pentium chip.


5428 07-Jan-1995 jkh

Gunther Schadow <gusw@fub46.zedat.fu-berlin.de>'s
driver for the Genius GS-4500 hand scanner.
Submitted by: gusw@fub46.zedat.fu-berlin.de


5413 05-Jan-1995 se

Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Reviewed by: <wollman>
First hooks and defines for the ISDN driver,
that soon will see the light ...


5321 31-Dec-1994 jkh

From Bill Paul:

- /sys/i386/i386/swapgeneric.c is just plain broke. But fear not, for I
have unbroken it. One thing that swapgeneric.c does is walk through the
list of configured devices searching for a boot device. The only easy
way to accomplish this in 2.0 is to use Garret Wollman's kern_devconf
stuff. *BUT*, the head of the kern_devconf linked list (dc_list) is declared
static in /sys/kern/kern_devconf.c. This means that swapgeneric.c can't
see it at link time. I had to remove the 'static' keyword to get around
this little problem. I hope this doesn't break anything anywhere.

*Furthermore,* there's a small matter of making the call to setconf()
in swapgeneric.c disappear when 'config kernel swap generic' isn't used.
You could change /sbin/config to create a dummy setconf() function in
swapkernel.c, but that seems messy somehow. (It's also someting of an
'it isn't broken, why are you fixing it' situation.) My solution was to
do what the NetBSD people did and put an #ifdef GENERIC around the call
to setconf(). If your kernel is called GENERIC or you define 'options
GENERIC,' then you can use 'config kernel swap generic' and it'll work.

That aside, the upshot is that: a) swapgeneric.c actually works, and
and b) the -a boot flag now works as well. If you boot with -a, as in
"Boot: wd(0,a)/kernel -a" you will be presented with a 'root device?'
prompt after the autoconfig phase, at which point you can specify what
device you want mounted as root. Regrettably, you can't specify an NFS
filesystem. Yet. Three files are affected: /sys/i386/i386/swapgeneric.c,
/sys/i386/i386/autoconf.c and /sys/kern/kern_devconf.c.

Submitted by: wpaul


5291 30-Dec-1994 bde

icu.s:
Move definition of `stat_imask' to clock.c.

clock.c:
Rename `rtcmask' to `stat_imask' and export it. Rename `clkmask' to
`clk_imask' for consistency.

Only calculate TIMER_DIV(hz) once.

Merge debugging and "garbage" code to produce debugging code and format the
output better.

Make writertc() static inline and use it everywhere. Now all accesses to
the clock registers go through rtcin() and writertc().

Move rtc initialization to cpu_initclocks().

Merge enablertclock() with cpu_initclocks() and remove enablertclock().
The extra entry point was just a leftover from 1.1.5.


5229 25-Dec-1994 ats

Add entries for the sony and panasonic drives and drivers.


5220 24-Dec-1994 bde

Obtained from: 1.1.5

Fix single-stepping of emulated FPU instructions.

Don't panic if an FPU instruction is attempted but there is no FPU
and no FPU emulator is configured.


5161 18-Dec-1994 joerg

Ooops, i forgot one NVT > 0 in a previous commit. Now pcvt will also
work as the system's console.


5160 18-Dec-1994 joerg

Move the code providing the equivalent of ICRNL for console input from
the device driver(s) to cons.c.


5153 18-Dec-1994 dg

Move page_unhold's in pmap_object_init_pt down one line to gard against
a potential race condition.


5144 18-Dec-1994 dg

Check for PG_FAKE too in pmap_object_init_pt.


5134 17-Dec-1994 jkh

Add Fred Cawthorne's GPIB driver.
Submitted by: fcawth@delphi.umd.edu


5050 11-Dec-1994 bde

Move declaration of d_strategy_t to <sys/conf.h>.


5037 11-Dec-1994 dg

Removed inappropriate comment.


5036 11-Dec-1994 dg

Add additional comment.


5035 11-Dec-1994 dg

Fix bogus comment.


4962 04-Dec-1994 phk

Entries for the vn driver completed.


4948 04-Dec-1994 phk

Allocated chardev#43 and blkdev#15 to the vn-driver. The rest of the stuff
will come as a latter date, I just wanted to lay claim on some numbers.


4929 03-Dec-1994 bde

i386/exception.s,
Keep track of interrupt nesting level. It is normally 0
for syscalls and traps, but is fudged to 1 for their exit
processing in case they metamorphose into an interrupt
handler.

i386/genassym.c;
Remove support for the obsolete pcb_iml and pcb_cmap2.

Add support for pcb_inl.

i386/swtch.s:
Fudge the interrupt nesting level across context switches and in
the idle loop so that the work for preemptive context switches
gets counted as interrupt time, the work for voluntary context
switches gets counted mostly as system time (the part when
curproc == 0 gets counted as interrupt time), and only truly idle
time gets counted as idle time.

Remove obsolete support (commented out and otherwise) for pcb_iml.

Load curpcb just before curproc instead of just after so that
curpcb is always valid if curproc is. A few more changes like
this may fix tracing through context switches.

Remove obsolete function swtch_to_inactive().

include/cpu.h:
Use the new interrupt nesting level variable to implement a
non-fake CLF_INTR() so that accounting for the interrupt state
works.

You can use top, iostat or (best) an up to date systat to see
interrupt overheads. I see the expected huge interrupt overheads
for ISA devices (on a 486DX/33, about 55% for an IDE drive
transferring 1250K/sec and the same for a WD8013EBT network card
transferring 1100K/sec). The huge interrupt overheads for serial
devices are unfortunately normally invisible.

include/pcb.h:
Remove the obsolete pcb_iml and pcb_cmap2. Replace them by
padding to preserve binary compatibility.

Use part of the new padding for pcb_inl.

isa/icu.s:
isa/vector.s:
Keep track of interrupt nesting level.


4918 03-Dec-1994 wollman

Add Cronyx/Sigma cdevsw[] entry.


4835 27-Nov-1994 joerg

Temporary kludge: treat \r same as \n in input, so working on a
comconsole will behave as expected. The true problem should be fixed
instead, Bruce' comment for this:

>Anyway, i found the reason for my problems: somehow, ICRNL isn't in
>effect at `userconfig' time (but only for comconsole?), hence only

ICRNL doesn't apply to cngetc(). cnputc() unconditionally does the
equivalent of ONLCR; perhaps cngetc() should unconditionally do the
equivalent of ICRNL. Ddb must be checking for CR. Userconfig only
checks for NL. Userconfig works with syscons because pccngetc()
does the conversion. This is probably the wrong place to do it.


4829 27-Nov-1994 phk

I made a syntax error yesterday.
Submitted by: John Capo


4819 26-Nov-1994 phk

Set the bootverbose if so desired.
if (bootverbose)
Print the geometries the bios passes to us (through the bootblocks).


4600 18-Nov-1994 phk

Grap the bootinfo structure the bootblock passes us.


4517 16-Nov-1994 dg

Allow MAXMEM to be larger than the detected physical memory. This change
was supposed to have already been made, but got botched somewhere.
Don't clobber the last page of memory (where the message buffer is). Some
BIOS don't gratuitously wipe it out on reboot.


4501 15-Nov-1994 bde

Make gdt_segs[] public again for APM.

Make ldt[] public again and restore currentldt and _default_ldt for
USER_LDT.


4476 14-Nov-1994 bde

Oops, the previous commit got the diff for the log message instead of
the following.

Move declarations to and from <machine/segments.h>. Make segment stuff
static if possible.

Remove unused (although initialized) global variables _default_ldt,
currentldt, _gsel_tss (rename the latter to the auto variable
gtel_tss).

Use "correct" and consistent types for interrupt handlers.

Remove a mailing address from the code.

Fix type mismatches found by adding prototypes.


4475 14-Nov-1994 bde

(Bogus several hundred line diff for a log message deleted. See rev 1.91
for the intended log message. -DG)


4463 14-Nov-1994 bde

Undo a previous change. <sys/disklabel.h> was broken, not these files.


4448 14-Nov-1994 bde

Replace strtol() by strtoul() so that "negative" addresses and masks don't
get truncated to LONG_MAX. Don't lobotomize the merged library source.

Make all private data static.

Use int_parms for the i/o "address" since the "address" is really a number
and is represented as an int.

Add command `flags' to allow changing device flags.

Fix scrolling of device listing. Only scrolling of the current devtab
was controlled. Reprint the header after scrolling.

Rename commands and change strings to match their config(8) keywords:

io -> port
IOaddr -> port
mem -> iomem (abbreviation is io :-()
MemAddr -> iomem
case changes

Don't use NULL for ASCII NUL.

Call strtoul() with base 0 for both numbers and addresses so that input
is consistent and hex and octal can be used for numbers.

Fix entry of irq number. Check the range at no extra cost. It wasn't
possible to enter irq -1.

Format device listing better. Large numbers (such as 0xffffffff for the
GENERIC lpt0 port) messed up the formatting.

Show the unit number in the device listing. Comment about the fields
that aren't shown.


4416 13-Nov-1994 jkh

I *almost* had it right. Skip over the rest of the command if it's
abbreviated.


4411 13-Nov-1994 jkh

Make the command set actually work the way I'd intended - you can abbrevate
commands now, as long as whatever you abbreviate them to remains unique.


4410 13-Nov-1994 jkh

Whoops, make the comment match the new reality while we're at it.


4409 13-Nov-1994 jkh

mem's 2nd arg was incorrectly defined as an integer, not address parameter.
I found this out the hard way when I went to do:

mem ed0 0xe8000

And found out that I had to enter the second parameter in decimal.. :-)


4396 12-Nov-1994 ache

Revision 1.6 fix was lost: don't write 0 to RTC_DIAG


4370 12-Nov-1994 phk

Make a kernel sans FFS possible.


4352 11-Nov-1994 dg

Convert irq to a bitmask before putting in id_irq.
Fixed bugs related to the 'more' code.


4346 10-Nov-1994 ats

Shut up a compiler warning about a missing cast.


4341 10-Nov-1994 ache

Use adjkerntz into inittodr too (for APM stuff)


4250 08-Nov-1994 jkh

Cosmetic - the help screen didn't have its header properly formatted. Needed
an extra tab.


4221 07-Nov-1994 phk

Added a kernel variable, "dodump" defaulting to zero, which disables dumps.
Somebody should make a mib variable for it.
Just now it is pointless to dump the kernel, since we have nothing which
can read the dump.
Furthermore is should never be the default to dump.
options DODUMP
will enable dumps.


4217 06-Nov-1994 phk

Initialize %fs and %gs from %ds.
This seems to stabilize the APM-bios on my Gateway Handbook, and it makes
sense in general too.


4201 06-Nov-1994 dg

Do a better job at preparing registers for the new process in setregs()
by setting them all to a known state.


4193 06-Nov-1994 bde

Nuke the losing version of microtime. The assembler version now works
for all reasonable HZ's. HZ > 1000 doesn't work because of sloppy
conversions in hzto() (division by (tick / 1000) == 0). This was
fixed in 1.1.5.

Eliminate some extern declarations by including the appropriate header
files that now contain appropriate declarations.


4180 05-Nov-1994 bde

Maintain a new variable `timer0_overflow_threshold' so that microtime()
doesn't have to calculate it every call.

Rename `timer0_prescale' to `timer0_prescaler_count' and maintain it
correctly. Previously we lost a few 8253 cycles for every "prescaled"
clock interrupt, and the lossage grows rapidly at 16 KHz. Now we
only lose a few cycles for every standard clock interrupt.

Rename `*_divisor' to `*_max_count'.

Do the calculation of TIMER_DIV(rate) only once instead of 3 times each
time the rate is changed.

Don't allow preposterously large interrupt rates. Bug fixes elsewhere
should allow the system to survive rates that saturate the system, however.

Clean up declarations.

Include <machine/clock.h> to check our own declarations.


4179 05-Nov-1994 bde

Fix a bug introduced between 1.1 and 1.1.5. Loading the time was moved
outside the critical region.

Make it work with 2.0. It wasn't designed to be called at splclock().

Make it work with prescaling. The overflow threshold was bogus.

Make it work for any HZ. Side effect of fixing prescaling.

Speed it up. Allocate registers better. Reduce multiplication and
division to multiplication and a shift. Speed is now 5-6 usec on a
486DX/33, was about 3 usec more.

Optimize for the non-pentium case. The pentium code got moved around
a bit and hasn't been tested.

Change #include's to 2.0 style.


4119 03-Nov-1994 pst

Assign character device 20 to be the user reserved device.


4116 03-Nov-1994 jkh

Unconditionalize USERCONFIG. Uh, thanks, David.


4111 03-Nov-1994 jkh

Whoops - make sure TRUE and FALSE are defined now.


4108 03-Nov-1994 jkh

Make the enable & disable commands finally work.


4089 02-Nov-1994 jkh

Whoops. When you `ls' a kernel with lots and lots of devices, guess what?
It scrolls off your screen! :-) Add crude "more" type processing.


4038 01-Nov-1994 ache

Implement CPU_ADJKERNTZ in different way: call resettodr()
on writting this variable. adjkerntz pgm changes will follow.


4037 01-Nov-1994 pst

Add kernel hooks for /dev/vatio -- a minimalistic BSD audio driver emulator
created by Amancio Hasty (specificly, this, in conjunction with his sound
driver mods for dual-mode DMA will allow VAT compiled for BSD/386 1.1 to
run under FreeBSD 2.x.)


4031 31-Oct-1994 joerg

Added hooks for an easy drop-in of the pcvt concole driver.
Don't panic:-), this is simple stuff just doing exactly the same as for syscons.
(files.i386 did already contain the necessary stuff.)


4015 30-Oct-1994 bde

Don't include isa.c! How did it link?

Otherwise clean up the includes. Don't include anything included by
param.h. Do include systm.h and cons.h to avoid satisfy -Wimplicit.
Don't include console.h or use NOKEY because these are for syscons
and we use generic consoles.

Don't follow null pointer for command "ls -lrt" - don't allow extra
args but do allow trailing blanks.

Check for invalid device numbers. strtol() failures are now checked
for in all cases, but not carefully enough. We should check for
trailing junk, allow any base in all cases (just like config) and
handle signs better . (Use strtoul not strtol and cast by assignment
to the correct type - always an integral type, PARM_ADDR is bogus.
Hex numbers > 0x7fffffff can't be entered now. 0xffffffff has to
be entered as -1.)


4014 30-Oct-1994 bde

Fix selector arg to match the (missing) prototype for sdtossd().
Cosmetic.

Return from trap() if trap_fatal() returns. trap_fatal() isn't
fatal if you have ddb. Returning from trap() is usually the right
thing to do and much better than falling through.


4013 30-Oct-1994 bde

Fix selector arg to match the (missing) prototype for ssdtosd().
Cosmetic.


4012 30-Oct-1994 bde

locore.s:
Build a dummy frame at the top of tmpstk to help debuggers trace the stack
when the system is idle.

swtch.s: idle():
Initialize the frame pointer so that debuggers don't try to trace a bogus
stack.

Load the frame pointer, load the stack pointer and switch out the old
stack in the unique order that never leaves one of the pointers pointers
invalid so that debuggers can trace idle(). Disabling interrupts
provides sufficient validity for normal operation, but debuggers use
(trace) traps.


3975 28-Oct-1994 dg

Fixed Jordan's editing mistake that caused this to puke.


3962 28-Oct-1994 jkh

From: fredriks@mcs.com (Lars Fredriksen)
...
It turns out that these files do not include <sys/dkbad.h> before
<sys/disklabel.h>.
Submitted by: fredriks


3948 28-Oct-1994 jkh

Note that enable and disable don't do anything, at present.
Make ^U/^X actually work.


3940 27-Oct-1994 jkh

Julian Elischer's disklabel fixes.


3919 26-Oct-1994 bde

Fix compiler warnings.


3918 26-Oct-1994 bde

Move definition and initialization of video_mode_pointer to syscons.c.


3910 26-Oct-1994 jkh

Add my user configuration utility - userconfig().
David wrote the I/O routines for this thing and deserves most of the
credit for thinking the whole idea up.


3907 26-Oct-1994 jkh

Invoke userconfig() if kernel compiled with options USERCONFIG and
-c flag used.


3889 26-Oct-1994 jkh

Fix two very minor nits, one of which caused a warning (no return type for
main).


3867 25-Oct-1994 se

BEWARE: Interface change of register_intr() !

Changed the fifth parameter to register_intr() from u_int mask into
u_int *maskptr in preparation for new features (shared interrupts and
removable devices, eg. for PCMCIA).


3861 25-Oct-1994 bde

Use the correct macro for deciding whether syscons' variables should
be accessed.

Remove some unused declarations and document a bogus one.


3846 25-Oct-1994 dg

Allow MAXMEM kernel option to indicate more memory than is detected; it
previously could only be used to limit the amount of memory.


3844 25-Oct-1994 dg

Restricted maximum bufpages to 1500; this is required for machines >64MB
of memory to work without running out of kernel VM (and increasing it to
even more than it is now (96MB) is out of the question. Changed bufpages
calculation to allocation a little less bufer cache (16% of mem-2MB instead
of 20%); this is simply a better figure for most systems.


3842 25-Oct-1994 dg

Moved initialization of tmpstk so that it immediately follows the kernel
text. Fixed rounding bug that caused the last page of kernel text to be
read/write instead of read-only. This is important now that tmpstk can
crash into it. Removed +4 bias of tmpstk because it screws up ddb's
ability to traceback correctly.


3795 22-Oct-1994 phk

Autoconf is the one to realize that we are booted disk-less and start the
ball rolling. locore is just moving some data from the boot-program.


3744 21-Oct-1994 wollman

Make my ALLDEVS kernel compile (basically, LINT minus a lot of options).


3728 20-Oct-1994 phk

Peter Dufaults comconsole changes.

Submitted by: Peter Dufault


3703 19-Oct-1994 wollman

Implement disk_externalize().


3682 18-Oct-1994 ache

Remove CPU_COLORDISP, GIO_COLOR now exists


3661 17-Oct-1994 ache

Ifdef color_display by NSC, pointed by Rod


3627 15-Oct-1994 ache

ADd CPU_COLORDISP sysctl to handle console display type


3612 15-Oct-1994 dg

1) Some of the counters in the vmmeter struct don't fit well into the Mach VM
scheme of things, so I've changed them to be more appropriate. page in/ous
are now associated with the pager that did them. Nuked v_fault as the
only fault of interest that wouldn't be already counted in v_trap is a VM
fault, and this is counted seperately.
2) Implemented most of the remaining counters and corrected the counting of
some that were done wrong. They are all almost correct now...just a few
minor ones left to fix.


3563 13-Oct-1994 sos

Added socksys device (for iBCS2 emulation)
Reviewed by:
Submitted by:
Obtained from:


3513 11-Oct-1994 sos

Ouch, fixed bug in errno translation (ibcs2 support).


3502 10-Oct-1994 phk

minaddr #ifdef lost in previous commit. Sorry.


3495 10-Oct-1994 sos

Hmm, only translate errno when doing an actual return.

Reviewed by: sef@freefall.cdrom.com


3489 10-Oct-1994 phk

locore.s: Made the APM stuff depend on NAPM > 0 rather than a separate
"APM" macro.
machdep.c: Made the APM-descriptors unconditional.
Bruce: if these still conflict with your debugger, please put in a reservation
for your debugger. These three desc. can be anywhere, as long as they are
contiguous, so just move them as needed.


3476 09-Oct-1994 sos

Updated to convert errno return in syscall if conversion tabel present.


3451 09-Oct-1994 dg

Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.


3436 08-Oct-1994 phk

db_disasm.c: Unused var zapped.
pmap.c: tons of unused vars zapped, various other warnings silenced.
trap.c: unused vars zapped.
vm_machdep.c: A wrong argument, which by chance did the right thing, was
corrected.


3426 08-Oct-1994 rgrimes

Correct #ifdef for nfs_disless support is #ifdef NFS, there will be no
option DISKLESS for the 2.0 nfs diskless support. A 2.0 diskless kernel
simple needs NFS linked in statically.


3406 07-Oct-1994 dg

#ifdef DISKLESS the copying of the nfs_diskless structure. Not the best
solution, but the only one I have time for at the moment.


3384 06-Oct-1994 rgrimes

1. Eliminate unused esym global from locore, our boot code never supported
that and when it does it will be done differently.

2. The kernel now does a frame setup on entry so it ``looks'' like a
real function call. This will be needed by future boot code and
debuggers.

3. Clean up stack offsets to all be in decimal and use %ebp when copying
parameters in from the boot code.

4. Implement version 1 of the uniform boot code passing mechanism with
support for kernelname passing and nfs_diskless structure passing.

5. Document the 3 different ways the kernel is called depending on what code
is calling it.


3367 04-Oct-1994 ache

Add code to handle CPU_DISRTCSET


3366 04-Oct-1994 ache

Add disable_rtc_set variable to block resettodr() call, needed for
adjkerntz -i, per Bruce suggestion


3355 04-Oct-1994 ache

RTC_CENTURY usage ifdefed out by USE_RTC_CENTURY compile option,
pointed by Bruce


3314 02-Oct-1994 dg

Patch from HOSOKAWA Tatsumi to fix bug in the size of apm_current_gdt_pdesc

Submitted by: HOSOKAWA Tatsumi


3306 02-Oct-1994 phk

Unused variables, except one with a omnious comment.


3294 02-Oct-1994 rgrimes

If you are building a kernel without NFS statically linked in you
must #define NFS before including <sys/mount.h> to pick up some of
the definitions needed for struct diskless. Be sure to undef it after this
so you do not effect other code.

This is kinda sick, but it does the job. Problem found by davidg.


3291 02-Oct-1994 dg

"idle priority" support. Based on code from Henrik Vestergaard Draboel,
but substantially rewritten by me.


3284 02-Oct-1994 rgrimes

1. Remove all references to cyloffset, it has been unused for some time.

2. New detection code so we know what boot code called us.

3. Remove old DISKLESS support code and halt if we are called by that boot
code as it will NOT work with the new nfs_diskless structure.

This is really in preperation for new boot code and new diskless support.

Reviewed by: davidg


3283 02-Oct-1994 rgrimes

Add code to generate NFSDISKLESS_SIZE for use in locore for copying the
nfs_diskless structure in from the boot code.
Reviewed by: davidg


3272 01-Oct-1994 dg

Added Cortex-I Frame Grabber by Paul S. LaFollette, Jr.

Submitted by: Paul S. LaFollette, Jr.


3258 01-Oct-1994 dg

Laptop Advanced Power Management support by HOSOKAWA Tatsumi.

Submitted by: HOSOKAWA Tatsumi


3188 29-Sep-1994 sos

Added pcaselect call.


3185 29-Sep-1994 sos

Updated pcaudio.c to latest from 1.1.5.1
Enabled timer reprogramming in clock.c (this could use more work).

Obtained from: FreeBSD-1.1.5.1


3178 28-Sep-1994 wollman

LKM support is no longer optional.


3156 28-Sep-1994 bde

Ensure normal selection and alignment of the text and data sections before
including files. vector.s sometimes left the data section misaligned
(depending on the configuration) so all the time-critical globals in icu.s
were sometimes misaligned.


3117 26-Sep-1994 pst

Make Cyrix CPU flush internal cache any time it goes into hold state.
(Meant to commit this a long time ago... oh well).


3102 25-Sep-1994 dg

Inlined ins/outs functions.

Obtained from: NetBSD


3047 24-Sep-1994 dg

Nuked splnet before sync. Not only is this unnecessary, but it appears
to cause problems by making it impossible to sync NFS related buffers
when rebooting.


2945 21-Sep-1994 jkh

Add entry for transputer (cdev 8).
Reviewed by: jkh
Submitted by: luigi


2932 20-Sep-1994 bde

Don't lose the RTC interrupt in resettodr().


2913 20-Sep-1994 ache

resettodr() implemented, inittodr() fixed
Submitted by: me & chris@gnome.co.uk


2873 18-Sep-1994 bde

Remove some unnecessary #includes.

Restore the simple leap year calculation as a macro and document it so
that it doesn't become complicated again. The simple version works
for all leap years covered by 32-bit time_t's. The complicated version
doesn't work for all leap years covered by 64-bit time_t's since among
other reasons, the solar system is not stable for long enough.

Fix declarations.

Nuke spinwait().


2858 18-Sep-1994 wollman

Redo Kernel NTP PLL support, kernel side.

This code is mostly taken from the 1.1 port (which was in turn taken from
Dave Mills's kern.tar.Z example). A few significant differences:

1) ntp_gettime() is now a MIB variable rather than a system call. A few
fiddles are done in libc to make it behave the same.

2) mono_time does not participate in the PLL adjustments.

3) A new interface has been defined (in <machine/clock.h>) for doing
possibly machine-dependent things around the time of the clock update.
This is used in Pentium kernels to disable interrupts, set `time', and
reset the CPU cycle counter as quickly as possible to avoid jitter in
microtime(). Measurements show an apparent resolution of a bit more than
8.14usec, which is reasonable given system-call overhead.


2826 16-Sep-1994 dg

Removed inclusion of pio.h and cpufunc.h (cpufunc.h is included from
systm.h). Merged functionality of pio.h into cpufunc.h. Cleaned up some
related code.


2822 16-Sep-1994 phk

Made the kernel compile even without "ether".


2818 16-Sep-1994 ache

CPU_ADJKERNTZ added to cpu_sysctl


2792 15-Sep-1994 dg

Brought over from 1.1.5:

From Bruce Evans:
Protect against reentering Debugger().


2790 15-Sep-1994 dg

Change brought over from 1.1.5:

date: 1994/06/05 19:31:45; author: ats; state: Exp; lines: +19 -9
Changed some wrong comments from my last commit and document some more
opcodes.


2789 15-Sep-1994 dg

Brought over from 1.1.5:

Fix from Bruce Evans. There were missing sets of parantheses:

1. The checks for the standard data selectors were botched, so %ss == 0
and probably %cs == 0 were allowed. A fix is enclosed. The checks
for the standard selectors could be omitted without losing anything
since the standard selectors pass the valid_ldt_sel() tests.


2783 15-Sep-1994 sos

Added support for many more videomodes, including graphic modes up til
320x200 256col VGA. This is nessesary for the iBCS stuff to work right.
(And we get the benefit of more video modes). Uses the videocard BIOS
to optain mode tables.
Added a "green" saver, switches off the syncs for "green" monitors.

Reviewed by:
Submitted by:
Obtained from:


2772 14-Sep-1994 wollman

Beginnings of support for loadable protocol domains. In particular,
don't hard-code netisr values in icu.s, but rather, use an array of
function pointers and set them all up in machdep.c for statically-linked
protocol families. (This will eventually be done differently.)


2770 14-Sep-1994 ache

1. adjkerntz variable added for preparation to resettodr() implementation
2. Leap year calculations fixed


2689 12-Sep-1994 dg

Eliminated a whole pile of ancient (we're taking 4.3BSD) VM system
related #define constants. Corrected incorrect VM_MAX_KERNEL_ADDRESS.

Reviewed by: John Dyson


2660 11-Sep-1994 dg

Be more careful about dereferencing curproc, p_vmspace, and curpcb,
otherwise the machine will overflow the stack in a recursive fault loop
(causing the machine to spontaneously reboot because of the stack fault
that ultimately happens).

Submitted by: Inspired by Bruce Evans, but this change is different
than what he suggested.


2582 08-Sep-1994 jkh

Recycle cdev 20, sound blaster driver is long since deceased.
Assign cdev 8 to Luigi's transputer driver.
Submitted by: jkh


2578 08-Sep-1994 bde

Remove <machine/eflags.h> and all dependencies on it. eflags.h is just
the Mach/i386 version of the BSD/vax(?) <machine/psl.h>. The Mach
version has slightly better names for many macros but is now out of
date and little used. It was originally used even less (for spelling
PSL_T as EFL_TF in <machine/db_machdep.h>).


2512 05-Sep-1994 bde

Fix comments.


2500 05-Sep-1994 dg

DOn't allow I/O register access in process 1 (oops).


2495 04-Sep-1994 pst

Detect if we're running on a Cyrix 486DLC and enable automatic cache
negation whenever we access memory between 640k and 1M.

Original code from NetBSD 1.0-BETA. The exact origins are unclear but
Theo de Raadt, Charles, and Michael V. may have contributed to it.

Submitted by: pst


2493 04-Sep-1994 dg

Rewrote last vestige of code that used gs (copyinstr). The use of gs in
this routine caused problems for machines that don't set it up properly
before boot (such was the case on an EVEREX machine sitting next to me).


2492 04-Sep-1994 dg

Added pmap_mapdev() function to map device memory.


2486 04-Sep-1994 dg

Initialize eflags register - brought over from 1.1.5.


2457 02-Sep-1994 dg

Converted P_LINK -> P_FORW, P_RLINK -> P_BACK, minor optimization.


2455 02-Sep-1994 dg

Removed all vestiges of tlbflush(). Replaced them with calls to pmap_update().
Made pmap_update an inline assembly function.


2452 02-Sep-1994 dg

It's not necessary to make page tables write-through, so get rid of this
(this was an experimental change which probably shouldn't have been
committed). I/O pages are still marked non-cacheable, however.


2441 01-Sep-1994 dg

Realtime priority scheduling support.

Submitted by: Henrik Vestergaard Draboel


2430 31-Aug-1994 se

Reviewed by: Stefan Esser <se>
Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Added PCI support (call of pci_config(), if NPCI > 0).


2426 31-Aug-1994 dg

Fixed bug that surfaced with last commit for NOBOUNCE -> BOUNCE_BUFFERS by
adding appropriate #ifdefs and changing some variables to externs (as they
should have always been).


2423 31-Aug-1994 dg

Conditionalized support for syscons as the console so that it can be
made optional in the kernel config file.

Submitted by: John Hay


2422 31-Aug-1994 dg

Rather than exclude bounce buffers support with NOBOUNCE, include it
with BOUNCE_BUFFERS. This is more intuitive, and is better for future
multiplatform support. Added BOUNCE_BUFFERS option to the GENERIC and
LINT kernel config files.


2416 30-Aug-1994 dg

Whoops...clean up after me cleaning up after Bruce.


2415 30-Aug-1994 dg

Cleaned up after Bruce: there were still some things that included
com.h/lpa.h. Removed all vestiges of com/lpa out of conf.c and also
fixed up the end of cdevsw/bdevsw to have "no" routines instead of
a NULL pointer (suggested by someone a few weeks back).


2410 30-Aug-1994 bde

Don't define LOCORE (as nothing) in sources. It is now defined
consistently (as 1) in Makefile.i386 for all assembler sources.


2400 29-Aug-1994 ache

Fake floppy partition RAW_PART=2 now


2357 28-Aug-1994 bde

Don't test if a u_int is < 0. The remaining test is sufficient and the
extra one caused a warning.


2320 27-Aug-1994 dg

1) Changed ddb into a option rather than a pseudo-device (use options DDB
in your kernel config now).
2) Added ps ddb function from 1.1.5. Cleaned it up a bit and moved into its
own file.
3) Added \r handing in db_printf.
4) Added missing memory usage stats to statclock().
5) Added dummy function to pseudo_set so it will be emitted if there
are no other pseudo declarations.


2257 24-Aug-1994 sos

Changes preparing for iBCS support
Reviewed by:
Submitted by:


2254 24-Aug-1994 sos

Changes preparing for iBCS2 support

Reviewed by:
Submitted by:


2216 22-Aug-1994 bde

Pad `_cpu_vendor' to finish on a 32-bit boundary so that most of the
locore globals aren't misaligned.


2152 20-Aug-1994 dg

Implemented filesystem clean bit via:

machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
Removed unused rootfs global.

vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.


2138 19-Aug-1994 dg

Removed bogus save of CMAP2.


2124 19-Aug-1994 dg

Terry Lambert's loadable kernel module support w/improvements from the
NetBSD group.


2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


2103 18-Aug-1994 dg

Bruce Evans' dynamic interrupt support.

/usr/src/sys/i386/isa/clock.c:
o Garrett's statclock changes.
o Wire xxxintr, not Vclk.
o Wire using register_intr(), not setidt().

/usr/src/sys/i386/isa/icu.s:
o Garrett's statclock changes.
o Removed unused variable high_imask.
o Fake int 8 for rtc as well as int 0 for clk. Required for kernel
profiling with statclock, harmless otherwise.

/usr/src/sys/i386/isa/isa.c:
o Allow isdp->id_irq and other things in *isdp to be changed by
probes. Changing interrupts later requires direct calls to
register_intr() and unregister_intr() and more care.
ALLOW_CONFLICT_* is brought over from 1.1.5, except
ALLOW_CONFLICT_IRQ is not supported. IRQ conflict checking is
delayed until after probing so that drivers can change the IRQ
to a free one; real conflicts require more cooperation between
drivers to handle.
o Too many details to list.
o This file requires splitting and a lot more work.

/usr/src/sys/i386/isa/isa_device.h:
o Declare more things more completely.

/usr/src/sys/i386/isa/sio.c:
o Prepare to register interrupt handlers as fast.

/usr/src/sys/i386/isa/vector.s:
o Generate entry code for 16 fast interrupt handlers and 16 normal
interrupt handlers. Changed some constants to variables:
# $unit is now intr_unit[intr]. Type is int. Someday it should
be a cookie suitable for the handler (e.g., a struct com_s for
sio).
# $handler is now intr_handler[intr].
# intrcnt_actv[id_num] is now *intr_countp[intr]. The indirection
is required to get a contiguous range of counters for vmstat
and so that the drivers depend more in the driver than on the
interrupt number (drivers could take turns using an interrupt
and the counts would remain correct). There is a separate
counter for each device and for each stray interrupt. In
1.1.5, stray interrupt 7 clobbers the count for device 7 or
something worse if there is no device 7 :-(.
# mask is now intr_mask[intr] (was already indirect).
o Entry points are now _XintrI and _XfastintrI (I = intr = 0-15),
not _VdevU (U = unit).
o Removed BUILD_VECTORS stuff. There's a trace of it left for
the string table for vmstat but config now generates the
string in one piece because nothing more is required.
o Removed old handling of stray interrupts and older comments
about it.

Submitted by: Bruce Evans


2074 15-Aug-1994 wollman

Enable use of the RTC chip for the statistical clock. While this does
not provide the full accuracy of a randomized statistical clock, it does
provide greater accuracy than the previous method, while not significantly
increasing overhead. It also provides profiling support at 1024 Hz.

You must re-compile config before making a new kernel, or you will end
up with unresolved symbols.

Reviewed uy: Bruce evans said it worked for him.


2060 13-Aug-1994 wollman

Fix conditional-compilation mixup, pointed out by Paul Richards.


2059 13-Aug-1994 dg

Made the kernel compile cleanly with gcc 2.6.0. Thanks go to Bruce
Evans for suggesting a method to detect various versions of gcc.


2056 13-Aug-1994 wollman

Change all #includes to follow the current Berkeley style. Some of these
``changes'' are actually not changes at all, but CVS sometimes has trouble
telling the difference.

This also includes support for second-directory compiles. This is not
quite complete yet, as `config' doesn't yet do the right thing. You can
still make it work trivially, however, by doing the following:

rm /sys/compile
mkdir /usr/obj/sys/compile
ln -s M-. /sys/compile
cd /sys/i386/conf
config MYKERNEL
cd ../../compile/MYKERNEL
ln -s /sys @
rm machine
ln -s @/i386/include machine
make depend
make


2017 11-Aug-1994 wollman

For Pentium machines, use a faster version of microtime with 8 usec
resolution (can probably be improved somewhat). Other machines take
a three-instruction hit if I586_CPU is defined, none otherwise.


2014 10-Aug-1994 wollman

Tell Pentium users their CPU speed. (More changes to make use of this
to come later.)


2001 10-Aug-1994 wollman

Handle NMI's in accordance with data in van Gilluwe book.


1999 10-Aug-1994 wollman

Some programs (like GNU configure programs) depend on the output of
`uname -s' to be something reasonable (traditionally, `i386') rather
than `PC-Class'. Make it so.


1998 10-Aug-1994 wollman

Add back in CPU detection copde from 1.1.5. As an added bonus, the
hw.model MIB variable is now declared correctly.


1975 09-Aug-1994 dg

Removed ntohl and ntohs functions. These were already inlined assembly in
endian.h.


1896 07-Aug-1994 dg

Made pmap_kenter "TLB safe". ...and then removed all the pmap_updates that
are no longer needed because of this.


1895 07-Aug-1994 dg

Provide support for upcoming merged VM/buffer cache, and fixed a few bugs
that haven't appeared to manifest themselves (yet).

Submitted by: John Dyson


1894 07-Aug-1994 dg

Don't kremove process VM pages (oops!). This was the cause of the instability
that was introduced last night.

Submitted by: John Dyson


1890 06-Aug-1994 dg

Fixed various prototype problems with the pmap functions and the subsequent
problems that fixing them caused.


1889 06-Aug-1994 dg

Incorporated 1.1.5 improvements to the bounce buffer code (i.e. make it
actually work), and additionally improved it's performance via new pmap
routines and "pbuf" allocation policy.

Submitted by: John Dyson


1888 06-Aug-1994 dg

Made the tmpstk start at tmpstk. Not doing so causes problems for the
debugger.

Submitted by: John Dyson


1887 06-Aug-1994 dg

Incorporated post 1.1.5 work from John Dyson. This includes performance
improvements via the new routines pmap_qenter/pmap_qremove and pmap_kenter/
pmap_kremove. These routine allow fast mapping of pages for those
architectures that have "normal" MMUs. Also included is a fix to the
pageout daemon to properly check a queue end condition.

Submitted by: John Dyson


1838 04-Aug-1994 dg

Added assembly versions of ffs() and bcmp().


1829 04-Aug-1994 dg

Nuked #if 0'd _insque and _remque routines - they are now inlined in
cpufunc.h.


1825 03-Aug-1994 dg

Merged in post-1.1.5 work done by myself and John Dyson. This includes:

me:
1) TLB flush optimization that effectively eliminates half of all of the
TLB flushes. This works by only flushing the TLB when a page is "present"
in memory (i.e. the valid bit is set in the page table entry). See section
5.3.5 of the Intel 386 Programmer's Reference Manual.
2) The handling of "CMAP" has been improved to catch attempts at multiple
simultaneous use.

John:
1) Added pmap_qenter/pmap_qremove functions for fast mapping of pages into
the kernel. This is for future optimizations and support for the upcoming
merged VM/buffer cache.

Reviewed by: John Dyson


1810 01-Aug-1994 dg

Removed all code related to the pagescan daemon, and changed 'act_count'
adjustments to compensate for a world without the pagescan daemon.


1704 11-Jun-1994 dg

Fixed minor spelling error.


1703 11-Jun-1994 dg

Bruce found a bug in my changes to stop using the gs selector.

From Bruce Evans:

fu[i]byte() checked the wrong register. This caused interesting behaviour
in the GPL math emulator. The emulator does not check the values returned
by fu*() or su*() (:-() and it interpreted the address of -12(%ebp) as
-1(%ebp). The same probably occurs for all signed 8-bit offsets from
registers.

I cleaned up the new bzero() a bit.


1691 06-Jun-1994 dg

Added some missing cld's (OOPS!) and changed the position of some of
the others to make them easier to spot.


1690 06-Jun-1994 dg

trap.c:
Vastly improved trap.c from me. This rewritten version has a variety of
features, amoung them: higher performance and much higher code quality.

support.s, cpufunc.h:
No longer use gs override to enforce range limits - compare directly
against VM_MAXUSER_ADDRESS instead. The old way caused problems in
preserving the gs selector...and this method is just as fast or faster.


1689 06-Jun-1994 dg

Back out previous change for the moment - I need to commit some other
changes first.


1688 06-Jun-1994 dg

Added some missing cld's (OOPS!) and changed the position of some of
the others to make them easier to spot.


1678 04-Jun-1994 dg

Removed extra (bogus) declaration of Xrsvd14 that was confusing me.


1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


1544 25-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1543,
which included commits to RCS files with non-trunk default branches.


1454 04-May-1994 rgrimes

USL copyright


1445 03-May-1994 ats

Add two routines insl and outsl, that should do 32bit string ins and outs.
Both are completely untested in the moment. They are used from the
if_ep.c driver for the EISA card.


1443 02-May-1994 sos

Update to use the timer0_divisor & timer0_prescale to determine the current
time. Speed-up to time calculation too. Now faster than before..


1442 02-May-1994 sos

Update the reprogram timer stuff, now the frequency of timer 0
can only be changed at the "right" times. Accuracy should be
assured.


1440 02-May-1994 dg

Removed some tlbflush optimizations as some of them were bogus and lead
to some strange behavior.


1431 29-Apr-1994 gclarkii

Added ifdef for GPL_MATH_EMULATE to keep the sytem from panicing when
using it.


1419 26-Apr-1994 ats

Document parts of the case statements, which instruction is emulated.
Marked all dubious parts with ATS :-) for better searching. First round
of better documentation of the math emulator and of bugfixing it.


1415 25-Apr-1994 dg

From John Dyson:

Fixed physio in the 386 case - write faults weren't properly implemented.


1407 23-Apr-1994 wollman

Define new option, INACCURATE_MICROTIME_IS_OK. When this is defined,
the NTP kernel PLL is disabled, and acquire_timer0() is enabled, thus
opening the door for microtime() (and hence gettimeofday()) to return
bogus timestamps. This option is necessary for the `pca' driver to
work, but is implemented to underscore the fact that accurate timekeeping
and the `pca' driver are incompatible at present. If someone writes a version
of microtime() that works when the `pca' driver is being used, this can get
junked.


1390 21-Apr-1994 sos

New support for sharing the timers
acquire_timer / release_timer

Pulled in timer related functions from isa.c


1386 21-Apr-1994 sos

cdev 24 used for pcaudio (PCM speaker driver)


1379 20-Apr-1994 dg

Bug fixes and performance improvements from John Dyson and myself:

1) check va before clearing the page clean flag. Not doing so was
causing the vnode pager error 5 messages when paging from
NFS. (pmap.c)
2) put back interrupt protection in idle_loop. Bruce didn't think
it was necessary, John insists that it is (and I agree). (swtch.s)
3) various improvements to the clustering code (vm_machdep.c). It's
now enabled/used by default.
4) bad disk blocks are now handled properly when doing clustered IOs.
(wd.c, vm_machdep.c)
5) bogus bad block handling fixed in wd.c.
6) algorithm improvements to the pageout/pagescan daemons. It's amazing
how well 4MB machines work now.


1378 19-Apr-1994 wollman

Reserve block device 8 and character device 32 for users' drivers.


1362 14-Apr-1994 dg

Changes from John Dyson and myself:

1) Removed all instances of disable_intr()/enable_intr() and changed
them back to splimp/splx. The previous method was done to improve
the performance, but Bruces recent changes to inline spl* have
made this unnecessary.
2) Cleaned up vm_machdep.c considerably. Probably fixed a few bugs, too.
3) Added a new mechanism for collecting page statistics - now done by
a new system process "pagescan". Previously this was done by the
pageout daemon, but this proved to be impractical.
4) Improved the page usage statistics gathering mechanism - performance is
much improved in small memory machines.
5) Modified mbuf.h to enable the support for an external free routine when
using mbuf clusters. Added appropriate glue in various places to
allow this to work.
6) Adapted a suggested change to the NFS code from Yuval Yurom to take
advantage of #5.
7) Added fault/swap statistics support.


1342 07-Apr-1994 dg

Make Bruce happy: silently enter ddb on a BPT or trace trap if ddb is
configured in the kernel.


1335 05-Apr-1994 dg

from John Dyson:

1) fixed some bugs related to the bounce buffer code
2) vnode pager now supports clustered pageouts
3) experimental code for clustering all I/O via a new "cldisksort"
4) added >16MB check to Bustek driver
5) made some experimental algorithmic changes to the pageout daemon
6) fixed bugs in truncating mapped files (esp when mapped via NFS)
7) reorganized vnode pager I/O code


1321 02-Apr-1994 dg

New interrupt code from Bruce Evans. In additional to Bruce's attached
list of changes, I've made the following additional changes:

1) i386/include/ipl.h renamed to spl.h as the name conflicts with the
file of the same name in i386/isa/ipl.h.
2) changed all use of *mask (i.e. netmask, biomask, ttymask, etc) to
*_imask (net_imask, etc).
3) changed vestige of splnet use in if_is to splimp.
4) got rid of "impmask" completely (Bruce had gotten rid of netmask),
and are now using net_imask instead.
5) dozens of minor cruft to glue in Bruce's changes.

These require changes I made to config(8) as well, and thus it must
be rebuilt.

-DG

from Bruce Evans:

sio:
o No diff is supplied. Remove the define of setsofttty(). I hope
that is enough.

*.s:
o i386/isa/debug.h no longer exists. The event counters became too
much trouble to maintain. All function call entry and exception
entry counters can be recovered by using profiling kernel (the new
profiling supports all entry points; however, it is too slow to
leave enabled all the time; it also). Only BDBTRAP() from debug.h
is now used. That is moved to exception.s. It might be worth
preserving SHOW_BITS() and calling it from _mcount() (if enabled).
o T_ASTFLT is now only set just before calling trap().
o All exception handlers set SWI_AST_MASK in cpl as soon as possible
after entry and arrange for _doreti to restore it atomically with
exiting. It is not possible to set it atomically with entering
the kernel, so it must be checked against the user mode bits in
the trap frame before committing to using it. There is no place
to store the old value of cpl for syscalls or traps, so there are
some complications restoring it.

Profiling stuff (mostly in *.s):
o Changes to kern/subr_mcount.c, gcc and gprof are not supplied yet.
o All interesting labels `foo' are renamed `_foo' and all
uninteresting labels `_bar' are renamed `bar'. A small change
to gprof allows ignoring labels not starting with underscores.
o MCOUNT_LABEL() is to provide names for counters for times spent
in exception handlers.
o FAKE_MCOUNT() is a version of MCOUNT() suitable for exception
handlers. Its arg is the pc where the exception occurred. The
new mcount() pretends that this was a call from that pc to a
suitable MCOUNT_LABEL().
o MEXITCOUNT is to turn off any timer started by MCOUNT().

/usr/src/sys/i386/i386/exception.s:
o The non-BDB BPTTRAP() macros were doing a sti even when interrupts
were disabled when the trap occurred. The sti (fixed) sti is
actually a no-op unless you have my changes to machdep.c that make
the debugger trap gates interrupt gates, but fixing that would
make the ifdefs messier. ddb seems to be unharmed by both
interrupts always disabled and always enabled (I had the branch in
the fix back to front for some time :-().
o There is no known pushal bug.
o tf_err can be left as garbage for syscalls.

/usr/src/sys/i386/i386/locore.s:
o Fix and update BDE_DEBUGGER support.
o ENTRY(btext) before initialization was dangerous.
o Warm boot shot was longer than intended.

/usr/src/sys/i386/i386/machdep.c:
o DON'T APPLY ALL OF THIS DIFF. It's what I'm using, but may require
other changes.
Use the following:
o Remove aston() and setsoftclock().
Maybe use the following:
o No netisr.h.
o Spelling fix.
o Delay to read the Rebooting message.
o Fix for vm system unmapping a reduced area of memory
after bounds_check_with_label() reduces the size of
a physical i/o for a partition boundary. A similar
fix is required in kern_physio.c.
o Correct use of __CONCAT. It never worked here for non-
ANSI cpp's. Is it time to drop support for non-ANSI?
o gdt_segs init. 0xffffffffUL is bogus because ssd_limit
is not 32 bits. The replacement may have the same
value :-), but is more natural.
o physmem was one page too low. Confusing variable names.
Don't use the following:
o Better numbers of buffers. Each 8K page requires up to
16 buffer headers. On my system, this results in 5576
buffers containing [up to] 2854912 bytes of memory.
The usual allocation of about 384 buffers only holds
192K of disk if you use it on an fs with a block size
of 512.
o gdt changes for bdb.
o *TGT -> *IDT changes for bdb.
o #ifdefed changes for bdb.

/usr/src/sys/i386/i386/microtime.s:
o Use the correct asm macros. I think asm.h was copied from Mach
just for microtime and isn't used now. It certainly doesn't
belong in <sys>. Various macros are also duplicated in
sys/i386/boot.h and libc/i386/*.h.
o Don't switch to and from the IRR; it is guaranteed to be selected
(default after ICU init and explicitly selected in isa.c too, and
never changed until the old microtime clobbered it).

/usr/src/sys/i386/i386/support.s:
o Non-essential changes (none related to spls or profiling).
o Removed slow loads of %gs again. The LDT support may require
not relying on %gs, but loading it is not the way to fix it!
Some places (copyin ...) forgot to load it. Loading it clobbers
the user %gs. trap() still loads it after certain types of
faults so that fuword() etc can rely on it without loading it
explicitly. Exception handlers don't restore it. If we want
to preserve the user %gs, then the fastest method is to not
touch it except for context switches. Comparing with
VM_MAXUSER_ADDRESS and branching takes only 2 or 4 cycles on
a 486, while loading %gs takes 9 cycles and using it takes
another.
o Fixed a signed branch to unsigned.

/usr/src/sys/i386/i386/swtch.s:
o Move spl0() outside of idle loop.
o Remove cli/sti from idle loop. sw1 does a cli, and in the
unlikely event of an interrupt occurring and whichqs becoming
zero, sw1 will just jump back to _idle.
o There's no spl0() function in asm any more, so use splz().
o swtch() doesn't need to be superaligned, at least with the
new mcounting.
o Fixed a signed branch to unsigned.
o Removed astoff().

/usr/src/sys/i386/i386/trap.c:
o The decentralized extern decls were inconsistent, of course.
o Fixed typo MATH_EMULTATE in comments. */
o Removed unused variables.
o Old netmask is now impmask; print it instead. Perhaps we
should print some of the new masks.
o BTW, trap() should not print anything for normal debugger
traps.

/usr/src/sys/i386/include/asmacros.h:
o DON'T APPLY ALL OF THIS DIFF. Just use some of the null macros
as necessary.

/usr/src/sys/i386/include/cpu.h:
o CLKF_BASEPRI() changes since cpl == SWI_AST_MASK is now normal
while the kernel is running.
o Don't use var++ to set boolean variables. It fails after a mere
4G times :-) and is slower than storing a constant on [3-4]86s.

/usr/src/sys/i386/include/cpufunc.h:
o DON'T APPLY ALL OF THIS DIFF. You need mainly the include of
<machine/ipl.h>. Unfortunately, <machine/ipl.h> is needed by
almost everything for the inlines.

/usr/src/sys/i386/include/ipl.h:
o New file. Defines spl inlines and SWI macros and declares most
variables related to hard and soft interrupt masks.

/usr/src/sys/i386/isa/icu.h:
o Moved definitions to <machine/ipl.h>

/usr/src/sys/i386/isa/icu.s:
o Software interrupts (SWIs) and delayed hardware interrupts (HWIs)
are now handled uniformally, and dispatching them from splx() is
more like dispatching them from _doreti. The dispatcher is
essentially *(handler[ffs(ipending & ~cpl)]().
o More care (not quite enough) is taken to avoid unbounded nesting
of interrupts.
o The interface to softclock() is changed so that a trap frame is
not required.
o Fast interrupt handlers are now handled more uniformally.
Configuration is still too early (new handlers would require
bits in <machine/ipl.h> and functions to vector.s).
o splnnn() and splx() are no longer here; they are inline functions
(could be macros for other compilers). splz() is the nontrivial
part of the old splx().

/usr/src/sys/i386/isa/ipl.h
o New file. Supposed to have only bus-dependent stuff. Perhaps
the h/w masks should be declared here.

/usr/src/sys/i386/isa/isa.c:
o DON'T APPLY ALL OF THIS DIFF. You need only things involving
*mask and *MASK and comments about them. netmask is now a pure
software mask. It works like the softclock mask.

/usr/src/sys/i386/isa/vector.s:
o Reorganize AUTO_EOI* macros.
o Option FAST_INTR_HANDLER_USERS_ES for people who don't trust
fastintr handlers.
o fastintr handlers need to metamorphose into ordinary interrupt
handlers if their SWI bit has become set. Previously, sio had
unintended latency for handling output completions and input
of SLIP framing characters because this was not done.

/usr/src/sys/net/netisr.h:
o The machine-dependent stuff is now imported from <machine/ipl.h>.

/usr/src/sys/sys/systm.h
o DON'T APPLY ALL OF THIS DIFF. You need mainly the different
splx() prototype. The spl*() prototypes are duplicated as
inlines in <machine/ipl.h> but they need to be duplicated here
in case there are no inlines. I sent systm.h and cpufunc.h
to Garrett. We agree that spl0 should be replaced by splnone
and not the other way around like I've done.

/usr/src/sys/kern/kern_clock.c
o splsoftclock() now lowers cpl so the direct call to softclock()
works as intended.
o softclock() interface changed to avoid passing the whole frame
(some machines may need another change for profile_tick()).
o profiling renamed _profiling to avoid ANSI namespace pollution.
(I had to improve the mcount() interface and may as well fix it.)
The GUPROF variant doesn't actually reference profiling here,
but the 'U' in GUPROF should mean to select the microtimer
mcount() and not change the interface.


1314 30-Mar-1994 dg

Eliminated the "physstrat" wart and merged it into kern_physio.c. This
patch also fixes a bug which causes a kernel VM leak.


1313 30-Mar-1994 dg

Eliminated the "physstrat" wart and merged it into kern_physio.c. This
patch also fixes a bug which causes a kernel VM leak.


1312 30-Mar-1994 dg

New routine "pmap_kenter", designed to take advantage of the special
case of the kernel pmap.


1307 24-Mar-1994 dg

From John Dyson: performance improvements to the new bounce buffer
code.


1298 23-Mar-1994 dg

Bounce buffers. From John Dyson with help from me.


1291 21-Mar-1994 ache

Now printf("changing root... indicates raw partition for floppy
f.e. fd1d


1290 21-Mar-1994 ache

Fix printf for root system mounted on second floppy


1289 21-Mar-1994 ache

Fix for root system mounted on second floppy


1288 21-Mar-1994 dg

Changed dynamic stack grow code to grow by "SGROWSIZ" amount. Initially
allocate SGROWSIZ amount of stack. Also set vm_ssize to the initial
stack VM size. Increased DFLSSIZ stack rlimit default to 8MB.


1281 19-Mar-1994 wollman

Added cpu_model and machine variables.


1262 14-Mar-1994 dg

Performance improvements from John Dyson.

1) A new mechanism has been added to prevent pages from being paged
out called "vm_page_hold". Similar to vm_page_wire, but
much lower overhead.
2) Scheduling algorithm has been changed to improve interactive
performance.
3) Paging algorithm improved.
4) Some vnode and swap pager bugs fixed.


1247 07-Mar-1994 dg

1) enhanced in_cksum from Bruce Evans.
2) minor comment change in machdep.c
3) enhanced bzero from John Dyson (twice as fast on a 486DX/33)


1246 07-Mar-1994 dg

1) "Pre-faulting" in of pages into process address space
Eliminates vm_fault overhead on process startup and
mmap referenced data for in-memory pages.

(process startup time using in-memory segments *much* faster)

2) Even more efficient pmap code. Code partially cleaned up.
More comments yet to follow.

(generally more efficient pte management)

3) Pageout clustering ( in addition to the FreeBSD V1.1 pagein
clustering.)

(much faster paging performance on non-write behind disk
subsystems, slightly faster performance on other systems.)

4) Slightly changed vm_pageout code for more efficiency and
better statistics. Also, resist swapout a little more.

(less likely to pageout a recently used page)

5) Slight improvement to the page table page trap efficiency.

(generally faster system VM fault performance)

6) Defer creation of unnamed anonymous regions pager until needed.

(speeds up shared memory bss creation)

7) Remove possible deadlock from swap_pager initialization.

8) Enhanced procfs to provide "vminfo" about vm objects and user
pmaps.

9) Increased MCLSHIFT/MCLBYTES from 2K to 4K to improve net &
socket performance and to prepare for things to come.

John Dyson
dyson@implode.root.com
David Greenman
davidg@root.com


1235 02-Mar-1994 guido

Ttys structures are now allocated dynamically via ttymalloc/ttyfree.
This inetrface should be used from now on.
pseudo device pty xx still keeps its meaning: a maximum of
xx ptys is allowed.
A ringbuffer is now 2040 bytes long, per Garrett Wollman's request.
The changes are inspired by the way NetBSD did it (thanks for that!),
though I made it slihghtly different, including the interface so
at least 75% of the allocated space is deallocated when the tty is
closed.
Note further that it is easy to modify the ringbuffer length runtime.
This will have to wait untill some later date...


-Guido


1222 27-Feb-1994 phk

dcfclk driver obsoleted by sio/TIOCTIMESTAMP.


1208 24-Feb-1994 hsu

validate sigcontext before restoring it


1151 13-Feb-1994 dg

Fixed bug in handling of COW - the original code was bogus and it was
only accidental that it worked. Also, don't cache non-managed pages.


1139 10-Feb-1994 dg

Patch from John Dyson:

a pv chain was being traversed while interrupts were
fully enabled in pmap_remove_all ... this is bogus, and
has been fixed in pmap.c. (sorry for adding the splimp)


1129 08-Feb-1994 dg

From: Dave Matthews <dave@prlng.co.uk>

Description:
The integer overflow instruction (into) and the interrupt instruction with
value 4 (int #4) both give rise to SIGBUS signals rather than SIGFPE. The
problem is that overflow is a trap not a fault (unlike the BOUND instruction).


1127 08-Feb-1994 dg

Fixed bugs in stack grow code, and moved it back into a seperate function
like it was originally. Also added back call to "grow" in sendsig now
that this routine actually works.


1124 08-Feb-1994 dg

Fixes from John Dyson to fix out-of-memory hangs and other problems (such
as increased swap space usage) related to (incorrectly) paging out the
page tables.


1116 07-Feb-1994 dg

Fixed calculation of physmem when the special MAXMEM kernel config overide
is used. This bug caused the buffer cache to be WAY too big when memory
was being restricted - resulting in hangs and other out of memory problems.


1104 06-Feb-1994 dg

At the suggestion of Bruce Evans, don't zero RTC diag register. Doing so
was causing problems for some machines.


1072 01-Feb-1994 dg

Minor cleanup. Decode state information better in the case of a fatal
trap.


1066 01-Feb-1994 dg

Bug fix from previous WINE commit. From Jeffrey Hsu.


1058 01-Feb-1994 dg

Removed all uses of "USE_486_WRITE_PROTECT" and made this automatic.
Reordered and removed some NOP's.


1055 31-Jan-1994 dg

Added four pattern memory test routine that is done at startup.


1051 31-Jan-1994 dg

WINE/user LDT support from John Brezak, ported to FreeBSD by Jeffrey Hsu
<hsu@soda.berkeley.edu>.


1046 31-Jan-1994 dg

Make I/O memory explicitly non-cacheable. This is purely an asthetic
change.


1045 31-Jan-1994 dg

VM system performance improvements from John Dyson and myself. The
following is a summary:

1) increased object cache back up to a more reasonable value.
2) removed old & bogus cruft from machdep.c (clearseg, copyseg,
physcopyseg, etc).
3) inlined many functions in pmap.c
4) changed "load_cr3(rcr3())" into tlbflush() and made tlbflush inline
assembly.
5) changed the way that modified pages are tracked - now vm_page struct
is kept updated directly - no more scanning page tables.
6) removed lots of unnecessary spl's
7) removed old unused functions from pmap.c
8) removed all use of page_size, page_shift, page_mask variables - replaced
with PAGE_ constants.
9) moved trunc/round_page, atop, ptoa, out of vm_param.h and into i386/
include/param.h, and optimized them.
10) numerous changes to sys/vm/ swap_pager, vnode_pager, pageout, fault
code to improve performance. LRU algorithm modified to be more
effective, read ahead/behind values tuned for better performance,
etc, etc...


1039 29-Jan-1994 nate

From: Wolfgang Solfrank <ws@sun-lamp.cs.berkeley.edu>
To: source-changes@sun-lamp.cs.berkeley.edu

Fix bogus fcom emulations
How did any program with floating point emulation ever work?


1028 27-Jan-1994 dg

Made pmap_is_managed a static inline function.


1021 26-Jan-1994 dg

Remove confusing and incorrect comment inherited from patchkit days.


1007 23-Jan-1994 dg

Much better fix for the hanging console problem. This one actually
works.


1005 23-Jan-1994 dg

Backed out previous commit as it requires additional kludges to work
completely, and it looks like syscons might solve the problem a
different way (although what about serial consoles??).


1004 22-Jan-1994 dg

Brute-force fix for the "hanging console" problem. Simply _don't_ call
the device close routine. This works because the device close calls
the line discipline close (which only flushes the output buffers) and
the ttyclose() routine, which does little of nothing except screw with
the session and process group fields (which is what was causing all
the problems).


991 21-Jan-1994 dg

Remove some old, unused, major UGLY code.


990 21-Jan-1994 dg

System V IPC code from Danny Boulet, chewed on a bit by the NetBSD group
and then some more by Jeffrey Hsu (who provided this port for FreeBSD).


989 20-Jan-1994 dg

Pointed out by Wolfgang Solfrank:
Correct parameters of sync


988 20-Jan-1994 dg

Removed some more old unused code/comments. Added hack to "fix" the
problem with some chipsets (UMC) remapping the 'hole' memory even when
you've got 16MB. People were led to believe that since there was only
16MB of memory in the machine, that they were okay wrt the ISA DMA
limit. This hack simply causes the extra memory to be ignored if it
appears around the 16MB limit.


987 20-Jan-1994 dg

Improved algorithm that calculates the pages in the base memory - If the
BIOS says that the amount is *between* 0-640K, believe it. Cleaned up
the comments a bit, removed some old cruff, etc.


981 17-Jan-1994 dg

Improvements mostly from John Dyson, with a little bit from me.

* Removed pmap_is_wired
* added extra cli/sti protection in idle (swtch.s)
* slight code improvement in trap.c
* added lots of comments
* improved paging and other algorithms in VM system


975 16-Jan-1994 martin

NFS Diskless booting support added.


974 14-Jan-1994 dg

"New" VM system from John Dyson & myself. For a run-down of the
major changes, see the log of any effected file in the sys/vm
directory (swap_pager.c for instance).


935 04-Jan-1994 nate

Removed wx driver hooks


924 03-Jan-1994 dg

Convert syscall to trapframe. Based on work done by John Brezak.


911 22-Dec-1993 dg

Raised minimum buffer cache from 128k to 256k.


879 19-Dec-1993 wollman

Make everything compile with -Wtraditional. Make it easier to distribute
a binary link-kit. Make all non-optional options (pagers, procfs) standard,
and update LINT to reflect new symtab requirements.

NB: -Wtraditional will henceforth be forgotten. This editing pass was
primarily intended to detect any constructions where the old code might
have been relying on traditional C semantics or syntax. These were all
fixed, and the result of fixing some of them means that -Wall is now a
realistic possibility within a few weeks.


856 13-Dec-1993 dg

added some panics to catch the condition where pmap_pte returns null
- indicating that the page table page is non-resident.


849 12-Dec-1993 dg

1) Added proc file system from Paul Kranenburg with changes from
John Dyson to make it reliably work under FreeBSD.
2) Added and enabled PROCFS in the GENERICxx and LINT kernels.
3) New execve() from me. Still work to be done here, but this version
works well and is needed before other changes can be made. For
a description of the design behind this, see freebsd-arch or
ask me.
4) Rewrote stack fault code; made user stack VM grow as needed rather
than all up front; improves performance a little and reduces
process memory requirements.
5) Incorporated fix from Gene Stark to fault/wire a user page table
page to fix a problem in copyout. This is a temporary fix and
is not appropriate for pageable page tables. For a description
of the problem, see Gene's post to the freebsd-hackers mailing
list.
6) Tighten up vm_page struct to reduce memory requirements for it. ifdef
pager page lock code as it's not being used currently.
7) Introduced new element to vmspace struct - vm_minsaddr; initial
(minimum) stack address. Compliment to vm_maxsaddr.
8) Added a panic if the allocation for process u-pages fails.
9) Improve performance and accuracy of kernel profiling by putting in
a little inline assembly instead of spl().
10) Made serial console with sio driver work. Still has problems with
serial input, but is almost useable.
11) Added -Bstatic to SYSTEM_LD in Makefile.i386 so that kernels will
build properly with the new ld.


827 03-Dec-1993 alm

From: Jeffrey Hsu <hsu@soda.berkeley.edu>

The following patch adds the addr argument to signal handlers.

The kernel with the patch is no more and no less in compliance or in
violation of POSIX and ANSI C than the kernel before the patch.

The added functionality this addr argument provides is quite useful. It
enables an entire class of algorithms which use mprotect to trace memory
references. Beside garbage collectors, I have heard of this technique being
applied to debuggers and profilers. The only benchmarking I've performed is
using akcl to compile maxima: without the kernel patch, it takes 7 hours to
compile maxima, while with stratified garbage collection, it only takes 50
minutes.

Basically, I can't think of a reason not to add the addr argument and there
is a compelling need for it.

If you find the patch acceptable, please let me know so I can send my
FreeBSD akcl config files to wfs for inclusion in the core akcl release.
The old 386BSD config files there won't work on either NetBSD or FreeBSD.


806 28-Nov-1993 dg

Patch from Gene Stark:

Subject: Page fault in PTE area fails in copyout
Index: sys/i386/i386/trap.c FreeBSD-1.0.2

Description:
Reading files of several megabytes into Emacs, or many small
files all at once, would fail with "IO error - bad address".

Repeat-By:
The bug can be exercised by a test program that malloc()'s
a 5MB chunk of memory, and then, without accessing the memory
first, filling it with data from a file using read().
(I read 64k chunks from /dev/wd0d into successive 64k regions
of the 5MB chunk.) The read() will fail with EFAULT at the first
virtual address boundary that is a multiple of 0x400000.

Fix:
The problem was code in sys/i386/i386/trap.c that tries to
figure out what kind of trap occurred and to handle it appropriately.
It was interpreting any page fault with virtual address
>= vm->vm_maxsaddr as being a user stack segment fault.
In fact, addresses >= USRSTACK are in the user structure/PTE area,
and if they are handled as stack faults, the proper PTE will
not be paged in when it is supposed to be. This situation comes
up in copyout() and copyoutstr(), if PTE's are accessed for the
first time ever. The page fault on accessing the nonexistent PTE
is mishandled as a stack fault, and then the fault that occurs on
the subsequent access to the page itself causes copyout to fail
with EFAULT.


804 27-Nov-1993 wollman

Declare cnopen, cnclose, and other console routines.


803 27-Nov-1993 rich

Declared cn{open,close,read,write,ioctl,select} extern.


801 26-Nov-1993 wollman

Fix d_write_t -> d_rdwr_t (typing error).


798 25-Nov-1993 wollman

Make the LINT kernel compile with -W -Wreturn-type -Wcomment -Werror, and
add same (sans -Werror) to Makefile for future compilations.


790 22-Nov-1993 dg

patches from Julian Elischer -
Added support for mmapping /dev/mem


785 18-Nov-1993 rgrimes

New version of scsi code from Julian


778 17-Nov-1993 wollman

Fixed comments that start within a comment, so code compiles cleanly with
-Wcomment.


774 16-Nov-1993 dg

new process tracing code from Sean Eric Fagen (sef@kithrup.com).
...also, fixed up the syscall args to make GCC happy.


760 14-Nov-1993 rgrimes

Add _bde_exists: label so that the global is really defined. Fix spelling
error (mount -> amount)


757 13-Nov-1993 dg

First steps in rewriting locore.s, and making info useful
when the machine panics.

i386/i386/locore.s:
1) got rid of most .set directives that were being used like
#define's, and replaced them with appropriate #define's in
the appropriate header files (accessed via genassym).
2) added comments to header inclusions and global definitions,
and global variables
3) replaced some hardcoded constants with cpp defines (such as
PDESIZE and others)
4) aligned all comments to the same column to make them easier to
read
5) moved macro definitions for ENTRY, ALIGN, NOP, etc. to
/sys/i386/include/asmacros.h
6) added #ifdef BDE_DEBUGGER around all of Bruce's debugger code
7) added new global '_KERNend' to store last location+1 of kernel
8) cleaned up zeroing of bss so that only bss is zeroed
9) fix zeroing of page tables so that it really does zero them all
- not just if they follow the bss.
10) rewrote page table initialization code so that 1) works correctly
and 2) write protects the kernel text by default
11) properly initialize the kernel page directory, upages, p0stack PT,
and page tables. The previous scheme was more than a bit
screwy.
12) change allocation of virtual area of IO hole so that it is
fixed at KERNBASE + 0xa0000. The previous scheme put it
right after the kernel page tables and then later expected
it to be at KERNBASE +0xa0000
13) change multiple bogus settings of user read/write of various
areas of kernel VM - including the IO hole; we should never
be accessing the IO hole in user mode through the kernel
page tables
14) split kernel support routines such as bcopy, bzero, copyin,
copyout, etc. into a seperate file 'support.s'
15) split swtch and related routines into a seperate 'swtch.s'
16) split routines related to traps, syscalls, and interrupts
into a seperate file 'exception.s'
17) remove some unused global variables from locore that got
inserted by Garrett when he pulled them out of some .h
files.

i386/isa/icu.s:
1) clean up global variable declarations
2) move in declaration of astpending and netisr

i386/i386/pmap.c:
1) fix calculation of virtual_avail. It previously was calculated
to be right in the middle of the kernel page tables - not
a good place to start allocating kernel VM.
2) properly allocate kernel page dir/tables etc out of kernel map
- previously only took out 2 pages.

i386/i386/machdep.c:
1) modify boot() to print a warning that the system will reboot in
PANIC_REBOOT_WAIT_TIME amount of seconds, and let the user
abort with a key on the console. The machine will wait for
ever if a key is typed before the reboot. The default is
15 seconds, but can be set to 0 to mean don't wait at all,
-1 to mean wait forever, or any positive value to wait for
that many seconds.
2) print "Rebooting..." just before doing it.

kern/subr_prf.c:
1) remove PANICWAIT as it is deprecated by the change to machdep.c

i386/i386/trap.c:
1) add table of trap type strings and use it to print a real trap/
panic message rather than just a number. Lot's of work to
be done here, but this is the first step. Symbolic traceback
is in the TODO.

i386/i386/Makefile.i386:
1) add support in to build support.s, exception.s and swtch.s

...and various changes to various header files to make all of the
above happen.


724 07-Nov-1993 wollman

Get rid of WFJ's use of sleep() for more user-friendly tsleep().


718 07-Nov-1993 wollman

Made all header files idempotent and moved incorrect common data from
headers into a related source file. (This is the only change to locore.s).
Also fixed pg() to be properly declared and use stdargs.


701 04-Nov-1993 dg

splnone()'s in the trap code can be deadly. Save/restore previous priority
instead.


700 04-Nov-1993 ache

DST offset calculation removed, it is wrong in any case.


693 03-Nov-1993 nate

Fixed the reason for wx.c not working. I forgot to keep all the table entries
for the driver in the kernel when NWX was defined.


692 03-Nov-1993 nate

Added wx driver support


689 01-Nov-1993 chmr

Modified the "rude stack hack" that it only applies to addresses within
the stack area and not memory above VM_MAXUSER_ADDRESS.
That way, copyout and friends now work for pages whose page table entries
have not yet been allocated/been paged out.


683 29-Oct-1993 dg

Whoops, the algorithm I last used was messed up - I left off parans, and
should have used PGSHIFT instead of PAGE_SHIFT.


682 29-Oct-1993 dg

Change filesystem buffer cache size calculation to be less for 4MB
machines (now 20% of all memory after the first 3MB). This is necessary
in order for 4MB machine to be able to rebuild the entire source tree
and not run out of physical memory because of fixed memory requirements
of processes and kernel VM.


674 26-Oct-1993 nate

Added character 21 to conf.c table for future ps/2 mouse driver. This
finishes the install of the device short of adding the driver itself to
i386/isa/psm.c


658 23-Oct-1993 jkh

Moved sound driver from major 21 to major 30


651 23-Oct-1993 jkh

New soundcard driver at major device 21


629 18-Oct-1993 dg

Yank out Christoph Robitschko's hack for the hanging console problem as
it didn't actually fix it, and because starting the getty on /dev/console
instead of /dev/vga is a good work-around.


620 16-Oct-1993 rgrimes

Removed all patch kit headers, sccsid and rcsid strings, put $Id$ in, some
minor cleanup. Added $Id$ to files that did not have any version info, etc


619 16-Oct-1993 rgrimes

Removed all patch kit headers, sccsid and rcsid strings, put $Id$ in, some
minor cleanup. Added $Id$ to files that did not have any version info, etc


608 15-Oct-1993 rgrimes

genassym.c:
Remove NKMEMCLUSTERS, it is no longer define or used.

locores.s:
Fix comment on PTDpde and APTDpde to be pde instead of pte
Add new equation for calculating location of Sysmap
Remove Bill's old #ifdef garbage for counting up memory,
that stuff will never be made to work and was just cluttering
up the file.

Add code that places the PTD, page table pages, and kernel
stack below the 640k ISA hole if there is room for it, otherwise
put this stuff all at 1MB. This fixes the 28K bogusity in
the boot blocks, that can now go away!

Fix the caclulation of where first is to be dependent on
NKPDE so that we can skip over the above mentioned areas.
The 28K thing is now 44K in size due to the increase in
kernel virtual memory space, but since we no longer have
to worry about that this is no big deal.

Use if NNPX > 0 instead of ifdef NPX for floating point code.

machdep.c
Change the calculation of for the buffer cache to be
20% of all memory above 2MB and add back the upper limit
of 2/5's of the VM_KMEM_SIZE so that we do not eat ALL
of the kernel memory space on large memory machines, note
that this will not even come into effect unless you have
more than 32MB. The current buffer cache limit is 6.7MB
due to this caclulation.

It seems that we where erroniously allocating bufpages pages
for buffer_map. buffer_map is UNUSED in this implementation
of the buffer cache, but since the map is referenced in
several if statements a quick fix was to simply allocate
1 vm page (but no real memory) to it.

pmap.h
Remove rcsid, don't want them in the kernel files!

Removed some cruft inside an #ifdef DEBUGx that caused
compiler errors if you where compiling this for debug.

Use the #defines for PD_SHIFT and PG_SHIFT in place of
constants.

trap.c:
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Remove a now completly invalid check for a maximum virtual
address, the virtual address now ends at 0xFFFFFFFF so
there is no more MAX!! (Thanks David, I completly missed
that one!)

vm_machdep.c
Remove patch kit header and rcsid, fix $Id$.
Now include "npx.h" and use NNPX for controlling the
floating point code.

Replace several 0xFE00000 constants with KERNBASE


604 14-Oct-1993 rgrimes

>From David Greenman

Bruce Evans had limited the kernel virtual address space to not include the
last 4MB since it was not being used. Other changes are being made that will
reloate the Alternate Page Directory Table (APDT) into this area so the limit
is being fixed to be the last virtual address. (Infact with this patch you
can now do that relocation)


592 13-Oct-1993 rgrimes

Removed hack that did the R_SHIFT of unsigned numbers, no longer need
to do this as I have changed to using PDTI's as the bases for the vm
system layout.

Eliminate constants SYSPDROFF and SYSPDREND, now use NKPTE to control the size
of the kernel virtual space.

Eliminate constant PDRPDROFF, now use PDTDTPI to control location of PTD,
PTDmap and PTDpde

Eliminate constant APDRPDROFF, now use APTDPTDI to control location of APTD,
APTDmap and APTDpde.

Still need to fix Sysmap location (it is still a constant).

.globl statements are now consistent with respect to <comma><space>, the
<space> being removed from all .globl statements.

Document the fillkpt macro as to what registers control what.

Fix some comments that went past column 80, and clean/line some others up.

Remove constand for _Crtat, now use KERNBASE+constant, this still needs work.

Replace constants for offsets of sigcode parameters with symbolic names
from assym.s

Mark the sigreturn() call with XXX since we use the hardcoded constant
for the system call number, this is bogus and should be a #define or
something some place!

The kernel before and after this change was verified with cmp, not one
byte changed. These are all cosmetic clean up changes that makes the
code more correct and easier to move the kernels virtual address space
and size.


590 12-Oct-1993 rgrimes

Add Page Table Directory Indexes (NKPDE, KPTDI, PTDPTDI, APTDPTDI) to
be used to replace more constants in locore.


589 12-Oct-1993 rgrimes

KPTDI_LAST renamed to KPTDI


587 12-Oct-1993 rgrimes

Eliminate definition of I386_PAGE_SIZE and use NBPG instead
Replace 0xFE000000 constants with KERNBASE
Use new definition NKPDE in place of a first-last+1 calculation.


577 11-Oct-1993 rgrimes

Add support for Holger Viets mitsumi cd rom driver as prepared by Gary
Clark II.


570 10-Oct-1993 rgrimes

SYSPDROFF and SYSPDREND are now calculated using KERNBASE, KERNSIZE and
PDRSHIFT.

The SYSTEM constant that was defined in this file has been replaced
with KERNBASE from param.h.

Changed almost all # style comments to /* */ C style comments. Several
comments cleaned up so that they make a little more since.

In the comments that describe C calling conventions to assembler routines
used a comma space sequence to seperate arguments and removed the space
between the function name and the argument list.

Removed useless comments like /* clr eax */.

Changed all comma space sequences on assemble instructions to just be comma.

Removed spaces after $ operators to make the file consistent, this may need
to change again (ie: $KERNBASE should probably be $(KERNBASE), but for now
it all seems to work just fine.) This may become a problem with the C
pre-processor.

Changed several double blank lines to single blank lines that where used
to seperate the I/O routines, these routines are blocked enough that we
don't need double blank lines between them.

Changed sequence of I/O routines to be all input functions, all output
functions instead of just the opposite.

Moved the SHOW_A_LOT debug stuff to near the end of the file.

Changed two occurances of the constant 0xfff to NBPG-1.


569 10-Oct-1993 rgrimes

Added a compile time #error so that if the user does not specify on of
the proper I_X86CPU in the config file the following error will occur
while building the kernel: (had to line wrap the error for this message)

../../i386/i386/machdep.c:343: #error This kernel is not configured for one \
of the supported CPUs


567 10-Oct-1993 rgrimes

Added PDRSHIFT and KERNSIZE so that the PDR offsets can be calculated in
locore.s instead of being constants (3F8, 3FA).


564 09-Oct-1993 rgrimes

Remove patch kit header, add $Id$, fix tabing on sound blaster cdev entry


562 09-Oct-1993 jkh

Add devsw entries for upcoming sound blaster driver.


556 08-Oct-1993 rgrimes

All:
removed patch kit headers and sccsids, add $Id$. This is a general
clean up and reallignment with NetBSD-current where possible.

genassym.c:
removed extranious include of reg.h
removed old FP_* defines that have been ifdefed out since the patch kit
removed PCB_SIGC that is not referenced anywhere
add trapframe and sigframe defines
add KERNBASE define for use in locore.s

locore.s:
include npx.h and use NNPX for turning on and off FPU
include machine/cputypes.h for the types of cpu (used in cpu_identify)
change SYSPDREND to be one higher, this is really the base of the
next area, and will be changing again next time I revise the file
Reverse the NOP defines, you now get slow NOP's by default, this
may be what is casuing us trouble with some systems. If you want
the NOPS to be null you now need to have options DUMMY_NOPS.
Now get esym from the boot blocks which don't pass it yet, and
it is not used, but this will be changing.
Move the bit_colors stuff to be in with the rest of Bruces SHOW_A_LOT
things for debugging.
Added NetBSD's CPU type probe code, we now know what type of CPU
we are running on.
Adjust kernel pde calcuation to correct for change in SYSPDREND, no
longer need the +1.

machdep.c
include npx.h and use NNPX for turning on and off FPU
include isa.h, map.h(new file), exec.h in preperation for
changes that are still in process.
Add some of the code for MACHINE_NONCONTIG that will alow us
to better map around the BIOS memory area.
Now print the version, cpu id, real memory and availiable memory
during boot.
Correct the calculation of bufpages, the code was mixing pages
and bytes, it now does the right things. Removed Bill's hack
for limiting the erronous calculation.
add the identifycpu print out code from NetBSD.
remove the definition of the sigframe struct, it belongs in
frame.h
put in printf's about syncing disks on a halt/reboot.
Change the halted message to be a little easier reading.
Clean up of the dump messages, makes the source and the output
much more readable.
Change 0,0 in several places to have spaces after the commas.


549 08-Oct-1993 rgrimes

Removed patch kit headers, and rcsid, add $Id$, relocate Terry Lamberts
copyright to match the location that it is in NetBSD.

Remove the __main() {} dummy function, it belongs in kern/init_main.c


528 30-Sep-1993 rgrimes

This is a fix for the 32K DMA buffer region that was not accounted for,
it relocates it to be after the BIOS memory hole instead of right below
the 640K limit.
THANK YOU CHRIS!!!

From: <cgd@postgres.Berkeley.EDU>
Date: Wed, 29 Sep 93 18:49:58 -0700
basically, reserve a new 32k space right after firstaddr,
and put the buffer space there...

the diffs are below, and are in ~cgd/sys/i386/i386 (in machdep.c)
on freefall. i obviously can't test them, so if some of you would
look the diffs over and try them out...


519 29-Sep-1993 rgrimes

Add symbolic name for system page directory end, and change constant to
a calculation for the system page directory tables.


504 24-Sep-1993 rgrimes

>From: rich@id.slip.bcm.tmc.edu.cdrom.com (Rich Murphey)
Date: Sun, 12 Sep 1993 18:19:05 -0500
This will allow you to compile and run a freebsd kernel with shared
memory support. I haven't tested the shm*() calls yet.

You run out of page table descriptors if you specify 4Mb of sharable
memory (SHMMAXPGS=1024). I don't know what the limit is, but
SHMMAXPGS=64 works. Rich


472 15-Sep-1993 rgrimes

Enabled floppy drive ioctl's so that disklabel on a floppy is now
quite and works correctly. This is derived from notes in Bruce Evans
lattest patches to fd.c:

>From: bde@kralizec.zeta.org.au (Bruce Evans)
>Subject: fixes for fd driver

6. I picked up some code posted the other day to implement label ioctls.
Now `disklabel fd0' works. See a comment for how to modify conf.c.


434 10-Sep-1993 nate

Removed volatile functions which were causing grief in the system, since
volatile functions are undefined, and there is no reason to have them
in our kernel.


433 10-Sep-1993 rgrimes

This is just to shut the compiler up
===================================================================
RCS file: /a/cvs/386BSD/src/sys/i386/i386/vm_machdep.c,v
retrieving revision 1.3
diff -c -r1.3 vm_machdep.c
*** 1.3 1993/07/27 10:52:21
--- vm_machdep.c 1993/09/10 20:12:53
***************
*** 179,184 ****
--- 179,186 ----
#endif
splclock();
swtch();
+ /*NOTREACHED*/
+ for(;;);
}

cpu_wait(p) struct proc *p; {


424 09-Sep-1993 rgrimes

Changed the pg("ptdi> %x") to a printf and then a panic, since we are
going to panic shortly after this anyway. Destroys less state, and
keeps the machine from waiting for someone to smash the return key
a few times before it panics!


394 06-Sep-1993 rgrimes

Added ns_cksum.c from NetBSD


391 05-Sep-1993 rgrimes

From: barclay_a_c <barclay_a_c@bt-web.bt.co.uk>
Subject: More information on "netstat -r
Date: Mon, 7 Jun 1993 11:57:33 +0100

Gentlemen,
I have been trying for some time now to get "netstat -r"
working from the 0.2.2 patchkit. I have the following solution to the
bug.

The symbol _radix_node_head needs to be added to the symbols.raw file
to enable the new routing table access. If this symbol is not present
then the old system us used which results in no routes being given. I
don't think that this justifies a patch in its own right but perhaps
could be added to a related patch sometime..


351 28-Aug-1993 rgrimes

Changed trap.c so that a panic will occur if we do not have hardware
FP and we try to call the emulator when it is not compiled in.
Removed the #if defined(i486) || defined(i387) that use to call the
panic if we did not have a math emulator.
Removed an extranious include of i386/i386/math_emu.h from math_emulate.c.


342 28-Aug-1993 rgrimes

Added support for scsi/sg.c as cdev major 18.


306 20-Aug-1993 rgrimes

Enabled call to sddump so that if you have options SCSIDUMP in your kernel
you'll get to the dump code. If you don't trust this on your disk also
add option NOT_TRUSTED, that disables the dump code, but prints out what
it WOULD do it it was going to scrible on your disk.


301 20-Aug-1993 alm

patch 1of2 to prevent kill -1 syslogd from hanging the console
blindly applied patch provided by Christoph Robitschko:
*** cons.c.orig Sat Jun 12 07:57:53 1993
--- cons.c Thu Aug 19 22:34:53 1993
***************
*** 56,61 ****
--- 56,62 ----
#include "sys/tty.h"
#include "sys/file.h"
#include "sys/conf.h"
+ #include "sys/vnode.h"

#include "cons.h"

***************
*** 105,118 ****
--- 106,130 ----
(*cp->cn_init)(cp);
}

+ static struct vnode *cnopenvp = NULLVP;
+
+
cnopen(dev, flag, mode, p)
dev_t dev;
int flag, mode;
struct proc *p;
{
+ int error;
+
+
if (cn_tab == NULL)
return (0);
dev = cn_tab->cn_dev;
+ if (cnopenvp == NULLVP)
+ if ((error = getdevvp(dev, &cnopenvp, VCHR))) {
+ printf("cnopen: getdevvp returned %d !\n", error);
+ return(error);
+ }
return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p));
}

***************
*** 121,130 ****
int flag, mode;
struct proc *p;
{
if (cn_tab == NULL)
return (0);
dev = cn_tab->cn_dev;
! return ((*cdevsw[major(dev)].d_close)(dev, flag, mode, p));
}

cnread(dev, uio, flag)
--- 133,153 ----
int flag, mode;
struct proc *p;
{
+ int error;
+
+
if (cn_tab == NULL)
return (0);
dev = cn_tab->cn_dev;
! if (vcount(cnopenvp) <= 1)
! error = (*cdevsw[major(dev)].d_close)(dev, flag, mode, p);
! else
! error = 0;
! if (error == 0) {
! vrele(cnopenvp);
! cnopenvp = NULLVP;
! return(error);
! }
}

cnread(dev, uio, flag)


269 10-Aug-1993 rgrimes

Removed one more reverence to the old Adaptec 1542b as.c driver, one less
dependent for autoconf.c.


267 09-Aug-1993 rgrimes

Finish removal of reminents of as.c Adaptec scsi driver.


259 09-Aug-1993 rgrimes

From guido@gvr.win.tue.nl Sat Aug 7 06:58:04 1993

I posted some patches on the 386bsd_patchkit list to prohibit io access.
Because of a noninitialised filed in the tss, this was possible.
It is included below as the patch to machdep.c
However, when you do this *necessary* fix (security), it will be
impossible form within user space to do io.

therefor, I included another fix: when you open /dev/io, you
get the access. Of course you can rewrite it to use another minor
and thus giving access to the iospace when /dev/mem is opened, e.g.

NOTE: The /dev/io entry has not been added to /dev/MAKEDEV yet.
The patch is in NetBSD.


256 08-Aug-1993 rgrimes

Removed the asking for a root file system when booting from floppy as that
is now handled by the new boot blocks immediatly after the kernel is loaded.


209 30-Jul-1993 jkh

Free'd up major number assigned to codrv.


200 27-Jul-1993 dg

* Applied fixes from Bruce Evans to fix COW bugs, >1MB kernel loading,
profiling, and various protection checks that cause security holes
and system crashes.
* Changed min/max/bcmp/ffs/strlen to be static inline functions
- included from cpufunc.h in via systm.h. This change
improves performance in many parts of the kernel - up to 5% in the
networking layer alone. Note that this requires systm.h to be included
in any file that uses these functions otherwise it won't be able to
find them during the load.
* Fixed incorrect call to splx() in if_is.c
* Fixed bogus variable assignment to splx() in if_ed.c


140 18-Jul-1993 paul

Added volatile void to cpu_exit() in the hope that it would
stop warning about returning from gcc.

It hasn't but the declaration is still correct.


134 16-Jul-1993 dg

New locore from Christoph Rubitschko.


132 16-Jul-1993 dg

Updated kernel files to move occurances of "struct args" syscall
argument definitions outside of the function parameter list. This is
to reduce the copious warning messages that (non-Jolitz) gcc produces.
Also fixed some bogus variable declarations and casts to make the
compiler happy.


118 12-Jul-1993 rgrimes

Fixed two occarances of ldos which should have been lods.
(From Christoph Robitschko)


8 18-Jun-1993 paul

Upgrade to GCC 2.X


5 12-Jun-1993 rgrimes

This commit was generated by cvs2svn to compensate for changes in r4,
which included commits to RCS files with non-trunk default branches.