History log of /freebsd-10.1-release/sys/powerpc/powerpc/
Revision Date Author Comments
272461 03-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


271171 05-Sep-2014 jhibbits

MFC r261095,r263464,r263752,r264189

r263464,r263752,r275189:

Mask out SRR1 bits that aren't exported to the MSR.

This appears to fix a strange condition with X on 32-bit PowerBooks I
observed, caused by one of these bits getting set in the mcontext, but
not set in the thread, which is a symptom of another problem, more
difficult to diagnose. Since these bits aren't exported anyway, this
change makes it more explicit that the bits aren't MSR-related in SRR1.

r261095:

Fix 32-bit signal handling on ppc64. This was broken when the
PSL_USERSTATIC macro was changed. Since copying 64-bit srr1 into
32-bit srr1 drops the upper 32 bits, any bits set in the context were
dropped, meaning the context check fails. Since 32-bit set_context()
can't change those bits anyway, copy the ones from the current context
(td->td_frame) before calling set_context().

Approved by: re
Relnotes: yes (Affects 10-stable, but not 10.0-release)


271113 04-Sep-2014 nwhitehorn

MFC r268880:
Allow mappings of memory not previously direct-mapped by the kernel when
calling mmap on /dev/mem and add a handler for the possible userland
machine checks that may result. Remove some pointless and wrong copy/paste
that has been in here for a decade as well.

This results in a /dev/mem with identical semantics to the x86 version.


270920 01-Sep-2014 kib

Fix a leak of the wired pages when unwiring of the PROT_NONE-mapped
wired region. Rework the handling of unwire to do the it in batch,
both at pmap and object level.

All commits below are by alc.

MFC r268327:
Introduce pmap_unwire().

MFC r268591:
Implement pmap_unwire() for powerpc.

MFC r268776:
Implement pmap_unwire() for arm.

MFC r268806:
pmap_unwire(9) man page.

MFC r269134:
When unwiring a region of an address space, do not assume that the
underlying physical pages are mapped by the pmap. This fixes a leak
of the wired pages on the unwiring of the region mapped with no access
allowed.

MFC r269339:
In the implementation of the new function pmap_unwire(), the call to
MOEA64_PVO_TO_PTE() must be performed before any changes are made to the
PVO. Otherwise, MOEA64_PVO_TO_PTE() will panic.

MFC r269365:
Correct a long-standing problem in moea{,64}_pvo_enter() that was revealed
by the combination of r268591 and r269134: When we attempt to add the
wired attribute to an existing mapping, moea{,64}_pvo_enter() do nothing.
(They only set the wired attribute on newly created mappings.)

MFC r269433:
Handle wiring failures in vm_map_wire() with the new functions
pmap_unwire() and vm_object_unwire().
Retire vm_fault_{un,}wire(), since they are no longer used.

MFC r269438:
Rewrite a loop in vm_map_wire() so that gcc doesn't think that the variable
"rv" is uninitialized.

MFC r269485:
Retire pmap_change_wiring().

Reviewed by: alc


270439 24-Aug-2014 kib

Merge the changes to pmap_enter(9) for sleep-less operation (requested
by flag). The ia64 pmap.c changes are direct commit, since ia64 is
removed on head.

MFC r269368 (by alc):
Retire PVO_EXECUTABLE.

MFC r269728:
Change pmap_enter(9) interface to take flags parameter and superpage
mapping size (currently unused).

MFC r269759 (by alc):
Update the text of a KASSERT() to reflect the changes in r269728.

MFC r269822 (by alc):
Change {_,}pmap_allocpte() so that they look for the flag
PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether
to sleep on page table page allocation.

MFC r270151 (by alc):
Replace KASSERT that no PV list locks are held with a conditional
unlock.

Reviewed by: alc
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation


267040 04-Jun-2014 nwhitehorn

MFC r266778:
Repair nested signal handling on PowerPC. The signal trampoline code
was not allocating space for the parameter save area in the stack frame.
If the compiler chose to save the argument to the signal handler on the
stack, it would overwrite the first 32 bits of the sigaction struct with
it, corrupting it for a subsequent invocation.

PR: powerpc/183040


266676 26-May-2014 nwhitehorn

MFC r265900:

Repair some races in IPI handling:
1. Make sure IPI mask is set before sending the IPI
2. Operate atomically on PS3 PIC outstanding interrupt list
3. Make sure IPIs are EOI'ed before, not after, processing. Without this,
a second IPI could be sent partway through processing the first one,
get erroneously acknowledge by the EOI to the first, and be lost. In
particular in the case of smp_rendezvous(), this can be fatal.

In combination, this makes the PS3 boot SMP again. It probably also fixes
some latent bugs elsewhere.


266312 17-May-2014 ian

MFC 263036, 263059: delete advertising clause in licenses, renumber.


266160 15-May-2014 ian

MFC r261423, r261424, r261516, r261513, r261562, r261563, r261564, r261565,
r261596, r261606

Add the imx sdhci controller.

Move Open Firmware device root on PowerPC, ARM, and MIPS systems to
a sub-node of nexus (ofwbus) rather than direct attach under nexus. This
fixes FDT on x86 and will make coexistence with ACPI on ARM systems easier.
SPARC is unchanged.

Add the missing ')' at end of sentence. Reword it to use a more common idiom.

Pass the kernel physical address to initarm through the boot param struct.

Make functions only used in vfp.c static, and remove vfp_enable.

Fix __syscall on armeb EABI. As it returns a 64-bit value it needs to
place 32-bit data in r1, not r0. 64-bit data is already packed correctly.

Use abp_physaddr for the physical address over KERNPHYSADDR. This helps us
remove the need to load the kernel at a fixed address.

Remove references to PHYSADDR where it's used only in debugging output.

Dynamically generate the page table. This will allow us to detect the
physical address we are loaded at to change the mapping.


266128 15-May-2014 ian

MFC r261351, r261352, r261355, r261396, r261397, r261398, r261403, r261404,
r261405

Open Firmware interrupt specifiers can consist of arbitrary-length byte
strings and include arbitrary information (IRQ line/domain/sense). When the
ofw_bus_map_intr() API was introduced, it assumed that, as on most systems,
these were either 1 cell, containing an interrupt line, or 2, containing
a line number plus a sense code. It turns out a non-negligible number of
ARM systems use 3 (or even 4!) cells for interrupts, so make this more
general.

Provide a simpler and more standards-compliant simplebus implementation to
get the Routerboard 800 up and running with the vendor device tree. This
does not implement some BERI-specific features (which hopefully won't be
necessary soon), so move the old code to mips/beri, with a higher attach
priority when built, until MIPS interrupt domain support is rearranged.

Allow nesting of simplebuses.

Add a set of helpers (ofw_bus_get_status() and ofw_bus_status_okay()) to
process "status" properties of OF nodes.

Fix one remnant endian flaw in nexus.


266020 14-May-2014 ian

MFC r258800, r258802, r258805, r258806, r258807, r258851, r258857,
r259199, r259484, r259513, r259514, r259516

The kernel stack guard pages are only below the stack pointer, not above.

Remove unnecessary double-setting of the thread's onfault state in
copyinstr().

Open Firmware mandates that certain cross-references, in particular those
in /chosen, be ihandles. The ePAPR spec makes those cross-reference phandles,
since FDT has no concept of ihandles. Have the OF FDT CI module interpret
queries about ihandles as cross-reference phandles.

Real OF systems have an ihandle under /chosen/stdout, not a phandle. Use
the right type.

Rearchitect platform memory map parsing to make it less
Open Firmware-centric.

Remove fdtbus_bs_tag definition, which is now obsolete. The remainder of
this file is also slated for future demolition.

Return the correct IEEE 1275 code for "nextprop".

Use the common Open Firmware PCI interrupt routing code instead of the
duplicate version in dev/fdt.

Configure interrupt sense based on device tree information.

Simplify the ofw_bus_lookup_imap() API slightly: make it allocate maskbuf
internally instead of requiring the caller to allocate it.


266005 14-May-2014 ian

MFC r258259, r258798, r259010

Unify handling of illegal instruction faults between AIM and Book-E.

Make uart_cpu_powerpc work on both FDT and OFW systems.

Fix debug printfs in FPU_EMU to compile on powerpc64 and enable it for
powerpc64.


266004 14-May-2014 ian

MFC r258247, r258250, r258257

Remove a pointless #ifdef AIM. This is just PPC64 specific, including
64-bit Book-E.

Make single precision floating point arithmetic actually work

Split the function of the PCB_FPU flags into two: PCB_FPU now indicates that
the actual FPU is enabled, while PCB_FPREGS indicates that the FPU state
structure in the PCB is valid.


266003 14-May-2014 ian

MFC r257995, r258244, r258246,

Rename the "bare" platform "mpc85xx"
Also turn "bare" into a truly bare platform

Move CCSR discovery into the platform module

There is no reason Book-E needs to save XER and CTR on context switches.


266001 14-May-2014 ian

MFC r258002, r258024, r258027, r258051, r258052, r258243, r258244, r258002,
r258024, r258027, r258051, r258052, r258243,

Follow up r223485, which made AIM use the ABI thread pointer instead of
PCPU fields for curthread, by doing the same to Book-E.

Use the same implementation of copyinout.c for both AIM and Book-E.

Actually add IOMMU domain to the list of known mappings.

Following the approach with ACPI DMAR on x86, split IOMMU handling into
a variant PCI bus instead of trying to shoehorn it into the PCI host bridge
adapter.

Make sure that TLB1 mappings are aligned correctly.


265996 14-May-2014 ian

MFC r257161, r257169, r257178, r257190, r257191

Add pmap_mapdev_attr() and pmap_kenter_attr() interfaces.

Fix concurrency issues with TLB1 updates and make pmap_kextract() search
TLB1 mappings as well

Interrelated improvements to early boot mappings:
- Remove explicit requirement that the SOC registers be found except as an
optimization (although the MPC85XX LAW drivers still require they be found
externally, which should change).
- Remove magic CCSRBAR_VA value.
- Allow bus_machdep.c's early-boot code to handle non 1:1 mappings and
systems not in real-mode or global 1:1 maps in early boot.
- Allow pmap_mapdev() on Book-E to reissue previous addresses if the
area is already mapped. Additionally have it check all mappings, not
just the CCSR area.

Add some extra sanity checking and checks to printf format specifiers.

Bump initial TLB size. The kernel is not necessarily less than 16 MB

Handle (in a slightly ugly way) ePAPR-type loaders that just place a
device tree into r3.


265972 13-May-2014 ian

MFC r257115, r257116, r257117

Remove dead and duplicated code.


265969 13-May-2014 ian

MFC r256994, r257016, r257055, r257059, r257060, r257075

Add two new interfaces to ofw_bus:
- ofw_bus_map_intr()
Maps an (iparent, IRQ) tuple to a system-global interrupt number in some
platform dependent way. This is meant to be implemented as a replacement
for [FDT_]MAP_IRQ() that is an MI interface that knows about the bus
hierarchy.
- ofw_bus_config_intr()
Configures an interrupt (previously mapped) based on firmware sense flags.
This replaces manual interpretation of the sense field in bus drivers and
will, in a follow-up, allow that interpretation to be redirected to the PIC
drivers where it belongs. This will eventually replace the tables in
/sys/dev/fdt/fdt_ARCH.c

The PowerPC/AIM code has been converted to use these globally, with an
implementation in terms of MAP_IRQ() and powerpc_config_intr(), assuming
OpenPIC, at the bus root in nexus(4). The ofw_bus_config_intr() will shortly
be integrated into pic_if.m and bounced through nexus into the PIC tree.

Factor out MI portions of the PowerPC nexus device into /sys/dev/ofw. The
sparc64 driver will be modified to use this shortly.

Allow PIC drivers to translate firmware sense codes for themselves. This
is designed to replace the tables in dev/fdt/fdt_ARCH.c, but will not
happen quite yet.

Do not map IRQs twice. This fixes PowerPC/FDT systems with multiple PICs,
which would try to treat the previously-mapped interrupts from
fdt_decode_intr() as interrupt line numbers on the same parent PIC.

Remove some of the code required for supporting ssm(4) on SPARC in favor
of a more PowerPC/FDT-focused design. Whenever SPARC64 is integrated
into this rework, this should be (trivially) revisited.


265967 13-May-2014 ian

MFC r256932, r256938, r256966, r256953, r256967, r256969, r257015:

Add a new function (OF_getencprop()) that undoes the transformation applied
by encode-int. Specifically, it takes a set of 32-bit cell values and
changes them to host byte order. Most non-string instances of OF_getprop()
should be using this function, which is a no-op on big-endian platforms.

Use the new function all over the place.


265960 13-May-2014 ian

MFC r256901, r256914 (by nwhitehorn):

Catch up on 6 years of improvements in Open Firmware nexus devices by
importing the sparc64 one. At least 90% of this code is MI and will be
moved into /sys/dev/ofw at some point in the future.

Ignore registers on devices where the reg property is malformed. Issue a
warning if this happens under bootverbose. This prevents some
strange-looking entries in dmesg for SMU devices on Apple G5 systems.


265959 13-May-2014 ian

MFC r256870, r256898, r256899, r256900 (by nwhitehorn):

Standards-conformance and code deduplication:
- Use bus reference phandles in place of FDT offsets as IRQ domain keys
- Unify the identical macio/fdt/mambo OpenPIC drivers into one
- Be more forgiving (following ePAPR) about what we need from the device
tree to identify an OpenPIC
- Correctly map all IRQs into an interrupt domain
- Set IRQ_*_CONFORM for interrupts on an unknown PIC type instead of
failing attachment for that device.

Allow lots of interrupts (useful on multi-domain platforms) and do not
set device_quiet() on all devices attached under nexus(4).


265954 13-May-2014 ian

MFC r256814, r256816, r256818, r256846, r256855, r256864 (by nwhitehorn):

- Handle 2GB of ram
- Allow the OFW interrupt mapping code to work with PCI devices not
enumerated by Open Firmware, as in the case of FDT.
- Provide an interface for PCI bus drivers that need some of ofw_pci's
metadata during attach.
- Use standard ofw_bus helpers instead of reinventing the wheel.
- Make hard-wired TLB allocations be at minimum one page.


265952 13-May-2014 ian

MFC r256792, r256793, r256799 (by nwhitehorn): Unify AIM and booke code.


265606 07-May-2014 scottl

Merge r264984

Retire smp_active. It was racey and caused demonstrated problems with
the cpufreq code. Replace its use with smp_started. There's at least
one userland tool that still looks at the kern.smp.active sysctl, so
preserve it but point it to smp_started as well.

Obtained from: Netflix, Inc.


262675 02-Mar-2014 jhibbits

MFC r261309

Unbreak non-SMP builds. This was broken by r259284. Also, reorganize the
code introduced in that revision a bit.


260674 15-Jan-2014 jhibbits

MFC r259284,r259287

Add PMU-based CPU frequency scalling. This is used on most Titanium
PowerBooks.


259510 17-Dec-2013 kib

MFC r257228:
Add bus_dmamap_load_ma() function to load map with the array of
vm_pages.


259256 12-Dec-2013 andreast

MFC: r258722, r258757

r258722:
Give some output about the CPU clock on IBMPOWER machines, currently read
from OF. Linux does it similar, means they also read the OF values and
display them.
r258757:
Use the Open Firmware-based CPU frequency determination as a generic
fallback if we can't measure CPU frequency. This is also useful on a
variety of embedded systems using FDT.


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255887 26-Sep-2013 alc

Eliminate the declaration for a method that is no longer used. (This
change should have been a part of r255724.)

Reminded by: nathan
Approved by: re (gjb)


255724 20-Sep-2013 alc

The pmap function pmap_clear_reference() is no longer used. Remove it.

pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


255640 17-Sep-2013 nwhitehorn

Add POWER7+ and POWER8 to the CPU ID table.

Approved by: re (kib)


255639 17-Sep-2013 nwhitehorn

Make sure to copy segments back to the segs array if non-NULL. This is
relied upon by bus_dmamap_load_mbuf_sg() (i.e. all network drivers).

Approved by: re (kib)
MFC after: 2 weeks


255614 16-Sep-2013 nwhitehorn

Fix bug in busdma: if segs is a preexisting buffer, we memcpy it
into the DMA map. The length of the buffer had not yet been
initialized, however, so this would copy gibberish unless it
happened to be right by chance. This bug mostly only affected
systems with IOMMUs.

Approved by: re (gjb)
MFC after: 3 days


255419 09-Sep-2013 nwhitehorn

Raise artificial limits on number of CPUs and number of interrupts.

Approved by: re (kib)


255418 09-Sep-2013 nwhitehorn

Add POWER CPUs to the kernel's knowledge. This does not imply we currently
actually run on any machines with POWER CPUs but avoids closing that door
unnecessarily.

Approved by: re (kib)


255417 09-Sep-2013 nwhitehorn

Add hook called when every new processor is brought online -- including the
BSP -- so that platform modules have a chance to add the new CPU to any
internal bookkeeping.

Approved by: re (kib)


255028 29-Aug-2013 alc

Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division


254025 07-Aug-2013 jeff

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253367 15-Jul-2013 ae

Include sys/systm.h after sys/param.h.

Suggested by: pluknet


251900 18-Jun-2013 rpaulo

Fix a KTR_BUSDMA format string.


248508 19-Mar-2013 kib

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


248280 14-Mar-2013 kib

Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks


247454 28-Feb-2013 davide

MFcalloutng:
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.

The commmit is not targeted for MFC.


246713 12-Feb-2013 kib

Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)


239008 03-Aug-2012 jhb

Improve the handling of static DMA buffers that use non-default memory
attributes (currently just BUS_DMA_NOCACHE):
- Don't call pmap_change_attr() on the returned address, instead use
kmem_alloc_contig() to ask the VM system for memory with the requested
attribute.
- As a result, always use kmem_alloc_contig() for non-default memory
attributes, even for sub-page allocations. This requires adjusting
bus_dmamem_free()'s logic for determining which free routine to use.
- For x86, add a new dummy bus_dmamap that is used for static DMA
buffers allocated via kmem_alloc_contig(). bus_dmamem_free() can then
use the map pointer to determine which free routine to use.
- For powerpc, add a new flag to the allocated map (bus_dmamem_alloc()
always creates a real map on powerpc) to indicate which free routine
should be used.

Note that the BUS_DMA_NOCACHE handling in powerpc is currently #ifdef'd out.
I have left it disabled but updated it to match x86.

Reviewed by: scottl
MFC after: 1 month


238357 10-Jul-2012 alc

Avoid recursion on the pvh global lock in the aim oea pmap.

Correct the return type of the pmap_ts_referenced() implementations.

Reported by: jhibbits [1]
Tested by: andreast


236141 27-May-2012 raj

Let us manage differences of Book-E PowerPC variations i.e. vendor /
implementation specific vs. the common architecture definition.

Bring PPC4XX defines (PSL, SPR, TLB). Note the new definitions under
BOOKE_PPC4XX are not used in the code yet.

This change set is not supposed to affect existing E500 support, it's just
another reorg step before bringing support for E500mc, E5500 and PPC465.

Obtained from: AppliedMicro, Freescale, Semihalf


236119 26-May-2012 raj

Move OpenPIC FDT bus glue to a shared location, so that other PowerPC
platforms can use it, not only MPC85XX.

This is just reorg, no functional changes.


236097 26-May-2012 raj

Rename e500 prefix to match other Book-E CPU variations. CPU id tidbits for
the new cores.

Obtained from: Freescale, Semihalf.


236000 25-May-2012 raj

Missing vm_paddr_t bits which should have been part of r235936.


235936 24-May-2012 raj

Fix physical address type to vm_paddr_t.


235689 20-May-2012 nwhitehorn

Replace the list of PVOs owned by each PMAP with an RB tree. This simplifies
range operations like pmap_remove() and pmap_protect() as well as allowing
simple operations like pmap_extract() not to involve any global state.
This substantially reduces lock coverages for the global table lock and
improves concurrency.


234580 22-Apr-2012 nwhitehorn

Remove dead code. The routines in atomic.S did not work properly anyway, and
were everywhere unused. If we turn out to need them, they should be
reimplemented.

MFC after: 2 weeks


234579 22-Apr-2012 nwhitehorn

Replace eieio; sync for creating bus-space memory barriers with sync.
sync performs a strict superset of the functions of eieio, so using both
is redundant. While here, expand bus barriers to all bus_space operations,
since many drivers do not correctly use bus_space_barrier().

In principle, we can also replace sync just with eieio, for a significant
performance increase, but it remains to be seen whether any poorly-written
drivers currently depend on the side effects of sync to properly function.

MFC after: 1 week


234115 11-Apr-2012 nwhitehorn

Do not restore the register holding the TLS pointer when doing various
usermode context switches (long jumps and ucontext operations). If these
are used across threads, multiple threads can end up with the same TLS base.
Madness will then result.

This makes behavior on PPC match that on x86 systems and on Linux.

MFC after: 10 days


232356 01-Mar-2012 jhb

- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
constraints.

These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.

Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.

Reviewed by: scottl


230400 20-Jan-2012 andreast

This commit adds profiling support for powerpc64. Now we can do application
profiling and kernel profiling. To enable kernel profiling one has to build
kgmon(8). I will enable the build once I managed to build and test powerpc
(32-bit) kernels with profiling support.

- add a powerpc64 PROF_PROLOGUE for _mcount.
- add macros to avoid adding the PROF_PROLOGUE in certain assembly entries.
- apply these macros where needed.
- add size information to the MCOUNT function.

MFC after: 3 weeks, together with r230291


230123 15-Jan-2012 nwhitehorn

Rework SLB trap handling so that double-faults into an SLB trap handler are
possible, and double faults within an SLB trap handler are not. The result
is that it possible to take an SLB fault at any time, on any address, for
any reason, at any point in the kernel.

This lets us do two important things. First, it removes the (soft) 16 GB RAM
ceiling on PPC64 as well as any architectural limitations on KVA space.
Second, it lets the kernel tolerate poorly designed hypervisors that
have a tendency to fail to restore the SLB properly after a hypervisor
context switch.

MFC after: 6 weeks


229967 11-Jan-2012 nwhitehorn

Add a memory barrier to bus_dmamap_sync(), as should have always been
present. We need a sync instead of eieio, as eieio does not enforce storage
ordering between main and device memory.


227537 15-Nov-2011 marius

As it turns out, r186347 actually is insufficient to avoid the use of the
curthread-accessing part of mtx_{,un}lock(9) when using a r210623-style
curthread implementation on sparc64, crashing the kernel in its early
cycles as PCPU isn't set up, yet (and can't be set up as OFW is one of the
things we need for that, which leads to a chicken-and-egg problem). What
happens is that due to the fact that the idea of r210623 actually is to
allow the compiler to cache invocations of curthread, it factors out
obtaining curthread needed for both mtx_lock(9) and mtx_unlock(9) to
before the branch based on kobj_mutex_inited when compiling the kernel
without the debugging options. So change kobj_class_compile_static(9)
to just never acquire kobj_mtx, effectively restricting it to its
documented use, and add a kobj_init_static(9) for initializing objects
using a class compiled with the former and that also avoids using mutex(9)
(and malloc(9)). Also assert in both of these functions that they are
used in their intended way only.
While at it, inline kobj_register_method() and kobj_unregister_method()
as there wasn't much point for factoring them out in the first place
and so that a reader of the code has to figure out the locking for
fewer functions missing a KOBJ_ASSERT.
Tested on powerpc{,64} by andreast.

Reviewed by: nwhitehorn (earlier version), jhb
MFC after: 3 days


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


226410 15-Oct-2011 nwhitehorn

Enforce a memory barrier in stream operations, as is done on other
bus_space calls. This makes ath(4) work correctly on PowerPC.

Submitted by: adrian
Tested by: andreast
MFC after: 3 days


225953 03-Oct-2011 mav

Revert r225875, r225877:
It is reported that on some chips (e.g. the 970MP) behavior of POW bit set
simultaneously with modifying other bits is undefined and may cause hangs.
The race should be handled in some other way, but for now just get back.

Reported by: nwitehorn


225877 29-Sep-2011 mav

Add header missed in r225875.

MFC after: 3 days


225875 29-Sep-2011 mav

Handle the race in cpu_idle() when due to the critical section CPU could get
into sleep after receiving interrupt, delaying interrupt thread execution
indefinitely until the next interrupt arrive.

Reviewed by: nwhitehorn
MFC after: 3 days


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


223758 04-Jul-2011 attilio

With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by: marcel, marius, alc
Tested by: pluknet
MD testing by: marcel, marius, gonzo, andreast


223485 23-Jun-2011 nwhitehorn

Use the ABI-mandated thread pointer register (r2 for ppc32, r13 for ppc64)
instead of a PCPU field for curthread. This averts a race on SMP systems
with a high interrupt rate where the thread looking up the value of
curthread could be preempted and migrated between obtaining the PCPU
pointer and reading the value of pc_curthread, resulting in curthread being
observed to be the current thread on the thread's original CPU. This played
merry havoc with the system, in particular with mutexes. Many thanks to
jhb for helping me work this one out.

Note that Book-E is in principle susceptible to the same problem, but has
not been modified yet due to lack of Book-E hardware.

MFC after: 2 weeks


223470 23-Jun-2011 andreast

Add leading zeros when printing the stackframe on __powerpc64__.


222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


222531 31-May-2011 nwhitehorn

On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by: jhb


221988 16-May-2011 nwhitehorn

Fix a </<= mixup. This could result in suboptimal performance on the last
page of physical memory.


221738 10-May-2011 nwhitehorn

Only try to set up IPIs at boot on systems that actually have more than one
CPU. This fixes a panic observed on Heathrow-based systems without
SMP-capable PICs when the kernel had both options SMP and INVARIANTS.

MFC after: 5 days


221173 28-Apr-2011 attilio

Add the watchdogs patting during the (shutdown time) disk syncing and
disk dumping.
With the option SW_WATCHDOG on, these operations are doomed to let
watchdog fire, fi they take too long.

I implemented the stubs this way because I really want wdog_kern_*
KPI to not be dependant by SW_WATCHDOG being on (and really, the option
only enables watchdog activation in hardclock) and also avoid to
call them when not necessary (avoiding not-volountary watchdog
activations).

Sponsored by: Sandvine Incorporated
Discussed with: emaste, des
MFC after: 2 weeks


220638 14-Apr-2011 andreast

Add stoppcbs[] arrays on powerpc(64) and have each CPU save its
current context in the IPI_STOP handler. Similar as done on other
architectures.

Approved by: nwhitehorn (mentor)


220597 13-Apr-2011 nwhitehorn

Make sure that extra threads in 32-bit processes stay in 32-bit mode. This
fixes operation of threaded 32-bit binaries on 64-bit kernels.


219405 08-Mar-2011 dchagin

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


218195 02-Feb-2011 mdf

Put the general logic for being a CPU hog into a new function
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after: 1 week


218184 02-Feb-2011 marcel

Rename INTR_VEC to MAP_IRQ. From the OFW or FDT we obtain a
PIC handle with interrupt pin. This we map to the resource
called SYS_RES_IRQ.


218081 29-Jan-2011 nwhitehorn

Fix boot on SMP systems after r218075 by delaying CPU binding until a
SYSINIT.

Reviewed by: marcel


218075 29-Jan-2011 marcel

Fix the interrupt code, broken 7 months ago. The interrupt framework
already supported nested PICs, but was limited to having a nested
AT-PIC only. With G5 support the need for nested OpenPIC controllers
needed to be added. This was done the wrong way and broke the MPC8555
eval system in the process.

OFW, as well as FDT, describe the interrupt routing in terms of a
controller and an interrupt pin on it. This needs to be mapped to a
flat and global resource: the IRQ. The IRQ is the same as the PCI
intline and as such needs to be representable in 8 bits. Secondly,
ISA support pretty much dictates that IRQ 0-15 should be reserved
for ISA interrupts, because of the internal workins of south bridges.
Both were broken.

This change reverts revision 209298 for a big part and re-implements
it simpler. In particular:
o The id() method of the PIC I/F is removed again. It's not needed.
o The openpic_attach() function has been changed to take the OFW
or FDT phandle of the controller as a second argument. All bus
attachments that previously used openpic_attach() as the attach
method of the device I/F now implement as bus-specific method
and pass the phandle_t to the renamed openpic_attach().
o Change powerpc_register_pic() to take a few more arguments. In
particular:
- Pass the number of IPIs specificly. The number of IRQs carved
out for a PIC is the sum of the number of int. pins and IPIs.
- Pass a flag indicating whether the PIC is an AT-PIC or not.
This tells the interrupt framework whether to assign IRQ 0-15
or some other range.
o Until we implement proper multi-pass bus enumeration, we have to
handle the case where we need to map from PIC+pin to IRQ *before*
the PIC gets registered. This is done in a similar way as before,
but rather than carving out 256 IRQs per PIC, we carve out 128
IRQs (124 pins + 4 IPIs). This is supposed to handle the G5 case,
but should really be fixed properly using multiple passes.
o Have the interrupt framework set root_pic in most cases and not
put that burden in PIC drivers (for the most part).
o Remove powerpc_ign_lookup() and replace it with powerpc_get_irq().
Remove IGN_SHIFT, INTR_INTLINE and INTR_IGN.

Related to the above, fix the Freescale PCI controller driver, broken
by the FDT code. Besides not attaching properly, bus numbers were
assigned improperly and enumeration was broken in general. This
prevented the AT PIC from being discovered and interrupt routing to
work properly. Consequently, the ata(4) controller stopped functioning.

Fix the driver, and FDT PCI support, enough to get the MPC8555CDS
going again. The FDT PCI code needs a whole lot more work.

No breakages are expected, but lackiong G5 hardware, it's possible
that there are unpleasant side-effects. At least MPC85xx support is
back to where it was 7 months ago -- it's amazing how badly support
can be broken in just 7 months...

Sponsored by: Juniper Networks


217896 26-Jan-2011 dchagin

Add macro to test the sv_flags of any process. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures.

MFC after: 1 month


217515 17-Jan-2011 jkim

Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.

MFC after: 1 month


217400 14-Jan-2011 kib

Enable shared page for the signal trampolines on PowerPC.

Reviewed and tested by: nwhitehorn


217072 06-Jan-2011 jhb

Remove bogus usage of INTR_FAST. "Fast" interrupts are now indicated by
registering a filter handler rather than a threaded handler. Also remove
a bogus use of INTR_MPSAFE for a filter.


216154 03-Dec-2010 nwhitehorn

Provide a simple IOMMU framework on PowerPC, which is required to support
PPC hypervisors.


215182 12-Nov-2010 nwhitehorn

Add CPU support code for the IBM Cell Broadband Engine.


215160 12-Nov-2010 nwhitehorn

Remove or conditionalize some hypervisor-unfriendly instruction sequences.


215159 12-Nov-2010 nwhitehorn

Add some platform KOBJ extensions and continue integrating PowerPC
hypervisor infrastructure support:
- Fix coexistence of multiple platform modules in the same kernel
- Allow platform modules to provide an SMP topology
- PowerPC hypervisors limit the amount of memory accessible in real mode.
Allow the platform modules to specify the maximum real-mode address,
and modify the bits of the kernel that need to allocate
real-mode-accessible buffers to respect this limits.


215157 12-Nov-2010 nwhitehorn

Centralize CPU idle routines into powerpc/cpu.c and use the same
cpu_idle_hook mechanism that x86 uses for overriding the idle routine.
This is required for supporting ilding the CPU under PowerPC hypervisors.


215101 10-Nov-2010 nwhitehorn

Entering deep nap mode on the 970MP requires that both MSR[NAP] and
MSR[DEEPNAP] be set, not just MSR[DEEPNAP]. Fixing this reduces the idle
temperature of my CPUs from 57 to 38 degrees and makes one-shot timer
mode work properly.

Hint from: mav
MFC after: 4 days


214574 30-Oct-2010 nwhitehorn

Restructure the way the copyin/copyout segment is stored to prevent a
concurrency bug. Since all SLB/SR entries were invalidated during an
exception, a decrementer exception could cause the user segment to be
invalidated during a copyin()/copyout() without a thread switch that
would cause it to be restored from the PCB, potentially causing the
operation to continue on invalid memory. This is now handled by explicit
restoration of segment 12 from the PCB on 32-bit systems and a check in
the Data Segment Exception handler on 64-bit.

While here, cause copyin()/copyout() to check whether the requested
user segment is already installed, saving some pipeline flushes, and
fix the synchronization primitives around the mtsr and slbmte
instructions to prevent accessing stale segments.

MFC after: 2 weeks


213383 03-Oct-2010 nwhitehorn

Add a memory-range interface to /dev/mem on PowerPC using PAT attributes.
Unlike actual MTRR, this only controls the mapping attributes for
subsequent mmap() of /dev/mem. Nonetheless, the support is sufficiently
MTRR-like that Xorg can use it, which translates into an enormous increase
in graphics performance on PowerPC.

MFC after: 2 weeks


213360 02-Oct-2010 nwhitehorn

Fix some KTR arguments that were breaking the LINT build.

Pointy hat to: me


213307 30-Sep-2010 nwhitehorn

Add support for memory attributes (pmap_mapdev_attr() and friends) on
PowerPC/AIM. This is currently stubbed out on Book-E, since I have no
idea how to implement it there.


213282 29-Sep-2010 neel

Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.

The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by: NetApp
Submitted by: Will McGovern (will at netapp dot com)
Reviewed by: mjacob, jhb


212586 13-Sep-2010 nwhitehorn

Fix a missing set of parantheses that could cause recent versions of libthr
to crash deferencing a NULL pointer to the user context on powerpc64
systems with COMPAT_FREEBSD32 defined.


212559 13-Sep-2010 nwhitehorn

Fix a subtle bug uncovered by the recent one-shot timer import in which
any spin locks acquired between the enabling of interrupts in
machdep_ap_bootstrap() and the invocation of the scheduler would fail to
have interrupts disabled due to the fake spinlock already held by the
idle thread. sched_throw(NULL) will enable interrupts by itself when
exiting this spinlock, so just let it do that and don't enable interrupts
here.


212556 13-Sep-2010 mav

Change call order to enable interrupts only after timer being programmed.

Submitted by: nwhitehorn


212541 13-Sep-2010 mav

Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.


212453 11-Sep-2010 mav

Update PowerPC event timer code to use new event timers infrastructure.

Reviewed by: nwitehorn
Tested by: andreast
H/W donated by: Gheorghe Ardelean


212170 03-Sep-2010 grehan

- Bump MAXCPU to 4. Tested on a quad G5 with both 32 and 64-bit kernels.
A make buildkernel -j4 uses ~360% CPU.
- Bracket the AP spinup printf with a mutex to avoid garbled output.
- Enable SMP by default on powerpc64.

Reviewed by: nwhitehorn


212054 31-Aug-2010 nwhitehorn

Restructure how reset and poweroff are handled on PowerPC systems, since
the existing code was very platform specific, and broken for SMP systems
trying to reboot from KDB.

- Add a new PLATFORM_RESET() method to the platform KOBJ interface, and
migrate existing reset functions into platform modules.
- Modify the OF_reboot() routine to submit the request by hand to avoid
the IPIs involved in the regular openfirmware() routine. This fixes
reboot from KDB on SMP machines.
- Move non-KDB reset and poweroff functions on the Powermac platform
into the relevant power control drivers (cuda, pmu, smu), instead of
using them through the Open Firmware backdoor.
- Rename platform_chrp to platform_powermac since it has become
increasingly Powermac specific. When we gain support for IBM systems,
we will grow a new platform_chrp.


210939 06-Aug-2010 jhb

Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month


209975 13-Jul-2010 nwhitehorn

MFppc64:

Kernel sources for 64-bit PowerPC, along with build-system changes to keep
32-bit kernels compiling (build system changes for 64-bit kernels are
coming later). Existing 32-bit PowerPC kernel configurations must be
updated after this change to specify their architecture.


209950 12-Jul-2010 nwhitehorn

Unify ABI-related bits of the Book-E and AIM machdep routines
(exec_setregs, etc.) in order to simplify the addition of 64-bit support,
and possible future extension of the Book-E code to handle hard floating
point and Altivec.

MFC after: 1 month


209908 11-Jul-2010 raj

Convert Freescale PowerPC platforms to FDT convention.

The following systems are affected:

- MPC8555CDS
- MPC8572DS

This overhaul covers the following major changes:

- All integrated peripherals drivers for Freescale MPC85XX SoC, which are
currently in the FreeBSD source tree are reworked and adjusted so they
derive config data out of the device tree blob (instead of hard coded /
tabelarized values).

- This includes: LBC, PCI / PCI-Express, I2C, DS1553, OpenPIC, TSEC, SEC,
QUICC, UART, CFI.

- Thanks to the common FDT infrastrucutre (fdtbus, simplebus) we retire
ocpbus(4) driver, which was based on hard-coded config data.

Note that world for these platforms has to be built WITH_FDT.

Reviewed by: imp
Sponsored by: The FreeBSD Foundation


209850 09-Jul-2010 nwhitehorn

MFppc64:

Use longs instead of ints as the native word type in bcopy(). This will
expand nicely on 64-bit systems.


209849 09-Jul-2010 nwhitehorn

MFppc64:

Check if devices are direct-mapped individually instead of just checking
the value of hw_direct_map.


209812 08-Jul-2010 nwhitehorn

Replace the existing PowerPC busdma implementation with the one from
amd64 (with slight modifications). This provides support for bounce
buffers, which are required on systems with RAM above 4 GB.


209726 06-Jul-2010 nwhitehorn

It does not actually make sense to provide an IPI facility on non-root
PICs, so replace cpuid logic with an assert.


209725 06-Jul-2010 nwhitehorn

Fix interrupt distribution to multiple CPUs on systems with cascaded PICs.
Because slave PICs send all interrupts to their CPU 0 output line (which
is routed to a pin on the master PIC), changes to per-CPU register banks
like EOI on the slave PIC must be accessed for CPU 0, instead of the
CPU actually processing the interrupt.

Submitted by: Andreas Tobler


209724 06-Jul-2010 nwhitehorn

Move the EOI logic when starting ithreads into intr_machdep instead of
relying on it as a side effect of PIC_MASK() in the PIC drivers, and add
an inmplementation of assign_cpu() for the kernel interrupt layer.


209670 03-Jul-2010 nwhitehorn

Add a missing conditional. We should not bind the PIC interrupt unless
the interrupt's PIC (a) exists and (b) is the root PIC.

Reported by: Andreas Tobler


209639 02-Jul-2010 marcel

Remove the unneeded header <machine/intr.h>.


209486 23-Jun-2010 nwhitehorn

Configure interrupts on SMP systems to be distributed among all online
CPUs by default, and provide a functional version of BUS_BIND_INTR().
While here, fix some potential concurrency problems in the interrupt
handling code.


209485 23-Jun-2010 marcel

In the attach method, refactor to take into account that
BUS_GET_RESOURCE_LIST() can return a NULL pointer -- and
will for MPC85xx kernels.


209299 18-Jun-2010 nwhitehorn

Change the default interrupt polarity on PowerPC systems from high to low.
On Apple systems at least, all the level interrupts are wired active low.
Before this change, our PIC programming only worked because Apple hardware
ignores the interrupt polarity bit on all interrupts except IRQ 0.


209298 18-Jun-2010 nwhitehorn

Provide for multiple, cascaded PICs on PowerPC systems, and extend the
OFW interrupt map interface to also return the device's interrupt parent.

MFC after: 8.1-RELEASE


208835 05-Jun-2010 nwhitehorn

Make sure that interrupt sense settings set after interrupts are enabled
are respected. This fixes loading the Apple onboard audio driver
(snd_ai2s) as a module after boot, which would previously cause a panic.

PR: powerpc/146888
MFC after: 5 days


208504 24-May-2010 alc

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


208149 16-May-2010 nwhitehorn

Add support for the U4 PCI-Express bridge chipset used in late-generation
Powermac G5 systems. MSI and several other things are not presently
supported.

The U3/U4 internal device support portions of this change were contributed
by Andreas Tobler.

MFC after: 1 week


207155 24-Apr-2010 alc

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


205527 23-Mar-2010 marcel

Enable power management for E500 cores. Use "doze" for now to make
sure the caches remain coherent. For single-core configurations and
with busdma changes we could eventually switch to "nap" and force
a D-cache invalidation as part of the DMA completion. To this end,
clear PSL_WE until after we handled the decrementer or external
interrupt as it tells us whether we just woke up or not.


204312 25-Feb-2010 nwhitehorn

Fix another bug involving /dev/mem and the OEA64 scratchpage. When
the scratchpage is updated, the PVO's physical address is updated as well.
This makes pmap_extract() begin returning non-zero values again, causing
the panic partially fixed in r204297. Fix this by excluding addresses
beyond virtual_end from consideration as KVA addresses, instead of allowing
addresses up to VM_MAX_KERNEL_ADDRESS.


204127 20-Feb-2010 nwhitehorn

Turn on experimental support for DEEPNAP on the 970MP.


201223 29-Dec-2009 rnoland

Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.

This replaces d_mmap() with the d_mmap2() implementation and also
changes the type of offset to vm_ooffset_t.

Purge d_mmap2().

All driver modules will need to be rebuilt since D_VERSION is also
bumped.

Reviewed by: jhb@
MFC after: Not in this lifetime...


199886 28-Nov-2009 nwhitehorn

Add a CPU features framework on PowerPC and simplify CPU setup a little
more. This provides three new sysctls to user space:
hw.cpu_features - A bitmask of available CPU features
hw.floatingpoint - Whether or not there is hardware FP support
hw.altivec - Whether or not Altivec is available

PR: powerpc/139154
MFC after: 10 days


199533 19-Nov-2009 raj

Fix cpuid output on E500 core.


198968 06-Nov-2009 marcel

Unbreak E500 builds. The inline assembly for the 970 CPUs
is invalid when compiling for BookE.


198678 30-Oct-2009 nwhitehorn

Make procstat -k work on PowerPC by avoiding mistakenly using signed
compares with a low address (0x1000) and a high address
(the KVA kernel stack).


198445 24-Oct-2009 nwhitehorn

Turn on NAP mode on G5 systems, and refactor the HID0 setup code a little.
This makes my G5 Xserve sound slightly less like it is filled with
howling banshees.


198378 23-Oct-2009 nwhitehorn

Add SMP support on U3-based G5 systems. This does not yet work perfectly:
at least on my Xserve, getting the decrementer and timebase on APs to tick
requires setting up a clock chip over I2C, which is not yet done.

While here, correct the 64-bit tlbie function to set the CPU to 64-bit
mode correctly.

Hardware donated by: grehan


198341 21-Oct-2009 marcel

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


197729 03-Oct-2009 bz

Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by: kib
MFC after: 1 month


196196 13-Aug-2009 attilio

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


194933 25-Jun-2009 jeff

- Add the right includes to use kmem_alloc(). This was broken by my
DPCPU commit.
Reported by: bz


194784 23-Jun-2009 jeff

Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.

Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas


194374 17-Jun-2009 nwhitehorn

Teach cpu_est_clockrate() about the G5's slightly different PMC. This
allows the boot messages to include the CPU speed and makes possible
the forthcoming cpufreq support for the PPC 970.


193909 10-Jun-2009 grehan

Get the gdb/psim emulator functioning again.

aim/machdep.c:
- the RI status register bit needs to be set when doing the mtmsrd 64-bit
instruction test
- psim doesn't implement the dcbz instruction so the run-time cacheline
test fails. Set the cachline size to 32 to avoid infinite loops in
future calls to __syncicache()

aim/platform_chrp.c:
- if after iterating through / and a name property of "cpus" still isn't
found, just search directly for '/cpus'.
- psim doesn't put a "reg" property on it's cpu nodes, so assume 0
since it is uniprocessor-only at this point

powerpc/openpic.c
- the number of CPUs reported is 1 too many on psim's openpic

Reviewed by: nwhitehorn
MFC after: 1 week (openpic part)


193578 06-Jun-2009 raj

Provide 64-bit big endian bus space operations for PowerPC. They are required
for the upcoming sec(4) driver.

Submitted by: Piotr Ziecik
Obtained from: Semihalf


193156 31-May-2009 nwhitehorn

Introduce support for cpufreq on PowerPC with the dynamic frequency
switching capabilities of the MPC7447A and MPC7448.


192533 21-May-2009 raj

Improve style(9), clean up.


192532 21-May-2009 raj

Initial support for SMP on PowerPC MPC85xx.

Tested with Freescale dual-core MPC8572DS development system.

Obtained from: Freescale, Semihalf


192109 14-May-2009 raj

PowerPC common SMP startup and time base rework.

- make mftb() shared, rewrite in C, provide complementary mttb()
- adjust SMP startup per the above, additional comments, minor naming
changes
- eliminate redundant TB defines, other minor cosmetics

Reviewed by: marcel, nwhitehorn
Obtained from: Freescale, Semihalf


192067 14-May-2009 nwhitehorn

Factor out platform dependent things unrelated to device drivers into a
new platform module. These are probed in early boot, and have the
responsibility of determining the layout of physical memory, determining
the CPU timebase frequency, and handling the zoo of SMP mechanisms
found on PowerPC.

Reviewed by: marcel, raj
Book-E parts by: raj


191450 24-Apr-2009 marcel

Add suppport for ISA and ISA interrupts to make the ATA
controller in the VIA southbridge functional in the CDS
(Configurable Development System) for MPC85XX.
The embedded USB controllers look operational but the
interrupt steering is still wrong.


191447 24-Apr-2009 marcel

Reimplement bs_be_rs_{1|2|4} and bs_le_rs_{1|2|4} by not
calling the inline functions in <machine/pio.h> and do
not add synchronization. Implement bs_gen_barrier() as
eieio and sync.


191380 22-Apr-2009 raj

Eliminate redundant setting of HID0_EMCP.


191307 20-Apr-2009 raj

Provide locking for PowerPC interrupt sources config.

Reviewed by: attilio


190708 05-Apr-2009 dchagin

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


190684 04-Apr-2009 marcel

PowerPC, meet kernel core dumps. The support is based
on a generic dumper that creates an ELF core file and
uses PMAP functions to scan and iterate over memory
chunks, as well as handle memory mappings used during
dumping.
the PMAP layer can choose to return physical memory
chunks or virtual memory chunks. For minidumps, the
chunks should be virtual.

The default MMU I/F implementation for the scan_md()
method returns NULL. Thus, when a PMAP implementation
does not implement the required methods, an empty
core file is created. Here, empty means having an ELF
header only.

Obtained from: Juniper Networks


190681 04-Apr-2009 nwhitehorn

Add support for 64-bit PowerPC CPUs operating in the 64-bit bridge mode
provided, for example, on the PowerPC 970 (G5), as well as on related CPUs
like the POWER3 and POWER4.

This also adds support for various built-in hardware found on Apple G5
hardware (e.g. the IBM CPC925 northbridge).

Reviewed by: grehan


189771 13-Mar-2009 dchagin

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


189100 27-Feb-2009 raj

Make Book-E debug register state part of the PCB context.

Previously, DBCR0 flags were set "globally", but this leads to problems
because Book-E fine grained debug settings work only in conjuction with the
debug master enable bit in MSR: in scenarios when the DBCR0 was set with
intention to debug one process, but another one with MSR[DE] set got
scheduled, the latter would immediately cause debug exceptions to occur upon
execution of its own code instructions (and not the one intended for
debugging).

To avoid such problems and properly handle debugging context, DBCR0 state
should be managed individually per process.

Submitted by: Grzegorz Bernacki gjb ! semihalf dot com
Reviewed by: marcel


188860 20-Feb-2009 nwhitehorn

Add Altivec support for supported CPUs. This is derived from the FPU support
code, and also reducing the size of trapcode to fit inside a 32 byte handler
slot.

Reviewed by: grehan
MFC after: 2 weeks


187691 25-Jan-2009 nwhitehorn

Fix a race condition where interrupts set up after boot could be enabled in
the PIC before the interrupt handler was set. If the interrupt triggered in
that window, then the interrupt vector would be disabled.

Reported by: Marco Trillo


187149 13-Jan-2009 raj

Rework BookE pmap towards multi-core support.

o Eliminate tlb0[] (a s/w copy of TLB0)
- The table contents cannot be maintained reliably in multiple MMU
environments, where asynchronous events (invalidations from other cores)
can change our local TLB0 contents underneath.
- Simplify and optimize TLB flushing: system wide invalidations are
performed using tlbivax instruction (propagates to other cores), for
local MMU invalidations a new optimized routine (assembly) is introduced.

o Improve and simplify TID allocation and management.
- Let each core keep track of its TID allocations.
- Simplify TID recycling, eliminate dead code.
- Drop the now unused powerpc/booke/support.S file.

o Improve page tables management logic.

o Simplify TLB1 manipulation routines.

o Other improvements and polishing.

Obtained from: Freescale, Semihalf


186347 20-Dec-2008 nwhitehorn

Modularize the Open Firmware client interface to allow run-time switching
of OFW access semantics, in order to allow future support for real-mode
OF access and flattened device frees. OF client interface modules are
implemented using KOBJ, in a similar way to the PPC PMAP modules.

Because we need Open Firmware to be available before mutexes can be used on
sparc64, changes are also included to allow KOBJ to be used very early in
the boot process by only using the mutex once we know it has been initialized.

Reviewed by: marius, grehan


185169 22-Nov-2008 kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


183439 28-Sep-2008 marius

Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by: jhb
Reviewed by: arch, grehan, jhb


183437 28-Sep-2008 nwhitehorn

Unbreak support for G4s without an L3 cache. L3 cache support was introduced
with, and limited to, the Motorola/Freescale 745x family.

Reported by: Marco Trillo


183397 27-Sep-2008 ed

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


183322 24-Sep-2008 kib

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


183319 24-Sep-2008 nwhitehorn

Allow the cacheline size on PowerPC to be set at runtime. This is essential for
supporting 64-bit CPUs, which often have 128-byte cache lines instead of the
standard 32.


183262 22-Sep-2008 nwhitehorn

Unbreak G3 support. G3 processors don't have an L3 cache, so we shouldn't try to program it.

Approved by: marcel (mentor)


183083 16-Sep-2008 marcel

o Synchronize the APs timebase and decrementer values with the BSP.
o Don't set/get the PIR register. It's CPU dependent.
o Also initialize pcpup->pc_curpcb, in case it's dereferenced.


183060 16-Sep-2008 marcel

Remove the tracing from the AP startup. The AP is known
to start and the tracing can interfere with AP startup.
Instead, use the available space in the reset vector
for the initial stack.


183030 15-Sep-2008 marcel

Dont worry about PSL_RI (restartable interrupt indicator) in
common PowerPC code when all we want to achieve is to enable
external interrupts. We can set PSL_RI at any time before we
allow interrupts and/or exceptions, so move it to the AIM
specific initialization and do it when we also set PSL_ME
(machine check enable).


183029 15-Sep-2008 marcel

Rename cpu_config_l2cr() to cpu_print_cacheinfo(). We're not
configuring the L2 cache on the BSP. Nor the L3 cache. We
merely print the settings.

Save the L2 and L3 cache configuration in global values so
that we know how to configure the cache on APs.


183028 14-Sep-2008 marcel

Remove debugging code.


182571 31-Aug-2008 marcel

Trace all PMAP calls using KTR_PMAP.


182487 30-Aug-2008 marcel

In db_show_mdpcpu(), print MD fields.


179081 18-May-2008 alc

Retire pmap_addr_hint(). It is no longer used.


179049 16-May-2008 attilio

Removed unused assembly offsets for structures digging.


178893 09-May-2008 alc

Add a stub for pmap_align_superpage() on machines that don't (yet)
implement pmap-level support for superpages.


178628 27-Apr-2008 marcel

MFp4: SMP support


178092 11-Apr-2008 jeff

- Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.

Reviewed by: jhb
Sponsored by: Nokia


177940 05-Apr-2008 jhb

Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).

Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64


177325 17-Mar-2008 jhb

Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.

Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}


177288 17-Mar-2008 marcel

Make remote GDB work for AIM processors. For BookE, the kernel
will have a special section, named .PPC.EMB.apuinfo, which will
tell GDB that a BookE processor is targeted and which will
result in GDB using a different register definition. In order
to support remote GDB for BookE, we need the GDB stub in the
kernel look for that section and use the BookE definitions.


177181 14-Mar-2008 jhb

Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:

intr_bind(rman_get_start(irq_res), cpu);

however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.

Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}


177068 11-Mar-2008 marcel

In intr_lookup(), when adding an IRQ to powerpc_intrs[], also
set a default name. If the IRQ is added as a consequence of
configurating the IRQ without there ever being a handler
assigned to it, we will not have a name. This breaks the
fragile intrcnt/intrnames logic.


176919 07-Mar-2008 marcel

For AIM, have cpu_idle() set MSR_POW when the powerpc_pow_enabled
variable is set. On my Mac Mini this puts the CPU in NAP mode when
the kernel is idle and, any technical or environmental reasons
aside, avoids that I have to listen to the fan all day :-)


176918 07-Mar-2008 marcel

Add support for the BUS_CONFIG_INTR() method to the platform and to
openpic(4). Make use of it in ocpbus(4). On the MPC85xxCDS, IRQ0:4
are active-low.


176777 03-Mar-2008 raj

Import the omitted gdb_machdep.c for PowerPC kernel.

Approved by: cognet (mentor)
MFp4: e500


176771 03-Mar-2008 raj

Initial support for Freescale PowerQUICC III MPC85xx system-on-chip family.

The PQ3 is a high performance integrated communications processing system
based on the e500 core, which is an embedded RISC processor that implements
the 32-bit Book E definition of the PowerPC architecture. For details refer
to: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MPC8555E

This port was tested and successfully run on the following members of the PQ3
family: MPC8533, MPC8541, MPC8548, MPC8555.

The following major integrated peripherals are supported:

* On-chip peripherals bus
* OpenPIC interrupt controller
* UART
* Ethernet (TSEC)
* Host/PCI bridge
* QUICC engine (SCC functionality)

This commit brings the main functionality and will be followed by individual
drivers that are logically separate from this base.

Approved by: cognet (mentor)
Obtained from: Juniper, Semihalf
MFp4: e500


176742 02-Mar-2008 raj

Unify and generalize PowerPC headers, adjust AIM code accordingly.

Rework of this area is a pre-requirement for importing e500 support (and
other PowerPC core variations in the future). Mainly the following
headers are refactored so that we can cover for low-level differences between
various machines within PowerPC architecture:

<machine/pcpu.h>
<machine/pcb.h>
<machine/kdb.h>
<machine/hid.h>
<machine/frame.h>

Areas which use the above are adjusted and cleaned up.

Credits for this rework go to marcel@

Approved by: cognet (mentor)
MFp4: e500


176734 02-Mar-2008 jeff

- Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.

Sponsored by: Nokia


176617 27-Feb-2008 marcel

Avoid hardcoding the kernel link address in the linker script.
Use KERNBASE instead. While here, move the text sections
forward to the beginning of the text segment.


176534 25-Feb-2008 raj

Teach PowerPC CPU identification routines to recognize e500 cores. Fix style
issues in this area.

Approved by: cognet (mentor)
MFp4: e500


176208 12-Feb-2008 marcel

Add PIC support for IPIs. When registering an interrupt handler,
the PIC also informs the platform at which IRQ level it can start
assigning IPIs, since this can depend on the number of IRQs
supported for external interrupts.


175067 03-Jan-2008 alc

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


174782 19-Dec-2007 marcel

Redefine bus_space_tag_t on PowerPC from a 32-bit integral to
a pointer to struct bus_space. The structure contains function
pointers that do the actual bus space access.

The reason for this change is that previously all bus space
accesses were little endian (i.e. had an explicit byte-swap
for multi-byte accesses), because all busses on Macs are little
endian.
The upcoming support for Book E, and in particular the E500
core, requires support for big-endian busses because all
embedded peripherals are in the native byte-order.

With this change, there's no distinction between I/O port
space and memory mapped I/O. PowerPC doesn't have I/O port
space. Busses assign tags based on the byte-order only.
For that purpose, two global structures exist (bs_be_tag and
bs_le_tag), of which the address can be taken to get a valid
tag.

Obtained from: Juniper, Semihalf


174601 14-Dec-2007 marcel

This file was repocopied to src/sys/powerpc/aim, where it will
live on -- an afterlife.


174593 14-Dec-2007 marcel

Remove unused file.


174195 02-Dec-2007 rwatson

Break out stack(9) from ddb(4):

- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)


173799 21-Nov-2007 scottl

Extend critical section coverage in the low-level interrupt handlers to
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.

Reviewed by: jhb, jeff, silby


173708 17-Nov-2007 alc

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


173615 14-Nov-2007 marcel

o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().

i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().

PR: ia64/118024


173601 14-Nov-2007 julian

A bunch more files that should probably print out a thread name
instead of a process name.


173600 14-Nov-2007 julian

generally we are interested in what thread did something as
opposed to what process. Since threads by default have teh name of the
process unless over-written with more useful information, just print the
thread name instead.


173584 13-Nov-2007 grehan

Split decr_init() into two, with the section that reads the timebase
frequency from OpenFirmware moved out and into a routine that is called
from cpu_startup().

This allows correct reporting of the CPU clockspeed when printing out
CPU information at boot time.

Reported by: numerous
Reviewed by: marcel
MFC after: 1 day


173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


172887 23-Oct-2007 grehan

Cut over to ULE on PowerPC

kern/sched_ule.c - Add __powerpc__ to the list of supported architectures

powerpc/conf/GENERIC - Swap SCHED_4BSD with SCHED_ULE

powerpc/powerpc/genassym.c - Export TD_LOCK field of thread struct

powerpc/powerpc/swtch.S - Handle new 3rd parameter to cpu_switch() by
updating the old thread's lock. Note: uniprocessor-only, will require
modification for MP support.

powerpc/powerpc/vm_machdep.c - Set 3rd param of cpu_switch to mutex of
old thread's lock, making the call a no-op.

Reviewed by: marcel, jeffr (slightly older version)


172189 15-Sep-2007 alc

It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)


171805 11-Aug-2007 marcel

Revamp the interrupt handling in support of INTR_FILTER. This includes:
o Revamp the PIC I/F to only abstract the PIC hardware. The
resource handling has been moved to nexus, where it belongs.
o Include EOI and MASK+EOI methods to the PIC I/F in support of
INTR_FILTER.
o With the allocation of interrupt resources and setup of
interrupt handlers in the common platform code we can delay
talking to the PIC hardware after enumeration of all devices.
Introduce a call to powerpc_intr_enable() in configure_final()
to achieve that and have powerpc_setup_intr() only program the
PIC when !cold.
o As a consequence of the above, remove all early_attach() glue
from the OpenPIC and Heathrow PIC drivers and have them
register themselves when they're found during enumeration.
o Decouple the interrupt vector from the interrupt request line.
Allocate vectors increasingly so that they can be used for
the intrcnt index as well. Extend the Heathrow PIC driver to
translate between IRQ and vector. The OpenPIC driver already
has the support for vectors in hardware.

Approved by: re (blanket)


171787 08-Aug-2007 marcel

Re-enable external interrupts for faults, traps and syscalls.

Approved by: re (blanket)


171785 07-Aug-2007 marcel

Eliminate <machine/interruptvar.h> as it has only a single
prototype. In the future that prototype will not be needed
at all anyway, but for now it's moved to intr_machdep.h.

Approved by: re (blanket)


171783 07-Aug-2007 marcel

Remove redundant prototype.

Approved by: re (blanket)


171670 31-Jul-2007 marcel

Fix backward compatibility of the "old" (i.e. FreeBSD6) lseek
syscall. It was broken when a new lseek syscall was introduced.
The problem is that we need to swap the 32-bit td_retval values
for the __syscall indirect syscall when the actual syscall has
a 32-bit return value. Hence, we need to exclude lseek(2). And
this means the "old" lseek(2) as well -- which we didn't.

Based on a patch from: grehan@
Approved by: re (rwatson)


170979 22-Jun-2007 yongari

Reimplement bus_dmamap_load with bus_dmamap_load_buffer.
Previously it didn't honor parent dma tag's restrictions such that
an invalid dma segment could be passed to device. The driver for the
device may panic in sanity check routine for the dma segment or may
produce unexpected results. I have no idea how it could ever have
worked before.

Reviewed by: grehan
Tested by: gad
Approved by: re (hrs)


170978 22-Jun-2007 yongari

Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by: scottl (long a ago)
Approved by: re (hrs)


170473 09-Jun-2007 marcel

Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.


170421 08-Jun-2007 marcel

Sync with other platforms: add kluge to use contigmalloc when the
alignment is larger than the size and print a diagnostic when we
didn't satisfy the alignment.


170363 06-Jun-2007 grehan

Fix the compile. Band-aid until it is worked out how to use the context
switch api on ppc.


170305 04-Jun-2007 jeff

- Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170291 04-Jun-2007 attilio

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


170162 31-May-2007 piso

In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb


170036 27-May-2007 marcel

Don't initialize the decrementer before initclocks() is called.
Use cpu_initclocks() for that as it assures that relevant locks
have been initialized.


169846 22-May-2007 kan

Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.


169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


168885 20-Apr-2007 grehan

Add ofw bus methods to the ppc nexus driver. This will be used in future
EFIKA platform support.

PR: 111522
Submitted by: Andrew Turner, andrew at fubar geek nz


168193 01-Apr-2007 marcel

Remove unused file.


167352 09-Mar-2007 mohans

Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.


167289 07-Mar-2007 piso

Update openpic to support the new bus_setup_intr() syntax.

Reviewed by: marcel


166901 23-Feb-2007 piso

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


166812 18-Feb-2007 marcel

The table of known CPU models ends with an entry that has a version
of 0, not with an entry that has an empty CPU name.

Submitted by: Andrew Turner (andrew@fubar.geek.nz)


166659 12-Feb-2007 kevlo

Remove the cast to caddr_t for sfp, they're not needed.

Reviewed by: marcel


166011 14-Jan-2007 marcel

Propagate the CPU model to the hw.model sysctl.


165608 28-Dec-2006 marcel

In cpu_reset(), call OF_reboot() instead of OF_exit(). The latter
doesn't do a reboot and has been observed to reset the NVRAM to its
default values.


165362 20-Dec-2006 grehan

Remove bogus increment of re-hashed PTEG index. This snuck in with r1.12 of
pmap.c, and is potentially the cause of hangs reported on machines with a
small amount of memory. On machines with sufficient RAM, and without a lot
of processes running, this situation would probably never occur.

Testing is still incomplete, but it is obviously wrong so remove the
offending code now.

The issue of what to do when both the primary and secondary hash overflow
is still open.

Reported by: Dan Kresja at windriver dot com, via alc


165151 13-Dec-2006 marcel

Implement OF_decode_addr(). This makes uart(4) work as a serial
console on a Xserve G4.


164987 07-Dec-2006 marcel

Add support for multiple FAST handlers.

Reviewed & tested by: grehan@


164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


164895 05-Dec-2006 grehan

Fix gdb issue where the i-cache was not being updated when a breakpoint
was written into a user's address space. The fix is to modify uiomove_fromphys
to sync the icache when an executable user-space page is written into.

Alan Cox suggested that there should probably be a higher-level interface
to this in the ptrace code, but agreed that this is an OK short-term solution.

Files changed:

pmap.h - declaration of pmap_page_executable()
pmap_dispatch.c - pass through the page_executable call to the mmu object
mmu_oea.c - implement the page_executable method by examining the PTE_EXEC
field in the vm_page_t
uio_machdep.c - in uiomove_fromphys(), if the op was a UIO_WRITE to user-space,
and if the page is executable, sync the icache since this is at the least
a breakpoint-write from gdb.

Reported by: marcel
Tested by: marcel, grehan on g3+g4
Discussed with: alc
MFC after: 2 weeks


164765 30-Nov-2006 grehan

Don't use vm_page_flag_set() if installing bootstrap page-table entries
since the vm page mutex's aren't yet initialized. Fixes boot-time panic.

Reported by: Dario Freni saturnero at freesbie dot org


164760 30-Nov-2006 jb

Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.


164229 12-Nov-2006 alc

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


164198 11-Nov-2006 alc

Eliminate unused global variables.


163858 01-Nov-2006 jb

Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by: scottl, jhb
MFC after: 2 weeks


163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


163488 18-Oct-2006 grehan

Fix remaining compile error.


163472 18-Oct-2006 davidxu

Attempt to fix compiling problem.

Noticed by: tinderbox


163449 17-Oct-2006 davidxu

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


163192 10-Oct-2006 bde

The powerpc and sparc64 MD `reboot' commands should never have existed
since they just duplicated the MI `reset' command. Instead of removing
them, make `reboot' an MI alias for `reboot' since this gives a better
way of killing the `r' alias for `reset'. Remove the `registers' command
that was used to kill the alias.

Turn the powerpc and sparc64 MD `halt' command into an MI command.

A copy of sparc64/db_interface.c grew in sun4v just after I found the
extra reboot commands. It has not been changed, and is now not
identical. Duplicated commands come out duplicated in ddb's online
help, but cause large problems when used (e.g., on i386's with 2 halt's
and an hwatch, typing h doesn' give the expected message about an
ambiguous command, but hangs like the halt command or a looping parseri
would).


163021 05-Oct-2006 grehan

Catch up with recent clock modifications:
- include <sys/clock.h> for inittodr prototype
- remove now-conflicting SECDAY definition that is in <sys/clock.h>


162961 02-Oct-2006 phk

remove orphaned sysctl_machdep_adjkerntz()


162958 02-Oct-2006 phk

Second part of a little cleanup in the calendar/timezone/RTC handling.

Split subr_clock.c in two parts (by repo-copy):
subr_clock.c contains generic RTC and calendaric stuff. etc.
subr_rtc.c contains the newbus'ified RTC interface.

Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock}
sysctls and associated variables into subr_clock.c. They are
not machine dependent and we have generic code that relies on being
present so they are not even optional.


162361 16-Sep-2006 rwatson

Add audit hooks for ppc, ia64 system call paths.

Reviewed by: marcel (ia64)
Obtained from: TrustedBSD Project
MFC after: 3 days


161797 01-Sep-2006 marcel

In cpu_set_user_tls(), properly set the thread pointer. It is 0x7000
bytes after the end of the TCB, which is itself 8 bytes.


161675 28-Aug-2006 davidxu

Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.


161588 24-Aug-2006 marcel

Add skeletal support for GDB. In particular gdb_cpu_getreg() needs
implementing to make GDB support usable.


160959 03-Aug-2006 sobomax

Use proper trap code for the EXC_ALI traps. This fixes SIGBUS during
unaligned 64-bits load/stores.

MFC after: 2 weeks


160932 02-Aug-2006 jhb

Don't ignore errors from intr_event_add_handler().

CID: 1516
Found by: Coverity Prevent (tm)


160889 01-Aug-2006 alc

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


160801 28-Jul-2006 jhb

Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.


160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


160773 27-Jul-2006 jhb

Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
section.
- Add a new explicit check before userret() to make sure we don't return
with any locks held. The advantage here is that we can include the
syscall number and name in syscall() whereas that info is not available
in userret().
- Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by
the more general checks just added.

MFC after: 2 weeks


160764 27-Jul-2006 jhb

Add missing ptrace(2) system-call stops to various syscall()
implementations.

MFC after: 1 week


160714 26-Jul-2006 marcel

o Move the prototype of mem_valid() from ofw_machdep.h to md_var.h.
This avoids that mem.c has to include ofw_machdep.h, including
all OFW related headers.
o Provide a stub for OF_decode_addr(), which is used by low-level
console drivers to obtain a tag and handle given a OFW phandle.
This is different from sparc64, where a fake bus tag needs to be
created explicitly.


160713 26-Jul-2006 marcel

Include needed clock.h.


160312 12-Jul-2006 jhb

Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.


160235 10-Jul-2006 alc

Add synchronization to moea_zero_page() and moea_zero_page_area().

Remove the acquisition and release of Giant from moea_zero_page_idle().

Tested by: grehan@


160068 01-Jul-2006 alc

Eliminate the acquisition and release of Giant from moea_extract_and_hold()
and moea_protect().

Tested by: grehan@ and rink@


159928 25-Jun-2006 alc

Synchronize accesses to the PTEG table.

Add many lock assertions.

Tested by: grehan@


159705 17-Jun-2006 rink

Prevent 'mutex not owned' panic on boot if INVARIANTS is in the kernel. This
makes the GENERIC kernel boot on ppc.

Reviewed by: grehan
Approved by: imp (mentor)
MFC after: 1 week

dCVS: ----------------------------------------------------------------------


159627 15-Jun-2006 ups

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


159324 06-Jun-2006 alc

Correct a typo in the previous revision.


159323 06-Jun-2006 alc

Add a stub for pmap_enter_object().


159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


158449 11-May-2006 phk

Remove straggling reference to CPU_ macros


157895 20-Apr-2006 imp

Set the rid for any resource obtained from rman_resource_reserve.


157443 03-Apr-2006 peter

Remove the unused sva and eva arguments from pmap_remove_pages().


155455 08-Feb-2006 phk

Simplify system time accounting for profiling.

Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.


154091 07-Jan-2006 grehan

Set the siginfo si_addr field, and also the mysterious 3rd parameter
to old-style signals, to be the DAR register for DSI miss exceptions.
This gives the address of the access rather than the instruction
address. The behaviour is now the same as on i386.

Found by: libsigsegv tests


153940 31-Dec-2005 netchild

MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)

MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)


153741 26-Dec-2005 sobomax

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


153685 23-Dec-2005 grehan

Mark the return address of the call to ast() in the generic trap
handling code so the stack trace unwinders don't start trying to
go into user-space.

Found by trying to create core dumps with a KTR_COMPILE/KTR_GEOM
kernel, which results in a stack_save() call in the ast() coredump
path - this created a panic, and then calling 'trace' in ddb resulted
in the black screen of death after printing out most of the backtrace.


153666 22-Dec-2005 jhb

Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)


153488 16-Dec-2005 jhb

GC some unused frame types.

Approved by: grehan


152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


152304 11-Nov-2005 grehan

Fix compile warning: pmap_bootstrap is now declared extern in pmap.h,
remove redundant declaration.


152233 09-Nov-2005 grehan

No longer needed: replaced by mmu_if.m/pmap_dispatch.c/mmu_oea.c


152228 09-Nov-2005 grehan

Apply r1.103 to correct place.

pmap.c on PowerPC is now a combo of mmu_if.m, pmap_dispatch.c
and mmu_oea.c

(I forgot to delete pmap.c from CVS in last jumbo commit)


152224 09-Nov-2005 alc

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


152180 08-Nov-2005 grehan

Name change from pmap_* to moea_* to fit into the new order of
mmu implementation.

This code handles the 32-bit 'OEA' MMU found on G2/G3/G4 PPC cores.


152179 08-Nov-2005 grehan

Insert a layer of indirection to the pmap code, using a kobj for
the interface. This allows run-time selection of MMU code, based
on CPU-type detection, or tunable-overrides when testing new code.

Pre-requisite for G5 support.

conf/files.powerpc
- remove pmap.c
- add mmu_if.h, mmu_oea.c, pmap_dispatch.c

powerpc/include/mmuvar.h
- definitions for MMU implementations

powerpc/include/pmap.h
- remove pmap_pte_spill declaration
- add pmap_mmu_install declaration
- size the phys_avail array
- pmap_bootstrapped is now global-scope

powerpc/powerpc/machdep.c
- call kobj_machdep_init early in the boot sequence to allow
kobj usage prior to SI_SUB_LOCK
- install the OEA pmap code. This will be moved to CPU-specific
init code in the future.

powerpc/powerpc/mmu_if.m
- Kobj MMU interface definitions

powerpc/powerpc/pmap_dispatch.c
- central dispatch for pmap calls
- contains the global mmu kobj and the routine to locate the
the mmu implementation and init the kobj


152147 07-Nov-2005 grehan

Finally (!?) get to the bottom of the mysterious G3 boot-time panics.
After a number of tests using nop's to change the alignment, it was
confirmed that the mtibat instructions should be cache-aligned.
FreeScale app note AN2540 indicates that the isync before and after
the mtdbat is the right thing to do, but sync/isync isn't required
before the mtibat so it has been removed.

Fix by using a ".balign 32" to pull the code in question to the correct
alignment.

MFC after: 3 days


151891 30-Oct-2005 grehan

Copy SPRG0-3 registers at boot-time and restore when calling into
OpenFirmware. FreeBSD/ppc uses SPRG0 as the per-cpu data area pointer,
and SPRG1-3 as temporary registers during exception handling. There
have been a few instances where OpenFirmware does require these to
be part of it's context, such as cd-booting an eMac.

reported by: many
MFC after: 3 days


151877 30-Oct-2005 grehan

In stack_save, stop when a trap-frame is encountered. This prevents
trying to access user-space stack addresses when a user fault
is encountered, as occurs when GEOM KTR code is handling a page fault
and is using stack_save() to capture a trace for debug purposes.

It may be possible to walk beyond the trap-frame if it is a kernel fault,
as db_backtrace() does, but I don't think that complexity is needed in
this routine.

MFC after: 3 days


151658 25-Oct-2005 jhb

Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)

Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


149958 10-Sep-2005 grehan

Fix boot-time hang/panic on G3 systems when modifying IBAT0 in
pmap_bootstrap by using the sync;isync big hammer to make sure
all prior operations have completed.

Reported by: Nathan Whitehorn <nathan at uchicago edu>
MFC after: 2 days


149925 10-Sep-2005 marcel

Move the prototypes of db_md_set_watchpoint(), db_md_clr_watchpoint()
and db_md_list_watchpoints() to ddb/ddb.h.


149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


148666 03-Aug-2005 jeff

- Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
Concept code from: Neal Fachan <neal@isilon.com>


148568 30-Jul-2005 grehan

Temporary band-aid to fix hang when a process exec's Altivec instructions.

trap_subr.S: declare a stub for the a-unavailable trap
that does an absolute jump to the vector-assist trap.
This is due to the fact that the vec-unavail trap
doesn't start at a 256-byte boundary, so the trick of
masking the bottom 8 bits of the link register to identify
the interrupt doesn't work, so let the vec-assist
case handle Altivec-disabled for the time being.

Note that this will be fixed in the future with a much
smaller vector code-stub (< 16 bytes) that will allow
use of strange vector offsets that are also present in
4xx processors, and also allow smaller differences in
vector codepaths on the G5.

trap.c: Treat altivec-unavailable/assist process traps as SIGILL.
Not quite correct, since altivec-assist should really be a panic,
but it is fine for the moment due to the above measure.

machdep.c Install the stub code for the altivec-unavailable trap, and
the standard trap code at the altivec-assist.

Reported by: Andreas Tobler <toa at pop agri ch>
MFC after: 3 days


147889 10-Jul-2005 davidxu

Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)


147851 09-Jul-2005 grehan

The nsegs parameter to bus_dmamap_load_mbuf_sg is not required to
be set to 0 on input. This caused a panic in an an MP test version
of the GEM driver from Marius, and from inspection of other PCI
drivers, the same problem would happen there.
Fix by explicitly setting to 0.

Approved by: re


147217 10-Jun-2005 alc

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


146794 29-May-2005 marcel

Create nexus in configure_first() instead of in configure(). This
makes sure that sysinit tasks that run after configure_first(),
but before configure() have a nexus to hang devices off.


146792 29-May-2005 marcel

Call cninit_finish() from configure_final().


145433 23-Apr-2005 davidxu

Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.


145343 20-Apr-2005 ps

Don't enter the debugger if KDB_UNATTENDED is set or if
debug.debugger_on_panic=0.

MFC after: 2 weeks


144971 12-Apr-2005 jhb

Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.


144807 08-Apr-2005 jhb

Change an instance of md_savecrit to md_saved_msr that I missed.


144637 04-Apr-2005 jhb

Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more


143784 18-Mar-2005 grehan

Split configure into 3 steps ala sparc64

Obtained from: iedowse, sparc64


143634 15-Mar-2005 grehan

Prepend underscore to bus_dmamap_{unload|sync} in line with
recent busdma changes.


143633 15-Mar-2005 grehan

Include <sys/signalvar.h> for trapsignal prototype.


143234 07-Mar-2005 grehan

Replaced previous hw.physmem extraction with des's mods to
getenv_ulong() - much simpler.

Pointed out by: des


143201 07-Mar-2005 grehan

physmem is a much better indicator for 'real' memory on PPC than Maxmem
since there are often significant holes in the memory map due to the
kernel, loader and OFW data structures not being included: Maxmem is
the highest available, so can be misleading.


143200 07-Mar-2005 grehan

Allow user to undersize memory with hw.physmem loader variable.

Obtained from: i386/machdep.c:getmemsize()


143063 02-Mar-2005 joerg

netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.

Submitted by: netchild
Reviewed by: various developers on arch@, some time ago


142882 01-Mar-2005 grehan

Catch up with "physical memory" sysctl change.
(MFi386: rev 1.608)


142771 28-Feb-2005 grehan

Catch the case where the idle loop is entered with interrupts disabled,
causing a hard hang.


142764 28-Feb-2005 grehan

- switch pcpu to a struct declaration ala amd64. It may be more efficient to
cache-align this struct, but that's a topic for a far-in-the-future
commit.
- eliminate commented-out reference to a non-existent pcpu field.


142756 28-Feb-2005 grehan

Correctly set kernelname for kern.bootfile sysctl

Noticed by: gad
Code stolen from: sparc64


142416 25-Feb-2005 grehan

Add PVO_FAKE flag to pvo entries for PG_FICTITIOUS mappings, to
avoid trying to reverse-map a device physical address to the
vm_page array and walking into non-existent vm weeds.

found by: Xorg server exiting


141378 06-Feb-2005 njl

Finish the job of sorting all includes and fix the build by including
malloc.h before proc.h on sparc64. Noticed by das@

Compiled on: alpha, amd64, i386, pc98, sparc64


141249 04-Feb-2005 njl

Sort includes a little so that bus.h comes before cpu.h (for device_t).


141237 04-Feb-2005 njl

Add an implementation of cpu_est_clockrate(9). This function estimates the
current clock frequency for the given CPU id in units of Hz.


141229 04-Feb-2005 grehan

- recognize 7447A/7448 CPUs (used in miniMacs)
- enable 745x branch caches. Already enabled by OpenFirmware
on Macs, but reduces NetBSD diffs and usable by embedded folk.

Obtained from: NetBSD


141227 04-Feb-2005 grehan

- add wall_cmos_clock and adjkerntz variables, required by msdosfs
- support adjkerntz sysctl to silence NTP, though it's a null
implementation at the moment.


140538 21-Jan-2005 grehan

Fix (accidental?) lock order reversal in pmap_remove. Found when
a process that has mmap'd device mem exits.


140314 15-Jan-2005 scottl

Add bus_dmamap_load_mbuf_sg() to powerpc.


140256 14-Jan-2005 jhb

- Remove some OBE comments regarding cpu_exit(). cpu_exit() is no longer
the last action of kern_exit(). Instead, it is a MD callout to cleanup
per-process state during exit.
- Add notes of concern to Alpha and ia64 about the possible need to drop
fp state in cpu_thread_exit() rather than in cpu_exit() since it is
per-thread state rather than per-process.


139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


139401 29-Dec-2004 grehan

Correctly initialise the 2nd kernel segment, and don't
forget to actually install it in the segment register.
This may fix some of the weird panics seen when kernel VM
is heavily used.


139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


137912 20-Nov-2004 das

U areas are going away, so don't allocate one for process 0.

Reviewed by: arch@


137906 20-Nov-2004 das

user.h is included only to get pcb.h, so use the latter directly instead.


137117 01-Nov-2004 jhb

- Change the ddb paging "support" to use a variable (db_lines_per_page) to
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.

MFC after: 1 month
Inspired by: kris


135529 20-Sep-2004 jhb

- Add support for "paging" in stack trace output. That is, when you do
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.

MFC after: 1 month
Tested on: i386, alpha, sparc64


135172 13-Sep-2004 alc

Lock the kernel pmap in pmap_kenter().

Tested by: gallatin@


134934 08-Sep-2004 scottl

Fix a problem with tag->boundary inheritence that has existed since day one
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.

This is a MT5 candidate.

Reviewed by: marcel
Submitted by: sos (in part)


134791 05-Sep-2004 julian

Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week


134571 31-Aug-2004 julian

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


134568 31-Aug-2004 julian

Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after: 2 days


134535 30-Aug-2004 alc

- Introduce a lock for synchronizing access to the pvo and pteg tables.
- In pmap_enter(), only the acquisition and release of the page queues
lock needs to check the bootstrap flag.

Tested by: gallatin@


134453 28-Aug-2004 alc

Eliminate unnecessary indirection.


134329 26-Aug-2004 alc

Add pmap locking to many of the functions.

Many thanks to Andrew Gallatin for resolving a powerpc-specific
initialization problem in my original patch.

Tested by: gallatin@


133862 16-Aug-2004 marius

Instead of "OpenFirmware", "openfirmware", etc. use the official spelling
"Open Firmware" from IEEE 1275 and OpenFirmware.org (no pun intended).

Ok'ed by: tmm


133855 16-Aug-2004 ssouhlal

Add /dev/mem and /dev/kmem to powerpc.

Approved by: grehan (mentor)


133521 11-Aug-2004 marius

- Use the rman_get_* functions instead of reaching into struct resource.
- Remove __RMAN_RESORUCE_VISIBLE again. It's no longer required either
because of the above change or because struct rman is no longer hidden.

Reviewed by: grehan
Tested by: cross-compile on i386


133464 11-Aug-2004 marcel

Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc


133166 05-Aug-2004 grehan

In pmap_page_protect, clear the vm page's PG_WRITEABLE flag if
downgrading to read-only. Found by triggering the KASSERT in
vm_pageout_flush().


133143 04-Aug-2004 alc

- Push down the acquisition and release of Giant into pmap_enter_quick()
on those architectures without pmap locking.
- Eliminate the acquisition and release of Giant in vm_map_pmap_enter().


132994 02-Aug-2004 grehan

Kernel traps were not being passed to trap_fatal in some
circumstances.

Spotted by: gallatin


132899 30-Jul-2004 alc

- Push down the acquisition and release of Giant into pmap_protect() on
those architectures without pmap locking.
- Eliminate the acquisition and release of Giant from vm_map_protect().

(Translation: mprotect(2) runs to completion without touching Giant on
alpha, amd64, i386 and ia64.)


132836 29-Jul-2004 ssouhlal

Implement MD parts of ptrace.

Approved by: grehan (mentor)


132688 27-Jul-2004 grehan

Make sure icache is sync'd whenever memory is touched. It may
be more optimal to override the BKPT_WRITE macro, but DDB performance
isn't really a goal at this stage...


132683 27-Jul-2004 grehan

Save DAR/DSISR in DDB regsave area when stack overflow detected. It's
hard to work out where the problem was without these.


132681 27-Jul-2004 grehan

Improve boot-time debugging with DDB by extracting the ksym start/end
values from the loader.


132666 26-Jul-2004 alc

Implement the protection check required by the pmap_extract_and_hold()
specification.

Reviewed and tested by: grehan@


132571 23-Jul-2004 grehan

Detect kernel stack excursion into guard pages. Drop into KDB
with a wired stack if this is found.

Mostly obtained from: NetBSD


132570 23-Jul-2004 grehan

Bring KDB stack size into line with thread stack size (4 pages).


132569 23-Jul-2004 grehan

Allow DSI exceptions to invoke DDB.


132562 23-Jul-2004 grehan

The ADDR16 relocations were assuming that non-local symbols had an
addend of 0. This isn't correct, and was quite easy to break by
referring to the address of an element within a structure.

However, fixing this exposed the fact that symbol lookups for
local variables were returning the base of the section they
were contained in. This case is detected by comparing the return
value from elf_lookup() to the relocbase+addend value: if it is
lesser, but greater than relocbase, then relocbase+addend is
taken to be the authoritative value.

bug reported by: gallatin


132520 22-Jul-2004 grehan

Update the callframe structure to leave space for the frame pointer
and saved link register as per the ABI call sequence. Update code
that uses this (fork_trampoline etc) to use the correct genassym'd
offsets.

This fixes the 'invalid LR' message when backtracing kernel
threads in DDB.


132482 21-Jul-2004 marcel

Unify db_stack_trace_cmd(). All it did was look up the thread given
the thread ID and call db_trace_thread().
Since arm has all the logic in db_stack_trace_cmd(), rename the
new DB_COMMAND function to db_stack_trace to avoid conflicts on
arm.
While here, have db_stack_trace parse its own arguments so that
we can use a more natural radix for IDs. If the ID is not a thread
ID, or more precisely when no thread exists with the ID, try if
there's a process with that ID and return the first thread in it.
This makes it easier to print stack traces from the ps output.

requested by: rwatson@
tested on: amd64, i386, ia64


132428 20-Jul-2004 grehan

elf_cpu_load_file no longer has an __unused variable. Also, don't
bother syncing the icache for the special case of the kernel (id == 1),
since the loader has already done this.

__unused use reported by: gallatin


132426 20-Jul-2004 grehan

Properly obey PPC context synchronization rules when modifying
the address translation bits of the MSR. This fixes the boot-time
panic reported by Drew Gallatin.


132282 17-Jul-2004 grehan

Resurrect kld support. Support ADDR16_HA/LA relocations, and sync
the icache on module load. Requires "-mlongcall" support, in gcc >= 3.3
but needs a bugfix to support gcc arith builtins.


132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


132088 13-Jul-2004 davidxu

Add ptrace_clear_single_step(), alpha already has it for years, the function
will be used by ptrace to clear a thread's single step state.


132075 12-Jul-2004 grehan

Rename low-level code ddb -> db. Use KDB instead of DDB.
Fix bug in setup of stack frame where 8 bytes wasn't being
saved for the callee's frame pointer and saved LR.


132074 12-Jul-2004 grehan

Bring into KDB new order.


132073 12-Jul-2004 grehan

- DDB -> KDB, with kdb routines
- ddb -> db for low-level trapcode
- implement makectx. I think it only matters that the stack is setup
correctly.
- bring over ddb_trap_glue and rename to db_trap_glue


132072 12-Jul-2004 grehan

No need for ddb option. Never a need for ipkdb option.


132071 12-Jul-2004 grehan

Catch up with gratuitous ddb -> db renaming


132070 12-Jul-2004 grehan

Bring into line with KDB. Bring in NetBSD updates for backtrace routine,
although it really needs a decent re-work.


132069 12-Jul-2004 grehan

Remove unused NetBSD code. Bring mem r/w routines into here in line
with sparc64, although keep the size deref checks: they are useful
when accessing PCI space where some devices may not implement byte
access.


132011 12-Jul-2004 alc

pmap_remove_pages() must not remove wired mappings. Since
pmap_remove_pages() is an optimization, its implementation is optional.

Discussed with: grehan


131867 09-Jul-2004 grehan

- correctly set the return value for the copyin/out fault buffer to 1
so setfault would return correctly when a page fault was invalid
(e.g. a syscall with a bad parameter).

This caused an endless DSI loop, seen when running sendmail which
does a setlogin() call with a NULL pointer.

- introduce KTR_SYSC tracing. expose the syscallnames[] array to
make the tracing more readable.


131808 08-Jul-2004 grehan

G4 requires isync after 256Mb ibat/dbat update, G3 requires
isync after each bat update. Otherwise, pmap_bootstrap causes
an ISI exception. A fall-out of loader BAT removal.


131698 06-Jul-2004 grehan

- trailing white-space cleanup
- add call to thread_user_enter for P_SA processes before
trap processing ala all other arches


131658 05-Jul-2004 alc

Correct pmap_extract()'s return type. It should be vm_paddr_t, not
vm_offset_t.


131481 02-Jul-2004 jhb

Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by: scottl (with his re@ hat)


131401 01-Jul-2004 grehan

Modify loop test when cycling through phys_avail array. It's possible
for an OpenFirmware implementation to have a single memory region
(hello PearPC).


131400 01-Jul-2004 grehan

Catch up with __RMAN_RESOURCE_VISIBLE change


131102 25-Jun-2004 grehan

Catchup to now-required <sys/module.h> for PowerPC


130028 03-Jun-2004 tjr

Remove checks for curthread == NULL - it can't happen.


130023 03-Jun-2004 tjr

Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by: jhb


129750 26-May-2004 tmm

Retire cpu_sched_exit(); it is not used any more.


129416 19-May-2004 grehan

trap_pfault() shouldn't be acquiring Giant. Found to blow up
with MUTEX_PROFILING.

Submitted by: Suleiman Souhlal <refugee@segfaulted.com>


129282 16-May-2004 peter

Make a small revision to the api between the elf linker core and the
elf_reloc() backends for two reasons. First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).


128395 18-Apr-2004 alc

MFamd64
Simplify the sf_buf implementation. In short, make it a veneer
over the direct virtual-to-physical mapping.


128103 11-Apr-2004 alc

Remove avail_end. It is not used.


127977 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127875 05-Apr-2004 alc

Remove avail_start on those platforms that no longer use it. (Only amd64
does anything with it beyond simple initialization.)


127869 05-Apr-2004 alc

Remove unused arguments from pmap_init().


127788 03-Apr-2004 alc

In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not. Add a new parameter so that the caller can specify which is
the case.

Reported by: dillon


127618 30-Mar-2004 benno

Replace td2 with td on the assumption that this was a typo. This should at
least unbreak the build.

Pointy hat to: peter
Not tested either by: benno


127582 29-Mar-2004 peter

Finish tidying up a couple of leftovers from the KSTACK_PAGES stuff. Some
files still #included the opt_ file. powerpc hadn't been updated yet.


127333 23-Mar-2004 alc

Add an implementation of uiomove_fromphys() for PowerPC. This
implementation uses the direct virtual-to-physical mapping.

Discussed with: grehan


127086 16-Mar-2004 alc

Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.


126919 13-Mar-2004 scottl

Now that contigfree() does not require Giant, don't grab it in busdma.


126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


126478 02-Mar-2004 grehan

Increase kernel VA from 256Mb to 512Mb by shifting the segment used
for user copyinout down to 12, and keeping segments 13/14 for
kernel VA.

It would be nice to have more available, but segments lower than
this are reserved for either memory or 1:1 mapped device i/o,
and seg 15 is OpenFirmware ROM. Also, the effort to keep OpenFirmware
available for callbacks limits the use of VA-mapped segments.
Fortunately UMA_MD_SMALL_ALLOC takes away a lot of VM pressure.

Obtained from: NetBSD


126474 02-Mar-2004 grehan

Kernel changes for libthr (and probably libpthread).

include/ucontext.h
- remove trapframe and switch over to 'generic' description of machine
state. Include version field to help with future modifications.
Include floating point and altivec state, and hopefully align
correctly

powerpc/copyinout.c
- fill out casuptr() sync primitive, required by kern_umtx.c

powerpc/machdep.c
- shifted proc0/thread0/pcpu setup to before cninit, since
syscons -> make_dev -> devlock requires a valid curthread
- implemented get_mcontext/set_mcontext
- recast sendsig/sigreturn to use get/set_mcontext and new
ucontext struct. floating point now saved
- TODO: save/restore altivec state

powerpc/vm_machdep.c
- implemented cpu_thread_setup/cpu_set_upcall/cpu_set_upcall_kse
- eliminated trailing whitespace

Submitted by: Suleiman Souhlal <refugee@segfaulted.com>, ucontext by grehan


125708 11-Feb-2004 grehan

Interrupt statistics, vmstat -i now works.

Submitted by: Suleiman Souhlal <refugee@segfaulted.com>
Slightly modified by: grehan
Derived from: i386


125705 11-Feb-2004 grehan

Clean up header files, which fixes compile warning.


125702 11-Feb-2004 grehan

- constify devinfo strings to eliminate compile warning
- remove trailing whitespace


125691 11-Feb-2004 grehan

- fix compile warnings
- removed obsolete NetBSD-derived ADB conditionals


125689 11-Feb-2004 grehan

Fix compile warning


125687 11-Feb-2004 grehan

Cleaned up param.h:

- culled long-dead #define's
- segment register defs moved to sr.h
- NPMAPS moved to pmap.h
- KERNBASE moved to vmparam.h
- removed include of <machine/cpu.h> and fixed src files that
relied on this.

Modifying segment register code no longer causes gcc rebuilds :-)


125686 11-Feb-2004 grehan

Invalide pcb's fpu cpu # by setting it to INT_MAX, not NULL.


125682 11-Feb-2004 grehan

Add sysctl hw.uma_mdpages to track how many pages have been allocated
by UMA_MD_SMALL_ALLOC


125617 09-Feb-2004 grehan

Disable branch-target instruction cache on MPC7457 as outlined
in Motorola processor errata.

Submitted by: Suleiman Souhlal <refugee@segfaulted.com>


125615 09-Feb-2004 grehan

Recognize MPC7547 (aka G4+)


125442 04-Feb-2004 grehan

Remove pmap_pvo_allocf zone alloc function. It was a way of
using the direct-mapping of physmem to force PTE data structures
to be physically addressable so the interrupt-time real-mode
DSI trap handler could perform PTE spills. However, the memory
may have been > 256Mb, which would have caused a BAT spill and
double-interrupt.

The new trap code no longer handles PTE spills, so the requirement
that these pages be direct-mapped no longer applies. The irony is
UMA_MD_SMALL_ALLOC will return direct mappings for these structs :-)


125441 04-Feb-2004 grehan

Major overhaul of common trap code
- remove unused 601 and tlb exception code
- remove interrupt-time PTE spill code. The pmap code
will now take care of pinning kernel PTEs, and there
are no longer issues about physical mapping of PTE
data structures
- All segment registers are switched on kernel entry/exit,
allowing the kernel to have more virtual space and for
user virtual space to extend to 4G.
- The temporary register save area has been shifted from
unused exception vector space to the per-cpu data area.
This allows interrupts to be delivered to multiple CPUs
- ISI traps no longer spill to BAT tables. It is assumed
that all of kernel instruction memory is pinned.
- shift from 'ldmw/stmw' instructions to individual register
loads/stores when saving context. All PPC manuals indicate
this should be much faster.
- use '%r' for register names throughout.

TODO: need to test if DSI traps were the result of kernel stack
guard-page hits.

Reworked from: NetBSD


125440 04-Feb-2004 grehan

- remove unused trap definitions
- ISI traps are now handled by the generic trap routine
- direct diagnostic traps to DDB if defined
- remove unused asngen pcpu init


125439 04-Feb-2004 grehan

- Lots more symbols required by the new trap_subr code


125438 04-Feb-2004 grehan

- Add definition for GET_CPUINFO, required by new trap_subr code
- garbage-collect unused defs


125434 04-Feb-2004 grehan

Allow child devices to set the OpenFirmware device node ivar


125185 29-Jan-2004 grehan

When UMA_MD_SMALL_ALLOC is defined, pmap_kextract will be called
for direct-mapped addresses. Assume that any address less than KVA
is one of these and return it. Also assert that an address is KVA
does have a valid mapping - callers of pmap_kextract don't check
the return value, since they assume that they have a valid virtual
address.


125184 29-Jan-2004 grehan

Implement UMA_MD_SMALL_ALLOC, since the BAT registers allow direct
addressing of memory. Makes a substantial improvement for apps that
stress the limited amount of KVM on PPC (e.g. untarring the ports tree).

uma_machdep.c stolen from amd64/ia64.


124772 21-Jan-2004 grehan

- Catch up with panic __LINE__/__FILE__ changes by moving panic calls
out of asm.
- remove some long-dead code from machdep.c


124771 21-Jan-2004 grehan

A syscons implementation using the 8-bit framebuffer set up by
OpenFirmware. Not at all optimized, but provides a PC-style
user-experience.

Tested on revA imac, B&W G3, 2k iBook, and G4 eMac.


124469 13-Jan-2004 grehan

Make the OpenPic driver bus-independent, with attachments for
the MacIO chip and PSIM's IOBus. Bus-specific drivers should
use the identify method to attach themselves to nexus so
interrupt can be allocated before the h/w is probed. The
'early attach' routine in openpic is used for this stage
of boot. When h/w is probed, the openpic can be attached
properly. It will enable interrupts allocated prior to
this.


124468 13-Jan-2004 grehan

Remove hard-coded knowledge of specific OFW devices. Use bus_generic_probe
and add_child entry point to allow devices to use the identify
method to add themselves if need be (e.g. openpic, syscons).
Export interrupt-controller-add routine for extern int cntlr drivers.
Eliminate recursive OFW device-tree walk and only iterate the
top-level ala sparc64. Allow child devices to set the device
type with write_ivars.

Step 1 of many in removing the hard-dependency on OpenFirmware.


124092 03-Jan-2004 davidxu

Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr


123929 28-Dec-2003 silby

Track three new sendfile-related statistics:
- The number of times sendfile had to do disk I/O
- The number of times sfbuf allocation failed
- The number of times sfbuf allocation had to wait


123920 28-Dec-2003 silby

Move the declaration of sfbufspeak and sfbufsused to mbuf.h,
and use imax instead of max, as sfbufspeak and sfbufsused
are signed.

Submitted by: bde


123884 27-Dec-2003 silby

Track current and peak sfbuf usage, export the values via sysctl.


123742 23-Dec-2003 peter

Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.


123560 16-Dec-2003 grehan

Disable the per-vm_page PTE cache. This was not being invalidated
correctly, resulting in the dreaded "vm_pageout_flush: partially
invalid page" panic. The caching issue will be revisited in the
future, but opt for safety over performance in the meantime.

Tested by: gallatin


123370 10-Dec-2003 grehan

- removed obsolete ppc_exit/ppc_boot functions
- OpenFirmware returns overlapping memory regions. Use a simple
brute force algorithm to merge these into non-overlapping
regions. This fixes bugs in reporting of available memory
and also prevents pages from being added twice in the VM system.


123357 09-Dec-2003 gallatin

Remove redundant declaration of ddb_trap


123354 09-Dec-2003 gallatin

pmap_query_bit() should return false if the bit is not set.

Reviewed by: grehan


123353 09-Dec-2003 gallatin

Use the "shut-down" and "reset-all" Forth procedures to halt and
reboot, as calling OF_exit() just hangs a mac.

FreeBSD on my G4 800Mhz mac behaves identically to OSX for halt
and reboot now.

Reviewed by: grehan (who also supplied the concept and sample code)


122947 21-Nov-2003 jhb

- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.

Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)


122841 17-Nov-2003 peter

Widen the enable/disable helper function's argument in line with the
ithread_create() changes etc. This should be mostly a NOP.


122821 16-Nov-2003 alc

- Remove unnecessary synchronization from sf_buf_init(). (There is only
one active CPU when sf_buf_init() is performed.)


122780 16-Nov-2003 alc

- Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.


122364 09-Nov-2003 marcel

Change the clear_ret argument of get_mcontext() to be a flags argument.
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.

This change is triggered by a need on ia64 to have another knob for
get_mcontext().


121237 19-Oct-2003 peter

Add a stub cpu_idle() function for sparc64, alpha, powerpc. This is a
MI declared function so it should be everywhere.


120722 03-Oct-2003 alc

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


120460 26-Sep-2003 grehan

DELAY must be a routine, not a macro definition.


120422 25-Sep-2003 peter

Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.


120336 22-Sep-2003 grehan

Soften assert in pmap_remove_all.
Introduct pmap_extract_and_hold.

Stolen from: sparc64


119563 29-Aug-2003 alc

Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.


119015 17-Aug-2003 gordon

Fixup the ELF branding information to point to the new home of rtld.


119004 16-Aug-2003 marcel

In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.


118893 14-Aug-2003 grehan

Update powerpc to use the (old thread,new thread) calling convention
for cpu_throw() and cpu_switch().


118848 12-Aug-2003 imp

Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon


118443 04-Aug-2003 jhb

- Since td_critnest is now initialized in MI code, it doesn't have to be
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.


118365 02-Aug-2003 alc

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() in pmap_mapdev().
See revision 1.140 of kern/sys_pipe.c for a detailed rationale.

Submitted by: tegge


118244 31-Jul-2003 bmilekic

Make sure that when the PV ENTRY zone is created in pmap, that it's
created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In
the i386 case in particular, the pmap code would hook a special
page allocation routine that allocated from kernel_map and not kmem_map,
and so when/if the pageout daemon drained the zones, it could actually
push out slabs from the PV ENTRY zone but call UMA's default page_free,
which resulted in pages allocated from kernel_map being freed to
kmem_map; bad. kmem_free() ignores the return value of the
vm_map_delete and just returns. I'm not sure what the exact
repercussions could be, but it doesn't look good.

In the PAE case on i386, we also set-up a zone in pmap, so be
conservative for now and make that zone also ZONE_NOFREE and
ZONE_VM. Do this for the pmap zones for the other archs too,
although in some cases it may not be entirely necessarily. We'd
rather be safe than sorry at this point.

Perhaps all UMA_ZONE_VM zones should by default be also
UMA_ZONE_NOFREE?

May fix some of silby's crashes on the PV ENTRY zone.


118239 31-Jul-2003 peter

Deal with 'options KSTACK_PAGES' being a global option.


118100 27-Jul-2003 alc

Make pmap_pvo_allocf() callable without Giant.


118081 27-Jul-2003 mux

- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.


117600 15-Jul-2003 davidxu

Rename thread_siginfo to cpu_thread_siginfo.

Suggested by: jhb


117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


117126 01-Jul-2003 scottl

Mega busdma API commit.

Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.

sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.

If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.

Reviewed by: tmm, gibbs


117045 29-Jun-2003 alc

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


116958 28-Jun-2003 davidxu

Add a machine depended function thread_siginfo, SA signal code
will use the function to construct a siginfo structure and use
the result to export to userland.

Reviewed by: julian


116907 27-Jun-2003 scottl

Do the first and mostly mechanical step of adding mutex support to the
bus_dma async callback scheme. Note that sparc64 does not seem to do
async callbacks. Note that ia64 callbacks might not be MPSAFE at the
moment. Note that powerpc doesn't seem to do async callbacks due to
the implementation being incomplete.

Reviewed by: mostly silence on arch@


116804 25-Jun-2003 grehan

Remove unused bootpath[] variable. It conflicted with a declaration
in the sunlabel utility, causing build problems.


116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


116328 14-Jun-2003 alc

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


116188 11-Jun-2003 peter

GC unused cpu_wait() function


115858 04-Jun-2003 marcel

Change the second (and last) argument of cpu_set_upcall(). Previously
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.

Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.

This change officially marks the ability on ia64 for 1:1 threading.

Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64


115614 01-Jun-2003 phk

Remove #include <sys/disklabel.h>


115343 27-May-2003 scottl

Bring back bus_dmasync_op_t. It is now a typedef to an int, though the
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.

Approved by: re (bmah)


114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


114727 05-May-2003 obrien

Things run thru the C preprocessor must use C-style comments.


114374 01-May-2003 peter

Back out last commits. The elf64/elf32 kernel name thing was more pain
than it was worth.


114342 30-Apr-2003 peter

Fix transcription error. Use == NULL, not != NULL. Fortunately this
was harmless.


114340 30-Apr-2003 peter

Look for an elf32 kernel (powerpc) and elf64 kernel (sparc64) as well
as a plain "elf kernel".


114305 30-Apr-2003 jhb

Range check the syscall number before looking it up in the syscallnames[]
array.

Submitted by: pho


114029 25-Apr-2003 jhb

- Push down Giant into the sysarch() calls that still need Giant.
- Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is
invalid.


113998 25-Apr-2003 deischen

Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared. The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets. Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code. The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by: jake
Reviewed by: bde (months ago)


113703 19-Apr-2003 grehan

Fix compile warning - proc should have been thread.


113347 10-Apr-2003 mux

Change the operation parameter of bus_dmamap_sync() from an
enum to an int and redefine the BUS_DMASYNC_* constants as
flags. This allows us to specify several operations in one
call to bus_dmamap_sync() as in NetBSD.


113255 08-Apr-2003 des

Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header. Use it instead of hand-
rolled versions wherever applicable.

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


113090 04-Apr-2003 des

Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector. Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.


113038 03-Apr-2003 obrien

Use __FBSDID rather than rcsid[].


112898 01-Apr-2003 jeff

- Define a new md function 'casuptr'. This atomically compares and sets
a pointer that is in user space. It will be used as the basic primitive
for a kernel supported user space lock implementation.
- Implement this function in x86's support.s
- Provide stubs that return -1 in all other architectures. Implementations
will follow along shortly.

Reviewed by: jake


112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


112883 31-Mar-2003 jeff

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


112436 20-Mar-2003 mux

Use atomic operations to increment and decrement the refcount
in busdma tags. There are currently no tags shared accross
different drivers so this isn't needed at the moment, but it
will be required when we'll have a proper newbus method to get
the parent busdma tag.


112429 20-Mar-2003 grehan

Enable the FPU on first use per-thread and save state across context
switches. Not as lazy as it could be. Changing FPU state with sigcontext
still TODO.

fpu.c - convert some asm to inline C, and macroize fpu loads/stores
swtch.S - call out to save/restore fpu routines
trap.c - always call enable_fpu, since this shouldn't be called once
the FPU has been enabled for a thread
genassym.c - define for pcb fpu flag


112402 19-Mar-2003 grehan

Add machine check handler. While generally useful, it's required when
issuing PCI config cycles on MPC106-based PowerMacs, which cause machine
checks when accessing non-existent/empty slots.


112196 13-Mar-2003 mux

Grab Giant around calls to contigmalloc() and contigfree() so
that drivers converted to be MP safe don't have to deal with it.


111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


111655 28-Feb-2003 benno

These files are no longer used. They have been replaced with similarly named
.S files.


111551 26-Feb-2003 grehan

Register typo and incorrect 32-bit constant load in previous commit.
Resulted in AST delivery not working.


111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


111156 20-Feb-2003 grehan

Adjust IRQ count for psim's OpenPIC model - it seems to be
off by 1.


111155 20-Feb-2003 grehan

Catch up to latest KSE changes


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


111028 17-Feb-2003 jeff

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


111024 17-Feb-2003 jeff

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


111002 16-Feb-2003 phk

Remove #include <sys/dkstat.h>


110786 13-Feb-2003 grehan

Missed odd address test when transcribing the Alpha version.
This fixes the checksum problems seen with telnet.


110389 05-Feb-2003 benno

GC an unused variable.


110388 05-Feb-2003 benno

Export the ns_per_tick variable through md_var.h rather than by declaring
it extern in cpu.c.


110387 05-Feb-2003 benno

- Use cpu_setup() instead of identifycpu().
- Remove identifycpu().


110386 05-Feb-2003 benno

Add cpu.c. This contains one exported function, cpu_setup(), which handles
setup of and printing information about cpus.

Obtained from: NetBSD (parts)


110381 05-Feb-2003 benno

Replace the inline asm in delay() with a while loop. This may not be as
efficient but it appears to actually work. Some investigation may be
required.


110380 05-Feb-2003 benno

- Rename the "powerpc" timecounter to the "decrementer" timecounter.
- Initialise it earlier.


110335 04-Feb-2003 harti

Fix a problem in bus_dmamap_load_{mbuf,uio} when the first mbuf or the first
uio segment is empty. In this case no dma segment is create by
bus_dmamap_load_buffer, but the calling routine clears the first flag.
Under certain combinations of addresses of the first and second mbuf/uio
buffer this leads to corrupted DMA segment descriptors. This was already
fixed by tmm in sparc64/sparc64/iommu.c.

PR: kern/47733
Reviewed by: sam
Approved by: jake (mentor)


110296 03-Feb-2003 jake

Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway. This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by: jhb, tmm
Tested on: i386, sparc64


110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


110180 01-Feb-2003 benno

- Introduce a flags value into the interrupt handler structure.
- Copy the flags passed to inthand_add into the flags value.
- If the interrupt is INTR_FAST, re-enable the irq after running the handler.


110172 01-Feb-2003 grehan

- add pmap_pagedaemon_waken variable
- remove dead code and fix warnings in pmap_zero_page/zero_page_area
- implement
pmap_clear_reference
pmap_ts_referenced
pmap_page_exists_quick
pmap_remove_all
- align pmap_qenter/qremove closer with i386 code
- fix vm_page locking in pmap_new_thread (from benno)
- add new parameter to pmap_clear_bit to return original
pte value

Approved by: benno


110167 01-Feb-2003 benno

Make nirq mean 'number of irqs' and not 'last irq'.


109935 27-Jan-2003 benno

Back the previous commit out. It didn't actually fix the problem I was
seeing and the memory barrier isn't needed with the bridges we're using.

Fix the function style however.


109920 27-Jan-2003 benno

Back out some changes that snuck in with the last commit.

Pointy hat to: benno


109919 27-Jan-2003 benno

Flesh out bus_dmamap_sync.


109918 27-Jan-2003 benno

Use td->td_sticks, not td->td_kse->ke_sticks.

Forgotten by: davidxu


109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


109677 22-Jan-2003 grehan

Remove BAT invalidation. This is done later in the boot sequence,
so isn't required here, and seems to cause problems when booting
from disk.

Approved by: benno


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109605 21-Jan-2003 jake

Resolve relative relocations in klds before trying to parse the module's
metadata. This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by: Hartmut Brandt <brandt@fokus.gmd.de>
PR: 46732
Tested on: alpha (obrien), i386, sparc64


109342 16-Jan-2003 dillon

Merge all the various copies of vm_fault_quick() into a single
portable copy.


109340 15-Jan-2003 dillon

Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy. Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).


109156 13-Jan-2003 benno

Correct an off-by-one error in the calculation of the number of interrupt
resources we're managing.


108945 08-Jan-2003 grehan

Add page queues locking to vunmapbuf().

Obtained from: sparc64
Approved by: benno


108944 08-Jan-2003 grehan

Sync the i-cache after copying down the interrupt code

Approved by: benno


108943 08-Jan-2003 grehan

Be more conservative about re-enabling interrupts during trap processing
until atomic issues are fully sorted.

Approved by: benno


108942 08-Jan-2003 grehan

Fix incorrect error returns and sign-extension.

Approved by: benno


108941 08-Jan-2003 grehan

Fetch the initial time from the rtc OpenFirmware node. This is a short-term
measure until the rtc h/w driver is written, and it's a lot better
than having "jan 1 1970" on filesys times.

Approved by: benno


108940 08-Jan-2003 grehan

Remove obsolete NFS_ROOT conditional.

Approved by: benno


108939 08-Jan-2003 grehan

Implement bus_dmamap_load_mbuf/bus_dmamap_load_uio.
Tested load_mbuf with GEM ethernet driver.

Submitted by: Hiten Pandya <hiten@unixdaemons.com> (modified by grehan)


107719 10-Dec-2002 julian

Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by: re (jhb)


107180 22-Nov-2002 mux

Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit(). This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit(). We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by: re@
Suggested by: jhb


106977 16-Nov-2002 deischen

Add getcontext, setcontext, and swapcontext as system calls.
Previously these were libc functions but were requested to
be made into system calls for atomicity and to coalesce what
might be two entrances into the kernel (signal mask setting
and floating point trap) into one.

A few style nits and comments from bde are also included.

Tested on alpha by: gallatin


106838 13-Nov-2002 alc

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


106697 09-Nov-2002 des

Print real / avail memory in megabytes rather than kilobytes.


106605 07-Nov-2002 tmm

Move the definitions of the hw.physmem, hw.usermem and hw.availpages
sysctls to MI code; this reduces code duplication and makes all of them
available on sparc64, and the latter two on powerpc.
The semantics by the i386 and pc98 hw.availpages is slightly changed:
previously, holes between ranges of available pages would be included,
while they are excluded now. The new behaviour should be more correct
and brings i386 in line with the other architectures.

Move physmem to vm/vm_init.c, where this variable is used in MI code.


106503 06-Nov-2002 jmallett

Remove what was a temporary bogus assignment of bits of siginfo_t, as it does
not look like the prerequisites to fill it in properly will be in the tree
for the upcoming release, but it's mostly done, so there is no need for these
to stay around to remind us.


105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


105611 21-Oct-2002 grehan

Add the USER_SR segment register to pcb state. Initialize correctly,
and save/restore during a context switch.

The USER_SR could be overwritten when the current thread was switched
out with a faulting copyin/copyout.

Approved by: Benno


105469 19-Oct-2002 marcel

Add two hooks to signal module load and module unload to MD code.
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.

Validated on: alpha, i386, ia64


104435 04-Oct-2002 grehan

Clean up ddb warnings/errors and enable in GENERIC

Approved by: benno
Motivated by: gallatin


104434 04-Oct-2002 grehan

- fix zero-sized stack alloc from previous commit. a default is now
selected ala sparc64
- KSEIII routines implemented (taken from i386/sparc64)

Approved by: Benno


104354 02-Oct-2002 scottl

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


103646 19-Sep-2002 jhb

Implement db_print_backtrace() if DDB is compiled into the kernel. This
MD function is just a wrapper around db_stack_trace_cmd() that prints out
a backtrace of curthread. Currently, this function is only implemented
on i386 and alpha (and the alpha version isn't quite tested yet, will do
that in a bit). Other changes:

- For i386, fix a bug in the raw frame address case. The eip we extract
from the passed in frame address does not match the frame we received.
Thus, instead of printing a bogus frame with the wrong eip, go ahead
and advance frame down to the same frame as the eip we are using.
- For alpha, attempt to add a way of doing a raw trace for alpha. Instead
of passing a frame address in 'addr', pass in a pointer to a structure
containing PC and KSP and use those to start the backtrace. The alpha
db_print_backtrace() uses asm to read in the current PC and KSP values
into such a request.

Tested on: i386
Requested by: many


103609 19-Sep-2002 grehan

- bring vm_mapbuf/unmapbuf in line with other archs
- update for recent KSE changes

Approved by: benno


103608 19-Sep-2002 grehan

- make sure recoverable interrupts are re-enabled in the trap handler
- turn on ast() loop to enable signal delivery

Approved by: benno


103607 19-Sep-2002 grehan

- worked around 32-bit big-endian syscall return value problem
- syscall register spills weren't copied in correctly
- removed VM_PROT_READ from the fault type on write protect faults

Approved by: benno


103606 19-Sep-2002 grehan

Add sync before isync for G4 cpus

Obtained from: NetBSD
Approved by: benno


103605 19-Sep-2002 grehan

- use symbol for user-context offset
- fix szsigcode size declaration

Approved by: benno


103604 19-Sep-2002 grehan

- use BAT registers to map device space and physical memory
- remove test in pmap_activate that prevented vmspace sharing (v/rfork)
- always sync icache in pmap_enter until problems are sorted
- fix incorrect use of regions in pmap_kenter
- bring in pmap_release from NetBSD
- fix overwrite of bootstrap flag in pmap_pvo_enter

Approved by: benno


103603 19-Sep-2002 grehan

- psim device support
- comment out re-enabling of interrupts until problems are sorted

Approved by: benno


103602 19-Sep-2002 grehan

Clear on-demand BAT entries to properly restore OpenFirmware's
address space

Approved by: benno


103601 19-Sep-2002 grehan

psim device support

Approved by: benno


103600 19-Sep-2002 grehan

- implemented sendsig/sigreturn
- sysctl for cacheline size, required by libc/rtld
- init'd more exception vectors
- fixed problem with register overwrite in exec_setregs
- removed redundant NetBSD code

Approved by: benno


103599 19-Sep-2002 grehan

- moved intrcnt/intrnames to locore.s to fix sysctl -a panic

Approved by: benno


103598 19-Sep-2002 grehan

- rationalised includes
- added sigframe offset

Approved by: benno


103597 19-Sep-2002 grehan

- removed unnecessary includes
- converted inline asm to C for int enable
- shifted clearing of 'cold' to end of routine

Approved by: benno


103367 15-Sep-2002 julian

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


103049 07-Sep-2002 peter

Zap the implementations of the i386-aout specific cpu_coredump function.
Most of the non-i386 platforms had rather broken implementations anyway.


102808 01-Sep-2002 jake

Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.


102666 31-Aug-2002 peter

Take a shot at fixing up a whole stack of style and other embarresing
unforced errors that Bruce identified. I have not yet addressed all of
his concerns.


102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


102561 29-Aug-2002 jake

Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.


102399 25-Aug-2002 alc

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


101941 15-Aug-2002 rwatson

In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101346 05-Aug-2002 alc

o Don't set PG_MAPPED or PG_WRITEABLE when a page is mapped
using pmap_kenter() or pmap_qenter().
o Use VM_ALLOC_WIRED in pmap_new_thread().


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


100319 18-Jul-2002 benno

Remove the statically allocated array that holds OpenFirmware memory mappings
during pmap_bootstrap. Instead, temporarily help ourselves to some memory
from phys_avail since we won't need it post-boostrap.


99900 13-Jul-2002 mini

Add additional cred_free_thread() calls that I had missed the first time.

Pointed out by: jhb


99887 12-Jul-2002 jhb

Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.

Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64


99731 10-Jul-2002 benno

Add setjmp (needed for DDB).


99730 10-Jul-2002 benno

Add DDB support.


99729 10-Jul-2002 benno

- Make sure we don't trample our metadata pointer in our initial bootstrap.
- Load metadata parameters.


99725 10-Jul-2002 benno

Remove some diagnostic code that snuck in.


99723 10-Jul-2002 benno

Remove some unused includes.


99666 09-Jul-2002 benno

Add an implementation for pmap_zero_page_area.


99665 09-Jul-2002 benno

Add the OF_getetheraddr function required by if_gem.


99664 09-Jul-2002 benno

Tidy up trap vector and external interrupt setup.


99659 09-Jul-2002 benno

Changes for KSE3.

Submitted by: Peter Grehan <peterg@ptree32.com.au>


99657 09-Jul-2002 benno

1) Add busdma machdep code.
2) Add bus_pio.h and bus_memio.h (which do nothing).

Submitted by: Peter Grehan <peterg@ptree32.com.au> (1)


99654 09-Jul-2002 benno

Driver for OpenPIC compatible interrupt controllers.
It's fairly PowerMac specific at the moment, but that should be fixable.


99652 09-Jul-2002 benno

- Add the "compatible" property to the list that we keep in ivars.
- Add interrupt alloc/setup/teardown/dealloc support, via whichever PIC
OpenFirmware gives us.


99651 09-Jul-2002 benno

Add interrupt handling support code.

I've tried to make this fairly platform-independant as some PowerPC platforms
may not have openpic-style interrupt controllers. This may not have the best
performance but it works for now.


99571 08-Jul-2002 peter

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


99559 07-Jul-2002 peter

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


99557 07-Jul-2002 peter

Update for post-kse3 pmap kthread allocation changes


99040 29-Jun-2002 benno

in_cksum et al.

Submitted by: Peter Grehan <peterg@ptree32.com.au>


99038 29-Jun-2002 benno

Add pmap_mapdev and pmap_unmapdev.


99037 29-Jun-2002 benno

- Initialise battable to cover I/O spaces.
- Statically size the bpvo entries to avoid conflicts between bpvo allocation
and the vm allocator.
- Shift pmap_init2 code into pmap_init.
- Add UMA_ZONE_VM flag to uma_zcreate.

Submitted by: Peter Grehan <peterg@ptree32.com.au>


99036 29-Jun-2002 benno

To quote Peter:

The case in cpu_switch() where there isn't a higher priority thread
(choosethread() == curthread) uses r4 as the PCB context pointer. However, the
use of r4 after the label L2 is incorrect, since it was probably trashed by
the call to choosethread, and in any case was set up to curthread at the start
of the routine.

This condition will occur when an interrupt thread schedules a netisr, which
is a lower priority thread.

Another (probably unnecessary) difference is that I was paranoid about
register trashing, so I decided to save r2 and r13 as well.

Submitted by: Peter Grehan <peterg@ptree32.com.au>


99035 29-Jun-2002 benno

mempcy/bcopy handles overlapping copies so make ovbcopy call it.


99034 29-Jun-2002 benno

Add BOOTP_NFSROOT support code.


99033 29-Jun-2002 benno

- Use tmpstk exclusively in the init path.
- Remove redundant code.

Submitted by: Peter Grehan <peterg@ptree32.com.au>


99032 29-Jun-2002 benno

Many fixes to low-level trap and interrupt handling:

- Tidy up clock code. Don't repeatedly call hardclock().
- Remove intrnames, decrnest and intrcnt from locore.s
- Coalesce all trap handling into a single stub that then calls a dispatch
function.

Submitted by: Peter Grehan <peterg@ptree32.com.au>


99030 29-Jun-2002 benno

Convert this from mostly inline assembler to mostly C.

Submitted by: Peter Grehan <peterg@ptree32.com.au>


98765 24-Jun-2002 jake

Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.


98727 24-Jun-2002 mini

Remove unused diagnostic function cread_free_thread().

Approved by: alfred


98480 20-Jun-2002 peter

Deorbit suibyte(). It was only used for split address space systems
for supporting UIO_USERISPACE (ie: it wasn't used).


98001 07-Jun-2002 jhb

- Fixup / remove obsolete comments.
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.


97399 28-May-2002 benno

The stack is not at the top of the user struct.


97398 28-May-2002 benno

Remove an assertion as to whether the current thread already had the FPU or
not. It may be desirable to put something similar back, but it's getting in
the way in it's current form.


97397 28-May-2002 benno

- Move macros that represent where syscall args are kept in a trapframe from
trap.c to frame.h
- Use the macros in vm_machdep.c:cpu_fork() to set up the trap frame of the
new thread.


97395 28-May-2002 benno

Remove the old prototype for kcopy. It's in cpu.h now.


97385 28-May-2002 benno

Implement pmap_copy and pmap_copy_page.


97384 28-May-2002 benno

Move the kcopy() function from trap.c to machdep.c. Add a prototype.


97347 27-May-2002 benno

Print srr1 in printtrap()

Submitted by: Peter Grehan <peterg@ptree32.com.au>


97346 27-May-2002 benno

Get the correct memory regions from OpenFirmware. We were getting the
"available" ranges, not the "physical" ranges. Clean up some of the
bootstrap code in the process.

Submitted by: Peter Grehan <peterg@ptree32.com.au>


97342 27-May-2002 benno

Use correct types in [sf]uword32.


97307 26-May-2002 dfr

Add declarations of suword32 and suword64. Add implementations of one or
the other (or both) to all the platforms. Similar for fuword32 and
fuword64.


96938 19-May-2002 benno

Make this more FreeBSD-ish.

Requested by: jhb


96906 19-May-2002 benno

- Do a quick style pass.
- Correct the implementation of fix_unaligned to use a thread, not a proc.
- GC some #if 0'd stuff.


96773 17-May-2002 benno

- Rename the _C_LABEL macro to CNAME.
- Rename the _ASM_LABEL macro to ASMNAME.
- Add the HIDENAME macro which is used in libc's syscall stuff.


96771 17-May-2002 benno

Fix commenting around NetBSD version string.


96499 13-May-2002 benno

FPU support.

Obtained from: NetBSD (portions)


96452 12-May-2002 benno

More locking fixes.


96443 12-May-2002 benno

Do the correct locking on processes for DSI and ISI traps.

Copied from: sparc64


96353 10-May-2002 benno

Implement the following functions:
- pmap_addr_hint
- pmap_change_wiring
- pmap_extract
- pmap_is_modified


96352 10-May-2002 benno

Install the system call trap handler.


96334 10-May-2002 benno

Improve our detection of an attempted duplicate entry. We may be trying to
change the page protection bits.


96333 10-May-2002 benno

Remove a debugging printf that escaped.


96255 09-May-2002 benno

Update to newer trap code from NetBSD.

Obtained from: NetBSD


96253 09-May-2002 benno

Add an assertion that we have a current pmap set before we try and return.


96252 09-May-2002 benno

The per-cpu curpmap is now set by pmap_activate. We don't need to do it here
anymore.


96251 09-May-2002 benno

- Add a prototype for the setfault() function.
- Remove some stray printf()s.


96250 09-May-2002 benno

1. Better track the executable status of mappings.
2. Set a pcpu variable to the real address of the active pmap (used when
exiting from traps.

Obtained from: NetBSD (1)


95814 30-Apr-2002 phk

Don't export timecounter structures under debug. with sysctl, they
contain no truly interesting data anymore.


95719 29-Apr-2002 benno

Commit of stuff that's been sitting in my tree for a while.

Highlights include:
- New low-level trap code from NetBSD. The high level code still needs a lot
of work.
- Fixes for some pmap handling in thread switching.
- The kernel will now get to attempting to jump into init in user mode. There
are some pmap/trap issues which prevent it from actually getting there though.

Obtained from: NetBSD (parts)


95714 29-Apr-2002 benno

- Add back calls to setfault that were removed when these functions were moved.


95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


95564 27-Apr-2002 alc

MFi386 1.222: Remove vm_map_growstack() and acquisition and release of Giant
around vm_fault() in trap_pfault().


95410 25-Apr-2002 marcel

Don't use the symbol name to lookup the symbol value when we can use
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.

Don't pass the symbol name to elf_reloc(), as it isn't used any more.


95121 20-Apr-2002 benno

Replace inline asm with it's inline function wrapper.


94839 16-Apr-2002 benno

Correct a comment.


94838 16-Apr-2002 benno

Implement the following functions:
- pmap_kextract
- pmap_object_init_pt
- pmap_protect
- pmap_remove_pages

I'm pretty sure pmap_remove_pages is at least somewhat bogus.


94837 16-Apr-2002 benno

Remove some dead code.


94836 16-Apr-2002 benno

Use mtsrin() instead of inline asm.


94835 16-Apr-2002 benno

Change the value of PMAP_BOOTSTRAP so we don't stomp on the PTE index value.


94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


94755 15-Apr-2002 benno

Add a nexus device.

Copied from: sparc64


94753 15-Apr-2002 benno

Turn some CTR's into CTR0's.


94275 09-Apr-2002 phk

GC various bits and pieces of USERCONFIG from all over the place.


94151 07-Apr-2002 phk

GC the "dumplo" variable, which is no longer used.

A lot of sys/*/*/machdep.c seems not to be.


93702 02-Apr-2002 jhb

- Move the MI mutexes sched_lock and Giant from being declared in the
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.

Tested on: alpha, i386


93607 01-Apr-2002 dillon

Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by: jake


93467 31-Mar-2002 phk

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


93452 30-Mar-2002 alc

Use the MI vm_map_growstack() instead of the MD grow_stack() in trap(). Remove
the MD grow_stack().


93273 27-Mar-2002 jeff

Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag. Remove the dup_list and dup_ok code from subr_witness. Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface. The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by: jhb


93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


92916 21-Mar-2002 benno

Collect all functions for copying to and from userspace into the one file.

This allows me to reimplement [sf]u{byte,word} as separate functions and not
as calls to copy{in,out}.


92847 21-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.


92842 20-Mar-2002 alfred

Remove __P.

Reveiwed by: benno


92824 20-Mar-2002 jhb

Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount. Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case. In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place. This also removse td_cache_ucred from struct thread. This is
still only done #ifdef DIAGNOSTIC.

Tested on: i386, alpha


92757 20-Mar-2002 benno

Increment pmap_pvo_count in the right place.


92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


92521 17-Mar-2002 benno

Changes and fixes in preparation for UMA:

- Bootstrap pvo entries are now allocated by stealing pages.
- Just return if we're pmap_enter'ing a mapping that's already there. Don't
remove it and re-enter it.


92520 17-Mar-2002 benno

Lowercase all of the trap names.


92519 17-Mar-2002 benno

Clean up and fix up copyin and copyout.


92067 11-Mar-2002 benno

Correct a typo. (* that should've been &)


91803 07-Mar-2002 benno

Install the DSI and ISI trap handlers and their appropriate locations.


91802 07-Mar-2002 benno

Copy the "implementation" of pmap_prefault from sparc64.


91794 07-Mar-2002 benno

Move tunable initialisation so it can get access to physmem.


91793 07-Mar-2002 benno

Calculate physmem.


91504 28-Feb-2002 arr

- Move a comment from being on the same line as a #ifdef to the line
following it. This should have gone in the previous commit, but
misviewed Bruce's patch.

Requested by: bde


91486 28-Feb-2002 benno

cpu_switch now works, for kthreads at least.


91485 28-Feb-2002 benno

Various cleanups.


91484 28-Feb-2002 benno

- Prevent the decrementer interrupt handler from nesting.
- Catch some more cases of PSL_EE and PSL_RI getting out of sync.


91483 28-Feb-2002 benno

- Modify pmap_activate so it only marks the pmap as active.
- Add a pmap_deactivate function.


91477 28-Feb-2002 benno

GC an unused variable in cpu_fork().


91475 28-Feb-2002 arr

- Fix panic() message and a couple style nits that snuck in from the
recent diagnostics commit (rev. 1.84).


91467 28-Feb-2002 benno

Make fork work, at least for kthreads. Switching still has some issues.


91465 28-Feb-2002 benno

- Rearrange the sequence of events in powerpc_init() somewhat.
- Catch another instance of PSL_EE being cleared without PSL_RI.


91456 28-Feb-2002 benno

Implement the following functions:
- pmap_remove
- pmap_kremove
- pmap_qremove


91455 28-Feb-2002 benno

Remove most of the usage of critical_enter/exit.

I put these in to match the use of spl*() in the NetBSD code I was basing this
on, but it appears to cause problems.

I'm doing this in a separate commit so as to be able to refer back if locking
becomes an issue at a later stage.


91403 27-Feb-2002 silby

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


91131 23-Feb-2002 benno

Don't call critical_enter()/critical_exit() around calls to pmap_pvo_enter()
as it does it's own handling of critical sections.


91090 22-Feb-2002 julian

Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by: dillon@freebsd.org


90895 19-Feb-2002 julian

Add change to teh PPC to keep it in step with i386 and MI code

Pointy hat this direction please...


90643 14-Feb-2002 benno

Complete rework of the PowerPC pmap and a number of other bits in the early
boot sequence.

The new pmap.c is based on NetBSD's newer pmap.c (for the mpc6xx processors)
which is 70% faster than the older code that the original pmap.c was based
on. It has also been based on the framework established by jake's initial
sparc64 pmap.c.

There is no change to how far the kernel gets (it makes it to the mountroot
prompt in psim) but the new pmap code is a lot cleaner.

Obtained from: NetBSD (pmap code)


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


90344 07-Feb-2002 phk

GC the PC_SWITCH* symbols which are not used in assembly anymore.


90065 01-Feb-2002 bde

Compile osigreturn() unconditionally since it will always be needed on
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.

ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.

powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.

sparc64: copy the cleaned up the ia64 stub, since there was no stub before.


89919 28-Jan-2002 gallatin

Simple fixes to get the powerpc kernel compiling again.

Reviewed by: mp


88088 18-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


87546 09-Dec-2001 dillon

Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after: 1 week


86311 13-Nov-2001 mp

Don't enable FP in the kernel. It is not needed when -msoft-float is used.

Reminded by: benno


86067 05-Nov-2001 mp

Clean up the trap handling code and make it consistent with the other platforms.

Submitted by: jhb


86066 05-Nov-2001 mp

Add enable_fpu/save_fpu for handling the floating point registers in the PCB.

Obtained from: NetBSD


85297 21-Oct-2001 des

Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85294 21-Oct-2001 des

[partially forced commit due to pilot error in earlier commit attempt]

{set,fill}_{,fp,db}regs() fixup:

- Add dummy {set,fill}_dbregs() on architectures that don't have them.

- KSEfy the powerpc versions (struct proc -> struct thread).

- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.

These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
one yet.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85201 19-Oct-2001 mp

Fix includes based on recent changes to lock.h, mutex.h and ktr.h.


84977 15-Oct-2001 benno

Flesh out cpu_fork() and cpu_set_fork_handler(). This is a work in progress.


84976 15-Oct-2001 benno

- Correct the type of the argument to delay() so as to not conflict with
sys/boot/common/bootstrap.h.
- Add a prototype for fork_trampoline().


84946 15-Oct-2001 mp

Fix typo.


84945 15-Oct-2001 mp

Save WIP. Partial rewrite of cpu_switch() and savectx(). This makes it closer
to working but still needs some work to properly switch the full context
(such as saving the fpu registers, switch stacks, etc.). Also, remove some
dead code that was mixed in.


84921 14-Oct-2001 benno

Implement pmap_mapdev.


84856 12-Oct-2001 mp

Modify a virtual address check to allow use of the openfirmware callback
used by the PowerPC simulator (PSIM).


84855 12-Oct-2001 mp

Add standard calls to device_add_child() and root_bus_configure().


84643 08-Oct-2001 mp

Add a call to init_param() to initialize some necessary variables.


84637 07-Oct-2001 des

Dissociate ptrace from procfs.

Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".


84381 02-Oct-2001 mjacob

Fix problem where a user buffer outside of the area being tested
will be corrupted.

PR: 29194
Obtained from: Tor.Egge@fast.no
MFC after: 2 weeks


83870 24-Sep-2001 mp

Catch up to recent removal of curpcb from globals.h.


83730 20-Sep-2001 mp

Add missing include file.


83729 20-Sep-2001 mp

Don't include NFS headers.

Reported by: dfr


83691 20-Sep-2001 peter

Replicate a change from alpha/genassym.c to other arches. This should
fix nfs-related build breakage.


83683 20-Sep-2001 mp

Use BATL/BATU macros instead of hardcoded hex constants.


83682 20-Sep-2001 mp

Update PowerPC MD code to compile and do initial bootstrap based on
recent changes (KSE and VM requiring physmem to be setup).

Reviewed by: benno, jhb, julian


83651 18-Sep-2001 peter

Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83276 10-Sep-2001 peter

Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by: jake, tmm, dillon


83223 08-Sep-2001 peter

Missing part of dillon's coredump commit. cpu_coredump() was still
passing IO_NODELOCKED to vn_rdwr(), this would cause operations on the
unlocked core vnode and softupdates nastiness if an a.out binary cored.


82939 04-Sep-2001 peter

Zap #if 0'ed map init code that got moved to the MI area.
Convert the powerpc tree to use the common code.


82938 04-Sep-2001 peter

Nuke #if 0'ed "setredzone()" stub. We never used it, and probably
never will. I've implemented an optional redzone as part of the KSE
upage breakup.


82708 01-Sep-2001 jhb

Axe stale mp_fixme().


82634 31-Aug-2001 peter

Similar to changes on i386/alpha/etc pmap.c; converge on a similar
look/feel on pmap_new_proc() with some cosmetic style changes.


82025 21-Aug-2001 peter

Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this
stuff.


81726 15-Aug-2001 jhb

The 'astpending' variable is already declared in trap.c (and unused in
FreeBSD besides).


81724 15-Aug-2001 jhb

FreeBSD doesn't use a want_resched variable. Instead, the PS_NEEDRESCHED
p_sflag is managed in a MI fashion.


81493 10-Aug-2001 jhb

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


81265 08-Aug-2001 peter

Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.


81138 04-Aug-2001 jhb

Axe unused and invalid GD_ASTPENDING symbol.


80431 27-Jul-2001 peter

Make PMAP_SHPGPERPROC tunable. One shouldn't need to recompile a kernel
for this, since it is easy to run into with large systems with lots of
shared mmap space.

Obtained from: yahoo


80399 26-Jul-2001 bmilekic

- Do not handle the per-CPU containers in mbuf code as though the cpuids
were indices in a dense array. The cpuids are a sparse set and treat
them as such, setting up containers only for CPUs activated during
mb_init().

- Fix netstat(1) and systat(1) to treat the per-CPU stats area as a sparse
map, in accordance with the above.

This allows us to properly boot with certain CPUs disactivated. However, if
we later decide to re-activate said CPUs, we will barf until we decide to
implement CPU spinon/spinoff callback hooks to allow for said CPUs' per-CPU
containers to get configured on their activation.

Reported by: mjacob
Partially (sys/ diffs) Submitted by: mjacob


79265 05-Jul-2001 dillon

Move vm_page_zero_idle() from machine-dependant sections to a
machine-independant source file, vm/vm_zeroidle.c. It was exactly the
same for all platforms and updating them all was getting annoying.


79263 04-Jul-2001 dillon

Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).
Also removed some spl's and added some VM mutexes, but they are not actually
used yet, so this commit does not really make any operational changes
to the system.

vm_page.c relates to vm_page_t manipulation, including high level deactivation,
activation, etc... vm_pageq.c relates to finding free pages and aquiring
exclusive access to a page queue (exclusivity part not yet implemented).
And the world still builds... :-)


79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


79123 03-Jul-2001 jhb

Allow Giant to be recursed when a process terminates.


79038 01-Jul-2001 benno

Don't include machine/autoconf.h for now. It's not used and is breaking the
build.


78983 29-Jun-2001 jhb

Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.


78925 28-Jun-2001 benno

Put back the two semicolons I accidentally lost while reformatting this to
bring it closer to style(9).

Spotted by: Mark Peek <mark@whistle.com>


78883 27-Jun-2001 benno

Code for dealing with external interrupts.

Obtained from: NetBSD


78880 27-Jun-2001 benno

Fix comment breakage.


78467 19-Jun-2001 benno

More verbose version of identifycpu() which also contains many more CPU
versions/revisions.

Modified from the original patch to mark G3 and G4 processors as such.

Submitted by: Jeff Schottmiller <jeff@neoscale.com>


78342 16-Jun-2001 benno

This commit (along with one pending in sys/dev/ofw and one in sys/conf) give
us our first minimal glimpse of PowerPC support.

With this code we can get to the "mountroot>" prompt on my Apple iMac. We
can't get any further due to lack of clock and interrupt handling, among other
things. This does however mean that pmap and VM are initialising.

We're fairly dependant on OpenFirmware at this point, but I hope to add
support for other classes of firmware at a later stage.

Reviewed by: obrien, dfr


77957 10-Jun-2001 benno

Bring in NetBSD code used in the PowerPC port.

Reviewed by: obrien, dfr
Obtained from: NetBSD


77442 29-May-2001 jhb

GC #if 0'd calls to releasing and acquiring the old style giant kernel
lock.


77031 23-May-2001 ru

- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.


76932 21-May-2001 gallatin

catch these files up to their i386 neighbors to make alpha boot
prior to the vm_mtx


76659 16-May-2001 jhb

Lock the procfs functions for doing a single step and reading/writing
registers better. Hold sched_lock not only for checking the flag but
also while performing the actual operation to ensure the process doesn't
get swapped out by another CPU while we the operation is being performed.


76442 10-May-2001 jhb

Trim lots of stuff that is now in MI code along with MD alpha code.


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


76056 26-Apr-2001 jhb

Initialize p_md.md_kernnest to 1 for newly fork'd processes since they
start off in the kernel.


75925 24-Apr-2001 jhb

Add a new field 'md_kernnest' to the alpha machine dependent process
structure. This field keeps track of how many levels deep we are nested
into the kernel. The nesting level is bumped at the start of a trap,
interrupt, syscall, or exception and is decremented on return. This is
used to detect the case when the kernel is returning back to a kernel
context in exception_return(). If we are returning to the kernel we need
to update the globaldata pointer register saved in the stack frame in case
we have switched CPU's between taking the initial interrupt that saved the
frame and returning. If we don't do this fixup it is possible for a CPU to
use the wrong per-cpu data. On UP systems this is not a problem, so the
code is conditional on SMP.

A count was used instead of simply checking the process status register in
the frame during exception_return() since there are critical sections at
the very start and end of a trap, exception, or interrupt from userland in
which we could trash the t7 register being used in userland. The counter
is incremented after adn before these critical sections respectively so
that we will not overwrite the saved t7 register if we are interrupted
during one of these critical sections.


75873 23-Apr-2001 mjacob

Fix includes so it compiles again.


75570 17-Apr-2001 jhb

Blow away the panic mutex in favor of using a single atomic_cmpset() on a
panic_cpu shared variable. I used a simple atomic operation here instead
of a spin lock as it seemed to be excessive overhead. Also, this can avoid
recursive panics if, for example, witness is broken.


74891 28-Mar-2001 jhb

- Include <machine/prom.h> to get the prototype for prom_halt().
- If there is no gdb device, just return without trying to return any
value since gdb_handle_exception() returns void.
- When calling prom_halt(), pass in a value telling it to actually halt
and not to randomly choose whether or not to halt or reboot depending on
whatever value happened to be in a0 when the call was made.


74271 15-Mar-2001 gallatin

remove bogus check -- for kernel threads we fork off of proc0, not curproc
This was causing panics when modules which create kthreads were loaded
after boot.

pointed out by: jake, jhb


73922 07-Mar-2001 jhb

Use the proc lock to protect p_pptr when waking up our parent in cpu_exit()
and remove the mpfixme() message that is now fixed.


72906 22-Feb-2001 jhb

Rename switch_trampoline() to fork_trampoline() on the alpha and ia64.

Suggested by: dfr


72746 20-Feb-2001 jhb

- Don't call clear_resched() in userret(), instead, clear the resched flag
in mi_switch() just before calling cpu_switch() so that the first switch
after a resched request will satisfy the request.
- While I'm at it, move a few things into mi_switch() and out of
cpu_switch(), specifically set the p_oncpu and p_lastcpu members of
proc in mi_switch(), and handle the sched_lock state change across a
context switch in mi_switch().
- Since cpu_switch() no longer handles the sched_lock state change, we
have to setup an initial state for sched_lock in fork_exit() before we
release it.


72358 11-Feb-2001 markm

RIP <machine/lock.h>.

Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb


72276 10-Feb-2001 jhb

- Make astpending and need_resched process attributes rather than CPU
attributes. This is needed for AST's to be properly posted in a preemptive
kernel. They are backed by two new flags in p_sflag: PS_ASTPENDING and
PS_NEEDRESCHED. They are still accesssed by their old macros:
aston(), astoff(), etc. For completeness, an astpending() macro has been
added to check for a pending AST, and clear_resched() has been added to
clear need_resched().
- Rename syscall2() on the x86 back to syscall() to be consistent with
other architectures.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


71881 31-Jan-2001 dfr

* Move exception_return to exception.s which is a more logical home for it.
* Optimise the return path for syscalls so that they only restore a minimal
set of registers instead of performing a full exception_return.

A new flag in the trapframe indicates that the frame only holds partial
state. When it is necessary to perform a full state restore (e.g. after an
execve or signal), the flag is cleared to force a full restore.


71694 26-Jan-2001 jhb

Update some comments, s0 in the pcb of a child returning from fork1() is
now passed in as a0 to fork_exit() and and s2 is passed in as a1.


71604 24-Jan-2001 jhb

- Change fork_exit() to take a pointer to a trapframe as its 3rd argument
instead of a trapframe directly. (Requested by bde.)
- Convert the alpha switch_trampoline to call fork_exit() and use the MI
fork_return() instead of child_return().
- Axe child_return().


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


71541 24-Jan-2001 jhb

- Proc locking.
- P_INMEM -> PS_INMEM.


71536 24-Jan-2001 jhb

cpuno -> cpuid.


70741 07-Jan-2001 benno

PowerPC atomic operation functions.
Some of these are dependant on an inline function (powerpc_mb()) that is
yet to come.

Reviewed by: obrien


70583 01-Jan-2001 obrien

MP shells for the PowerPC platform.


70317 23-Dec-2000 jake

Protect proc.p_pptr and proc.p_children/p_sibling with the
proctree_lock.

linprocfs not locked pending response from informal maintainer.

Reviewed by: jhb, -smp@


69488 01-Dec-2000 gallatin

acquire/release Giant in vm_page_zero_idle(), like on i386

Discused with: jhb


68900 19-Nov-2000 dfr

Convert various calls to splhigh() to disable_intr() since splhigh() is
now a no-op.


68899 19-Nov-2000 dfr

We don't need <stddef.h> for offsetof() any more.


68762 15-Nov-2000 jhb

Don't perform an mi_switch() when we release Giant during cpu_exit(). We
are about to call cpu_switch() anyways.

Found by: witness


67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


67365 20-Oct-2000 jhb

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


67358 20-Oct-2000 jhb

- machine/mutex.h -> sys/mutex.h
- Catch up to the MI mutex structure due to saveflags,saveipl,savepsr
becoming saveintr.


66573 03-Oct-2000 dfr

Clear pcb_schednest in cpu_fork() for the child process. This is
is necessary since the child's call stack only includes one recursive
hold of sched_lock.


66277 22-Sep-2000 ps

Remove the NCPU, NAPIC, NBUS, NINTR config options. Make NAPIC,
NBUS, NINTR dynamic and set NCPU to a maximum of 16 under SMP.

Reviewed by: peter


65557 07-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


61825 19-Jun-2000 gallatin

Support bounce buffers for ISA DMA on the alpha. This is required for the
irongate chipset (used in the UP1000) which does not support scatter/gather
DMA. We'll still use scatter gather if the core logic chipset supports it.

Reviewed by: dfr


61540 11-Jun-2000 alc

cpu_fork(): Check "flags" before dereferencing "p2". Otherwise,
the call "vm_fork(p1, 0, flags);" early in fork1 can cause a kernel
panic.


60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


58345 20-Mar-2000 phk

Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.

B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.

Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.

Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.

This change is a step in the direction towards a stackable BIO capability.

A lot of this patch were machine generated (Thanks to style(9) compliance!)

Vinum users: Greg has not had time to test this yet, be careful.


56096 16-Jan-2000 gallatin

The kernel side of per-process unaligned access control (printing, fixing &
delivering SIGBUS). This will allow a non-superuser to control unaligned
access behaviour on a per-process basis once a userland control program
(uac) is written.

Reviewed by: obrien
Tested by: obrien


55612 08-Jan-2000 marcel

Sync with i386

\begin{quote}
Compile genassym.c with ordinary ${CFLAGS}. The (small) needs for
${GEN_CFLAGS} and -U_KERNEL became negative when all all the
genassym.c's were converted to be cross-built.

Makefile.*:
- Cleanups associated with the old genassym.
- Fixed deprecated spelling of ${.IMPSRC} as "$<".
\end{quote}

Submitted by: bde


55571 07-Jan-2000 marcel

Use genassym(1).


55534 07-Jan-2000 peter

Revert back all the way to 1.11 - the problem was that Makefile.alpha was
out of sync.


55526 07-Jan-2000 msmith

Don't include <sys/systm.h>. It doesn't do anything, and with recent
changes it breaks building genassym.


55329 03-Jan-2000 mjacob

untangle some includes and clean up for compilation cleanliness.


55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


54207 06-Dec-1999 peter

Make this compile again. (missing #include for RFPROC)


54188 06-Dec-1999 luoqi

User ldt sharing.


53593 22-Nov-1999 peter

Use %ll instead of %q as gcc moans bitterly about it.


53086 10-Nov-1999 dfr

Re-organise the code which manages the owner of the FP state (fpcurproc).
The old code was spread out through the machdep code and was sloppy about
enabling and disabling the FEN bit (which controls access to the FP
register set). This caused a DIAGNOSTIC warning "DANGER WILL ROBINSON:
FEN SET IN cpu_fork!" sometimes when operating under high loads and could
conceivably lead to processes getting incorrect FP results.

The new code is much more strict about the FEN bit and makes sure that
*only* fpcurproc ever has it enabled. This also allows us to remove a
section of code from the exception_return path which might improve
performance marginally.

Reviewed by: gallatin


52647 30-Oct-1999 alc

The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) to
eliminate an extra (useless) level of indirection in half of the page
queue accesses and (2) to use a single name for each queue throughout,
instead of, e.g., "vm_page_queue_active" in some places and
"vm_page_queues[PQ_ACTIVE]" in others.

Reviewed by: dillon


52635 29-Oct-1999 phk

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


51474 20-Sep-1999 dillon

Fix bug in pipe code relating to writes of mmap'd but illegal address
spaces which cross a segment boundry in the page table. pmap_kextract()
is not designed for access to the user space portion of the page
table and cannot handle the null-page-directory-entry case.

The fix is to have vm_fault_quick() return a success or failure which
is then used to avoid calling pmap_kextract().


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50458 27-Aug-1999 gallatin

Fix the child's return path from fork so that fork will return 0
in the child. This corrects a problem where linux/alpha binaries see
the child's return value of fork as the parent's pid. This happens because
linux/alpha binaries apparently check the return value directly, rather
than looking for a non-zero value in a4, as *BSD & OSF/1 do.

Reviewed by:dfr@nlsystems.com


50085 20-Aug-1999 gallatin

Fix a nasty kld bug where modules with objects of type GLOB_DAT which had
non-zero addends were being loaded incorrectly


50037 19-Aug-1999 peter

Update for MI switch code, and trim a heap of unused (I believe) entries.


49444 05-Aug-1999 jdp

Sync with alc's revision 1.125 of i386/i386/vm_machdep.c. This
fixes the kernel build breakage.


48974 22-Jul-1999 alc

Reduce the number of "magic constants" used for page coloring
by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same
thing at different places in the kernel. Drop PQ_PRIME3.


48840 16-Jul-1999 dfr

Handle R_ALPHA_NONE relocations in KLD.


48713 09-Jul-1999 mjacob

Add in dbregs stubs that a committer for changes on the i386 ought to have done.
PR: 12579


48391 01-Jul-1999 peter

Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface. kproc_start is still available
as a SYSINIT() hook. This allowed simplification of chunks of the
sysinit code in the process. This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process. It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit. This means that nfsiod
doesn't need to be in /sbin and is always "available". This is a fair bit
easier to do outside of the SYSINIT_KT() framework.


48305 28-Jun-1999 peter

Revert back to not using -DKERNEL


48244 26-Jun-1999 peter

Make genassym compile - the recent buf locking changes meant that more
things from #ifdef KERNEL were needed.


47869 10-Jun-1999 dt

Replace my previous fix of saving the FP state with a much simpler one: when
we swap out fpcurproc, save its FP state.

Suggested by: bde


47840 08-Jun-1999 dt

Keep fpcurproc locked in memory, so that we always can save the FP state
correctly.

This should fix the "pmap_changebit didn't" panic that some people see.

Reviewed by: dfr


45958 23-Apr-1999 dt

Fixed several (not all) warnings.


45889 20-Apr-1999 dt

Added consts to cpu_set_fork_handler prototype. (Follow i386 version.)


45821 19-Apr-1999 peter

unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.


44327 28-Feb-1999 bde

Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process. Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by: dfr, phk


43758 08-Feb-1999 dillon

Adjust idle zero-page fill hysteresis based on tests. Use 2/3 and 4/5
zero-fill levels.

Adjust comment for ozfod in vmmeter.h - this counter represents
non-optimal ( on the fly ) zero fills, not prefills.


43753 08-Feb-1999 dillon

Add hysteresis to alpha version of vm_page_zero_idle().


43752 08-Feb-1999 dillon

Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in with
PQ_FREE. There is little operational difference other then the kernel
being a few kilobytes smaller and the code being more readable.

* vm_page_select_free() has been *greatly* simplified.
* The PQ_ZERO page queue and supporting structures have been removed
* vm_page_zero_idle() revamped (see below)

PG_ZERO setting and clearing has been migrated from vm_page_alloc()
to vm_page_free[_zero]() and will eventually be guarenteed to remain
tracked throughout a page's life ( if it isn't already ).

When a page is freed, PG_ZERO pages are appended to the appropriate
tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended.
When locating a new free page, PG_ZERO selection operates from within
vm_page_list_find() ( get page from end of queue instead of beginning
of queue ) and then only occurs in the nominal critical path case. If
the nominal case misses, both normal and zero-page allocation devolves
into the same _vm_page_list_find() select code without any specific
zero-page optimizations.

Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been
added and zero-page tracking adjusted to conform with the other changes.
Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free
pages. We may wish to increase both parameters as time permits. The
hysteresis is designed to avoid silly zeroing in borderline allocation/free
situations.


43209 26-Jan-1999 julian

Mostly remove the VM_STACK OPTION.
This changes the definitions of a few items so that structures are the
same whether or not the option itself is enabled. This allows
people to enable and disable the option without recompilng the world.

As the author says:

|I ran into a problem pulling out the VM_STACK option. I was aware of this
|when I first did the work, but then forgot about it. The VM_STACK stuff
|has some code changes in the i386 branch. There need to be corresponding
|changes in the alpha branch before it can come out completely.

what is done:
|
|1) Pull the VM_STACK option out of the header files it appears in. This
|really shouldn't affect anything that executes with or without the rest
|of the VM_STACK patches. The vm_map_entry will then always have one
|extra element (avail_ssize). It just won't be used if the VM_STACK
|option is not turned on.
|
|I've also pulled the option out of vm_map.c. This shouldn't harm anything,
|since the routines that are enabled as a result are not called unless
|the VM_STACK option is enabled elsewhere.
|
|2) Add what appears to be appropriate code the the alpha branch, still
|protected behind the VM_STACK switch. I don't have an alpha machine,
|so we would need to get some testers with alpha machines to try it out.
|
|Once there is some testing, we can consider making the change permanent
|for both i386 and alpha.
|
[..]
|
|Once the alpha code is adequately tested, we can pull VM_STACK out
|everywhere.
|

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


42175 30-Dec-1998 dfr

Various changes to support OSF1 emulation:

* Move the user stack from VM_MAXUSER_ADDRESS to a place below the 32bit
boundary (needed to support 32bit OSF programs). This should also save
one pagetable per process.
* Add cvtqlsv to the set of instructions handled by the floating point
software completion code.
* Disable all floating point exceptions by default.
* A minor change to execve to allow the OSF1 image activator to support
dynamic loading.


41868 16-Dec-1998 bde

Removed bogus casts of USRSTACK and/or the other operand in binary
expressions involving USRSTACK.


41499 04-Dec-1998 dfr

Implement 'software completion' for floating point arithmetic. On the
alpha, operations involving non-finite numbers or denormalised numbers
or operations which should generate such numbers will cause an arithmetic
exception. For programs which follow some strict code generation rules,
the kernel trap handler can then 'complete' the operation by emulating
the faulting instruction.

To use software completion, a program must be compiled with the arguments
'-mtrap-precision=i' and '-mfp-trap-mode=su' or '-mfp-trap-mode=sui'.
Programs compiled in this way can use non-finite and denormalised numbers
at the expense of slightly less efficient code generation of floating
point instructions. Programs not compiled with these options will receive
a SIGFPE signal when non-finite or denormalised numbers are used or
generated.

Reviewed by: John Polstra <jdp@polstra.com>


41181 15-Nov-1998 dfr

* Add hooks to allow the X server to access I/O ports and memory.
* Update drivers to the latest version of the bus interface.

The ISA drivers' use of the new resource api is minimal. Garrett has
some much cleaner drivers which should be more easily shared between
i386 and alpha. This has only been tested on cia based machines. It
should work on lca and apecs but I might have broken something.


40517 18-Oct-1998 dfr

R_ALPHA_RELATIVE relocations need to add the value to the existing memory
contents.


40435 16-Oct-1998 peter

*gulp*. Jordan specifically OK'ed this..

This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.


40377 15-Oct-1998 dfr

Change a bogus cast to the correct one.


39197 14-Sep-1998 jdp

Add new functions fill_fpregs() and set_fpregs(), like fill_regs()
and set_regs() but for the floating point register state. The code
is stolen from procfs_machdep.c, and moved out of there into
machdep.c.

These functions are needed for generating ELF core dumps.


39072 11-Sep-1998 dfr

Machine dependant relocations for KLD.


37597 12-Jul-1998 dfr

Don't bother calling pmap_emulate_reference() from cpu_fork(). It isn't
needed and it panics a DIAGNOSTIC kernel.


37585 12-Jul-1998 dfr

Add definition of p_switchtime.


36972 14-Jun-1998 dfr

Major changes to the generic device framework for FreeBSD/alpha:

* Eliminate bus_t and make it possible for all devices to have
attached children.

* Support dynamically extendable interfaces for drivers to replace
both the function pointers in driver_t and bus_ops_t (which has been
removed entirely. Two system defined interfaces have been defined,
'device' which is mandatory for all devices and 'bus' which is
recommended for all devices which support attached children.

* In addition, the alpha port defines two simple interfaces 'clock'
for attaching various real time clocks to the system and 'mcclock'
for the many different variations of mc146818 clocks which can be
attached to different alpha platforms. This eliminates two more
function pointer tables in favour of the generic method dispatch
system provided by the device framework.

Future device interfaces may include:

* cdev and bdev interfaces for devfs to use in replacement for specfs
and the fixed interfaces bdevsw and cdevsw.

* scsi interface to replace struct scsi_adapter (not sure how this
works in CAM but I imagine there is something similar there).

* various tailored interfaces for different bus types such as pci,
isa, pccard etc.


36865 10-Jun-1998 dfr

Add missing copyrights. Thanks to Jason Thorpe for politely noting the
mistake...


36849 10-Jun-1998 dfr

Add initial support for the FreeBSD/alpha kernel. This is very much a
work in progress and has never booted a real machine. Initial
development and testing was done using SimOS (see
http://simos.stanford.edu for details). On the SimOS simulator, this
port successfully reaches single-user mode and has been tested with
loads as high as one copy of /bin/ls :-).

Obtained from: partly from NetBSD/alpha