History log of /freebsd-10.0-release/sys/sparc64/sparc64/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255786 22-Sep-2013 glebius

- Create kern.ipc.sendfile namespace, and put the new "readhead" OID
there as "kern.ipc.sendfile.readahead".
- Push all nsfbuf related tunables into MD code. Don't move them
to new namespace in favor of POLA.

Reviewed by: scottl
Approved by: re (gjb)


255724 20-Sep-2013 alc

The pmap function pmap_clear_reference() is no longer used. Remove it.

pmap_clear_reference() has had exactly one caller in the kernel for
several years, more precisely, since FreeBSD 8. Now, that call no
longer exists.

Approved by: re (kib)
Sponsored by: EMC / Isilon Storage Division


255677 18-Sep-2013 pjd

Fix panic in ktrcapfail() when no capability rights are passed.
While here, correct all consumers to pass NULL instead of 0 as we pass
capability rights as pointers now, not uint64_t.

Reported by: Daniel Peyrolon
Tested by: Daniel Peyrolon
Approved by: re (marius)


255426 09-Sep-2013 jhb

Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by: alc
Approved by: re (kib)


255028 29-Aug-2013 alc

Significantly reduce the cost, i.e., run time, of calls to madvise(...,
MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new
pmap function, pmap_advise(), that operates on a range of virtual addresses
within the specified pmap, allowing for a more efficient implementation of
MADV_DONTNEED and MADV_FREE. Previously, the implementation of
MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as
pmap_clear_reference(). Intuitively, the problem with this implementation
is that the pmap-level locks are acquired and released and the page table
traversed repeatedly, once for each resident page in the range
that was specified to madvise(2). A more subtle flaw with the previous
implementation is that pmap_clear_reference() would clear the reference bit
on all mappings to the specified page, not just the mapping in the range
specified to madvise(2).

Since our malloc(3) makes heavy use of madvise(2), this change can have a
measureable impact. For example, the system time for completing a parallel
"buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%.

Note: This change only contains pmap_advise() implementations for a subset
of our supported architectures. I will commit implementations for the
remaining architectures after further testing. For now, a stub function is
sufficient because of the advisory nature of pmap_advise().

Discussed with: jeff, jhb, kib
Tested by: pho (i386), marcel (ia64)
Sponsored by: EMC / Isilon Storage Division


254667 22-Aug-2013 kib

Revert r254501. Instead, reuse the type stability of the struct pmap
which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE
zone. Initialize the pmap lock in the vmspace zone init function, and
remove pmap lock initialization and destruction from pmap_pinit() and
pmap_release().

Suggested and reviewed by: alc (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation


254649 22-Aug-2013 kib

Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).
The flag was mandatory since r209792, where vm_page_grab(9) was
changed to only support the alloc retry semantic.

Suggested and reviewed by: alc
Sponsored by: The FreeBSD Foundation


254138 09-Aug-2013 attilio

The soft and hard busy mechanism rely on the vm object lock to work.
Unify the 2 concept into a real, minimal, sxlock where the shared
acquisition represent the soft busy and the exclusive acquisition
represent the hard busy.
The old VPO_WANTED mechanism becames the hard-path for this new lock
and it becomes per-page rather than per-object.
The vm_object lock becames an interlock for this functionality:
it can be held in both read or write mode.
However, if the vm_object lock is held in read mode while acquiring
or releasing the busy state, the thread owner cannot make any
assumption on the busy state unless it is also busying it.

Also:
- Add a new flag to directly shared busy pages while vm_page_alloc
and vm_page_grab are being executed. This will be very helpful
once these functions happen under a read object lock.
- Move the swapping sleep into its own per-object flag

The KPI is heavilly changed this is why the version is bumped.
It is very likely that some VM ports users will need to change
their own code.

Sponsored by: EMC / Isilon storage division
Discussed with: alc
Reviewed by: jeff, kib
Tested by: gavin, bapt (older version)
Tested by: pho, scottl


254065 07-Aug-2013 kib

Split the pagequeues per NUMA domains, and split pageademon process
into threads each processing queue in a single domain. The structure
of the pagedaemons and queues is kept intact, most of the changes come
from the need for code to find an owning page queue for given page,
calculated from the segment containing the page.

The tie between NUMA domain and pagedaemon thread/pagequeue split is
rather arbitrary, the multithreaded daemon could be allowed for the
single-domain machines, or one domain might be split into several page
domains, to further increase concurrency.

Right now, each pagedaemon thread tries to reach the global target,
precalculated at the start of the pass. This is not optimal, since it
could cause excessive page deactivation and freeing. The code should
be changed to re-check the global page deficit state in the loop after
some number of iterations.

The pagedaemons reach the quorum before starting the OOM, since one
thread inability to meet the target is normal for split queues. Only
when all pagedaemons fail to produce enough reusable pages, OOM is
started by single selected thread.

Launder is modified to take into account the segments layout with
regard to the region for which cleaning is performed.

Based on the preliminary patch by jeff, sponsored by EMC / Isilon
Storage Division.

Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation


254025 07-Aug-2013 jeff

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


253994 06-Aug-2013 marius

Add MD (for now) atomic_store_acq_<type>() and use it in pmap_activate()
to get the semantics when setting the PMAP right. Prior to r251782, the
latter already used implicit acquire semantics, which - currently - means
to not employ additional explicit memory barriers under the hood (see also
r225889).


253940 04-Aug-2013 attilio

Remove unused member.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


253367 15-Jul-2013 ae

Include sys/systm.h after sys/param.h.

Suggested by: pluknet


253351 15-Jul-2013 ae

Introduce new structure sfstat for collecting sendfile's statistics
and remove corresponding fields from struct mbstat. Use PCPU counters
and SFSTAT_INC() macro for update these statistics.

Discussed with: glebius


253266 12-Jul-2013 marius

Prefix the alias macros for members of struct __mcontext with an underscore
in order to avoid a clash in the net80211 code.


251782 15-Jun-2013 ed

Stick to using the documented atomic(9) API.

The atomic_store_ptr() function is not part of the atomic(9) API. We
only provide a version with a release barrier.


251703 13-Jun-2013 jeff

- Add a BIT_FFS() macro and use it to replace cpusetffs_obj()

Discussed with: attilio
Sponsored by: EMC / Isilon Storage Division


250884 21-May-2013 attilio

o Relax locking assertions for vm_page_find_least()
o Relax locking assertions for pmap_enter_object() and add them also
to architectures that currently don't have any
o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade
operation on the per-object rwlock
o Use all the mechanisms above to make vm_map_pmap_enter() to work
mostl of the times only with readlocks.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc


250747 17-May-2013 alc

Relax the object locking assertion in pmap_enter_locked().

Reviewed by: attilio
Sponsored by: EMC / Isilon Storage Division


248508 19-Mar-2013 kib

Implement the concept of the unmapped VMIO buffers, i.e. buffers which
do not map the b_pages pages into buffer_map KVA. The use of the
unmapped buffers eliminate the need to perform TLB shootdown for
mapping on the buffer creation and reuse, greatly reducing the amount
of IPIs for shootdown on big-SMP machines and eliminating up to 25-30%
of the system time on i/o intensive workloads.

The unmapped buffer should be explicitely requested by the GB_UNMAPPED
flag by the consumer. For unmapped buffer, no KVA reservation is
performed at all. The consumer might request unmapped buffer which
does have a KVA reserve, to manually map it without recursing into
buffer cache and blocking, with the GB_KVAALLOC flag.

When the mapped buffer is requested and unmapped buffer already
exists, the cache performs an upgrade, possibly reusing the KVA
reservation.

Unmapped buffer is translated into unmapped bio in g_vfs_strategy().
Unmapped bio carry a pointer to the vm_page_t array, offset and length
instead of the data pointer. The provider which processes the bio
should explicitely specify a readiness to accept unmapped bio,
otherwise g_down geom thread performs the transient upgrade of the bio
request by mapping the pages into the new bio_transient_map KVA
submap.

The bio_transient_map submap claims up to 10% of the buffer map, and
the total buffer_map + bio_transient_map KVA usage stays the
same. Still, it could be manually tuned by kern.bio_transient_maxcnt
tunable, in the units of the transient mappings. Eventually, the
bio_transient_map could be removed after all geom classes and drivers
can accept unmapped i/o requests.

Unmapped support can be turned off by the vfs.unmapped_buf_allowed
tunable, disabling which makes the buffer (or cluster) creation
requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped
buffers are only enabled by default on the architectures where
pmap_copy_page() was implemented and tested.

In the rework, filesystem metadata is not the subject to maxbufspace
limit anymore. Since the metadata buffers are always mapped, the
buffers still have to fit into the buffer map, which provides a
reasonable (but practically unreachable) upper bound on it. The
non-metadata buffer allocations, both mapped and unmapped, is
accounted against maxbufspace, as before. Effectively, this means that
the maxbufspace is forced on mapped and unmapped buffers separately.
The pre-patch bufspace limiting code did not worked, because
buffer_map fragmentation does not allow the limit to be reached.

By Jeff Roberson request, the getnewbuf() function was split into
smaller single-purpose functions.

Sponsored by: The FreeBSD Foundation
Discussed with: jeff (previous version)
Tested by: pho, scottl (previous version), jhb, bf
MFC after: 2 weeks


248280 14-Mar-2013 kib

Add pmap function pmap_copy_pages(), which copies the content of the
pages around, taking array of vm_page_t both for source and
destination. Starting offsets and total transfer size are specified.

The function implements optimal algorithm for copying using the
platform-specific optimizations. For instance, on the architectures
were the direct map is available, no transient mappings are created,
for i386 the per-cpu ephemeral page frame is used. The code was
typically borrowed from the pmap_copy_page() for the same
architecture.

Only i386/amd64, powerpc aim and arm/arm-v6 implementations were
tested at the time of commit. High-level code, not committed yet to
the tree, ensures that the use of the function is only allowed after
explicit enablement.

For sparc64, the existing code has known issues and a stab is added
instead, to allow the kernel linking.

Sponsored by: The FreeBSD Foundation
Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6)
MFC after: 2 weeks


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


247463 28-Feb-2013 mav

MFcalloutng:
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.


247400 27-Feb-2013 attilio

Merge from vmobj-rwlock:
VM_OBJECT_LOCKED() macro is only used to implement a custom version
of lock assertions right now (which likely spread out thanks to
copy and paste).
Remove it and implement actual assertions.

Sponsored by: EMC / Isilon storage division
Reviewed by: alc
Tested by: pho


247297 26-Feb-2013 attilio

Merge from vmobj-rwlock branch:
Remove unused inclusion of vm/vm_pager.h and vm/vnode_pager.h.

Sponsored by: EMC / Isilon storage division
Tested by: pho
Reviewed by: alc


246713 12-Feb-2013 kib

Reform the busdma API so that new types may be added without modifying
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.

The cam changes unify the bus_dmamap_load* handling in cam drivers.

The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.

Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)


245850 23-Jan-2013 marius

Revert the part of r239864 which removed obtaining the SMP mutex around
reading registers from other CPUs. As it turns out, the hardware doesn't
really like concurrent IPI'ing causing adverse effects. Also the thought
deadlock when using this spin lock here and the targeted CPU(s) are also
holding or in case of nested locks can't actually happen. This is due to
the fact that on sparc64, spinlock_enter() only raises the PIL but doesn't
disable interrupts completely. Thus direct cross calls as used for the
register reading (and all other MD IPI needs) still will be executed by
the targeted CPU(s) in that case.

MFC after: 3 days


245017 03-Jan-2013 marius

Revert bogus part of r241740.
Reported by: Michael Moll

MFC after: 3 days


243132 16-Nov-2012 kib

Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h
to vm/vm_phys.h, where it belongs.

Requested and reviewed by: alc
MFC after: 2 weeks


243040 14-Nov-2012 kib

Flip the semantic of M_NOWAIT to only require the allocation to not
sleep, and perform the page allocations with VM_ALLOC_SYSTEM
class. Previously, the allocation was also allowed to completely drain
the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT
request class for vm_page_alloc() and similar functions.

Allow the caller of malloc* to request the 'deep drain' semantic by
providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT
class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM
allocation class.

Centralize the translation of the M_* malloc(9) flags in the single
inline function malloc2vm_flags().

Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com>
Reviewed by: alc, mdf (previous version)
Tested by: pho (previous version)
MFC after: 2 weeks


242534 03-Nov-2012 attilio

Rework the known rwlock to benefit about staying on their own
cache line in order to avoid manual frobbing but using
struct rwlock_padalign.

Reviewed by: alc, jimharris


241780 20-Oct-2012 marius

- Give PIL_PREEMPT the lowest priority just above low/stray interrupts.
The reason for this is that the SPARC v9 architecture allows nested
interrupts of higher priority/level than that of the current interrupt
to occur (and we can't just entirely bypass this model, also, at least
for tick interrupts, this also wouldn't be wise). However, when a
preemption interrupt interrupts another interrupt of lower priority,
f.e. PIL_ITHREAD, and that one in turn is nested by a third interrupt,
f.e. PIL_TICK, with SCHED_ULE the execution of interrupts higher than
PIL_PREEMPT may be migrated to another CPU. In particular, tl1_ret(),
which is responsible for restoring the state of the CPU prior to entry
to the interrupt based on the (also migrated) trap frame, then is run
on a CPU which actually didn't receive the interrupt in question,
causing an inappropriate processor interrupt level to be "restored".
In turn, this causes interrupts of the first level, i.e. PIL_ITHREAD
in the above scenario, to be blocked on the target of the migration
until the correct PIL happens to be restored again on that CPU again.
Making PIL_PREEMPT the lowest real priority, this effectively prevents
this scenario from happening, as preemption interrupts no longer can
interrupt any other interrupt besides stray ones (which is no issue).
Thanks to attilio@ and especially mav@ for helping me to understand
this problem at the 201208DevSummit.
- Give PIL_STOP (which is also used for IPI_STOP_HARD, given that there's
no real equivalent to NMIs on SPARC v9) the highest possible priority
just below the hardwired PIL_TICK, so it has a chance to interrupt
more things.

MFC after: 1 week


241740 19-Oct-2012 marius

- Remove an unused header.
- Don't waste a delay slot.

MFC after: 3 days


241734 19-Oct-2012 marius

Let SCHED_ULE give affinity to the CPU the tick interrupt triggered on
when running tick_process(), similarly to what the x86 equivalents of
this function do, however employing the less racy sequence also used in
intr_event_handle().

MFC after: 3 days


241371 09-Oct-2012 attilio

Reverts r234074,234105,234564,234723,234989,235231-235232 and part of
r234247.
Use, instead, the static intializer introduced in r239923 for x86 and
sparc64 intr_cpus, unwinding the code to the initial version.

Reviewed by: marius


241020 28-Sep-2012 alc

Eliminate a stale comment. It describes another use case for the pmap in
Mach that doesn't exist in FreeBSD.


240518 14-Sep-2012 eadler

Correct double "the the"

Approved by: cperciva
MFC after: 3 days


240244 08-Sep-2012 attilio

userret() already checks for td_locks when INVARIANTS is enabled, so
there is no need to check if Giant is acquired after it.

Reviewed by: kib
MFC after: 1 week


239941 31-Aug-2012 marius

Add a global MD macro for the VIS block size instead of duplicating
it and using magic values all over the place.

MFC after: 1 week


239864 29-Aug-2012 marius

- Unlike cache invalidation and TLB demapping IPIs, reading registers from
other CPUs doesn't require locking so get rid of it. As the latter is used
for the timecounter on certain machine models, using a spin lock in this
case can lead to a deadlock with the upcoming callout(9) rework.
- Merge r134227/r167250 from x86:
Avoid cross-IPI SMP deadlock by using the smp_ipi_mtx spin lock not only
for smp_rendezvous_cpus() but also for the MD cache invalidation and TLB
demapping IPIs.
- Mark some unused function arguments as such.

MFC after: 1 week


239079 05-Aug-2012 marius

Merge r236494 from x86:

Isolate the global TTE list lock from data and other locks to prevent false
sharing within the cache.

MFC after: 3 days


237623 27-Jun-2012 alc

Add new pmap layer locks to the predefined lock order. Change the names
of a few existing VM locks to follow a consistent naming scheme.


236214 29-May-2012 alc

Replace all uses of the vm page queues lock by a r/w lock that is private
to this pmap.c. This new r/w lock is used primarily to synchronize access
to the TTE lists. However, it will be used in a somewhat unconventional
way. As finer-grained TTE list locking is added to each of the pmap
functions that acquire this r/w lock, its acquisition will be changed from
write to read, enabling concurrent execution of the pmap functions with
finer-grained locking.

Reviewed by: attilio
Tested by: flo
MFC after: 10 days


235231 10-May-2012 marius

Merge r234989 from x86:

Revert part of r234723 by re-enabling the SMP protection for intr_bind().


234723 26-Apr-2012 attilio

Clean up the intr* MD KPI from the SMP dependency, removing a cause of
discrepancy between modules and kernel, but deal with SMP differences
within the functions themselves.

As an added bonus this also helps in terms of code readability.

Requested by: gibbs
Reviewed by: jhb, marius
MFC after: 1 week


234247 13-Apr-2012 marius

Merge from x86:

r233961:

Fix interrupt load balancing regression, introduced in revision
222813, that left all un-pinned interrupts assigned to CPU 0.
In intr_shuffle_irqs(), remove CPU_SETOF() call that initialized
the "intr_cpus" cpuset to only contain CPU0.

This initialization is too late and nullifies the results of calls
to the intr_add_cpu() that occur much earlier in the boot process.

r234074 (partial):

The BSP is not added to the mask of valid target CPUs for interrupts.
Fix this by adding the BSP as an interrupt target directly in

r234105:

Fix !SMP build after r234074.

MFC after: 3 days


233748 31-Mar-2012 marius

Remove checks that are redundant due to tf_type being unsigned.

MFC after: 3 days


233747 31-Mar-2012 marius

Fix panic on kernel traps having a mapping in trap_sig b0rked in r206086.
Repored by: David E. Cross

MFC after: 3 days


232356 01-Mar-2012 jhb

- Change contigmalloc() to use the vm_paddr_t type instead of an unsigned
long for specifying a boundary constraint.
- Change bus_dma tags to use bus_addr_t instead of bus_size_t for boundary
constraints.

These allow boundary constraints to be fully expressed for cases where
sizeof(bus_addr_t) != sizeof(bus_size_t). Specifically, it allows a
driver to properly specify a 4GB boundary in a PAE kernel.

Note that this cannot be safely MFC'd without a lot of compat shims due
to KBI changes, so I do not intend to merge it.

Reviewed by: scottl


230662 28-Jan-2012 marius

Fully disable interrupts while we fiddle with the FP context in the
VIS-based block copy/zero implementations. While with 4BSD it's
sufficient to just disable the tick interrupts, with ULE+PREEMPTION
it's otherwise also possible that these are preempted via IPIs.


230634 27-Jan-2012 marius

Commit file missed in r230633.


230633 27-Jan-2012 marius

Now that we have a working OF_printf() since r230631 and a OF_panic()
helper since r230632, use these for output and panicing during the
early cycles and move cninit() until after the static per-CPU data
has been set up. This solves a couple of issue regarding the non-
availability of the static per-CPU data:
- panic() not working and only making things worse when called,
- having to supply a special DELAY() implementation to the low-level
console drivers,
- curthread accesses of mutex(9) usage in low-level console drivers
that aren't conditional due to compiler optimizations (basically,
this is the problem described in r227537 but in this case for
keyboards attached via uart(4)). [1]

PR: 164123 [1]


230632 27-Jan-2012 marius

- Now that we have a working OF_printf() since r230631, use it for
implementing a simple OF_panic() that may be used during the early
cycles when panic() isn't available, yet.
- Mark cpu_{exit,shutdown}() as __dead2 as appropriate.


228522 15-Dec-2011 alc

Eliminate vestiges of page coloring.


228201 02-Dec-2011 jchandra

Fix OF_finddevice error return value in case of FDT.

According to the open firmware standard, finddevice call has to return
a phandle with value of -1 in case of error.

This commit is to:
- Fix the FDT implementation of this interface (ofw_fdt_finddevice) to
return (phandle_t)-1 in case of error, instead of 0 as it does now.
- Fix up the callers of OF_finddevice() to compare the return value with
-1 instead of 0 to check for errors.
- Since phandle_t is unsigned, the return value of OF_finddevice should
be checked with '== -1' rather than '<= 0' or '> 0', fix up these cases
as well.

Reported by: nwhitehorn

Reviewed by: raj
Approved by: raj, nwhitehorn


228024 27-Nov-2011 marius

Update comment.


228022 27-Nov-2011 marius

For sparc64 also adjust the geometry of da(4) driven disks to not overflow
the 16-bit cylinders field of the VTOC8 disk label (at around 502GB). The
geometry chosen for disks above that limit allows to use disks up to 2TB,
which is the limit of the extended VTOC8 format. The geometry used for
disks smaller than the 16-bit cylinders limit stays the same as used by
cam_calc_geometry(9) for extended translation.
Thanks to Hans-Joerg Sirtl for providing hardware for testing this change.

MFC after: 3 days


227848 22-Nov-2011 marius

s,KOBJMETHOD_END,DEVMETHOD_END,g in order to fully hide the explicit mention
of kobj(9) from device drivers.


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


226498 18-Oct-2011 des

Trace attempts to call restricted MD syscalls.


226054 06-Oct-2011 marius

- Use atomic operations rather than sched_lock for safely assigning pm_active
and pc_pmap for SMP. This is key to allowing adding support for SCHED_ULE.
Thanks go to Peter Jeremy for additional testing.
- Add support for SCHED_ULE to cpu_switch().

Committed from: 201110DevSummit


225931 02-Oct-2011 marius

Make sparc64 compatible with NEW_PCIB and enable it:
- Implement bus_adjust_resource() methods as far as necessary and in non-PCI
bridge drivers as far as feasible without rototilling them.
- As NEW_PCIB does a layering violation by activating resources at layers
above pci(4) without previously bubbling up their allocation there, move
the assignment of bus tags and handles from the bus_alloc_resource() to
the bus_activate_resource() methods like at least the other NEW_PCIB
enabled architectures do. This is somewhat unfortunate as previously
sparc64 (ab)used resource activation to indicate whether SYS_RES_MEMORY
resources should be mapped into KVA, which is only necessary if their
going to be accessed via the pointer returned from rman_get_virtual() but
not for bus_space(9) as the later always uses physical access on sparc64.
Besides wasting KVA if we always map in SYS_RES_MEMORY resources, a driver
also may deliberately not map them in if the firmware already has done so,
possibly in a special way. So in order to still allow a driver to decide
whether a SYS_RES_MEMORY resource should be mapped into KVA we let it
indicate that by calling bus_space_map(9) with BUS_SPACE_MAP_LINEAR as
actually documented in the bus_space(9) page. This is implemented by
allocating a separate bus tag per SYS_RES_MEMORY resource and passing the
resource via the previously unused bus tag cookie so we later on can call
rman_set_virtual() in sparc64_bus_mem_map(). As a side effect this now
also allows to actually indicate that a SYS_RES_MEMORY resource should be
mapped in as cacheable and/or read-only via BUS_SPACE_MAP_CACHEABLE and
BUS_SPACE_MAP_READONLY respectively.
- Do some minor cleanup like taking advantage of rman_init_from_resource(),
factor out the common part of bus tag allocation into a newly added
sparc64_alloc_bus_tag(), hook up some missing newbus methods and replace
some homegrown versions with the generic counterparts etc.
- While at it, let apb_attach() (which can't use the generic NEW_PCIB code
as APB bridges just don't have the base and limit registers implemented)
regarding the config space registers cached in pcib_softc and the SYSCTL
reporting nodes set up.


225901 01-Oct-2011 marius

Remove obsolete macros.


225900 01-Oct-2011 marius

Nuke SUN4U #ifdef's which with the demise of sun4v no longer serve any
purpose.


225899 01-Oct-2011 marius

Also allocate space for the PIL counters. Given that no machine actually
uses IV_MAX interrupt vectors this wasn't a problem in practice though.


225888 30-Sep-2011 marius

Add a comment about why contrary to what once would think running all of
userland with total store order actually is appropriate.


225841 28-Sep-2011 kib

Remove locking of the vm page queues from several pmaps, which only
protected the dirty mask updates. The dirty mask updates are handled
by atomics after the r225840.

Submitted by: alc
Tested by: flo (sparc64)
MFC after: 2 weeks


225675 19-Sep-2011 attilio

It is safe to initialize locks even on early boot (and it is the same
thing all the other architectures already do) thus just initialize
kernel_pmap in pmap_bootstrap().

Reported by: alc
Reviewed by: alc, marius
Tested by: flo, marius
Approved by: re (kib)
MFC after: 1 week


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


225474 11-Sep-2011 kib

Inline the syscallenter() and syscallret(). This reduces the time measured
by the syscall entry speed microbenchmarks by ~10% on amd64.

Submitted by: jhb
Approved by: re (bz)
MFC after: 2 weeks


225418 06-Sep-2011 kib

Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomic
flags field. Updates to the atomic flags are performed using the atomic
ops on the containing word, do not require any vm lock to be held, and
are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9)
functions are provided to modify afalgs.

Document the changes to flags field to only require the page lock.

Introduce vm_page_reference(9) function to provide a stable KPI and
KBI for filesystems like tmpfs and zfs which need to mark a page as
referenced.

Reviewed by: alc, attilio
Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64)
Approved by: re (bz)


224746 09-Aug-2011 kib

- Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flag
to VPO_UNMANAGED (and also making the flag protected by the vm object
lock, instead of vm page queue lock).
- Mark the fake pages with both PG_FICTITIOUS (as it is now) and
VPO_UNMANAGED. As a consequence, pmap code now can use use just
VPO_UNMANAGED to decide whether the page is unmanaged.

Reviewed by: alc
Tested by: pho (x86, previous version), marius (sparc64),
marcel (arm, ia64, powerpc), ray (mips)
Sponsored by: The FreeBSD Foundation
Approved by: re (bz)


224682 06-Aug-2011 marius

- Merge from r147740:
When the last, possibly partially filled buffer is flushed, we didn't
reset fragsz to 0 and as such would stop reflecting reality.
- Use __FBSDID.
- Wrap a too long line.

Approved by: re (kib)
MFC after: 1 week


224681 06-Aug-2011 marius

Remove a shortcut which is invalid with MAXCPU > IDR_CHEETAH_MAX_BN_PAIRS.

Approved by: re (kib)


224216 19-Jul-2011 attilio

On 64 bit architectures size_t is 8 bytes, thus it should use an 8 bytes
storage.
Fix the sintrcnt/sintrnames specification.

No MFC is previewed for this patch.

Reported, reviewed and tested by: marcel
Approved by: re (kib)


224187 18-Jul-2011 attilio

- Remove the eintrcnt/eintrnames usage and introduce the concept of
sintrcnt/sintrnames which are symbols containing the size of the 2
tables.
- For amd64/i386 remove the storage of intr* stuff from assembly files.
This area can be widely improved by applying the same to other
architectures and likely finding an unified approach among them and
move the whole code to be MI. More work in this area is expected to
happen fairly soon.

No MFC is previewed for this patch.

Tested by: pluknet
Reviewed by: jhb
Approved by: re (kib)


223962 12-Jul-2011 marius

Remove NULL assignments which are redundant for static timecounters.

Submitted by: jkim


223961 12-Jul-2011 marius

- Remove redundant timecounter masking from counter_get_timecount().
- Zero the timecounter when allocation so we don't need to initialize unused
members and remove a now redundant NULL assignment.

Submitted by: jkim


223806 05-Jul-2011 marius

Remove the IDR_CHEETAH_MAX_BN_PAIRS limit from cheetah_ipi_selected().
This is just a simple approach. For reasons unknown OpenSolaris uses a
more sophisticated one involving IPIing the remaining CPUs in reverse
order after the first batch of 32.


223801 05-Jul-2011 marius

It can be useful to know which page still has mappings.


223800 05-Jul-2011 marius

- pmap_cache_remove() and pmap_protect_tte() are only used within pmap.c
so static'ize them.
- Correct a typo.


223798 05-Jul-2011 marius

In pmap_remove_all() assert that the page is neither fictitious nor
unmanaged as also done on other architectures.

Reviewed by: alc


223795 05-Jul-2011 marius

Call pmap_qremove() before freeing or unwiring the pages, otherwise
there's a window during which a page can be re-used before its previous
mapping is removed.

Reviewed by: alc
MFC after: 1 week


223758 04-Jul-2011 attilio

With retirement of cpumask_t and usage of cpuset_t for representing a
mask of CPUs, pc_other_cpus and pc_cpumask become highly inefficient.

Remove them and replace their usage with custom pc_cpuid magic (as,
atm, pc_cpumask can be easilly represented by (1 << pc_cpuid) and
pc_other_cpus by (all_cpus & ~(1 << pc_cpuid))).

This change is not targeted for MFC because of struct pcpu members
removal and dependency by cpumask_t retirement.

MD review by: marcel, marius, alc
Tested by: pluknet
MD testing by: marcel, marius, gonzo, andreast


223721 02-Jul-2011 marius

UltraSPARC-IV CPUs seem to be affected by a not publicly documented
erratum causing them to trigger stray vector interrupts accompanied by a
state in which they even fault on locked TLB entries. Just retrying the
instruction in that case gets the CPU back on track though. OpenSolaris
also just ignores a certain number of stray vector interrupts.
While at it, implement the stray vector interrupt handling for SPARC64-VI
which use these for indicating uncorrectable errors in interrupt packets.


223720 02-Jul-2011 marius

Don't waste a delay slot.


223719 02-Jul-2011 marius

- For Cheetah- and Zeus-class CPUs don't flush all unlocked entries from
the TLBs in order to get rid of the user mappings but instead traverse
them an flush only the latter like we also do for the Spitfire-class.
Also flushing the unlocked kernel entries can cause instant faults which
when called from within cpu_switch() are handled with the scheduler lock
held which in turn can cause timeouts on the acquisition of the lock by
other CPUs. This was easily seen with a 16-core V890 but occasionally
also happened with 2-way machines.
While at it, move the SPARC64-V support code entirely to zeus.c. This
causes a little bit of duplication but is less confusing than partially
using Cheetah-class bits for these.
- For SPARC64-V ensure that 4-Mbyte page entries are stored in the 1024-
entry, 2-way set associative TLB.
- In {d,i}tlb_get_data_sun4u() turn off the interrupts in order to ensure
that ASI_{D,I}TLB_DATA_ACCESS_REG actually are read twice back-to-back.

Tested by: Peter Jeremy (16-core US-IV), Michael Moll (2-way SPARC64-V)


223718 02-Jul-2011 marius

Using .comm to declare intrnames and eintrnames causes binutils 2.17.50 to
merge the two.


223692 30-Jun-2011 jonathan

Add some checks to ensure that Capsicum is behaving correctly, and add some
more explicit comments about what's going on and what future maintainers
need to do when e.g. adding a new operation to a sys_machdep.c.

Approved by: mentor(rwatson), re(bz)


223377 21-Jun-2011 marius

On machines where we don't need to lock the kernel TSB into the dTLB and
thus may basically use the entire 64-bit kernel address space increase
the kernel virtual memory to not be limited by VM_KMEM_SIZE_MAX.


223347 20-Jun-2011 marius

As astopgap minimize the sched_lock coverage in pmap_activate() in order
to reduce lock contention.


223346 20-Jun-2011 marius

- Remove MD usage of pc_cpumask and pc_other_cpus. [1]
- Remove CTASSERTs which no longer need to hold since r222813.

Submitted by: attilio [1]


223235 18-Jun-2011 marius

- As with stray vector interrupts limit the reporting of stray level
interrupts. Bringup on additional machine models repeatedly reveals
firmware that enables interrupts behind our back, causing the console
to be flooded otherwise.
- As with the regular interrupt counters using uint16_t instead of
u_long for counting the stray vector interrupts should be more than
sufficient.
- Cache the interrupt vector in intr_stray_vector().


222840 07-Jun-2011 marius

- For the case when tl1_align(_trap) is used to call rsf_fatal via
RSF_FATAL we need to switch to alternate globals for KSTACK_CHECK just
like tl1_data_excptn(_trap) does. This is more or less cosmetic because
in case RSF_FATAL is called we're already heading south.
- Correct an END().
- Read the window state from the correct register for a CATR().


222828 07-Jun-2011 marius

Adapt CATR() to r222813. This is somewhat tricky as we can't afford using
more than three temporary register in several places CATR() is used so
this code trades instructions in for registers. Actually, this still isn't
sufficient and CATR() has the side-effect of clobbering %y. Luckily, with
the current uses of CATR() this either doesn't matter or we are able to
(save and) restore it.
Now that there's only one use of AND() and TEST() left inline these.


222827 07-Jun-2011 marius

Fix a problem with r222813; given that we may only operate on interrupt
globals here but clobber %y save and restore the latter.


222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


222531 31-May-2011 nwhitehorn

On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by: jhb


221958 15-May-2011 marius

Recognize the eeprom device found in Fujitsu PRIMEPOWER650 and 900.


221869 14-May-2011 attilio

Disconnect sun4v architecture from the three.

Some files keep the SUN4V tags as a code reference, for the future,
if any rewamped sun4v support wants to be added again.

Reviewed by: marius
Tested by: sbruno
Approved by: re


220939 22-Apr-2011 marius

Correct spelling in comments.

Submitted by: brucec


220931 21-Apr-2011 marius

- Use the streaming cache unless BUS_DMA_COHERENT is specified. Since
r220375 all drivers enabled in the sparc64 GENERIC should be either
correctly using bus_dmamap_sync(9) calls or supply BUS_DMA_COHERENT
when appropriate or as a workaround for missing bus_dmamap_sync(9)
calls (sound(4) drivers and partially sym(4)). In at least some
configurations taking advantage of the streaming cache results in
a modest performance improvement.
- Remove the memory barrier for BUS_DMASYNC_PREREAD which as the
comment already suggested is bogus.
- Add my copyright for having implemented several things like support
for the Fire and Oberon IOMMUs, taking over PROM IOMMU mappings etc.


219782 19-Mar-2011 marius

On Serengeti-class machines the OFW root isn't the parent of the CPU
nodes.


219608 13-Mar-2011 marius

Remove the advertising clause from the UCB license according to the
July 22, 1999 addendum.


219567 12-Mar-2011 marius

Sync licenses and the corresponding RCS IDs with NetBSD, mainly switching
the licenses of Matthew R. Green and the TNF to 2-clause.

Obtained from: NetBSD


219533 11-Mar-2011 marius

- Add support for TLS relocations.
- Emitt an error when encountering an unsupported and in case of the
kernel also for unaligned relocations.
- Fix R_SPARC_LOX10 relocations. Apparently these are hardly ever used.


219532 11-Mar-2011 marius

- Remove clause 3 and 4 from TNF licenses. [1]
- Add the _RF_X committed in r212998 also to the tables in the sparc64
reloc.c in order reduce differences between the kernel and the userland
source. This results in no functional change though.
- Fix further inconsistencies in the abbreviations of the names of the
relocations.
- Further whitespace fixes.

Obtained from: NetBSD [1]


219531 11-Mar-2011 marius

Revert the binutils workaround committed in r219340, the underlying
problem has been fixed in r219530.


219523 11-Mar-2011 mdf

Mostly revert r219468, as I had misremembered the C standard regarding
the size of an extern array.

Keep one change from strncpy to strlcpy.


219468 10-Mar-2011 mdf

Use MAXPATHLEN rather than the size of an extern array when copying the
kernel name. Also consistenly use strlcpy().

Suggested by: Warner Losh


219405 08-Mar-2011 dchagin

Extend struct sysvec with new method sv_schedtail, which is used for an
explicit process at fork trampoline path instead of eventhadler(schedtail)
invocation for each child process.

Remove eventhandler(schedtail) code and change linux ABI to use newly added
sysvec method.

While here replace explicit comparing of module sysentvec structure with the
newly created process sysentvec to detect the linux ABI.

Discussed with: kib

MFC after: 2 Week


219340 06-Mar-2011 marius

- With the addition of TLS support binutils started to make the addend
values for resolved symbols relative to relocbase instead of sections
so detect this case and handle as appropriate, which allows using
kernel modules linked with affected versions of binutils. Actually I
think this is a bug in binutils but given that apparently nobody
complained for nearly six years and powerpc has basically the same
workaround I decided to put it in for the sparc64 kernel, too.
- Fix R_SPARC_HIX22 relocations. Apparently these are hardly ever used.


219339 06-Mar-2011 marius

- Consistently abbreviate the names of the relocations.
- End sentences with dots.
- Fix whitespace.


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218468 08-Feb-2011 marius

Set td_kstack_pages for thread0.


218457 08-Feb-2011 marius

Take advantage of accessing the kernel TSB via ASI_ATOMIC_QUAD_LDD_PHYS
on SPARC64-V, too. Tested by: Michael Moll


218195 02-Feb-2011 mdf

Put the general logic for being a CPU hog into a new function
should_yield(). Use this in various places. Encapsulate the common
case of check-and-yield into a new function maybe_yield().

Change several checks for a magic number of iterations to use
should_yield() instead.

MFC after: 1 week


217688 21-Jan-2011 pluknet

Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.

Submitted by: perryh pluto.rain.com (previous version)
Reviewed by: jhb
Approved by: kib (mentor)
Tested by: universe


217561 18-Jan-2011 kib

For architectures not using direct map , and requiring real KVA page for
sf buf allocation, use wakeup() instead of wakeup_one() to notify sf
buffer waiters about free buffer.

sf_buf_alloc() calls msleep(PCATCH) when SFB_CATCH flag was given,
and for simultaneous wakeup and signal delivery, msleep() returns
EINTR/ERESTART despite the thread was selected for wakeup_one(). As
result, we loose a wakeup, and some other waiter will not be woken up.

Reported and tested by: az
Reviewed by: alc, jhb
MFC after: 1 week


217519 17-Jan-2011 jkim

Remove empty dev_mem_md_init() stubs.


217514 17-Jan-2011 marius

In order to save instructions the MMU trap handlers assumed that the kernel
TSB is located within the 32-bit address space, which held true as long as
we were using virtual addresses magic-mapped before the location of the
kernel for addressing it. However, with r216803 in place when possible we
address it via its physical address instead, which on machines like Sun Fire
V880 have no physical memory in the 32-bit address space at all requires
to use 64-bit addressing. When using physical addressing it still should
be safe to assume that we can just ignore the lowest 10 bits of the address
as a minor optimization as we did before r216803.


217265 11-Jan-2011 jhb

Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by: bde


217058 06-Jan-2011 marius

Remove an unused variable accidentally added in r216803.


216961 04-Jan-2011 marius

Reserve INTR_MD[1-4] similarly to what BUS_DMA_BUS[1-4] are intended for
and switch sparc64 to use the first one for bus error filter handlers of
bridge drivers instead of (ab)using INTR_FAST for that so we eventually
can get rid of the latter.

Reviewed by: jhb
MFC after: 1 month


216891 02-Jan-2011 marius

Extend the section in which interrupts are disabled in the TLB demap
functions, otherwise if we get preempted after checking whether a certain
pmap is active on the current CPU but before disabling interrupts we might
operate on an outdated state as the pmap might have been deactivated in
the meantime. As the same issue may arises when the TLB demap function is
interrupted by a TLB demap IPI, just entering a critical section before
the check isn't sufficient so we have to fully disable interrupts instead.

MFC after: 3 days


216803 29-Dec-2010 marius

On UltraSPARC-III+ and greater take advantage of ASI_ATOMIC_QUAD_LDD_PHYS,
which takes an physical address instead of an virtual one, for loading TTEs
of the kernel TSB so we no longer need to lock the kernel TSB into the dTLB,
which only has a very limited number of lockable dTLB slots. The net result
is that we now basically can handle a kernel TSB of any size and no longer
need to limit the kernel address space based on the number of dTLB slots
available for locked entries. Consequently, other parts of the trap handlers
now also only access the the kernel TSB via its physical address in order
to avoid nested traps, as does the PMAP bootstrap code as we haven't taken
over the trap table at that point, yet. Apart from that the kernel TSB now
is accessed via a direct mapping when we are otherwise taking advantage of
ASI_ATOMIC_QUAD_LDD_PHYS so no further code changes are needed. Most of this
is implemented by extending the patching of the TSB addresses and mask as
well as the ASIs used to load it into the trap table so the runtime overhead
of this change is rather low. Currently the use of ASI_ATOMIC_QUAD_LDD_PHYS
is not yet enabled on SPARC64 CPUs due to lack of testing and due to the
fact it might require minor adjustments there.
Theoretically it should be possible to use the same approach also for the
user TSB, which already is not locked into the dTLB, avoiding nested traps.
However, for reasons I don't understand yet OpenSolaris only does that with
SPARC64 CPUs. On the other hand I think that also addressing the user TSB
physically and thus avoiding nested traps would get us closer to sharing
this code with sun4v, which only supports trap level 0 and 1, so eventually
we could have a single kernel which runs on both sun4u and sun4v (as does
Linux and OpenBSD).

Developed at and committed from: 27C3


216802 29-Dec-2010 marius

- Move the macros for generating load and store instructions to asmacros.h
so they can be shared by different source files and extend them by a
variant for atomic compare and swap.
- Consistently use EMPTY.


216628 21-Dec-2010 marius

Extend the hack of r182730 to trick GAS/GCC into compiling access to
STICK/STICK_COMPARE independently of the selected instruction set by
TICK_COMPARE so tick.c as of r214358 once again can be compiled with
gcc -mcpu=v9 for reference purposes.


214879 06-Nov-2010 marius

Implement pmap_is_prefaultable().

Reviewed by: alc (with bugfix)


214835 05-Nov-2010 jhb

Adjust the order of operations in spinlock_enter() and spinlock_exit() to
work properly with single-stepping in a kernel debugger. Specifically,
these routines have always disabled interrupts before increasing the nesting
count and restored the prior state of interrupts after decreasing the nesting
count to avoid problems with a nested interrupt not disabling interrupts
when acquiring a spin lock. However, trap interrupts for single-stepping
can still occur even when interrupts are disabled. Now the saved state of
interrupts is not saved in the thread until after interrupts have been
disabled and the nesting count has been increased. Similarly, the saved
state from the thread cannot be read once the nesting count has been
decreased to zero. To fix this, use temporary variables to store interrupt
state and shuffle it between the thread's MD area and the appropriate
registers.

In cooperation with: bde
MFC after: 1 month


214528 29-Oct-2010 marius

- When resetting pm_active and pm_context of a pmap in pmap_pinit() we
need locking as otherwise we may race against the other parts of the
MD code which expects a consistent state of these. While at it move
the resetting of the pmap before entering it in the TSB.
- Spell a 0 as TLB_CTX_KERNEL.


214358 25-Oct-2010 marius

- Given that in one-shot mode tick_et_start() also is called frequently
introduce function pointers once set up to the respective implementation
for reading the (S)TICK and writing the (S)STICK_COMPARE registers as a
compromise between duplicating code and selecting between different
implementations during execution over and over again, similar to what is
done elsewhere in the MD in order to support different CPU models that
won't ever change at runtime.
- In the remaining tick interrupt handler further push down disabling of
interrupts to the periodic case as it isn't necessary here in one-shot
mode at all.


214071 19-Oct-2010 marius

- Wrap exchanging td_intr_frame and calling the event timer callback in
a critical section as apparently required by both. I don't think either
belongs in the event timer front-ends but the callback should handle
this as necessary instead just like for example intr_event_handle()
does but this is how the other architectures currently handle it, either
explicitly or implicitly.
- Further rename and reword references to hardclock as this front-end no
longer has a notion of actually calling it.


213985 17-Oct-2010 marius

- In oneshot-mode it doesn't make sense to try to compensate the clock
drift in order to achieve a more stable clock as the tick intervals may
vary in the first place. In fact I haven't seen this code kick in when
in oneshot-mode so just skip it in that case.
- There's no need to explicitly stop the (S)TICK counter in oneshot-mode
with every tick as it just won't trigger again with the (S)TICK compare
register set to a value in the past (with a wrap-around once every ~195
years of uptime at 1.5 GHz this isn't something we have to worry about
in practice).
- Given that we'll disable interrupts completely anyway there's no
need to enter critical sections.


213873 14-Oct-2010 marius

Explicitly lower the PIL to 0 as part of enabling interrupts, similar to
what is done on other platforms. Unlike as with the sched_throw(NULL)
called on BSPs during their startup apparently there's nothing which will
reliably lower it on APs. I'm unsure why this only came up on V215 though,
breaking these with r207248. My best guess is that these are the only
supported ones so far fast enough to loose some race.

PR: 151404
MFC after: 3 days


213868 14-Oct-2010 marius

- In the spirit of r212559 add a comment describing what will eventually
lower the PIL.
- Just as with the AP ensure that the (S)TICK timer(s) are in a known
state when starting BSPs.


213282 29-Sep-2010 neel

Fix bogus error message from bus_dmamem_alloc() about incorrect alignment.

The check for alignment should be made against the physical address and not
the virtual address that maps it.

Sponsored by: NetApp
Submitted by: Will McGovern (will at netapp dot com)
Reviewed by: mjacob, jhb


213104 24-Sep-2010 marius

minor simplifications and cosmetics


212998 22-Sep-2010 kib

For sparc64 relocations that directly put bits of the symbol value into
the location, apply elf_relocaddr to the symbol value to have right
values for the symbols from dpcpu segment.

PR: kern/147769
Discussed with: avg
Tested by: marius
MFC after: 2 weeks


212730 16-Sep-2010 marius

Remove accidentally committed test code which effectively prevented the
use of the SPARC64 V VIS-based block copy function added in r212709.
Reported by: Michael Moll


212709 15-Sep-2010 marius

Add a VIS-based block copy function for SPARC64 V and later, which
additionally takes advantage of the prefetch cache of these CPUs.
Unlike the uncommitted US-III version, which provide no measurable
speedup or even resulted in a slight slowdown on certain CPUs models
compared to using the US-I version with these, the SPARC64 version
actually results in a slight improvement.


212676 15-Sep-2010 marius

Sync with other platforms:
- make dflt_lock() always panic,
- add kludge to use contigmalloc() when the alignment is larger than the size
and print a diagnostic when we didn't satisfy the alignment.


212663 15-Sep-2010 marius

- Update the comment in swi_vm() regarding busdma bounce buffers; it's
unlikely that support for these ever will be implemented on sparc64 as
the IOMMUs are able to translate to up to the maximum physical address
supported by the respective machine, bypassing the IOMMU is affected
by hardware errata and being able to support DMA engines which cannot
do at least 32-bit DMA does not justify the costs.
- The page zeroing in uma_small_alloc() may use the VIS-based block zero
function so take advantage of it.


212620 14-Sep-2010 marius

Remove a KASSERT which will also trigger for perfectly valid combinations
of small maxsize and "large" (including BUS_SPACE_UNRESTRICTED) nsegments
parameters. Generally using a presz of 0 (which indeed might indicate the
use of bogus parameters for DMA tag creation) is not fatal, it just means
that no additional DVMA space will be preallocated.


212619 14-Sep-2010 marius

Remove redundant raising of the PIL to PIL_TICK as the respective locore
code already did that.


212541 13-Sep-2010 mav

Refactor timer management code with priority to one-shot operation mode.
The main goal of this is to generate timer interrupts only when there is
some work to do. When CPU is busy interrupts are generating at full rate
of hz + stathz to fullfill scheduler and timekeeping requirements. But
when CPU is idle, only minimum set of interrupts (down to 8 interrupts per
second per CPU now), needed to handle scheduled callouts is executed.
This allows significantly increase idle CPU sleep time, increasing effect
of static power-saving technologies. Also it should reduce host CPU load
on virtualized systems, when guest system is idle.

There is set of tunables, also available as writable sysctls, allowing to
control wanted event timer subsystem behavior:
kern.eventtimer.timer - allows to choose event timer hardware to use.
On x86 there is up to 4 different kinds of timers. Depending on whether
chosen timer is per-CPU, behavior of other options slightly differs.
kern.eventtimer.periodic - allows to choose periodic and one-shot
operation mode. In periodic mode, current timer hardware taken as the only
source of time for time events. This mode is quite alike to previous kernel
behavior. One-shot mode instead uses currently selected time counter
hardware to schedule all needed events one by one and program timer to
generate interrupt exactly in specified time. Default value depends of
chosen timer capabilities, but one-shot mode is preferred, until other is
forced by user or hardware.
kern.eventtimer.singlemul - in periodic mode specifies how much times
higher timer frequency should be, to not strictly alias hardclock() and
statclock() events. Default values are 2 and 4, but could be reduced to 1
if extra interrupts are unwanted.
kern.eventtimer.idletick - makes each CPU to receive every timer interrupt
independently of whether they busy or not. By default this options is
disabled. If chosen timer is per-CPU and runs in periodic mode, this option
has no effect - all interrupts are generating.

As soon as this patch modifies cpu_idle() on some platforms, I have also
refactored one on x86. Now it makes use of MONITOR/MWAIT instrunctions
(if supported) under high sleep/wakeup rate, as fast alternative to other
methods. It allows SMP scheduler to wake up sleeping CPUs much faster
without using IPI, significantly increasing performance on some highly
task-switching loads.

Tested by: many (on i386, amd64, sparc64 and powerc)
H/W donated by: Gheorghe Ardelean
Sponsored by: iXsystems, Inc.


212456 11-Sep-2010 mav

Sparc64 uses dummy cpu_idle() method. It's CPUs never sleeping. Tell
scheduler that it doesn't need to use IPI to "wake up" CPU.


212413 10-Sep-2010 avg

bus_add_child: change type of order parameter to u_int

This reflects actual type used to store and compare child device orders.
Change is mostly done via a Coccinelle (soon to be devel/coccinelle)
semantic patch.
Verified by LINT+modules kernel builds.

Followup to: r212213
MFC after: 10 days


211568 21-Aug-2010 marius

Skip a KASSERT which isn't appropriate when not employing page coloring.
Reported by: Michael Moll


211515 19-Aug-2010 jhb

Remove unused KTRACE includes.


211197 11-Aug-2010 jhb

Update various places that store or manipulate CPU masks to use cpumask_t
instead of int or u_int. Since cpumask_t is currently u_int on all
platforms this should just be a cosmetic change.


211073 08-Aug-2010 marius

Wrap some sun4u-only symbols.


211071 08-Aug-2010 marius

- As it is not possible for sched_bind(9) to context switch with
td_critnest > 1 when not already running on the desired CPU read the
TICK counter of the BSP via a direct cross trap request in that case
instead.
- Treat the STICK based timecounter the same way as the TICK based one
regarding its quality and obtaining the counter value from the BSP.
Like the TICK timers the STICK ones also are only synchronized during
their startup (which might not result in good synchronicity in the
first place) but not afterwards and might drift over time, causing
problems when the time is read from different CPUs (see r135972).


211050 08-Aug-2010 marius

- Introduce a cpu_ipi_single() function pointer in order to send IPIs
to single CPUs more efficiently with Cheetah(-class) and Jalapeno CPUs.
Besides being used to implement the ipi_cpu() introduced in r210939,
cpu_ipi_single() will also be used internally by the sparc64 MD code.
- Factor out the Jalapeno support from the Cheetah IPI send functions
in order to be able to more easily and efficiently implement support
for more than 32 target CPUs as well as a workaround for Cheetah+
erratum 25 for the latter.


211049 08-Aug-2010 marius

For CPUs which ignore TD_CV and support hardware unaliasing don't
bother doing page coloring. This results in a small but measurable
performance improvement in buildworld times.


210601 29-Jul-2010 mav

Adapt sparc64 and sun4v timer code for the new event timers infrastructure.

Reviewed by: marius@


210334 21-Jul-2010 attilio

KTR_CTx are long time aliased by existing classes so they can't serve
their purpose anymore. Axe them out.

Sponsored by: Sandvine Incorporated
Discussed with: jhb, emaste
Possible MFC: TBD


210176 16-Jul-2010 mav

Allocate proper ammount of memory for interrupt names on sparc64 and
sun4v, same as done on other architectures. This removes garbage from
`vmstat -ia` output.

Reviewed by: marius@


209613 30-Jun-2010 jhb

Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to
<sys/syscallsubr.h> where all other kern_<syscall> prototypes live.


209138 13-Jun-2010 marius

Update a branch missed in r207537.

MFC after: 3 days


209048 11-Jun-2010 alc

Relax one of the new assertions in pmap_enter() a little. Specifically,
allow pmap_enter() to be performed on an unmanaged page that doesn't have
VPO_BUSY set. Having VPO_BUSY set really only matters for managed pages.
(See, for example, pmap_remove_write().)


208990 10-Jun-2010 alc

Reduce the scope of the page queues lock and the number of
PG_REFERENCED changes in vm_pageout_object_deactivate_pages().
Simplify this function's inner loop using TAILQ_FOREACH(), and shorten
some of its overly long lines. Update a stale comment.

Assert that PG_REFERENCED may be cleared only if the object containing
the page is locked. Add a comment documenting this.

Assert that a caller to vm_page_requeue() holds the page queues lock,
and assert that the page is on a page queue.

Push down the page queues lock into pmap_ts_referenced() and
pmap_page_exists_quick(). (As of now, there are no longer any pmap
functions that expect to be called with the page queues lock held.)

Neither pmap_ts_referenced() nor pmap_page_exists_quick() should ever
be passed an unmanaged page. Assert this rather than returning "0"
and "FALSE" respectively.

ARM:

Simplify pmap_page_exists_quick() by switching to TAILQ_FOREACH().

Push down the page queues lock inside of pmap_clearbit(), simplifying
pmap_clear_modify(), pmap_clear_reference(), and pmap_remove_write().
Additionally, this allows for avoiding the acquisition of the page
queues lock in some cases.

PowerPC/AIM:

moea*_page_exits_quick() and moea*_page_wired_mappings() will never be
called before pmap initialization is complete. Therefore, the check
for moea_initialized can be eliminated.

Push down the page queues lock inside of moea*_clear_bit(),
simplifying moea*_clear_modify() and moea*_clear_reference().

The last parameter to moea*_clear_bit() is never used. Eliminate it.

PowerPC/BookE:

Simplify mmu_booke_page_exists_quick()'s control flow.

Reviewed by: kib@


208846 05-Jun-2010 alc

Don't set PG_WRITEABLE in pmap_enter() unless the page is managed.

Correct a typo in a nearby comment on sparc64.


208574 26-May-2010 alc

Push down page queues lock acquisition in pmap_enter_object() and
pmap_is_referenced(). Eliminate the corresponding page queues lock
acquisitions from vm_map_pmap_enter() and mincore(), respectively. In
mincore(), this allows some additional cases to complete without ever
acquiring the page queues lock.

Assert that the page is managed in pmap_is_referenced().

On powerpc/aim, push down the page queues lock acquisition from
moea*_is_modified() and moea*_is_referenced() into moea*_query_bit().
Again, this will allow some additional cases to complete without ever
acquiring the page queues lock.

Reorder a few statements in vm_page_dontneed() so that a race can't lead
to an old reference persisting. This scenario is described in detail by a
comment.

Correct a spelling error in vm_page_dontneed().

Assert that the object is locked in vm_page_clear_dirty(), and restrict the
page queues lock assertion to just those cases in which the page is
currently writeable.

Add object locking to vnode_pager_generic_putpages(). This was the one
and only place where vm_page_clear_dirty() was being called without the
object being locked.

Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call
to vm_page_clear_dirty().

Change vnode_pager_generic_putpages() to the modern-style of function
definition. Also, change the name of one of the parameters to follow
virtual memory system naming conventions.

Reviewed by: kib


208504 24-May-2010 alc

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


208349 20-May-2010 marius

Change ad_firmware_geom_adjust() to operate on a struct disk * only and
hook it up to ada(4) also. While at it, rename *ad_firmware_geom_adjust()
to *ata_disk_firmware_geom_adjust() etc now that these are no longer
limited to ad(4).

Reviewed by: mav
MFC after: 3 days


208175 16-May-2010 alc

On entry to pmap_enter(), assert that the page is busy. While I'm
here, make the style of assertion used by pmap_enter() consistent
across all architectures.

On entry to pmap_remove_write(), assert that the page is neither
unmanaged nor fictitious, since we cannot remove write access to
either kind of page.

With the push down of the page queues lock, pmap_remove_write() cannot
condition its behavior on the state of the PG_WRITEABLE flag if the
page is busy. Assert that the object containing the page is locked.
This allows us to know that the page will neither become busy nor will
PG_WRITEABLE be set on it while pmap_remove_write() is running.

Correct a long-standing bug in vm_page_cowsetup(). We cannot possibly
do copy-on-write-based zero-copy transmit on unmanaged or fictitious
pages, so don't even try. Previously, the call to pmap_remove_write()
would have failed silently.


207796 08-May-2010 alc

Push down the page queues into vm_page_cache(), vm_page_try_to_cache(), and
vm_page_try_to_free(). Consequently, push down the page queues lock into
pmap_enter_quick(), pmap_page_wired_mapped(), pmap_remove_all(), and
pmap_remove_write().

Push down the page queues lock into Xen's pmap_page_is_mapped(). (I
overlooked the Xen pmap in r207702.)

Switch to a per-processor counter for the total number of pages cached.


207702 06-May-2010 alc

Push down the page queues lock inside of vm_page_free_toq() and
pmap_page_is_mapped() in preparation for removing page queues locking
around calls to vm_page_free(). Setting aside the assertion that calls
pmap_page_is_mapped(), vm_page_free_toq() now acquires and holds the page
queues lock just long enough to actually add or remove the page from the
paging queues.

Update vm_page_unhold() to reflect the above change.


207649 05-May-2010 alc

Use an OBJT_PHYS object and thus PG_UNMANAGED pages to implement the TSB.
The TSB is not a pageable structure, so there is no point in using managed
pages.

Reviewed by: kib


207537 02-May-2010 marius

Add support for SPARC64 V (and where it already makes sense for other
HAL/Fujitsu) CPUs. For the most part this consists of fleshing out the
MMU and cache handling, it doesn't add pmap optimizations possible with
these CPU, yet, though.
With these changes FreeBSD runs stable on Fujitsu Siemens PRIMEPOWER 250
and likely also other models based on SPARC64 V like 450, 650 and 850.
Thanks go to Michael Moll for providing access to a PRIMEPOWER 250.


207500 02-May-2010 marius

Add a hack for SPARC64 V CPUs, which set some undocumented bits in the
first data word.


207410 30-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


207373 29-Apr-2010 alc

MFamd64/i386 r207205
Clearing a page table entry's accessed bit and setting the page's
PG_REFERENCED flag in pmap_protect() can't really be justified, so
don't do it. Moreover, on ia64, don't set the page's dirty field
unless pmap_protect() is removing write access.


207248 26-Apr-2010 marius

Don't bother enabling interrupts before we're ready to handle them. This
prevents the firmware of Fujitsu Siemens PRIMEPOWER250, which both causes
stray interrupts and erroneously enables interrupts at least when calling
SUNW,set-trap-table, in the foot.


207243 26-Apr-2010 marius

Add OF_getscsinitid(), a helper similar to OF_getetheraddr() but for
obtaining the initiator ID to be used for SPI controllers from the
Open Firmware device tree.


207240 26-Apr-2010 marius

Skip the pseudo-devices found in Fujitsu Siemens PRIMEPOWER250.


207155 24-Apr-2010 alc

Resurrect pmap_is_referenced() and use it in mincore(). Essentially,
pmap_ts_referenced() is not always appropriate for checking whether or
not pages have been referenced because it clears any reference bits
that it encounters. For example, in mincore(), clearing the reference
bits has two negative consequences. First, it throws off the activity
count calculations performed by the page daemon. Specifically, a page
on which mincore() has called pmap_ts_referenced() looks less active
to the page daemon than it should. Consequently, the page could be
deactivated prematurely by the page daemon. Arguably, this problem
could be fixed by having mincore() duplicate the activity count
calculation on the page. However, there is a second problem for which
that is not a solution. In order to clear a reference on a 4KB page,
it may be necessary to demote a 2/4MB page mapping. Thus, a mincore()
by one process can have the side effect of demoting a superpage
mapping within another process!


206449 10-Apr-2010 marius

Unlike the sun4v variant, the sun4u version of SUNW,set-trap-table
actually only takes one argument.


206448 10-Apr-2010 marius

Do as the comment suggests and determine the bus space based on the last
bus we actually mapped at rather than always based on the last bus we
encountered while moving upward in the tree. Otherwise we might use the
wrong bus space in case the bridge directly underneath the nexus doesn't
require mapping, i.e. was skipped as it's the case for ssm(4) nodes.


206086 02-Apr-2010 marius

- Try do deal gracefully with correctable ECC errors.
- Improve the reporting of unhandled kernel and user traps.


205642 25-Mar-2010 nwhitehorn

Change the arguments of exec_setregs() so that it receives a pointer
to the image_params struct instead of several members of that struct
individually. This makes it easier to expand its arguments in the future
without touching all platforms.

Reviewed by: jhb


205409 21-Mar-2010 marius

- The firmware of Sun Fire V1280 has a misfeature of setting %wstate to
7 which corresponds to WSTATE_KMIX in OpenSolaris whenever calling into
it which totally screws us even when restoring %wstate afterwards as
spill/fill traps can happen while in OFW. The rather hackish OpenBSD
approach of just setting the equivalent of WSTATE_KERNEL to 7 also is
no option as we treat %wstate as a bit field. So in order to deal with
this problem actually implement spill/fill handlers for %wstate 7 which
just act as the WSTATE_KERNEL ones except of theoretically also handling
32-bit, turn off interrupts completely so we don't even take IPIs while
in OFW which should ensure we only take spill/fill traps at most and
restore %wstate after calling into OFW once we have taken over the trap
table. While at it, actually set WSTATE_{,PROM}_KMIX before calling into
OFW just like OpenSolaris does, which should at least help testing this
change on non-V1280.
- Remove comments referring to the %wstate usage in BSD/OS.
- Remove the no longer used RSF_ALIGN_RETRY macro.
- Correct some trap table addresses in comments.
- Ensure %wstate is set to WSTATE_KERNEL when taking over the trap table.
- Ensure PSTATE_AM is off when entering or exiting to OFW as well as that
interrupts are also completely off when exiting to OFW as the firmware
trap table shouldn't be used to handle our interrupts.


205399 20-Mar-2010 marius

Improve the KVA space sizing of 186682; on machines with large dTLBs we
can actually use all of the available lockable entries of the tiny dTLB
for the kernel TSB. With this change the KVA space sizing happens to be
more in line with the MI one so up to at least 24GB machines KVA doesn't
need to be limited manually. This is just another stopgap though, the
real solution is to take advantage of ASI_ATOMIC_QUAD_LDD_PHYS on CPUs
providing it so we don't need to lock the kernel TSB pages into the dTLB
in the first place.


205269 17-Mar-2010 marius

o Add support for UltraSparc-IV+:
- Swap the configuration of the first and second large dTLB as with
US-IV+ these can only hold entries of certain page sizes each, which
we happened to chose the non-working way around.
- Additionally ensure that the large iTLB is set up to hold 8k pages
(currently this happens to be a NOP though).
- Add a workaround for US-IV+ erratum #2.
- Turn off dTLB parity error reporting as otherwise we get seemingly
false positives when copying in the user window by simulating a
fill trap on return to usermode. Given that these parity errors can
be avoided by disabling multi issue mode and the problem could be
reproduced with a second machine this appears to be a silicon bug of
some sort.
- Add a membar #Sync also before the stores to ASI_DCACHE_TAG. While
at it, turn of interrupts across the whole cheetah_cache_flush() for
simplicity instead of around every flush. This should have next to no
impact as for cheetah-class machines we typically only need to flush
the caches a few times during boot when recovering from peeking/poking
non-existent PCI devices, if at all.
- Just use KERNBASE for FLUSH as we also do elsewhere as the US-IV+
documentation doesn't seem to mention that these CPUs also ignore the
address like previous cheetah-class CPUs do. Again the code changing
LSU_IC is executed seldom enough that the negligible optimization of
using %g0 instead should have no real impact.

With these changes FreeBSD runs stable on V890 equipped with US-IV+
and -j128 buildworlds in a loop for days are no problem. Unfortunately,
the performance isn't were it should be as a buildworld on a 4x1.5GHz
US-IV+ V890 takes nearly 3h while on a V440 with (theoretically) less
powerfull 4x1.5GHz US-IIIi it takes just over 1h. It's unclear whether
this is related to the supposed silicon bug mentioned above or due to
another issue. The documentation (which contains a sever bug in the
description of the bits added to the context registers though) at least
doesn't mention any requirements for changes in the CPU handling besides
those implemented and the cache as well as the TLB configurations and
handling look fine.
o Re-arrange cheetah_init() so it's easier to add support for SPARC64
V up to VIIIfx CPUs, which only require parts of this initialization.


205258 17-Mar-2010 marius

- Add TTE and context register bits for the additional page sizes supported
by UltraSparc-IV and -IV+ as well as SPARC64 V, VI, VII and VIIIfx CPUs.
- Replace TLB_PCXR_PGSZ_MASK and TLB_SCXR_PGSZ_MASK with TLB_CXR_PGSZ_MASK
which just is the complement of TLB_CXR_CTX_MASK instead of trying to
assemble it from the page size bits which vary across CPUs.
- Add macros for the remainder of the SFSR bits, which are useful for at
least debugging purposes.


204153 20-Feb-2010 marius

Starting with UltraSPARC IV CPUs the CPU caches are described with different
OFW properties.


204152 20-Feb-2010 marius

Some machines can not only consist of CPUs running at different speeds
but also of different types, f.e. Sun Fire V890 can be equipped with a
mix of UltraSPARC IV and IV+ CPUs, requiring different MMU initialization
and different workarounds for model specific errata. Therefore move the
CPU implementation number from a global variable to the per-CPU data.
Functions which are called before the latter is available are passed the
implementation number as a parameter now.


203845 13-Feb-2010 marius

Add ssm(4), which serves as a glue device allowing devices beneath the
scalable shared memory node, which is used in large UltraSPARC III based
machines to group snooping-coherency domains together, like schizo(4) to
be treated like nexus(4) children.


203844 13-Feb-2010 marius

- Add the 'cmp' and 'core' pseudo-busses which are used to group CPU cores
to the exclusion lists as the CPU nodes aren't handled as regular devices
either. Also add the pseudo-devices found in Sun Fire V1280.
- Allow nexus_attach() and nexus_alloc_resource() to be used by drivers
derived from nexus(4) for subordinate busses.
- Don't add the zero-sized memory resources of glue devices to the resource
lists.


203839 13-Feb-2010 marius

Style fixes


203838 13-Feb-2010 marius

- Search the whole OFW device tree instead of only the children of the
root nexus device for the CPUs as starting with UltraSPARC IV the 'cpu'
nodes hang off of from 'cmp' (chip multi-threading processor) or 'core'
or combinations thereof. Also in large UltraSPARC III based machines
the 'cpu' nodes hang off of 'ssm' (scalable shared memory) nodes which
group snooping-coherency domains together instead of directly from the
nexus.
It would be great if we could use newbus to deal with the different ways
the 'cpu' devices can hang off of pseudo ones but unfortunately both
cpu_mp_setmaxid() and sparc64_init() have to work prior to regular device
probing.
- Add support for UltraSPARC IV and IV+ CPUs. Due to the fact that these
are multi-core each CPU has two Fireplane config registers and thus the
module/target ID has to be determined differently so the one specific
to a certain core is used. Similarly, starting with UltraSPARC IV the
individual cores use a different property in the OFW device tree to
indicate the CPU/core ID as it no longer is in coincidence with the
shared slot/socket ID.
This involves changing the MD KTR code to not directly read the UPA
module ID either. We use the MID stored in the per-CPU data instead of
calling cpu_get_mid() as a replacement in order prevent clobbering any
registers as side-effect in the assembler version. This requires CATR()
invocations from mp_startup() prior to mapping the per-CPU pages to be
removed though.
While at it additionally distinguish between CPUs with Fireplane and
JBus interconnects as these also use slightly different sizes for the
JBus/agent/module/target IDs.
- Make sparc64_shutdown_final() static as it's not used outside of
machdep.c.


203833 13-Feb-2010 marius

- At least the trap table of the Sun Fire V1280 firmware apparently has
no cleanwindows handler so just remove trying to trigger it from _start
and the AP trampoline code as that leads to a crash there. This should
be okay as leaking data from the OFW via the CPU registers on start of
the kernel should be no real concern.
- Make the comments of _start and the AP trampoline code regarding the
initializations they perform match each other and reality.
- Make the comments of the AP trampoline code regarding iTLB accesses
refer to the right macro.


203185 30-Jan-2010 marius

Implement handling of the third argument of cpu_switch(). This unbreaks
sparc64 after r202889.

PR: 143215
MFC after: 1 week


202900 23-Jan-2010 marius

Merge r202882 from amd64/i386:

For PT_TO_SCE stop that stops the ptraced process upon syscall entry,
syscall arguments are collected before ptracestop() is called. As a
consequence, debugger cannot modify syscall or its arguments.

In syscall(), reread syscall number and arguments after ptracestop(),
if debugger modified anything in the process environment. Since procfs
stopevent requires number of syscall arguments in p_xstat, this cannot
be solved by moving stop/trace point before argument fetching.

Move the code to read arguments into separate function
fetch_syscall_args() to avoid code duplication. Note that ktrace point
for modified syscall is intentionally recorded twice, once with original
arguments, and second time with the arguments set by debugger.

PT_TO_SCX stop is executed after cpu_syscall_set_retval() already.

Reviewed by: kib


201396 02-Jan-2010 marius

- Demapping unused kernel TLB slots has proven to work reliably so move
the associated debugging under bootverbose.
- Remove freebsd4_sigreturn(); given that FreeBSD 4 didn't supported
sparc64 this only ever served as a transition aid prior to FreeBSD
5.0 and is unused by default since COMPAT_FREEBSD4 was removed from
GENERIC in r143072 nearly 5 years ago.


201371 01-Jan-2010 marius

Fix botches in r201005:
- Actually use the newly introduced sc_res in the front-end.
- Remove a whitespace glitch in mk48txx_gettime().


201008 25-Dec-2009 marius

Style changes

Obtained from: NetBSD (mc146818reg.h)


201005 25-Dec-2009 marius

- Take advantage of bus_{read,write}_*(9).
- Set dow = -1 in mk48txx_gettime() because some drivers (for example
the NetBSD and OpenBSD mk48txx(4)) don't set it correctly.


200948 24-Dec-2009 marius

Merge from amd64/i386:
Implement support for interrupt descriptions.


200947 24-Dec-2009 marius

Add missing locking in intr_bind().


200938 24-Dec-2009 marius

- Don't check for a valid interrupt controller on every interrupt
in intr_execute_handlers(). If we managed to get here without an
associated interrupt controller we have way bigger problems.
While at it predict stray vector interrupts as false as they are
rather unlikely.
- Don't blindly call the clear function of an interrupt controller
when adding a handler in inthand_add() as interrupt controllers
like the one driven by upa(4) are auto-clearing and thus provide
NULL instead.


200925 23-Dec-2009 marius

- By re-arranging the code in OF_decode_addr() somewhat and accepting
a bit of a detour we can just iterate through the banks array instead
of having to calculate every offset. This change is inspired by the
powerpc version of this function.
- Add support for the JBus to EBus bridges which hang off of nexus(4).


200924 23-Dec-2009 marius

Style changes.


200923 23-Dec-2009 marius

- Add support for the IOMMUs of Fire JBus to PCIe and Oberon Uranus
to PCIe bridges.
- Add support for talking the PROM mappings over to the kernel IOTSB
just like we do with the kernel TSB in order to allow OFW drivers
to continue to work.
- Change some members, parameters and variables to unsigned where
more appropriate.


200915 23-Dec-2009 marius

Don't probe the bq4802 variant found in Ultra 25 and 45 for now as
this chip isn't MC146818 compatible and requires different handlers
(but which I can't test due to lack of such hardware).


200914 23-Dec-2009 marius

Don't use an out register to hold the vector number across the call
of the interrupt handler in intr_fast() as the handler might clobber
it (no in-tree handler currently does but an upcoming one will).
While at it, tidy the register usage in the interrupt counting code.


200891 23-Dec-2009 marcel

Calculate the average CPU clock frequency and export that through
the hw.freq.cpu sysctl variable. This can be used by ports that
need to know "the" CPU frequency.


200874 22-Dec-2009 marius

Enroll these drivers in multipass probing. The motivation behind this
is that the JBus to EBus bridges share the interrupt controller of a
sibling JBus to PCIe bridge (at least as far as the OFW device tree
is concerned, in reality they are part of the same chip) so we have to
probe and attach the latter first. That happens to be also the case
due to the fact that the JBus to PCIe bridges appear first in the OFW
device tree but it doesn't hurt to ensure the right order.


200815 21-Dec-2009 marius

Provide and consume missing module dependency information.


200272 08-Dec-2009 marius

Add additional checks of the kernel stack addresses in order to
ensure we don't overrun the end of the call chain.

MFC after: 1 week


200215 07-Dec-2009 marius

Add <machine/pcb.h> missed in r199135.


199868 27-Nov-2009 alc

Simplify the invocation of vm_fault(). Specifically, eliminate the flag
VM_FAULT_DIRTY. The information provided by this flag can be trivially
inferred by vm_fault().

Discussed with: kib


199442 17-Nov-2009 marius

Unroll copying of the registers in {g,s}et_mcontext() and limit it
to the set actually restored by tl0_ret() instead of using the whole
trapframe. Additionally skip %g7 as that register is used as the
userland TLS pointer.

PR: 140523
MFC after: 1 week


199135 10-Nov-2009 kib

Extract the code that records syscall results in the frame into MD
function cpu_set_syscall_retval().

Suggested by: marcel
Reviewed by: marcel, davidxu
PowerPC, ARM, ia64 changes: marcel
Sparc64 tested and reviewed by: marius, also sunv reviewed
MIPS tested by: gonzo
MFC after: 1 month


198507 27-Oct-2009 kib

In r197963, a race with thread being selected for signal delivery
while in kernel mode, and later changing signal mask to block the
signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race
exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.

Use kern_sigprocmask() instead of direct manipulation of td_sigmask to
reschedule newly blocked signals, closing the race.

Reviewed by: davidxu
Tested by: pho
MFC after: 1 month


198341 21-Oct-2009 marcel

o Introduce vm_sync_icache() for making the I-cache coherent with
the memory or D-cache, depending on the semantics of the platform.
vm_sync_icache() is basically a wrapper around pmap_sync_icache(),
that translates the vm_map_t argumument to pmap_t.
o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc
it replaces the pmap_page_executable() function, added to solve
the I-cache problem in uiomove_fromphys().
o In proc_rwmem() call vm_sync_icache() when writing to a page that
has execute permissions. This assures that when breakpoints are
written, the I-cache will be coherent and the process will actually
hit the breakpoint.
o This also fixes the Book-E PMAP implementation that was missing
necessary locking while trying to deal with the I-cache coherency
in pmap_enter() (read: mmu_booke_enter_locked).

The key property of this change is that the I-cache is made coherent
*after* writes have been done. Doing it in the PMAP layer when adding
or changing a mapping means that the I-cache is made coherent *before*
any writes happen. The difference is key when the I-cache prefetches.


197729 03-Oct-2009 bz

Make sure that the primary native brandinfo always gets added
first and the native ia32 compat as middle (before other things).
o(ld)brandinfo as well as third party like linux, kfreebsd, etc.
stays on SI_ORDER_ANY coming last.

The reason for this is only to make sure that even in case we would
overflow the MAX_BRANDS sized array, the native FreeBSD brandinfo
would still be there and the system would be operational.

Reviewed by: kib
MFC after: 1 month


197164 13-Sep-2009 marius

Factor out the duplicated macro for the device type used in the
OFW device tree for PCI bridges and add a new one for PCI Express.
While at it, take advantage of the former for the rman(9) work-
around in jbusppm(4).


195840 24-Jul-2009 jhb

Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar to
a device pager (OBJT_DEVICE) object in that it uses fictitious pages to
provide aliases to other memory addresses. The primary difference is that
it uses an sglist(9) to determine the physical addresses for a given offset
into the object instead of invoking the d_mmap() method in a device driver.

Reviewed by: alc
Approved by: re (kensmith)
MFC after: 2 weeks


195149 28-Jun-2009 marius

- Work around the broken loader behavior of not demapping no longer
used kernel TLB slots when unloading the kernel or modules, which
results in havoc when loading a kernel and modules which take up
less TLB slots afterwards as the unused but locked ones aren't
accounted for in virtual_avail. Eventually this should be fixed
in the loader which isn't straight forward though and the kernel
should be robust against this anyway. [1]
- Ensure that the addresses allocated directly from phys_avail[] by
pmap_bootstrap_alloc() are always colored properly. This implicit
assumption was broken in r194784 as unlike the other consumers the
DPCPU area allocated for the BSP isn't a multiple of PAGE_SIZE *
DCACHE_COLORS. [2]
- Remove the no longer used global msgbuf_phys.
- Remove the redundant ekva parameter of pmap_bootstrap_alloc().
- Correct some outdated function names in ktr(9) invocations.

Requested by: jhb [1]
Reported by: gavin [2]
Approved by: re (kib)
MFC after: 2 weeks


194858 24-Jun-2009 kib

Unbreak sparc64 after the swap accounting changes: mark kernel_map
entries allocated for translations in pmap_init() as MAP_NOFAULT. This
prevents vm_map_insert from trying to account the entries for swap
usage, that is both wrong and too early to work.

While there, change FALSE to VMFS_NO_SPACE.

Reported and tested by: Florian Smeets <flo at kasimir com>
Reviewed by: marius


194784 23-Jun-2009 jeff

Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.

Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


192323 18-May-2009 marcel

Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.


191981 10-May-2009 marius

Just like in cpu_halt(), use cpu_shutdown() rather than ofw_exit()
directly in cpu_reset() in order to idle the APs before exiting
the kernel and letting the BSP enter the firmware so that processes
like init(8) which still might be running on an AP at that point
don't cause a panic there when it crashes due to the fact it no
longer can be supported by the kernel.

MFC after: 3 days


191980 10-May-2009 marius

- Fix style.
- Use __FBSDID.


190708 05-Apr-2009 dchagin

Fix KBI breakage by r190520 which affects older linux.ko binaries:

1) Move the new field (brand_note) to the end of the Brandinfo structure.
2) Add a new flag BI_BRAND_NOTE that indicates that the brand_note pointer
is valid.
3) Use the brand_note field if the flag BI_BRAND_NOTE is set and as old
modules won't have the flag set, so the new field brand_note would be
ignored.

Suggested by: jhb
Reviewed by: jhb
Approved by: kib (mentor)
MFC after: 6 days


190161 20-Mar-2009 marius

Revert r190105 so that removing options KDB but DDB or GDB being
available will cause the kernel to not respect -d and boot_kdb=1
for consistency with the other platforms as pointed out by marcel@.


190114 19-Mar-2009 marius

Hook up the generic OFW pnpinfo string method.


190107 19-Mar-2009 marius

- There's no need to wrap kdb_active and kdb_trap() in #ifdef KDB as
they're always available.
- Remove unused variable. [1]
- Add a missing const.
- Sort includes.

Submitted by: Christoph Mallon [1]


190106 19-Mar-2009 marius

- Remove the delay in cpu_mp_shutdown() which is no longer necessary since
we have stopped using SUNW,stop-self with r186395.
- There's no need to wrap kdb_active in #ifdef KDB as it's always available.


190105 19-Mar-2009 marius

There's no need to wrap kdb_enter() in #ifdef KDB as it's always available.


190103 19-Mar-2009 marius

Take advantage of KOBJMETHOD_END.


190101 19-Mar-2009 marius

Take advantage of KOBJMETHOD_END.


190099 19-Mar-2009 marius

- Sort device methods.
- Take advantage of KOBJMETHOD_END.


190098 19-Mar-2009 marius

- Failing to register as interrupt controller during attach shouldn't
be fatal so just inform about this instead of panicing.
- Sort device methods.
- Take advantage of KOBJMETHOD_END.
- Remove some redundant variables.


190003 19-Mar-2009 marius

Add missing const.


189771 13-Mar-2009 dchagin

Implement new way of branding ELF binaries by looking to a
".note.ABI-tag" section.

The search order of a brand is changed, now first of all the
".note.ABI-tag" is looked through.

Move code which fetch osreldate for ELF binary to check_note() handler.

PR: 118473
Approved by: kib (mentor)


188456 10-Feb-2009 marius

Improve r185008 so the streaming cache is only flushed when
a mapping actually met the threshold.


186682 01-Jan-2009 marius

- Currently the PMAP code is laid out to let the kernel TSB cover the
whole KVA space using one locked 4MB dTLB entry per GB of physical
memory. On Cheetah-class machines only the dt16 can hold locked
entries though, which would be completely consumed for the kernel
TSB on machines with >= 16GB. Therefore limit the KVA space to use
no more than half of the lockable dTLB slots, given that we need
them also for other things.
- Add sanity checks which ensure that we don't exhaust the (lockable)
TLB slots.


186395 22-Dec-2008 marius

- According to comments in OpenBSD, E{2,4}50 tend to have fragile
firmware versions which wedge when using the OFW test service,
so given that we don't really depend on SUNW,stop-self just nuke
it altogether instead of risking problems.
- At least Fire V880 have a small hardware glitch which causes the
reception of IDR_NACKs for CPUs we actually haven't tried to send
an IPI to, even not as part of the initial try. According to tests
this apparently can be safely ignored though, so just return if
checking for the individual IDR_NACKs indicates no outstanding
dispatch. Serializing the sending of IPIs between MD and MI code
by the combined usage of smp_ipi_mtx makes no difference to this
phenomenon. [1]
- Provide relevant debugging bits already with the initial panic
in case of problems with the IPI dispatch, which would have
allowed to diagnose the above problem without a specially built
kernel.
- In case of cheetah_ipi_selected() base the delay we wait for
other CPUs which also might want to dispatch IPIs on the total
amount of CPUs instead of just the number of CPUs we let this
CPU send IPIs to because in the worst case all CPUs also want
to IPI us at the same time.

Reported and access for extensive tests provided by: Beat Gaetzi [1]


186347 20-Dec-2008 nwhitehorn

Modularize the Open Firmware client interface to allow run-time switching
of OFW access semantics, in order to allow future support for real-mode
OF access and flattened device frees. OF client interface modules are
implemented using KOBJ, in a similar way to the PPC PMAP modules.

Because we need Open Firmware to be available before mutexes can be used on
sparc64, changes are also included to allow KOBJ to be used very early in
the boot process by only using the mutex once we know it has been initialized.

Reviewed by: marius, grehan


186128 15-Dec-2008 nwhitehorn

Adapt parts of the sparc64 Open Firmware bus enumeration code (in particular,
the code for parsing interrupt maps) to PowerPC and reflect their new MI
status by moving them to the shared dev/ofw directory.

This commit also modifies the OFW PCI enumeration procedure on PowerPC to
allow the bus to find non-firmware-enumerated devices that Apple likes to add,
and adds some useful Open Firmware properties (compat and name) to the pnpinfo
string of children on OFW SBus, EBus, PCI, and MacIO links. Because of the
change to PCI enumeration on PowerPC, X has started working again on PPC
machines with Grackle hostbridges.

Reviewed by: marius
Obtained from: sparc64


185169 22-Nov-2008 kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with: dchagin, imp, jhb, peter


185133 20-Nov-2008 marius

- According to OpenSolaris, CDMA flushing/syncing for Tomatillos
and XMITS has to be basically done in the same manner as for
the Sabres, i.e. only for devices behind PCI-PCI-bridges and
after a PIO read on the far side of the farest PCI-PCI-bridge.
Given that the Tomatillo documentation mentions no difference
to the Schizo bridges in this regard and this is also still
part of the procedure described Schizo documentation this
seems about right so adjust accordingly (the unconditional
CDMA flushing/syncing previously done was based on how Linux
behaves).
- Implement CDMA flushing/syncing for Schizo version >= 5,
which requires the workaround described in Schizo Errata I-23.
According to Schizo Errata I-13 it's just unusable with
version < 5 though. [1]
- Don't register the Schizo streaming buffer for now until it's
usage is sorted out according to the erratas.
- Register our interrupt filters with the revived INTR_FAST so
they these interrupts can even interrupt filters of device
drivers as necessary.
- Remove the comment regarding lack of newbus'ified bus_dma(9)
as being able to associate a DMA tag with a device would
allow to implement CDMA flushing/syncing in bus_dmamap_sync(9)
but that would totally kill performance. Given that for devices
not behind a PCI-PCI bridge the host-to-PCI bridges also only
do CDMA flushing/syncing based on interrupts there's no
additional disadvantage for polling(4) callbacks in the case
schizo(4) has to do the CDMA flushing/syncing but rather a
general problem.

Reported by: Michael Moll [1]


185109 19-Nov-2008 marius

Use the interrupt level right below PIL_FAST for executing interrupt
filters instead of PIL_FAST and allow special filters and handlers
for interrupts which need to be able to interrupt even filters, f.e.
bus error interrupts, to be registered with the revived INTR_FAST
at PIL_FAST.


185008 16-Nov-2008 marius

- Allow the front-end to specify that iommu(4) should disable
rerun of the streaming cache for silicon bug workarounds.
- Announce the presence of a streaming cache on attach for
informational purposes.
- For performance reasons don't do unnecessary flushes of the
streaming cache when coherent mappings are synced.
- Fix some minor style issues.


185007 16-Nov-2008 marius

Use the spitfire VIS block copy/zero functions also with cheetah-
class CPUs. In theory one could also use versions additionally
taking advantage of the prefetch cache with cheetah-class CPUs,
in my worldstone runs these either didn't provide extra speedup
(USIII+) in comparison to the existing spitfire versions or were
even slightly slower (USIIIi) though, so they aren't committed
for now.
The basic problem leading to the VIS-based copy/zero functions
being initially disabled for cheetah-class CPUs was solved by
letting cheetah_init() clear DCR_IFPOE.


185006 16-Nov-2008 marius

Micro-optimize spitfire_block_{copy,zero}():
- Predict the loop as taken as it's more likely that there's still
data to copy and memory to zero respectively.
- Don't waste the delay slot.


184376 27-Oct-2008 marius

- In GCC 4.2 __builtin_frame_address() was fixed to include the
V9 stack bias so we no longer need to add it in db_backtrace()
and stack_capture() respectively. This also reverts r182018,
which kludged around the resulting unaligned access.
- Sync the sun4v versions of db_trace.c and stack_machdep.c with
the sparc64 ones and fix some style bugs.

MFC after: 3 days


183527 01-Oct-2008 peter

Collect N identical (or near identical) mkdumpheader() implementations into
one, as threatened in the comment. Textdump magic can be passed in.


183397 27-Sep-2008 ed

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


183322 24-Sep-2008 kib

Change the static struct sysentvec and struct Elf_Brandinfo initializers
to the C99 style. At least, it is easier to read sysent definitions
that way, and search for the actual instances of sigcode etc.

Explicitely initialize sysentvec.sv_maxssiz that was missed in most
sysvecs.

No objection from: jhb
MFC after: 1 month


183201 20-Sep-2008 marius

Use the STICK timers only when absolutely necessary, i.e. if a machine
consists of CPUs running at different speeds, for driving hardclock as
these timers in turn are driven at frequencies as low as 5MHz, resulting
in bad granularity compared to the TICK timers. However, don't employ
the workaround for the BlackBird erratum #1 when using the TICK timer
on machines with cheetah-class CPUs for performance reasons.

Reported by: Florian Smeets


183144 18-Sep-2008 marius

- Add a missing prototype.
- Remove a banal comment.


183142 18-Sep-2008 marius

- Newer firmware versions no longer provide SUNW,stop-self so just
disable interrupts and loop forever with these.
- Hide all MP-related bits in <machine/smp.h> underneath #ifdef SMP.
- Inline ipi_all_but_self(9) and ipi_selected(9). We don't expose any
additional bits but save a few cycles by doing so.
- Remove ipi_all(9), which actually only called panic(9). It can't be
implemented natively anyway and having it removed at least causes
MI users to fail already fail when linking.


182918 10-Sep-2008 marius

Add drivers for the power management devices found on Fireplane/
Safari- and JBus-based machines. Currently the main purpose of
these drivers is debugging of the resource allocation on nexus(4)
and the register content of these devices though.


182916 10-Sep-2008 marius

Work around Cheetah+ erratum 34 (USIII+ erratum #10) by relocating
the locked entry in it16 slot 0, which typically is occupied by the
PROM, and manually entering locked entries in slots != 0.

Thanks to Hubert Feyrer for donating the Blade 2000 this change was
developed on.


182878 08-Sep-2008 marius

For cheetah-class CPUs ensure that the dt512_0 is set to hold 8k pages
for all three contexts and configure the dt512_1 to hold 4MB pages for
them (e.g. for direct mappings).
This might allow for additional optimization by using the faulting
page sizes provided by AA_DMMU_TAG_ACCESS_EXT for bypassing the page
size walker for the dt512 in the superpage support code.

Submitted by: nwhitehorn (initial patch)


182877 08-Sep-2008 marius

USIII and beyond CPUs have stricter requirements when it comes
to synchronization needed after stores to internal ASIs in order
to make side-effects visible. This mainly requires the MEMBAR #Sync
after such stores to be replaced with a FLUSH. We use KERNBASE as
the address to FLUSH as it is guaranteed to not trap. Actually,
the USII synchronization rules also already require a FLUSH in
pretty much all of the cases changed.
We're also hitting an additional USIII synchronization rule which
requires stores to AA_IMMU_SFSR to be immediately followed by a DONE,
FLUSH or RETRY. Doing so triggers a RED state exception though so
leave the MEMBAR #Sync. Linux apparently also has gotten away with
doing the same for quite some time now, apart from the fact that
it's not clear to me why we need to clear the valid bit from the
SFSR in the first place.

Reviewed by: nwhitehorn


182774 04-Sep-2008 marius

When determining whether we trapped while in the PROM don't only
check for addresses below the PROM range but also those above.


182773 04-Sep-2008 marius

Use the PROM provided SUNW,set-trap-table to take over the trap
table. This is required in order to set obp-control-relinquished
within the PROM, allowing to safely read the OFW translations node.
Without this, f.e. a `ofwdump -ap` triggers a fatal reset error or
worse things on machines based on USIII and beyond.
In theory this should allow to remove touching %tba in cpu_setregs(),
in practice we seem to currently face a chicken and egg problem when
doing so however.


182769 04-Sep-2008 marius

Ensure the caches have the desired configuration (see especially
cheetah_cache_enable()).


182768 04-Sep-2008 marius

Flesh out MMU and cache handling of cheetah-class CPUs.


182767 04-Sep-2008 marius

The physical address space of cheetah-class CPUs has been extended
to 43 bits so update TD_PA_BITS accordingly. For the most part this
increase is transparent to the existing code except for when reading
the physical address from ASI_{D,I}TLB_DATA_ACCESS_REG, which we
only do in the loader and which was already adjusted in r182478, or
from the OFW translations node.
While at it, ensure we are only taking valid OFW mapping entries
into account.


182743 03-Sep-2008 marius

Additionally clear the STICK bit in the SOFTINT register when
receiving a PIL_TICK interrupt. This change was erroneously
omitted in r182730.


182730 03-Sep-2008 marius

- USIII-based machines can consist of CPUs running at different
frequencies (and having different cache sizes) so use the STICK
(System TICK) timer, which was introduced due to this and is
driven by the same frequency across all CPUs, instead of the
TICK timer, whose frequency varies with the CPU clock, to drive
hardclock. We try to use the STICK counter with all CPUs that are
USIII or beyond, even when not necessary due to identical CPUs,
as we can can also avoid the workaround for the BlackBird erratum
#1 there. Unfortunately, using the STICK counter currently causes
a hang with USIIIi MP machines for reasons unknown, so we still
use the TICK timer there (which is okay as they can only consist
of identical CPUs).
- Given that we only (try to) synchronize the (S)TICK timers of APs
with the BSP during startup, we could end up spinning forever in
DELAY(9) if that function is migrated to another CPU while we're
spinning due to clock drift afterwards, so pin to the CPU in order
to avoid migration. Unfortunately, pinning doesn't work at the
point DELAY(9) is required by the low-level console drivers, yet,
so switch to a function pointer, which is updated accordingly, for
implementing DELAY(9). For USIII and beyond, this would also allow
to easily use the STICK counter instead of the TICK one here,
there's no benefit in doing so however.
While at it, use cpu_spinwait(9) for spinning in the delay-
functions. This currently is a NOP though.
- Don't set the TICK timer of the BSP to 0 during at startup as
there's no need to do so.
- Implement cpu_est_clockrate().
- Unfortunately, USIIIi-based machines don't provide a timecounter
device besides the STICK and TICK counters (well, in theory the
Tomatillo bridges have a performance counter that can be (ab)used
as timecounter by configuring it to count bus cycles, though unlike
the performance counter of Schizo bridges, the Tomatillo one is
broken and counts Sun knows what in this mode). This means that
we've to use a (S)TICK counter for timecounting, which has the old
problem of not being in sync across CPUs, so provide an additional
timecounter function which binds itself to the BSP but has an
adequate low priority.


182689 02-Sep-2008 marius

- USIII-based machines can consist of CPUs having different cache
sizes (and running at different frequencies) so move the cacheinfo
to the PCPU data. While at it, remove some redundant and/or unused
members from struct cacheinfo.
- In sparc64_init don't assume the first CPU node we find in the OFW
device tree is the BSP.


182688 02-Sep-2008 marius

Bypass isa_probe_children(9) and directly call bus_generic_attach(9)
in order to avoid the invasive probes done by identify-routines of
ISA drivers, which may access unassigned addresses or those of
unrelated devices and thus in turn can trigger master/target aborts
as revealed by r182108 and ahc(4). I think that this is also the
cause of the hang previously seen on B100 blades during boot.
Bypassing isa_probe_children(9) also avoids adding ISA hints, which
just can be wrong for sparc64.

Reported by: gavin


182122 24-Aug-2008 marius

There's a race in kmem(4) between checking whether a page is resident
in the kernel and copying it out, causing a panic when faulting on a
nofault entry. Handle this case gracefully by letting the kernel copy
functions return EFAULT instead. As such this change addresses the
same problem as r154721 does for i386.

MFC after: 3 days


182119 24-Aug-2008 marius

MFamd64: r133413

In syscall, always make a copy of parameters from trapframe, this
becauses some syscalls using set_mcontext can sneakily change
parameters and later when those syscalls references parameters,
they will wrongly use register values in mcontext_t.

PR: 72998
MFC after: 3 days


182020 22-Aug-2008 marius

cosmetic changes and style fixes


182018 22-Aug-2008 marius

Avoid misaligned access of struct frame.

MFC after: 3 days


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


181701 13-Aug-2008 marius

cosmetic changes and style fixes


181640 12-Aug-2008 marius

- Add sys_tick and the USIII and beyond sys_tick_cmpr to state_regs[].
- Const'ify and static'ize as appropriate.
- Use __FBSDID().


180664 21-Jul-2008 marius

- Remove redundant inclusion of opt_global.h.
- Use __FBSDID in autoconf.c.

MFC after: 3 days


180299 05-Jul-2008 marius

- Merge macros depending on the flags being preserved between calls
into a single "__asm"-statement as GCC doesn't guarantee their
consecutive output even when using consecutive "__asm __volatile"-
statement for them. Remove the otherwise unnecessary "__volatile". [1]
- The inline assembler instructions used here alter the condition
codes so add them to the clobber list accordingly.
- The inline assembler instructions used here uses output operands
before all input operands are consumed so add appropriate modifiers.

Pointed out by: bde [1]
MFC after: 2 weeks


180298 05-Jul-2008 marius

- Fix spelling and style.
- Use __FBSDID.


179229 23-May-2008 alc

The VM system no longer uses setPQL2(). Remove it and its helpers.


179081 18-May-2008 alc

Retire pmap_addr_hint(). It is no longer used.


178893 09-May-2008 alc

Add a stub for pmap_align_superpage() on machines that don't (yet)
implement pmap-level support for superpages.


178859 08-May-2008 marius

Remove #if 0'ed code referencing no longer existent ecache_flush().


178858 08-May-2008 marius

Use <machine/intr_machdep.h> directly instead of depending on header
pollution in the otherwise unused <sys/pcpu.h>.


178840 07-May-2008 marius

- Use the name returned by device_get_nameunit(9) for the name of the
counter-timer timecounter so the associated SYSCTL nodes don't clash on
machines having multiple U2P and U2S bridges as well as establishing a
clear mapping between these bridges and their timecounter device.
- Don't bother setting up a "nice" name for the IOMMU, just use the name
returned by device_get_nameunit(9), too.
- Fix some minor style(9) bugs.
- Use __FBSDID in counter.c

MFC after: 1 week


178471 25-Apr-2008 jeff

- Add an integer argument to idle to indicate how likely we are to wake
from idle over the next tick.
- Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are
suspended in cpu specific states. This function can fail and cause the
scheduler to fall back to another mechanism (ipi).
- Implement support for mwait in cpu_idle() on i386/amd64 machines that
support it. mwait is a higher performance way to synchronize cpus
as compared to hlt & ipis.
- Allow selecting the idle routine by name via sysctl machdep.idle. This
replaces machdep.cpu_idle_hlt. Only idle routines supported by the
current machine are permitted.

Sponsored by: Nokia


178443 23-Apr-2008 marius

o Rename ic_eoi to ic_clear to emphasize the functions it points
don't send and EOI which works like on amd64/i386 and blocks all
interrupts on the relevant interrupt controller.
o Replace the post_filter and post_inthread hooks registered when
creating the interrupt events with just ic_clear as on sparc64 we
don't need to do any disable->EOI->enable dance to unblock all but
the relevant interrupt while running the filter or handler; just
not clearing the interrupt already has the same effect.
o Merge from amd64/i386:
- Split the intr_table_lock into an sx lock used for most things,
and a spin lock to protect intrcnt_index.
- Add support for binding interrupts to CPUs, including for the
bus_bind_intr(9) interface, a assign_cpu hook and initially
shuffling interrupts arround in a round-robin fashion.

Reviewed by: jhb
MFC after: 1 month


178092 11-Apr-2008 jeff

- Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.

Reviewed by: jhb
Sponsored by: Nokia


178048 09-Apr-2008 marius

- Add support for IPI_PREEMPT. [1]
- Add my copyright to mp_machdep.c for having implemented support for
USIII and up and some fixes.

Obtained from: sun4v (modulo style(9) bugs) [1]


177940 05-Apr-2008 jhb

Add a MI intr_event_handle() routine for the non-INTR_FILTER case. This
allows all the INTR_FILTER #ifdef's to be removed from the MD interrupt
code.
- Rename the intr_event 'eoi', 'disable', and 'enable' hooks to
'post_filter', 'pre_ithread', and 'post_ithread' to be less x86-centric.
Also, add a comment describe what the MI code expects them to do.
- On amd64, i386, and powerpc this is effectively a NOP.
- On arm, don't bother masking the interrupt unless the ithread is
scheduled in the non-INTR_FILTER case to match what INTR_FILTER did.
Also, don't bother unmasking the interrupt in the post_filter case if
we never masked it. The INTR_FILTER case had been doing this by having
arm_unmask_irq for the post_filter (formerly 'eoi') hook.
- On ia64, stray interrupts are now masked for the non-INTR_FILTER case.
They were already masked in the INTR_FILTER case.
- On sparc64, use the a NULL pre_ithread hook and use intr_enable_eoi() for
both the 'post_filter' and 'post_ithread' hooks to match what the
non-INTR_FILTER code did.
- On sun4v, retire the ithread wrapper hack by using an appropriate
'post_ithread' hook instead (it's what 'post_ithread'/'enable' was
designed to do even in 5.x).

Glanced at by: piso
Reviewed by: marius
Requested by: marius [1], [5]
Tested on: amd64, i386, arm, sparc64


177642 26-Mar-2008 phk

The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].


177565 24-Mar-2008 marius

- Const'ify the bus_stream_asi and bus_type_asi arrays.
- Replace hard-coded functions names missed in bus_machdep.c rev. 1.44
with __func__.
- Break some long lines.

MFC after: 1 month


177325 17-Mar-2008 jhb

Simplify the interrupt code a bit:
- Always include the ie_disable and ie_eoi methods in 'struct intr_event'
and collapse down to one intr_event_create() routine. The disable and
eoi hooks simply aren't used currently in the !INTR_FILTER case.
- Expand 'disab' to 'disable' in a few places.
- Use function casts for arm and i386:intr_eoi_src() instead of wrapper
routines since to trim one extra indirection.

Compiled on: {arm,amd64,i386,ia64,ppc,sparc64} x {FILTER, !FILTER}
Tested on: {amd64,i386} x {FILTER, !FILTER}


177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


177181 14-Mar-2008 jhb

Add preliminary support for binding interrupts to CPUs:
- Add a new intr_event method ie_assign_cpu() that is invoked when the MI
code wishes to bind an interrupt source to an individual CPU. The MD
code may reject the binding with an error. If an assign_cpu function
is not provided, then the kernel assumes the platform does not support
binding interrupts to CPUs and fails all requests to do so.
- Bind ithreads to CPUs on their next execution loop once an interrupt
event is bound to a CPU. Only shared ithreads are bound. We currently
leave private ithreads for drivers using filters + ithreads in the
INTR_FILTER case unbound.
- A new intr_event_bind() routine is used to bind an interrupt event to
a CPU.
- Implement binding on amd64 and i386 by way of the existing pic_assign_cpu
PIC method.
- For x86, provide a 'intr_bind(IRQ, cpu)' wrapper routine that looks up
an interrupt source and binds its interrupt event to the specified CPU.
MI code can currently (ab)use this by doing:

intr_bind(rman_get_start(irq_res), cpu);

however, I plan to add a truly MI interface (probably a bus_bind_intr(9))
where the implementation in the x86 nexus(4) driver would end up calling
intr_bind() internally.

Requested by: kmacy, gallatin, jeff
Tested on: {amd64, i386} x {regular, INTR_FILTER}


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


176995 09-Mar-2008 marius

- Fix some style bugs.
- Replace hard-coded functions names missed in rev. 1.44 with __func__.

MFC after: 1 week


176994 09-Mar-2008 marius

- Do as the comment in pmap_bootstrap() suggests and flush all non-locked
TLB entries possibly left over by the firmware and also do so while
bootstrapping APs.
- Use __FBSDID.

MFC after: 1 month


176734 02-Mar-2008 jeff

- Remove the old smp cpu topology specification with a new, more flexible
tree structure that encodes the level of cache sharing and other
properties.
- Provide several convenience functions for creating one and two level
cpu trees as well as a default flat topology. The system now always
has some topology.
- On i386 and amd64 create a seperate level in the hierarchy for HTT
and multi-core cpus. This will allow the scheduler to intelligently
load balance non-uniform cores. Presently we don't detect what level
of the cache hierarchy is shared at each level in the topology.
- Add a mechanism for testing common topologies that have more information
than the MD code is able to provide via the kern.smp.topology tunable.
This should be considered a debugging tool only and not a stable api.

Sponsored by: Nokia


176197 11-Feb-2008 marius

The Sun disk label only uses 16-bit fields for cylinders, heads and
sectors so the geometry of large IDE disks has to be adjusted. This
corresponds to what the OpenSolaris dad(7D) driver does except that
the latter only tweaks sectors and effectively limits the mediasize
to 128GB so the cylinders and heads fields won't ever overflow. Not
limiting the mediasize is a compromise between allowing to use Sun
disk label as far as possible and being able to use the entire disk
with another disk label.
This allows to use the full capacity of large IDE disks if they were
not labeled under (Open)Solaris (in both ways of the meaning).

MFC after: 2 weeks


175768 28-Jan-2008 ru

Add a wrapper function that bound checks writes to the dump device.


175067 03-Jan-2008 alc

Add an access type parameter to pmap_enter(). It will be used to implement
superpage promotion.

Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is
a Boolean.


174933 27-Dec-2007 alc

Update two tracepoints, i.e., CTRx() invocations, to reflect the demise of
page coloring a few months ago.


174898 25-Dec-2007 rwatson

Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument. This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.


174195 02-Dec-2007 rwatson

Break out stack(9) from ddb(4):

- Introduce per-architecture stack_machdep.c to hold stack_save(9).
- Introduce per-architecture machine/stack.h to capture any common
definitions required between db_trace.c and stack_machdep.c.
- Add new kernel option "options STACK"; we will build in stack(9) if it is
defined, or also if "options DDB" is defined to provide compatibility
with existing users of stack(9).

Add new stack_save_td(9) function, which allows the capture of a stacktrace
of another thread rather than the current thread, which the existing
stack_save(9) was limited to. It requires that the thread be neither
swapped out nor running, which is the responsibility of the consumer to
enforce.

Update stack(9) man page.

Build tested: amd64, arm, i386, ia64, powerpc, sparc64, sun4v
Runtime tested: amd64 (rwatson), arm (cognet), i386 (rwatson)


173799 21-Nov-2007 scottl

Extend critical section coverage in the low-level interrupt handlers to
include the ithread scheduling step. Without this, a preemption might
occur in between the interrupt getting masked and the ithread getting
scheduled. Since the interrupt handler runs in the context of curthread,
the scheudler might see it as having a such a low priority on a busy system
that it doesn't get to run for a _long_ time, leaving the interrupt stranded
in a disabled state. The only way that the preemption can happen is by
a fast/filter handler triggering a schduling event earlier in the handler,
so this problem can only happen for cases where an interrupt is being
shared by both a fast/filter handler and an ithread handler. Unfortunately,
it seems to be common for this sharing to happen with network and USB
devices, for example. This fixes many of the mysterious TCP session
timeouts and NIC watchdogs that were being reported. Many thanks to Sam
Lefler for getting to the bottom of this problem.

Reviewed by: jhb, jeff, silby


173708 17-Nov-2007 alc

Prevent the leakage of wired pages in the following circumstances:
First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated.
Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the
pages beyond the EOF are unmapped and freed. However, when the file is
mlock(2)ed, the pages beyond the EOF are unmapped but not freed because
they have a non-zero wire count. This can be a mistake. Specifically,
it is a mistake if the sole reason why the pages are wired is because of
wired, managed mappings. Previously, unmapping the pages destroys these
wired, managed mappings, but does not reduce the pages' wire count.
Consequently, when the file is unmapped, the pages are not unwired
because the wired mapping has been destroyed. Moreover, when the vm
object is finally destroyed, the pages are leaked because they are still
wired. The fix is to reduce the pages' wired count by the number of
wired, managed mappings destroyed. To do this, I introduce a new pmap
function pmap_page_wired_mappings() that returns the number of managed
mappings to the given physical page that are wired, and I use this
function in vm_object_page_remove().

Reviewed by: tegge
MFC after: 6 weeks


173615 14-Nov-2007 marcel

o Rename cpu_thread_setup() to cpu_thread_alloc() to better
communicate that it relates to (is called by) thread_alloc()
o Add cpu_thread_free() which is called from thread_free()
to counter-act cpu_thread_alloc().

i386: Have cpu_thread_free() call cpu_thread_clean() to
preserve behaviour.
ia64: Have cpu_thread_free() call mtx_destroy() for the
mutex initialized in cpu_thread_alloc().

PR: ia64/118024


173361 05-Nov-2007 kib

Fix for the panic("vm_thread_new: kstack allocation failed") and
silent NULL pointer dereference in the i386 and sparc64 pmap_pinit()
when the kmem_alloc_nofault() failed to allocate address space. Both
functions now return error instead of panicing or dereferencing NULL.

As consequence, vmspace_exec() and vmspace_unshare() returns the errno
int. struct vmspace arg was added to vm_forkproc() to avoid dealing
with failed allocation when most of the fork1() job is already done.

The kernel stack for the thread is now set up in the thread_alloc(),
that itself may return NULL. Also, allocation of the first process
thread is performed in the fork1() to properly deal with stack
allocation failure. proc_linkup() is separated into proc_linkup()
called from fork1(), and proc_linkup0(), that is used to set up the
kernel process (was known as swapper).

In collaboration with: Peter Holm
Reviewed by: jhb


172708 16-Oct-2007 marius

- Fix the handling of R_SPARC_OLO10, which is a bit of a special case
in the way we implement handling of relocations.
As for the kernel part this fixes the loading of lots of modules,
which failed to load due to unresolvable symbols when built after
the GCC 4.2.0 import. This wasn't due to a change in GCC itself
though but one of several changes in configuration done along the
import. Specfically, HAVE_AS_REGISTER_PSEUDO_OP, which causes GCC
to denote global registers used for scratch purposes and in turn
GAS uses R_SPARC_OLO10 relocations for, is now defined.
While at it replace some more ELF_R_TYPE which should have been
ELF64_R_TYPE_ID but didn't cause problems so far.
- Sync a sanity check between kernel and rtld(1) and change it to be
maintenance free regarding the type used for the lookup table.
- Sprinkle const on lookup tables.
- Use __FBSDID.

Reported and tested by: yongari
MFC after: 5 days


172466 07-Oct-2007 alc

Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that is
a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs
when uma_small_free() frees a page. The solution has two parts: (1) Mark
pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock
assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested.
This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable
flags, i.e., they do not change state between the time that a page is
allocated and freed.

Approved by: re (kensmith)
PR: 116794


172207 17-Sep-2007 jeff

- Move all of the PS_ flags into either p_flag or td_flags.
- p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or
previously the sched_lock. These bugs have existed for some time.
- Allow swapout to try each thread in a process individually and then
swapin the whole process if any of these fail. This allows us to move
most scheduler related swap flags into td_flags.
- Keep ki_sflag for backwards compat but change all in source tools to
use the new and more correct location of P_INMEM.

Reported by: pho
Reviewed by: attilio, kib
Approved by: re (kensmith)


172189 15-Sep-2007 alc

It has been observed on the mailing lists that the different categories
of pages don't sum to anywhere near the total number of pages on amd64.
This is for the most part because uma_small_alloc() pages have never been
counted as wired pages, like their kmem_malloc() brethren. They should
be. This changes fixes that.

It is no longer necessary for the page queues lock to be held to free
pages allocated by uma_small_alloc(). I removed the acquisition and
release of the page queues lock from uma_small_free() on amd64 and ia64
weeks ago. This patch updates the other architectures that have
uma_small_alloc() and uma_small_free().

Approved by: re (kensmith)


172066 06-Sep-2007 marius

o Revamp the sparc64 interrupt code in order to be able to interface
with the INTR_FILTER-enabled MI code. Basically this consists of
registering an interrupt controller (of which there can be multiple
and optionally different ones either per host-to-foo bridge or shared
amongst host-to-foo bridges in any one machine) along with an interrupt
vector as specific argument for all the interrupt vectors used by a
given host-to-foo bridge (roughly similar to registering interrupt
sources on amd64 and i386), providing functions to enable, clear and
disable the interrupts of the children beneath the bridge.
This also includes:
- No longer entering a critical section in tl0_intr() and tl1_intr()
for executing interrupt handlers but rather let the handlers enter
it themselves so in the case of intr_event_handle() we don't enter
a nested critical section.
- Adding infrastructure for binding delivery of interrupt vectors to
specific CPUs which later on can be interfaced with the code from
amd64/i386 for binding interrupts to specific CPUs.
- Getting rid of the wrapper hack introduced along the lines of the
API changes for INTR_FILTER which as a side-effect caused interrupts
associated with ithread handlers only to get the elevated priority
of those associated with filters ("fast handlers") (this removes the
hack also in the non-INTR_FILTER case).
- Disabling (by not clearing) an interrupt in the interrupt controller
until all associated handlers have been executed, which is crucial
for the typical locking strategy of NIC drivers in order to work
correctly in case of shared interrupts. This was a more or less
theoretical problem on sparc64 though, as shared interrupts are
rather uncommon there except for the on-board SCCs and UARTs.
Note that due to the behavior of at least of some of the interrupt
controllers used on sparc64 an enable+EOI instead of a disable+EOI
approach (as implied by the INTR_FILTER MI code and implemented on
other architectures) is used as the latter can cause lost interrupts
or in the worst case interrupt starvation.
o Correct a typo in sbus_alloc_resource() which caused (pass-through)
allocations to only work down to the grandchildren of the bus, which
wasn't a real problem so far as we don't support any devices which are
great-grandchildren or greater of a U2S bridge, yet.
o In fhc(4) use bus_{read,write}_4() instead of bus_space_{read,write}_4()
in order to get rid of sc_bh and sc_bt in the fhc_softc. Also get rid
of some other unneeded members in fhc_softc.

Reviewed by: marcel (earlier version)
Approved by: re (kensmith)


171730 05-Aug-2007 marius

- Divorce the IOTSBs, which so far where handled via a global list
instead of per IOMMU, so we no longer need to program all of them
identically in systems having multiple IOMMUs. This continues the
rototilling of the nexus(4) done about 5 months ago, which amongst
others changed nexus(4) and the drivers for host-to-foo bridges
to provide bus_get_dma_tag methods, allowing to handle DMA tags in
a hierarchical way and to link them with devices.
This still doesn't move the silicon bug workarounds for Sabre (and
in the uncommitted schizo(4) for Tomatillo) bridges into special
bus_dma_tag_create() and bus_dmamap_sync() methods though, as w/o
fully newbus'ified bus_dma_tag_create() and bus_dma_tag_destroy()
this still requires too much hackery, i.e. per-child parent DMA
tags in the parent driver.
- Let the host-to-foo drivers supply the maximum physical address
of the IOMMU accompanying the bridges. Previously iommu(4) hard-
coded an upper limit of 16GB, which actually only applies to the
IOMMUs of the Hummingbird and Sabre bridges. The Psycho variants
as well as the U2S in fact can can translate to up to 2TB, i.e.
translate to 41-bit physical addresses. According to the recently
available Tomatillo documentation these bridges even translate to
43-bit physical addresses and hints at the Schizo bridges doing
43 bits as well.
This fixes the issue the FreeBSD 6.0 todo list item "Max RAM on
sparc64" was refering to and pretty much obsoletes the lack of
support for bounce buffers on sparc64.

Thanks to Nathan Whitehorn for pointing me at the Tomatillo manual.

Approved by: re (kensmith)


171553 23-Jul-2007 dwmalone

If clock_ct_to_ts fails to convert time time from the real time clock,
print a one line error message. Add some comments on not being able to
trust the day of week field (I'll act on these comments in a follow up
commit).

Approved by: re
MFC after: 3 weeks


171488 18-Jul-2007 jeff

- Remove the global definition of sched_lock in mutex.h to break
new code and third party modules which try to depend on it.
- Initialize sched_lock in sched_4bsd.c.
- Declare sched_lock in sparc64 pmap.c and assert that we're compiling
with SCHED_4BSD to prevent accidental crashes from running ULE. This
is the sole remaining file outside of the scheduler that uses the
global sched_lock.

Approved by: re


170846 16-Jun-2007 marius

- Add support for sending IPIs with USIII and greater sun4u CPUs.
These CPUs use an enhanced layout of the interrupt vector dispatch
and dispatch status registers in order to allow sending IPIs to
multiple targets simultaneously. Thus support for these CPUs was
put in a newly added cheetah_ipi_selected(). This is intended to
be pointed to by cpu_ipi_selected, which now is a function pointer,
in order to avoid cpu_impl checks once booted. Alternatively it
can point to spitfire_ipi_selected(), which was renamed from
cpu_ipi_selected(). Consequently cpu_ipi_send() was also renamed
to spitfire_ipi_send() (there's no need for a cheetah equivalent
of this so far). Initialization of the cpu_ipi_selected pointer
and other requirements is done in mp_init(), which was renamed
from mp_tramp_alloc(), as cpu_mp_start() isn't called on UP
systems while cpu_ipi_selected() is. As a side-effect this allows
to make mp_tramp static to sys/sparc64/sparc64/mp_machdep.c.
For the sake of avoiding #ifdef SMP and for keeping the history in
place cheetah_ipi_selected() and spitfire_ipi_{selected,send}()
where not put into/moved to sys/sparc64/sparc64/{cheetah,spitfire}.c
- Add some CTASSERTs and KASSERTs ensuring that MAXCPU doesn't
exceed the data types we use to store the CPU bit fields or the
number of USIII and greater CPUs supported by the current
cheetah_ipi_selected() implementation (which for JBus-CPUs is
only 4; that should be fine though as according to OpenSolaris
there are no sun4u machines with more than 4 JBus-CPUs).
- In cpu_mp_start() don't enumerate and start more than MAXCPU CPUs
as we can't handle more than that.
- In cpu_mp_start() check for upa-portid vs. portid depending on
cpu_impl for consistency with nexus(4).
- In spitfire_ipi_selected() add KASSERTs ensuring that a CPU isn't
told to IPI itself as sun4u CPUs just can't do that.
- In spitfire_ipi_send() do a MEMBAR #Sync after writing the
interrupt vector data as we want to make sure the payload was
actually written before we trigger the dispatch.
- In spitfire_ipi_send() also verify IDR_BUSY when checking whether
the dispatch was successful as it has to be cleared for this to
be the case.
- Remove some redundant variables.


170845 16-Jun-2007 marius

- Flesh out the support for the EBus variant which actually is the
RTC function of a National Semiconductor PC87317/PC97317. This
consists of using the century register the same way Solaris does
for compatibility reasons. Once there is a MD power(4) we'd also
want to interface the APC (Advanced Power Control) functionality
of the same chip function with it.
- Use a macro for the device description and take advantage of
ISA_PNP_PROBE() setting the device description.
- Use the generated typedefs for the prototypes of the device
interface functions.


170843 16-Jun-2007 marius

Remove the code for displaying the OFW hostid during boot for the
reasons outlined in the comment removed along with it, because the
OFW hostid has no real meaning for FreeBSD and mainly so the OFW
hostid is not confused with the FreeBSD hostid.


170305 04-Jun-2007 jeff

- Change comments and asserts to reflect the removal of the global
scheduler lock.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170303 04-Jun-2007 jeff

Commit 10/14 of sched_lock decomposition.
- Use sched_throw() rather than replicating the same cpu_throw() code for
each architecture. This also allows the scheduler to use any locking it
may want to.
- Use the thread_lock() rather than sched_lock when preempting.
- The scheduler lock is not required to synchronize release_aps.

Tested by: kris, current@
Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc.
Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)


170291 04-Jun-2007 attilio

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


170249 03-Jun-2007 alc

Prepare for the new physical memory allocator: Change the way that the
physical page's color is obtained.

Approved by: re


170170 31-May-2007 attilio

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


170162 31-May-2007 piso

In some particular cases (like in pccard and pccbb), the real device
handler is wrapped in a couple of functions - a filter wrapper and an
ithread wrapper. In this case (and just in this case), the filter
wrapper could ask the system to schedule the ithread and mask the
interrupt source if the wrapped handler is composed of just an ithread
handler: modify the "old" interrupt code to make it support
this situation, while the "new" interrupt code is already ok.

Discussed with: jhb


170086 29-May-2007 yongari

Honor maxsegsz of less than a page size in a DMA tag. Previously it
used to return PAGE_SIZE without respect to restrictions of a DMA tag.
This affected all of the busdma load functions that use
_bus_dmamap_loader_buffer() as their back-end.

Reviewed by: scottl


169846 22-May-2007 kan

Allow FreeBSD's native ELF image activators to execute shared libraries the
same way it was enabled for Linux binares in linuxulator.

This allows binaries built with -pie. Many ports auto-detect -fPIE support
in GCC 4.2 and build binaries FreeBSD was unable to run.


169805 20-May-2007 jeff

- rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.

Suggested by: julian@
Contributed by: attilio@


169796 20-May-2007 marius

- Staticize cpu_ipi_send() and cpu_mp_unleash() as these aren't
referenced outside of mp_machdep.c
- Replace a magic 14 with the newly added IDC_ITID_SHIFT macro.
- Remove the global mp_boot_mid variable as it's not really necessary
and just replacing it with PCPU_GET(mid) doesn't have any impact on
performance once booted.
- Replace PCPU_GET(cpuid) with the curcpu shortcut.
- Replace hardcoded function names in panic strings etc with __func__
so they don't need to be updated when renaming the function.
- Use register_t instead of u_long for variables used to hold the
return value of intr_disable() so we don't need to apply any
knowledge about the actual width of that value here.
- Improve the wording of some comments.
- Fix several style(9) bugs.


169795 20-May-2007 marius

- Also identify USIIIi+, USIV and USIV+ CPUs.
- Use __FBSDID in identcpu.c.
- Remove #ifndef SUN4V around global cpu_impl variable; it doesn't
hurt on sun4v for now and once setPQL2() is gone sun4v can stop
sharing identcpu.c with sparc64, making the reminder of this file
also sparc64-only again. [1]

Submitted by: kmacy [1]


169793 20-May-2007 marius

Delete the unused/not really used sparc64 (as in sun4u) cache.h,
iommureg.h (which already began to bitrot) and iommuvar.h from the
sun4v source and adjust some of the source which is shared between
sparc64 and sun4v as appropriate.


169667 18-May-2007 jeff

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


169178 01-May-2007 marius

Use the VIS-based Spitfire version of the page copying and zeroing
functions with CPUs they apply to only, otherwise default to the
plain C functions. This is modeled in a way so that f.e. a Cheetah
version of these functions can be inserted easily.


169175 01-May-2007 marius

Make the rman(9) workaround actually work. The main problem was that
the UPA_IMR2 resource is also shared with/a subset of the Schizo PCI
bus B CSR bank. I'm not entirely sure how this previously managed to
escape testing...


167352 09-Mar-2007 mohans

Over NFS, an open() call could result in multiple over-the-wire
GETATTRs being generated - one from lookup()/namei() and the other
from nfs_open() (for cto consistency). This change eliminates the
GETATTR in nfs_open() if an otw GETATTR was done from the namei()
path. Instead of extending the vop interface, we timestamp each attr
load, and use this to detect whether a GETATTR was done from namei()
for this syscall. Introduces a thread-local variable that counts the
syscalls made by the thread and uses <pid, tid, thread syscalls> as
the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on
thread state that could be used as the timestamp with minimal overhead.


167308 07-Mar-2007 marius

Rototill the sparc64 nexus(4) (actually this brings in the code the
sun4v nexus(4) in turn is based on):
o Change nexus(4) to manage the resources of its children so the
respective device drivers don't need to figure them out of OFW
themselves.
o Change nexus(4) to provide the ofw_bus KOBJ interface instead of
using IVARs for supplying the OFW node and the subset of standard
properties of its children. Together with the previous change this
also allows to fully take advantage of newbus in that drivers like
fhc(4), which attach on multiple parent busses, no longer require
different bus front-ends as obtaining the OFW node and properties
as well as resource allocation works the same for all supported
busses. As such this change also is part 4/4 of allowing creator(4)
to work in USIII-based machines as it allows this driver to attach
on both nexus(4) and upa(4). On the other hand removing these IVARs
breaks API compatibility with the powerpc nexus(4) but which isn't
that bad as a) sparc64 currently doesn't share any device driver
hanging off of nexus(4) with powerpc and b) they were no longer
compatible regarding OFW-related extensions at the pci(4) level
since quite some time.
o Provide bus_get_dma_tag methods in nexus(4) and its children in
order to handle DMA tags in a hierarchical way and get rid of the
sparc64_root_dma_tag kludge. Together with the previous two items
this changes also allows to completely get rid of the nexus(4)
IVAR interface. It also includes:
- pushing the constraints previously specified by the nexus_dmatag
down into the DMA tags of psycho(4) and sbus(4) as it's their
IOMMUs which induce these restrictions (and nothing at the
nexus(4) or anything that would warrant specifying them there),
- fixing some obviously wrong constraints of the psycho(4) and
sbus(4) DMA tags, which happened to not actually be used with
the sparc64_root_dma_tag kludge in place and therefore didn't
cause problems so far,
- replacing magic constants for constraints with macros as far
as it is obvious as to where they come from.
This doesn't include taking advantage of the newbus way to get
the parent DMA tags implemented by this change in order to divorce
the IOTSBs of the PCI and SBus IOMMUs or for implementing the
workaround for the DMA sync bug in Sabre (and Tomatillo) bridges,
yet, though.
o Get rid of the notion that nexus(4) (mostly) reflects an UPA bus
by replacing ofw_upa.h and with ofw_nexus.h (which was repo-copied
from ofw_upa.h) and renaming its content, which actually applies to
all of Fireplane/Safari, JBus and UPA (in the host bus case), as
appropriate.
o Just use M_DEVBUF instead of a separate M_NEXUS malloc type for
allocating the device info for the children of nexus(4). This is
done in order to not need to export M_NEXUS when deriving drivers
for subordinate busses from the nexus(4) class.
o Use the DEFINE_CLASS_0() macro to declare the nexus(4) driver so
we can derive subclasses from it.
o Const'ify the nexus_excl_name and nexus_excl_type arrays as well
as add 'associations' and 'rsc', which are pseudo-devices without
resources and therefore of no real interest for nexus(4), to the
former.
o Let the nexus(4) device memory rman manage the entire 64-bit address
space instead of just the UPA_MEMSTART to UPA_MEMEND subregion as
Fireplane/Safari- and JBus-based machines use multiple ranges,
which can't be as easily divided as in the case of UPA (limiting
the address space only served for sanity checking anyway).
o Use M_WAITOK instead of M_NOWAIT when allocating the device info
for children of nexus(4) in order to give one less opportunity
for adding devices to nexus(4) to fail.
o While adapting the drivers affected by the above nexus(4) changes,
change them to take advantage of rman_get_rid() instead of caching
the RIDs assigned to allocated resources, now that the RIDs of
resources are correctly set.
o In iommu(4) and nexus(4) replace hard-coded functions names, which
actually became outdated in several places, in panic strings and
status massages with __func__. [1]
o Use driver_filter_t in prototypes where appropriate.
o Add my copyright to creator(4), fhc(4), nexus(4), psycho(4) and
sbus(4) as I changed considerable amounts of these drivers as well
as added a bunch of new features, workarounds for silicon bugs etc.
o Fix some white space nits.

Due to lack of access to Exx00 hardware, these changes, i.e. central(4)
and fhc(4), couldn't be runtime tested on such a machine. Exx00 are
currently reported to panic before trying to attach nexus(4) anyway
though.

PR: 76052 [1]
Approved by: re (kensmith)


167267 06-Mar-2007 piso

Wrap at 80 bus_setup_intr() in upa_setup_intr().


166968 25-Feb-2007 marius

Use uma_set_align().


166901 23-Feb-2007 piso

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


166154 20-Jan-2007 marius

Quiet GCC4 warnings regarding the width of printf()-arguments not
matching the format. While at it limit the format to unsigned int as
we're only interested in the 11 least significant bits anyway.


166105 19-Jan-2007 marius

Convert the remainder of the low hanging fruits regarding including
headers in .S directly rather than getting to their macros through
genassym.c/assym.s so there are less headers genassym.c has to be
kept in sync with.
While at it fix some stytle(9) bugs (indentation, prototype format,
sort headers, etc) and remove trailing whitespace.


166096 18-Jan-2007 marius

- Rename UPA_BUS_SPACE to NEXUS_BUS_SPACE; besides an UPA bus, nexus(4)
may also reflect a Fireplane/Safari or JBus bus (or a virtual bus which
in turn reflects a JBus bus or something like that...).
- In the both the sparc64 and sun4v bus_machdep.c use __FBSDID.
- Spell SBus the official way in comments.
- Replace hardcoded function names (all of which were actually outdated)
in panic and status strings with __func__.
- Fix whitespace nits.


166060 16-Jan-2007 marius

Resurrect upa(4), now used for the subordinate/slave UPA bridge and
bus hanging off from the Fireplane/Safari bus in some USIII machines.
This is part 3/4 of allowing creator(4) to work in these machines.
The little info needed on how to configure the bridge and to work
around the incorrect values contained in the `interrupts' properties
of its children were obtained form OpenSolaris.


166058 16-Jan-2007 marius

Teach OF_decode_addr() about the bus space used for devices on the
nexus (which might or might not reflect an UPA interconnection bus;
accordingly UPA_BUS_SPACE should be renamed to NEXUS_BUS_SPACE at a
later point) and subordinate/slave UPA busses. This is part 1/4 of
allowing creator(4) to work in USIII machines (which have a UPA bus
hanging off from the Fireplane/Safari bus reflected by the nexus).


165299 17-Dec-2006 kmacy

GC unused fields in pcpu


165064 10-Dec-2006 kmacy

Do explicit bounds checking as a function of the actual size of the
reloc_target_bitmask array as opposed to the (known) index of the last value.
This change fixes CID 691.


164936 06-Dec-2006 julian

Threading cleanup.. part 2 of several.

Make part of John Birrell's KSE patch permanent..
Specifically, remove:
Any reference of the ksegrp structure. This feature was
never fully utilised and made things overly complicated.
All code in the scheduler that tried to make threaded programs
fair to unthreaded programs. Libpthread processes will already
do this to some extent and libthr processes already disable it.

Also:
Since this makes such a big change to the scheduler(s), take the opportunity
to rename some structures and elements that had to be moved anyhow.
This makes the code a lot more readable.

The ULE scheduler compiles again but I have no idea if it works.

The 4bsd scheduler still reqires a little cleaning and some functions that now do
ALMOST nothing will go away, but I thought I'd do that as a separate commit.

Tested by David Xu, and Dan Eischen using libthr and libpthread.


164760 30-Nov-2006 jb

Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.


164737 29-Nov-2006 kmacy

- Explicitly name the fields in pcb that we use to store trap state for later
retrieval, rather than using pad
- save the fault address in sfar for use by the alignment fixup handler
- mask off the trap number, so the context id doesn't confuse the UT_MAX
comparison

This change fixes alignment fixup handling which is needed for traceroute
to work in spite of its copious unaligned accesses


164591 24-Nov-2006 kmacy

remove unused reference to tsb pa


164485 22-Nov-2006 kmacy

Add mechanism to track TSB misses in tsb miss handler
Remove unused debug code


164372 18-Nov-2006 kmacy

remove 13 (largely) redundant files and switch to the sparc64/sparc64 version

Reviewed by: jb (mentor rwatson)


164229 12-Nov-2006 alc

Make pmap_enter() responsible for setting PG_WRITEABLE instead
of its caller. (As a beneficial side-effect, a high-contention
acquisition of the page queues lock in vm_fault() is eliminated.)


163973 04-Nov-2006 jb

Backout the previous change. It was not intended to be part of the
commit and, while something like that is probably required for sparc64,
it hadn't been tested.


163972 04-Nov-2006 jb

Build in kernel support for loading DTrace modules by default. This
adds the hooks that DTrace modules register with, and adds a few functions
which have the dtrace_ prefix to allow the DTrace FBT (function boundary
trace) provider to avoid tracing because they are called from the DTtrace
probe context.

Unlike other forms of tracing and debug, DTrace support in the kernel
incurs negligible run-time cost.

I think the only reason why anyone wouldn't want to have kernel support
enabled for DTrace would be due to the license (CDDL) under which DTrace
is released.


163965 03-Nov-2006 kmacy

make pcb pad area accessible from asm
Approved by: scottl (standing in for rwatson as mentor)


163858 01-Nov-2006 jb

Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by: scottl, jhb
MFC after: 2 weeks


163709 26-Oct-2006 jb

Make KSE a kernel option, turned on by default in all GENERIC
kernel configs except sun4v (which doesn't process signals properly
with KSE).

Reviewed by: davidxu@


163449 17-Oct-2006 davidxu

o Add keyword volatile for user mutex owner field.
o Fix type consistent problem by using type long for old
umtx and wait channel.
o Rename casuptr to casuword.


163192 10-Oct-2006 bde

The powerpc and sparc64 MD `reboot' commands should never have existed
since they just duplicated the MI `reset' command. Instead of removing
them, make `reboot' an MI alias for `reboot' since this gives a better
way of killing the `r' alias for `reset'. Remove the `registers' command
that was used to kill the alias.

Turn the powerpc and sparc64 MD `halt' command into an MI command.

A copy of sparc64/db_interface.c grew in sun4v just after I found the
extra reboot commands. It has not been changed, and is now not
identical. Duplicated commands come out duplicated in ddb's online
help, but cause large problems when used (e.g., on i386's with 2 halt's
and an hwatch, typing h doesn' give the expected message about an
ambiguous command, but hangs like the halt command or a looping parseri
would).


163152 09-Oct-2006 kmacy

unbreak buildkernel for sparc64 - fallout from sun4v

Approved by: rwatson (mentor)
Reviewed by: jmg


163146 09-Oct-2006 kmacy

kernel clean up to make the sun4v kernel build

Reviewed by: jmg
Approved by: rwatson (mentor)


162544 22-Sep-2006 alc

The fix in revision 1.152 converted in the wrong direction.

Fix a typo in a comment.

Submitted by: Michael Plass


162543 22-Sep-2006 alc

The sparc64/sparc64/pmap.c implementations of pmap_remove(),
pmap_protect(), and pmap_copy() have optimizations for regions
larger than PMAP_TSB_THRESH (which works out to 16MB). This
caused a panic in tsb_foreach for kernel mappings, since
pm->pm_tsb is NULL in that case. This fix teaches tsb_foreach
to use the kernel's tsb in that case.

Submitted by: Michael Plass
MFC after: 3 days


161966 03-Sep-2006 marius

Do as the USII CPU manual suggests and leave interrupts enabled
for a bit before retrying to resend an IPI in order to avoid
deadlocks if the other CPU is also trying to send one.
OpenSolaris uses a delay of 1 microsecond here but waiting 2
microseconds with interrupts enabled like Linux does shouldn't
hurt but is a bit safer.

MFC after: 1 day


161675 28-Aug-2006 davidxu

Implement casuword32, compare and set user integer, thank Marcel Moolenarr
who wrote the IA64 version of casuword32.


161028 06-Aug-2006 alc

Eliminate the unnecessary acquisition and release of the page queues lock
from pmap_pinit().


160889 01-Aug-2006 alc

Complete the transition from pmap_page_protect() to pmap_remove_write().
Originally, I had adopted sparc64's name, pmap_clear_write(), for the
function that is now pmap_remove_write(). However, this function is more
like pmap_remove_all() than like pmap_clear_modify() or
pmap_clear_reference(), hence, the name change.

The higher-level rationale behind this change is described in
src/sys/amd64/amd64/pmap.c revision 1.567. The short version is that I'm
trying to clean up and fix our support for execute access.

Reviewed by: marcel@ (ia64)


160801 28-Jul-2006 jhb

Retire SYF_ARGMASK and remove both SYF_MPSAFE and SYF_ARGMASK. sy_narg is
now back to just being an argument count.


160798 28-Jul-2006 jhb

Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to
mark system calls as being MPSAFE:
- Stop conditionally acquiring Giant around system call invocations.
- Remove all of the 'M' prefixes from the master system call files.
- Remove support for the 'M' prefix from the script that generates the
syscall-related files from the master system call files.
- Don't explicitly set SYF_MPSAFE when registering nfssvc.


160773 27-Jul-2006 jhb

Unify the checking for lock misbehavior in the various syscall()
implementations and adjust some of the checks while I'm here:
- Add a new check to make sure we don't return from a syscall in a critical
section.
- Add a new explicit check before userret() to make sure we don't return
with any locks held. The advantage here is that we can include the
syscall number and name in syscall() whereas that info is not available
in userret().
- Drop the mtx_assert()'s of sched_lock and Giant. They are replaced by
the more general checks just added.

MFC after: 2 weeks


160312 12-Jul-2006 jhb

Simplify the pager support in DDB. Allowing different db commands to
install custom pager functions didn't actually happen in practice (they
all just used the simple pager and passed in a local quit pointer). So,
just hardcode the simple pager as the only pager and make it set a global
db_pager_quit flag that db commands can check when the user hits 'q' (or a
suitable variant) at the pager prompt. Also, now that it's easy to do so,
enable paging by default for all ddb commands. Any command that wishes to
honor the quit flag can do so by checking db_pager_quit. Note that the
pager can also be effectively disabled by setting $lines to 0.

Other fixes:
- 'show idt' on i386 and pc98 now actually checks the quit flag and
terminates early.
- 'show intr' now actually checks the quit flag and terminates early.


159627 15-Jun-2006 ups

Remove mpte optimization from pmap_enter_quick().
There is a race with the current locking scheme and removing
it should have no measurable performance impact.
This fixes page faults leading to panics in pmap_enter_quick_locked()
on amd64/i386.

Reviewed by: alc,jhb,peter,ps


159303 05-Jun-2006 alc

Introduce the function pmap_enter_object(). It maps a sequence of resident
pages from the same object. Use it in vm_map_pmap_enter() to reduce the
locking overhead of premapping objects.

Reviewed by: tegge@


159031 29-May-2006 alc

MFalpha/amd64/arm/ia64
Retire pmap_track_modified(). We no longer need it because we do not
create managed mappings within the clean submap. To prevent regressions,
add assertions blocking the creation of managed mappings within the clean
submap.


158651 16-May-2006 phk

Since DELAY() was moved, most <machine/clock.h> #includes have been
unnecessary.


157896 20-Apr-2006 imp

Set the rid for any resource obtained from rman_reserve_resource.

Reviewed by: wollman, jmg (as were the other commits fixing this problem)


157825 17-Apr-2006 marius

- Since critical sections no longer raise the processor interrupt level to
above what's used for fast interrupts, only interrupts with the level of
the interrupt which led to calling intr_fast() (which is used with both
fast and ithread interrupts) are blocked while in that function. Thus
intr_fast() can be preempted by a fast interrupt (which are of a higher
level than ithread interrupts) while servicing an ithread interrupt. This
can lead to a stale pointer to the head of the active interrupt requests
list when back in the ithread interrupt invocation of intr_fast(), in turn
resulting in corruption of the interrupt request lists and consequently
in a panic. Solve this be turning off interrupts in intr_fast() before
reading the pointer to the head of the active list rather than after. [1]
- Add a KASSERT in intr_fast() which asserts that ir_func is non-zero before
calling it. [1]
- Increment interrupt stats after calling the handlers rather than before.
This reduces the delay until direct and fast handlers are serviced, in my
testings by 30% on average for the direct tick interrupt handler, in turn
resulting in less clock drift.

PR: 94778 [1]
Submitted by: Andrew Belashov [1]
MFC after: 2 weeks


157513 04-Apr-2006 marius

For USIII CPUs the type of the trap caused by peeking/poking non-existent
PCI devices apparently was changed from a special deferred trap with TPC
pointing to the membar #Sync following the failing load/store instruction
to a precise trap with TPC pointing to the failing load/store instruction.
Thus remove the check the check whether TPC points to a membar #Sync in
case of a data access trap as it's off-by-one for USIII CPUs and it should
be sufficient to check whether the trap happend while in fasword*() to
properly detect traps caused by peeking/poking. This also corresponds to
what other OSs do. Note that also only the USIIi manual suggests to check
the TPC for such traps while the USII one doesn't (in the public USIII
manual device peeking/poking isn't mentioned at all).


157445 03-Apr-2006 marius

- s,tramoline,trampoline, in a comment.
- Use FBSDID in trap.c
- Make the global trap_sig[] static as it's not used outside of trap.c.
- In sendsig() remove an unused variable.
- In trap() sync with the other archs; for fast data access MMU miss and
data access protection traps set ksi_addr to the SFAR reg which contains
the faulting address and otherwise to the TPC reg. Generally the TCP reg
contains the address of the instruction that caused the exception, except
for fast instruction access traps (and some others; more refinement may
be needed here) it also contains the faulting address.
Previously sendsig() always set si_addr to the SFAR reg which is wrong
for most traps.
- In sendsig() add support for FreeBSD old-style signals.

These changes are inspired by kmacy's sun4v changes and allow libsigsegv
to build on FreeBSD/sparc64, but it doesn't pass all checks and tests it
actually should, yet.

MFC after: 5 days


157443 03-Apr-2006 peter

Remove the unused sva and eva arguments from pmap_remove_pages().


157240 29-Mar-2006 marius

- We only lock the local per-CPU page in the local dTLB, so accessing the
foreign per-CPU pages in cpu_ipi_send() in order to get the module IDs
of the other CPUs can cause a page fault. If this happens when doing a
TLB shootdown while dealing with another page fault this causes a panic
due to the recursive page fault. As I don't spot other code that assumes
or requires that accessing foreign per-CPU pages must not page fault
solve this by adding a statically allocated (and therefore locked in the
kernel pages) array which establishes a FreeBSD CPU ID -> module ID
relation and use that in cpu_ipi_selected() (instead of statically
allocating the per-CPU pages which would just waste memory on say a dual
CPU machine as sun4u theoretically supports up to 128 CPUs or wasting
dTLB slots for the foreign per-CPU pages). [1]
- Fix a potential race in cpu_ipi_send(); as we don't serialize the access
to cpu_ipi_selected() between MI and MD use (only MI-MI and MD-MD) we
might catch the NACK bit caused by sending another IPI. Solve this by
checking the NACK bit in the contents of the interrupt dispatch status
reg read while interrupts were still turned off instead of reading that
reg anew after interrupts were turned on again. This is also what the
CPU docs suggest to do.
- Add a workaround for the SpitFire erratum #54 bug (affecting interrupt
dispatch). While public info regarding what this CPU bug actually causes
is not available testing shows that with the workaround in place it's
less likely to get a "couldn't send ipi" panic, it doesn't solve these
panics entirely though. [2]

Reported by: kris [1]
Some clue from: kmacy [1]
Info from: Linux, OpenSolaris [2]
Additional testing by: kris
MFC after: 3 days


157227 28-Mar-2006 marius

- Add a comment describing why tick_init() is called before cninit().
- Fix a typo in another comment.


157226 28-Mar-2006 marius

- Move the check for too high HZ values from tick_init() to tick_start()
as we have to call tick_init() before cninit() in order to provide the
low-level console drivers with a working DELAY() which in turn means we
cannot use panic() in tick_init().
- s,to high, too high, in the panic string

Inspired by: kmacy's sun4v changes
MFC after: 3 days


156122 28-Feb-2006 brueffer

Fix a c/p error.

Obtained from: The TrustedBSD Project
Approved by: rwatson (mentor)


155839 19-Feb-2006 marius

- Don't bother traversing trap frames in stack_save(). This fixes panics
when option DEBUG_LOCKS is used. Trap frames are determined by checking
whether the caller was one of the tl0_*() or tl1_*() asm functions via
a newly added pair of dummy symbols in exception.S which mark the begin
and end of these functions. The tl_trap_* pair marks those in the special
.trap section and the tl_text_* in the regular .text section. Because
of their performance penalty db_search_symbol()/db_symbol_values() and
linker_ddb_search_symbol()/linker_ddb_symbol_values() aren't used here
for determining the caller, with db_search_symbol()/db_symbol_values()
additionally not being reentrant.
- For consistency, change db_backtrace() to also use the new markers for
determining the tl0_*() and tl1_*() asm functions instead of bcmp()'ing
the symbol name.
- Use FBSDID in db_trace.c.

PR: 93226
Based on a patch by: Antoine Brodin <antoine.brodin@laposte.net>
Ok'ed by: jhb


155814 18-Feb-2006 rwatson

Add system call auditing support for sparc64.

Submitted by: brueffer
Obtained from: TrustedBSD Project


155726 15-Feb-2006 marius

For E250 and E450 enable the watchdog part of the MK48Txx as it just
works there.

MFC after: 3 days


155680 14-Feb-2006 jhb

Fix the hw.realmem sysctl. The global realmem variable is a count of
pages, not a count of bytes. The sysctl handler for hw.realmem already
uses ctob() to convert realmem from pages to bytes. Thus, on archs that
were storing a byte count in the realmem variable, hw.realmem was inflated.

Reported by: Valerio daelli valerio dot daelli at gmail dot com (alpha)
MFC after: 3 days


155534 11-Feb-2006 phk

CPU time accounting speedup (step 2)

Keep accounting time (in per-cpu) cputicks and the statistics counts
in the thread and summarize into struct proc when at context switch.

Don't reach across CPUs in calcru().

Add code to calibrate the top speed of cpu_tickrate() for variable
cpu_tick hardware (like TSC on power managed machines).

Don't enforce monotonicity (at least for now) in calcru. While the
calibrated cpu_tickrate ramps up it may not be true.

Use 27MHz counter on i386/Geode.

Use TSC on amd64 & i386 if present.

Use tick counter on sparc64


155455 08-Feb-2006 phk

Simplify system time accounting for profiling.

Rename struct thread's td_sticks to td_pticks, we will need the
other name for more appropriately named use shortly. Reduce it
from uint64_t to u_int.

Clear td_pticks whenever we enter the kernel instead of recording
its value as reference for userret(). Use the absolute value of
td->pticks in userret() and eliminate third argument.


155444 07-Feb-2006 phk

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


154419 16-Jan-2006 kris

Correct typos (s/OFERFLOW/OVERFLOW/).

Reviewed by: jhb


153958 01-Jan-2006 scottl

Use the correct units when handling the hw.physmem tunable.


153940 31-Dec-2005 netchild

MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
this allows to try different coloring algorithms without the need to
touch every file [1]
- make the page queue tuning values readable: sysctl vm.stats.pagequeue
- autotuning of the page coloring values based upon the cache size instead
of options in the kernel config (disabling of the page coloring as a
kernel option is still possible)

MD changes:
- detection of the cache size: only IA32 and AMD64 (untested) contains
cache size detection code, every other arch just comes with a dummy
function (this results in the use of default values like it was the
case without the autotuning of the page coloring)
- print some more info on Intel CPU's (like we do on AMD and Transmeta
CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by: Chad David <davidc@acns.ab.ca> [1]
Reviewed by: alc, arch (in 2004)
Discussed with: alc, Chad David, arch (in 2004)


153741 26-Dec-2005 sobomax

Remove kern.elf32.can_exec_dyn sysctl. Instead extend Brandinfo structure
with flags bitfield and set BI_CAN_EXEC_DYN flag for all brands that usually
allow executing elf dynamic binaries (aka shared libraries). When it is
requested to execute ET_DYN elf image check if this flag is on after we
know the elf brand allowing execution if so.

PR: kern/87615
Submitted by: Marcin Koziej <creep@desk.pl>


153666 22-Dec-2005 jhb

Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)


153504 18-Dec-2005 marcel

Make our ELF64 type definitions match standards. In particular this
means:
o Remove Elf64_Quarter,
o Redefine Elf64_Half to be 16-bit,
o Redefine Elf64_Word to be 32-bit,
o Add Elf64_Xword and Elf64_Sxword for 64-bit entities,
o Use Elf_Size in MI code to abstract the difference between
Elf32_Word and Elf64_Word.
o Add Elf_Ssize as the signed counterpart of Elf_Size.

MFC after: 2 weeks


153175 06-Dec-2005 marius

Use <sys/ktr.h> directly in .S files instead of exporting the
KTR_* class macros via genassym.c. Together with sys/sys/ktr.h
rev. 1.34 this has the desired side-effect of providing a default
value for KTR_COMPILE. Thus this fixes warnings from -Wundef
regarding KTR_COMPILE not being defined for .S files.

Requested by: ru
Reviewed by: ru


152961 30-Nov-2005 marius

Remove superfluous bzero()'ing of the softc.


152960 30-Nov-2005 marius

Remove superfluous inclusion of upa.h.


152630 20-Nov-2005 alc

Eliminate pmap_init2(). It's no longer used.


152224 09-Nov-2005 alc

Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon. Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails. Previously, reclamation
destroyed mappings to both inactive and active pages. get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs. The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time. This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge


152022 03-Nov-2005 jhb

Add stoppcbs[] arrays on Alpha and sparc64 and have each CPU save its
current context in the IPI_STOP handler so that we can get accurate stack
traces of threads on other CPUs on these two archs like we do now on i386
and amd64.

Tested on: alpha, sparc64


151658 25-Oct-2005 jhb

Reorganize the interrupt handling code a bit to make a few things cleaner
and increase flexibility to allow various different approaches to be tried
in the future.
- Split struct ithd up into two pieces. struct intr_event holds the list
of interrupt handlers associated with interrupt sources.
struct intr_thread contains the data relative to an interrupt thread.
Currently we still provide a 1:1 relationship of events to threads
with the exception that events only have an associated thread if there
is at least one threaded interrupt handler attached to the event. This
means that on x86 we no longer have 4 bazillion interrupt threads with
no handlers. It also means that interrupt events with only INTR_FAST
handlers no longer have an associated thread either.
- Renamed struct intrhand to struct intr_handler to follow the struct
intr_foo naming convention. This did require renaming the powerpc
MD struct intr_handler to struct ppc_intr_handler.
- INTR_FAST no longer implies INTR_EXCL on all architectures except for
powerpc. This means that multiple INTR_FAST handlers can attach to the
same interrupt and that INTR_FAST and non-INTR_FAST handlers can attach
to the same interrupt. Sharing INTR_FAST handlers may not always be
desirable, but having sio(4) and uhci(4) fight over an IRQ isn't fun
either. Drivers can always still use INTR_EXCL to ask for an interrupt
exclusively. The way this sharing works is that when an interrupt
comes in, all the INTR_FAST handlers are executed first, and if any
threaded handlers exist, the interrupt thread is scheduled afterwards.
This type of layout also makes it possible to investigate using interrupt
filters ala OS X where the filter determines whether or not its companion
threaded handler should run.
- Aside from the INTR_FAST changes above, the impact on MD interrupt code
is mostly just 's/ithread/intr_event/'.
- A new MI ddb command 'show intrs' walks the list of interrupt events
dumping their state. It also has a '/v' verbose switch which dumps
info about all of the handlers attached to each event.
- We currently don't destroy an interrupt thread when the last threaded
handler is removed because it would suck for things like ppbus(8)'s
braindead behavior. The code is present, though, it is just under
#if 0 for now.
- Move the code to actually execute the threaded handlers for an interrrupt
event into a separate function so that ithread_loop() becomes more
readable. Previously this code was all in the middle of ithread_loop()
and indented halfway across the screen.
- Made struct intr_thread private to kern_intr.c and replaced td_ithd
with a thread private flag TDP_ITHREAD.
- In statclock, check curthread against idlethread directly rather than
curthread's proc against idlethread's proc. (Not really related to intr
changes)

Tested on: alpha, amd64, i386, sparc64
Tested on: arm, ia64 (older version of patch by cognet and marcel)


151354 15-Oct-2005 davidxu

Fix compiling.


151316 14-Oct-2005 davidxu

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


149925 10-Sep-2005 marcel

Move the prototypes of db_md_set_watchpoint(), db_md_clr_watchpoint()
and db_md_list_watchpoints() to ddb/ddb.h.


149768 03-Sep-2005 alc

Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.


148823 07-Aug-2005 marius

The system tick _compare_ register of USIII CPUs and up is ASR25, not
ASR24 (which is the system tick register).


148666 03-Aug-2005 jeff

- Add support for saving stack traces and displaying them via printf(9)
and KTR.

Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
Concept code from: Neal Fachan <neal@isilon.com>


147889 10-Jul-2005 davidxu

Validate if the value written into {FS,GS}.base is a canonical
address, writting non-canonical address can cause kernel a panic,
by restricting base values to 0..VM_MAXUSER_ADDRESS, ensuring
only canonical values get written to the registers.

Reviewed by: peter, Josepha Koshy < joseph.koshy at gmail dot com >
Approved by: re (scottl)


147217 10-Jun-2005 alc

Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields. Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations. Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel


146982 04-Jun-2005 marius

After some input from bde@ and rereading the datasheet use a MTX_SPIN
mutex instead of a MTX_DEF one in order to defer preemption while
reading the date and time registers. If we don't manage to read them
within the time slot where we are guaranteed that no updates occur we
might actually read them during an update in which case the output is
undefined.


146794 29-May-2005 marcel

Create nexus in configure_first() instead of in configure(). This
makes sure that sysinit tasks that run after configure_first(),
but before configure() have a nexus to hang devices off.


146792 29-May-2005 marcel

Call cninit_finish() from configure_final().


146726 28-May-2005 alc

pmap_enter() no longer requires Giant. Therefore, stop acquiring and
releasing it in pmap_enter_quick().

MFC after: 3 weeks


146480 21-May-2005 marius

o creator(4):
- Use register macros instead of magic values in the code. [1]
- Check the return values of OF_getprop() and other stuff that actually
can fail.
- Let the unimplemented video driver methods return ENODEV rather
than 0 so other code isn't tricked into thinking a certain operation
was successfull. In case of e.g. the video driver creator_ioctl()
this caused vidcontrol(1) to return random garbage information.
Remove the TODO macros in the unimplemented video driver methods
which did a printf("%s: unimplemented\n", __func__). Under certain
circumstances these managed to invoke a printf() when a low-level
console device wasn't attached, yet, causing a Fast Data Access MMU
Miss. These macros were only really usefull for development anyway.
- Set the struct video_adapter and struct video_info va_flags and
vi_flags etc. as appropriate.
- In creator_configure() don't rely on hitting the node which is the
chosen console device first when searching the OFW tree for adapters
compatible with this driver. Instead just check whether the chosen
console device is a viable target for this driver. Targets that are
not the console (including additional cards in multi-head configs)
will be attached through creator_upa_attach(). I think this how the
code in creator_configure() was actually meant to work.
Honour the VIO_PROBE_ONLY flag and don't initialise and register the
console device twice when creator_configure() is called a second time
during sc_probe_unit().
Let creator_configure() return the number of the found adapters,
i.e. 1 in case probing succeeds, as it's expected. The return values
of video adapter configure functions however currently aren't checked
so this doesn't make a difference at the moment.
- In creator_upa_attach() don't rely on probing and attaching the
adapter which is the console first, in case there are multiple
adpaters and one of them is the console this could lead into using
the video adapter unit 0 twice.
- Make the check for DACs with inverted cursor control a bit more
precise and actually honour that information when turning the cursor
on or off. Add a helper function creator_cursor_enable() for this
in order to keep code duplication low. [1]
- Don't bother with faking a hardware cursor in case a device is the
console. Apparently this was meant to start kernel output right after
where the firmware left. In general this isn't worth the fuzz and
also had no real effect as creator_set_mode() did clear the screen
in any case, not just in case a device was not the console.
- Implement creator_fill_rect() and use it to actually blank the
display in creator_blank_display() when the mode is V_DISPLAY_BLANK,
moving blanking the display out of creator_set_mode(). Use it also
to implement creator_set_border() so the border can be re-drawn
when switching to a VTY from X, exiting X, etc. (which leaves us
with a black border most of the time).
- Implement the video driver creator_ioctl(), moving the implementation
of the IOCTL interface from the fbN CDEV version of creator_ioctl()
into the video driver version and use the latter to implement the
former. Use fb_commonioctl() to handle most of the FBIO IOCTLs.
This gives programs like vidcontrol(1) which use the video driver
creator_ioctl() a chance of working.
Implement turning off the cursor via the FBIOSCURSOR IOCTL, which
Xorg uses to in order to inform the OS that it's taking over the
cursor. In creator_putm() check whether the cursor is enabled and
(re-)install it if necessary, moving installing the cursor out of
creator_init() and into a helper function creator_cursor_install().
This fixes the missing mouse pointer when switching to a VTY from X,
exiting X, etc.
- Some clean-up (remove unused/useless code, etc.).

o sparc64/creator/creator_upa.c / sparc64/sparc64/sc_machdep.c:
- Attach syscons(4) as an own pseudo-device on the nexus rather than
directly in creator_upa_attach(), similiar to attaching syscons(4)
as a pseudo-device on isa(4) on other archs. This makes it a whole
lot easier to do the right thing in multi-head configs, especially
with different types of graphics adapters. [2]
- Set SC_AUTODETECT_KBD by default so USB keyboards work out of the
box. [2]

Based on/obtained from: Xorg 'ffb' driver [1]
Based on/obtained from: FreeBSD/powerpc [2]


146474 21-May-2005 marius

- MFpowerpc: sys/powerpc/powerpc/nexus.c rev. 1.7 (partial)
Use bus_generic_probe() and add a bus_add_child() interface method to
allow device drivers to use the identify method to add themselves if
need be (e.g. syscons(4)).
- Use FBSDID.


146473 21-May-2005 marius

- Make sure that the OFW address properties that are going to be decode
consist of the expected number of address and size cells (we can't use
dynamic arrays here because at the point in the boot process when this
code is used malloc() doesn't work, yet). This fixes a Fast Data Access
MMU Miss when uart(4) (erroneously) calls OF_decode_addr() to decode
the address of PS/2 keyboards. PS/2 keyboards use a different and also
undocumented scheme at the first parent node than mapping at 'ranges'
properties. It's however not worth implementing that other scheme and
actually also fits atkbdc(4) better to just start at the first parent
node of PS/2 keyboards which is the 8042 controller (I have atkbdc(4)
working that way).
- Use FBSDID.

MFC after: 1 month


146417 19-May-2005 marius

o mc146818(4):
- Add locking.
- Account for if the MC146818_NO_CENT_ADJUST flag is set we don't need
to check wheter year < POSIX_BASE_YEAR.
- Add some comments about mapping the day of week from the range the
generic clock code uses to the range the chip uses and which I meant
to add in the initial version.
- Minor clean-up, use __func__ instead of hardcoded function names in
error strings.

o in the rtc(4) front-end additionally:
- Don't leak resources in case mc146818_attach() fails.
- Account for ebus(4) defaulting to SYS_RES_MEMORY for the memory
resources since ebus.c rev. 1.22.


146416 19-May-2005 marius

- Add locking.
- Add support for storing the century in MK48TXX_WDAY_CB on MK48Txx with
extended registers when the MK48TXX_NO_CENT_ADJUST flag is set (and which
is termed somewhat confusing as it actually means don't manually adjust
the century in the driver).
- Add the MI part of interfacing the watchdog functionality of MK48Txx with
extended registers with watchdog(9). This is inspired by the SunOS/Solaris
drivers for the 'eeprom' devices also having watchdog support. I actually
expected this to work out of the box on Sun Exx00 machines with 'eeprom'
devices which have a 'watchdog-enable' property. On terminal count of the
the watchdog timer however only the MK48TXX_FLAGS_WDF bit rises but the
reset signal and the interrupt respectively (depending on whether the
MK48TXX_WDOG_WDS bit of the chip and the MK48TXX_WDOG_ENABLE_WDS flag
of the driver respectively is set) goes nowhere. Apparently passing the
reset signal on to the WDR line of the CPUs has to be enabled somewhere
else but we don't have documentation for the Exx00 specific controllers.
I decided to commit this nevertheless so it can be enabled in the eeprom(4)
front-end later in e.g. 6.0-STABLE without breaking the API. Besides the
Exx00 the watchdog part of the MK48Txx should also work on E250 and E450.
Possibly also without extra fiddling on these machines but I haven't
found someone willing to give it a try on such a machine so far.
- Use uintXX_t instead of u_intXX_t, use __func__ instead of hardcoded
function names in error strings.


146411 19-May-2005 marius

- Collapse eeprom_ebus.c and eeprom_sbus.c into eeprom.c and
eeprom_ebus_attach() and eeprom_sbus_attach() into eeprom_attach()
respectively. Since the introduction of the ofw_bus interface some
time ago and now that ebus(4) also uses SYS_RES_MEMORY for the
memory resources since ebus.c rev. 1.22 there is no longer a
need to have separate front-ends for ebus(4), fhc(4) and sbus(4).
- Fail gracefully instead of panicing when the model can't be
determined.
- Don't leak resources when mk48txx_attach() fails.
- Use FBSDID.


145433 23-Apr-2005 davidxu

Change cpu_set_kse_upcall to more generic style, so we can reuse it
in other codes. Add cpu_set_user_tls, use it to tweak user register
and setup user TLS. I ever wanted to merge it into cpu_set_kse_upcall,
but since cpu_set_kse_upcall is also used by M:N threads which may
not need this feature, so I wrote a separated cpu_set_user_tls.


145153 16-Apr-2005 marius

- MFi386: sys/i386/i386/intr_machdep.c rev. 1.11
Don't use atomic ops to increment interrupt stats.
On sparc64 this reduces delay until tick interrupts are service by 1/10th
on average. In turn this reduces the clock drift caused by these delays
so there's less drift which has to be compensated in tick_hardclock().
This includes switching from atomically incrementing the global cnt.v_intr
to the asm equivalent of PCPU_LAZY_INC(cnt.v_intr) in exception.S
- Correct some comments to match the registers actually used.
- Correct some format specifiers, interrupt levels passed in are u_int.
- Use FBSDID.

Ok'ed by: jhb


145152 16-Apr-2005 marius

Some changes to intr_execute_handlers():
- Fix NULL pointer dereferences caused when an ithread or a handler is
NULL which happens when a stray interrupt triggers after the respective
device interrupt was torn down.
- Remove the critical section around INTR_FAST handlers which actually
was a nested critical section. Both tl0_intr() and tl1_intr() already
enter a critical section for calling intr_execute_handlers().

MFC after: 3 days


145151 16-Apr-2005 marius

- In sparc64_init() remove the call to tick_stop(). There's no need to
call tick_stop() again after tick_init() as tick interrupts already
have been disabled as part of tick_init().
- In spinlock_enter() replace the magic value for PIL TICK with the
respective macro.
- Use FBSDID.


145150 16-Apr-2005 marius

- Add a workaround for a bug in BlackBird CPUs (said to be part of the
SpitFire erratum #54) which can cause writes to the TICK_CMPR register
to fail. This seems to fix the dying clocks problem reported by jhb@
and kris@. [1]
- In tick_start() don't reset the tick counter of the boot processor to
zero. It's initially reset in _start() and afterwards but _before_
tick_start() is called on the BSP the APs synchronise with the tick
counter of the BSP in mp_startup(). Resetting the tick counter of the
BSP in tick_start() probably also was the cause of problems seen when
using the CPU tick counter as timecounter on SMP machines.
Not resetting the tick counter of the BSP in mp_startup() makes the
tick counters and tick interrupts between the BSP and APs be pretty
much in sync as it's supposed to be. This also means there's no longer
a real reason to have separate tick_start() and tick_start_ap() so
merge them and zap tick_start_ap(). This is also a first step in
simplifying the interface to the tick counters in preparation to use
alternate clock hardware where available.
- Switch to the algorithm used on FreeBSD/ia64 for updating the tick
interrupt register and which compensates the clock drift caused by
varying delays between when the tick interrupts actually trigger and
when they are serviced. Not compensating the clock drift mainly hurts
interactive performance especially when using WITNESS. [2]
For further information about the algorithm also see the commit log
of sys/ia64/ia64/interrupt.c rev. 1.38.
On sparc64 the sysctls for monitoring the behaviour of the tick
interrupts are machdep.tick.adjust_edges, machdep.tick.adjust_excess,
machdep.tick.adjust_missed and machdep.tick.adjust_ticks.
- In tick_init() just use tick_stop() for stopping the tick interrupts
until a proper handler is set up later. This also stops the system
tick interrupt on USIII systems earlier.
- In tick_start() check for a rough upper limit of HZ.
- Some minor changes, e.g. use FBSDID, remove unused headers, etc.

Info obtained from: Linux [1]
Ok'ed by: marcel [2]
Additional testing by: kris (earlier version of the workaround), jhb
X-MFC after: 3 days [1]


145085 14-Apr-2005 jhb

Close a race I introduced in the spinlock_* changes. We need to finish
disabling interrupts before updating the saved pil in the thread. If we
save the value first then it can be clobbered if an interrupt comes in
and the interrupt handler tries to acquire a spin lock.

Submitted by: marius


144971 12-Apr-2005 jhb

Use PCPU_LAZY_INC() for cnt.v_{intr,trap,syscalls} rather than atomic
operations in some places and simple non-per CPU math in others.


144637 04-Apr-2005 jhb

Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more


143765 17-Mar-2005 iedowse

Split configure() into 3 separate steps like we do on other
architectures. This makes it possible to insert hooks before and
after the device attachment step.


143073 03-Mar-2005 marius

Remove the transition aid for the change of the sparc64 default system
call vector which was added in rev. 1.52. This change was done way before
sparc64 switched to a 64-bit time_t so all binaries are expected to have
been recompiled by now.


143024 02-Mar-2005 marius

- Allow multiple INTR_FAST handlers for the same source. The motivation
for this are the on-board SCCs and UARTs that use a shared IRQ. [1]
- Rework the interrupt counting code to account for shared interrupts. [1]
- In case ithread_add_handler() failed in inthand_add() just return with
the error code instead of setting up a non-fast handler regardless or
setting up a non-fast handler instead of a fast handler. I can't think
of a situation where the former behaviour would do the right thing.

Reviewed by: marcel [1]
Based on: sys/i386/i386/intr_machdep.c [1]


143021 02-Mar-2005 marius

Assorted style fixes and minor changes:
- Use FBSDID.
- Use uintXX_t instead of u_intXX_t.
- Be consistent with white-space.
- Mark some globals as static.
- Add a missing prototype.
- Remove a unused variable.
- etc.


142956 01-Mar-2005 wes

Attempt to doff the pointy hat: implement 'hw.realmem' on remaining
architectures. Pointed out by O'Brien, ScottL via email.

Reviewed by: obrien (various)


142869 01-Mar-2005 alc

Use the kernel pmap's lock to guarantee that only one thread at a time is
using either pmap_temp_map_1 or pmap_temp_map_2.

Tested by: kris@


142001 17-Feb-2005 marius

UltraSparc II[e,i] based systems come up with the tick compare register
loaded, the tick interrupt enabled and a handler that resets the tick
counter on every tick interrupt. While this isn't documented this can
cause DELAY() to wait for a value the tick counter will not reach when
used in early boot, i.e. before cpu_initclocks() is called, depending
on when in the cycle DELAY() is called, the delay value and the value
the tick compare register is set to. The excessive use of DELAY() in
uart(4) when probing Sun keyboards seems to always manage to trigger
this, resulting in a hang during boot.
Disable the tick interrupt in tick_init(), which is called early in
sparc64_init(), until the interrupt is enabled again in tick_start(),
called by cpu_initclocks(), with our own handler. This fixes the hang
during probing Sun keyboards on AXi boards and Ultra 10, with other
machines like Ultra 5 probably being affected but not tested.

Additional testing by: Matthias Muthmann
MFC after: 1 week


141753 12-Feb-2005 marius

- Re-write OF_decode_addr() with a bus-neutral approach, adding support
for nodes hanging off of Central (untested), FireHose (untested) and
PCI (tested) busses.
- Add an additional parameter to OF_decode_addr() which specifies the
index of the register bank to decode.

These should allow to eventually add support for the Z8530 hanging off of
FireHose to uart(4) and to write support for PCI-based graphics adapters.

Suggested by: tmm (back in '03)


141712 12-Feb-2005 alc

Add lock assertion.

Tested by: jhb


141378 06-Feb-2005 njl

Finish the job of sorting all includes and fix the build by including
malloc.h before proc.h on sparc64. Noticed by das@

Compiled on: alpha, amd64, i386, pc98, sparc64


141370 05-Feb-2005 alc

Acquire the source pmap's lock in pmap_copy().


141249 04-Feb-2005 njl

Sort includes a little so that bus.h comes before cpu.h (for device_t).


141237 04-Feb-2005 njl

Add an implementation of cpu_est_clockrate(9). This function estimates the
current clock frequency for the given CPU id in units of Hz.


141084 31-Jan-2005 scottl

Yikes! Fix a typo in a function name that managed to occur twice.

Submitted by: yongari


140485 19-Jan-2005 jhb

Add a small API to manage the MD user trap structures. Specifically, we
now use a pool mutex to manage the reference counts. This fixes races
resulting in use-after-free.

Tested by: kris, David Cornejo dave at dogwood dot com
Reported by: bmilekic's MemGuard
MFC after: 1 week


140281 15-Jan-2005 scottl

Add the bus_dmamap_load_mbuf_sg() function to sparc64.


139825 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


139264 24-Dec-2004 scottl

Identify USIIIi processors.

Submitted by: Gavin Atkinson
PR: 75468


139241 23-Dec-2004 alc

Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.


138897 15-Dec-2004 alc

In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead. To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep. Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock. (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)


138697 11-Dec-2004 alc

Pass VM_ALLOC_NOBUSY to vm_page_grab() so that we don't have to call
vm_page_flag_clear(PG_BUSY). The object lock is held the entire time.
Thus, whether or not the PG_BUSY flag is set is invisible to others.


138253 01-Dec-2004 marcel

Change gdb_cpu_setreg() to not take the value to which to set the
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.

This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.

Tested on: alpha, amd64, i386, ia64, sparc64


138129 27-Nov-2004 das

Don't include sys/user.h merely for its side-effect of recursively
including other headers.


137917 20-Nov-2004 das

Remove references to U area and garbage collect includes.

Reviewed by: arch@


137912 20-Nov-2004 das

U areas are going away, so don't allocate one for process 0.

Reviewed by: arch@


137822 17-Nov-2004 marius

Add a front-end for the `rtc' device which is a MC146818 compatible
clock found on the ISA bus (some USIIe, USIIi and USIIIi models) and
EBus (USIII models) instead of a MK48Txx clock.

Testet by: Matthew T. Lager" <freebsd@trinetworks.com> on Sun Fire V100,
Xavier Beaudouin <kiwi@oav.net> on Netra X1 (initial version)


137813 17-Nov-2004 marius

o Sync with the NetBSD mk48txx driver (the result simplyfies some changes
I have in mind for the genclock interface):
- Recognize the MK48T18 as well (differs from the MK48T08 only in
packaging options and voltages).
- Allow MD code to provide functions for reading/writing NVRAM/RTC
locations.
If passed NULL, the old behaviour using bus_space_{read,write}_1() is
used. Otherwise, all access to the chip goes via the MD functions.
This is necessary for mvmeppc boards where the mk48txx NVRAM/RTC is
not directly addressable.
- Cleanup MI mk48txx(4) todclock driver:
- Prepare mk48txxvar.h and leave only register definitions in
mk48txxreg.h.
- Define struct mk48txx_softc as usual devices and allocate necessary
members in it.
- Change mk48txx_attach() to only take a device_t.
o While converting the sparc64 eeprom driver to the above changes:
- Remove some dead code and stale comments.
- Use the NVRAM size provided by the mk48txx driver instead of hardcoding
it as suggested by a comment.
- Add a comment about why it doesn't make much sense to read the hostid
directly from the NVRAM except for displaying it when attaching.
- Don't print the hostid if it reads all zero because it's stored
elsewhere.


137376 08-Nov-2004 alc

Correct a typo in the previous revision.


137372 08-Nov-2004 alc

Introduce two new options, "CPU private" and "no wait", to sf_buf_alloc().
Change the spelling of the "catch" option to be consistent with the new
options. Implement the "no wait" option. An implementation of the "CPU
private" for i386 will be committed at a later date.


137168 03-Nov-2004 alc

The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy(). Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set. Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition. Typically,
this is when it is undergoing I/O.


137117 01-Nov-2004 jhb

- Change the ddb paging "support" to use a variable (db_lines_per_page) to
control the number of lines per page rather than a constant. The variable
can be examined and changed in ddb as '$lines'. Setting the variable to
0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
newlines and carriage returns so that one can rub out content on the
current line via '\r \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
the routine exits.
- Add some aliases to the simple pager to make it more compatible with
more(1): 'e' and 'j' do a single line. 'd' does half a page, and
'f' does a full page.

MFC after: 1 month
Inspired by: kris


136325 09-Oct-2004 kensmith

Flush the register windows before we start changing the context.

Submitted by: Andrew Belashov <bel (at) orel.ru> (slightly modified)
Reviewed by: jake


135972 30-Sep-2004 kensmith

This along with v1.6 of counter.c fixes some timecounter issues on
MP machines (hopefully). CPU timers are OK on UP machines but we
don't keep the timers in sync on MP machines so if the CPU's timer
is chosen as the primary timecounter it's possible for time to
not be monotonically increasing because different CPU's counters
may be used at different times. But the CPU's counters are otherwise
one of the higher quality counters available. So, on UP machines
we'll use a relatively high quality value but on MP machines we'll
use a quality that should prevent the CPU's counters from being chosen.

Requested by: green (who did the first version of the patch)
Reviewed by: marius, green
MFC after: 1 week


135971 30-Sep-2004 kensmith

Set the tc_quality field of the struct before calling tc_init(), since
the structure space had been obtained from malloc() its contents is
random garbage. The choice of value being set is part of a larger effort
to solve some timecounter issues on MP machines (while working on that
we noticed this problem).

Noticed by: marius
Reviewed by: marius, green
MFC after: 3 days


135885 28-Sep-2004 kensmith

Add an assertion that the pcb_nsaved field of the pcb be less than
MAXWIN to the register window manipulation functions - rwindow_load()
calls rwindow_save() so this one addition should take care of both.
This should help find places that pcb_nsaved doesn't get initialized
properly.

Suggested by: jake


135855 27-Sep-2004 kensmith

Some minor print/panic message cleanups.


135853 27-Sep-2004 kensmith

Initialize the count of saved register windows to 0 in the pcb created
for the new thread. The rest of the fields in the pcb wind up being
written to before they're read as a normal part of the pcb usage but
this field may be read upon return to userland, having it be uninitialized
garbage is bad.

Submitted by: Andrew Belashov (bel at orel dot ru)
Reviewed by: jake
MFC after: 3 days


135529 20-Sep-2004 jhb

- Add support for "paging" in stack trace output. That is, when you do
a stack trace from ddb, the output will pause with a '--More--' prompt
every 18 lines. If you hit Enter, it will print another line and prompt
again. If you hit space it will output another page and then prompt.
If you hit 'q' or 'x' it will abort the rest of the stack trace.
- Fix the sparc64 userland stack trace to honor the total count of lines
to print. This is useful if your trace happens to walk back onto
0xdeadc0de and gets stuck in an endless loop.

MFC after: 1 month
Tested on: i386, alpha, sparc64


135030 10-Sep-2004 marcel

Better fix the busdma problem exposed by ATA. With the CMD 646 for
example the maximum segment size is 64K while the boundary is set
to 8K due to controller limitations. It is impossible to NOT cross
the boundary for any segment size that's larger than the boundary.
So, once we inherited the boundary from the parent tag, make sure
to reduce the maximum segment size to the boundary if it was larger.

MT5 candidate.


134934 08-Sep-2004 scottl

Fix a problem with tag->boundary inheritence that has existed since day one
and was propagated to nearly every platform. The boundary of the child needs
to consider the boundary of the parent and pick the minimum of the two, not
the maximum. However, if either is 0 then pick the appropriate one.
This bug was exposed by a recent change to ATA, which should now be fixed by
this change. The alignment and maxsegsz tag attributes likely also need
a similar review in the near future.

This is a MT5 candidate.

Reviewed by: marcel
Submitted by: sos (in part)


134791 05-Sep-2004 julian

Refactor a bunch of scheduler code to give basically the same behaviour
but with slightly cleaned up interfaces.

The KSE structure has become the same as the "per thread scheduler
private data" structure. In order to not make the diffs too great
one is #defined as the other at this time.

The KSE (or td_sched) structure is now allocated per thread and has no
allocation code of its own.

Concurrency for a KSEGRP is now kept track of via a simple pair of counters
rather than using KSE structures as tokens.

Since the KSE structure is different in each scheduler, kern_switch.c
is now included at the end of each scheduler. Nothing outside the
scheduler knows the contents of the KSE (aka td_sched) structure.

The fields in the ksegrp structure that are to do with the scheduler's
queueing mechanisms are now moved to the kg_sched structure.
(per ksegrp scheduler private data structure). In other words how the
scheduler queues and keeps track of threads is no-one's business except
the scheduler's. This should allow people to write experimental
schedulers with completely different internal structuring.

A scheduler call sched_set_concurrency(kg, N) has been added that
notifies teh scheduler that no more than N threads from that ksegrp
should be allowed to be on concurrently scheduled. This is also
used to enforce 'fainess' at this time so that a ksegrp with
10000 threads can not swamp a the run queue and force out a process
with 1 thread, since the current code will not set the concurrency above
NCPU, and both schedulers will not allow more than that many
onto the system run queue at a time. Each scheduler should eventualy develop
their own methods to do this now that they are effectively separated.

Rejig libthr's kernel interface to follow the same code paths as
linkse for scope system threads. This has slightly hurt libthr's performance
but I will work to recover as much of it as I can.

Thread exit code has been cleaned up greatly.
exit and exec code now transitions a process back to
'standard non-threaded mode' before taking the next step.
Reviewed by: scottl, peter
MFC after: 1 week


134571 31-Aug-2004 julian

Remove an unneeded argument..
The removed argument could trivially be derived from the remaining one.
That in turn should be the same as curthread, but it is possible that curthread could be expensive to derive on some syste,s so leave it as an argument.
Having both proc and thread as an argumen tjust gives an opportunity for
them to get out sync.

MFC after: 3 days


134568 31-Aug-2004 julian

Remove sched_free_thread() which was only used
in diagnostics. It has outlived its usefulness and has started
causing panics for people who turn on DIAGNOSTIC, in what is otherwise
good code.

MFC after: 2 days


134127 21-Aug-2004 alc

Properly free the temporary sf_buf in uiomove_fromphys() if a copyin or
copyout fails.

Obtained from: DragonFlyBSD


133862 16-Aug-2004 marius

Instead of "OpenFirmware", "openfirmware", etc. use the official spelling
"Open Firmware" from IEEE 1275 and OpenFirmware.org (no pun intended).

Ok'ed by: tmm


133774 15-Aug-2004 marius

Correct some uses of the wrong members of the *min()/*max()-familiy, e.g.
min() on unsigned long. None of these are believed to have been fatal though.

Reviewed by: tmm


133728 14-Aug-2004 marius

- Make OF_getetheraddr() honour the "local-mac-address?" system config
variable. If set to "true" OF_getetheraddr() will now return the unique
MAC address stored in the "local-mac-address" property of the device's
OFW node if present and the host address/system default MAC address if
the node doesn't doesn't have such a property. If set to "false" the
host address will be returned for all devices like before this change.
This brings the behaviour of device drivers for NICs with OFW support/
FCode, i.e. dc(4) for on-board DM9102A on Sun machines, gem(4) and hme(4),
regarding "local-mac-address?" in line with NetBSD and Solaris.
The man pages of the respective drivers will be updated separately to
reflect this change.
- Remove OF_getetheraddr2() which was used as a stopgap in dc(4). Its
functionality is now part of OF_getetheraddr().


133663 13-Aug-2004 alc

Add pmap locking to pmap_remove_all().


133589 12-Aug-2004 marius

- Introduce an ofw_bus kobj-interface for retrieving the OFW node and a
subset ("compatible", "device_type", "model" and "name") of the standard
properties in drivers for devices on Open Firmware supported busses. The
standard properties "reg", "interrupts" und "address" are not covered by
this interface because they are only of interest in the respective bridge
code. There's a remaining standard property "status" which is unclear how
to support properly but which also isn't used in FreeBSD at present.
This ofw_bus kobj-interface allows to replace the various (ebus_get_node(),
ofw_pci_get_node(), etc.) and partially inconsistent (central_get_type()
vs. sbus_get_device_type(), etc.) existing IVAR ones with a common one.
This in turn allows to simplify and remove code-duplication in drivers for
devices that can hang off of more than one OFW supported bus.
- Convert the sparc64 Central, EBus, FHC, PCI and SBus bus drivers and the
drivers for their children to use the ofw_bus kobj-interface. The IVAR-
interfaces of the Central, EBus and FHC are entirely replaced by this. The
PCI bus driver used its own kobj-interface and now also uses the ofw_bus
one. The IVARs special to the SBus, e.g. for retrieving the burst size,
remain.
Beware: this causes an ABI-breakage for modules of drivers which used the
IVAR-interfaces, i.e. esp(4), hme(4), isp(4) and uart(4), which need to be
recompiled.
The style-inconsistencies introduced in some of the bus drivers will be
fixed by tmm@ in a generic clean-up of the respective drivers later (he
requested to add the changes in the "new" style).
- Convert the powerpc MacIO bus driver and the drivers for its children to
use the ofw_bus kobj-interface. This invloves removing the IVARs related
to the "reg" property which were unused and a leftover from the NetBSD
origini of the code. There's no ABI-breakage caused by this because none
of these driver are currently built as modules.
There are other powerpc bus drivers which can be converted to the ofw_bus
kobj-interface, e.g. the PCI bus driver, which should be done together
with converting powerpc to use the OFW PCI code from sparc64.
- Make the SBus and FHC front-end of zs(4) and the sparc64 eeprom(4) take
advantage of the ofw_bus kobj-interface and simplify them a bit.

Reviewed by: grehan, tmm
Approved by: re (scottl)
Discussed with: tmm
Tested with: Sun AX1105, AXe, Ultra 2, Ultra 60; PPC cross-build on i386


133464 11-Aug-2004 marcel

Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc


133451 10-Aug-2004 alc

Add pmap locking to many of the functions.

Implement the protection check required by the pmap_extract_and_hold()
specification.

Remove the acquisition and release of Giant from pmap_extract_and_hold() and
pmap_protect().

Many thanks to Ken Smith for resolving a sparc64-specific initialization
problem in my original patch.

Tested by: kensmith@


133143 04-Aug-2004 alc

- Push down the acquisition and release of Giant into pmap_enter_quick()
on those architectures without pmap locking.
- Eliminate the acquisition and release of Giant in vm_map_pmap_enter().


132956 01-Aug-2004 markm

Break out the MI part of the /dev/[k]mem and /dev/io drivers into
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.


132899 30-Jul-2004 alc

- Push down the acquisition and release of Giant into pmap_protect() on
those architectures without pmap locking.
- Eliminate the acquisition and release of Giant from vm_map_protect().

(Translation: mprotect(2) runs to completion without touching Giant on
alpha, amd64, i386 and ia64.)


132575 23-Jul-2004 alc

Use kmem_alloc_nofault() rather than kmem_alloc_pageable() for allocating
KVA for explicitly managed mappings, i.e., mappings created with
pmap_qenter().


132482 21-Jul-2004 marcel

Unify db_stack_trace_cmd(). All it did was look up the thread given
the thread ID and call db_trace_thread().
Since arm has all the logic in db_stack_trace_cmd(), rename the
new DB_COMMAND function to db_stack_trace to avoid conflicts on
arm.
While here, have db_stack_trace parse its own arguments so that
we can use a more natural radix for IDs. If the ID is not a thread
ID, or more precisely when no thread exists with the ID, try if
there's a process with that ID and return the first thread in it.
This makes it easier to print stack traces from the ps output.

requested by: rwatson@
tested on: amd64, i386, ia64


132220 15-Jul-2004 alc

Push down the acquisition and release of the page queues lock into
pmap_protect() and pmap_remove(). In general, they require the lock in
order to modify a page's pv list or flags. In some cases, however,
pmap_protect() can avoid acquiring the lock.


132088 13-Jul-2004 davidxu

Add ptrace_clear_single_step(), alpha already has it for years, the function
will be used by ptrace to clear a thread's single step state.


131952 10-Jul-2004 marcel

Mega update for the KDB framework: turn DDB into a KDB backend.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.

The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.

With this change, ia64 has support for breakpoints.


131950 10-Jul-2004 marcel

Update for the KDB framework:
o Make debugging code conditional upon KDB instead of DDB.
o Call kdb_enter() instead of Debugger().
o Remove implementation of Debugger().
o Check kdb_active instead of db_active.
o Call kdb_trap() according to the new world order.


131905 10-Jul-2004 marcel

Implement makectx(). The makectx() function is used by KDB to create
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.


131899 10-Jul-2004 marcel

Introduce the GDB debugger backend for the new KDB framework. The
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.


131537 03-Jul-2004 imp

These don't need RMAN_RESOURCE_VISIBLE now that rman is visible


131536 03-Jul-2004 imp

Really remove __RMAN_RESORUCE_VISIBLE


131535 03-Jul-2004 imp

Use the rman_* functions in preference to reaching into struct resource.
Remove __RMAN_RESOURCE_VISIBLE after compilation confirms it is now not
needed.


131481 02-Jul-2004 jhb

Implement preemption of kernel threads natively in the scheduler rather
than as one-off hacks in various other parts of the kernel:
- Add a function maybe_preempt() that is called from sched_add() to
determine if a thread about to be added to a run queue should be
preempted to directly. If it is not safe to preempt or if the new
thread does not have a high enough priority, then the function returns
false and sched_add() adds the thread to the run queue. If the thread
should be preempted to but the current thread is in a nested critical
section, then the flag TDF_OWEPREEMPT is set and the thread is added
to the run queue. Otherwise, mi_switch() is called immediately and the
thread is never added to the run queue since it is switch to directly.
When exiting an outermost critical section, if TDF_OWEPREEMPT is set,
then clear it and call mi_switch() to perform the deferred preemption.
- Remove explicit preemption from ithread_schedule() as calling
setrunqueue() now does all the correct work. This also removes the
do_switch argument from ithread_schedule().
- Do not use the manual preemption code in mtx_unlock if the architecture
supports native preemption.
- Don't call mi_switch() in a loop during shutdown to give ithreads a
chance to run if the architecture supports native preemption since
the ithreads will just preempt DELAY().
- Don't call mi_switch() from the page zeroing idle thread for
architectures that support native preemption as it is unnecessary.
- Native preemption is enabled on the same archs that supported ithread
preemption, namely alpha, i386, and amd64.

This change should largely be a NOP for the default case as committed
except that we will do fewer context switches in a few cases and will
avoid the run queues completely when preempting.

Approved by: scottl (with his re@ hat)


131376 30-Jun-2004 marius

These need __RMAN_RESOURCE_VISIBLE, too.


131224 28-Jun-2004 scottl

Retire BUS_DMAMAP_NSEGS for sparc64


131223 28-Jun-2004 scottl

Switch sparc64 busdma to use a dynamically allocated segment list rather
than a a stack-limited list. This removes the artifical limit on s/g list
size.
cvs: ----------------------------------------------------------------------


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130068 04-Jun-2004 phk

Add missing <sys/module.h> #includes


130028 03-Jun-2004 tjr

Remove checks for curthread == NULL - it can't happen.


130025 03-Jun-2004 phk

Add missing <sys/module.h> instances which were shadowed by the nested
include in <sys/kernel.h>


130023 03-Jun-2004 tjr

Move TDF_DEADLKTREAT into td_pflags (and rename it accordingly) to avoid
having to acquire sched_lock when manipulating it in lockmgr(), uiomove(),
and uiomove_fromphys().

Reviewed by: jhb


129906 31-May-2004 bmilekic

Bring in mbuma to replace mballoc.

mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
- Better layering between slab <-> zone caches; introduce
Keg structure which splits off slab cache away from the
zone structure and allows multiple zones to be stacked
on top of a single Keg (single type of slab cache);
perhaps we should look into defining a subset API on
top of the Keg for special use by malloc(9),
for example.
- UMA_ZONE_REFCNT zones can now be added, and reference
counters automagically allocated for them within the end
of the associated slab structures. uma_find_refcnt()
does a kextract to fetch the slab struct reference from
the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
- integrates mbuf & cluster allocations with extended UMA
and provides caches for commonly-allocated items; defines
several zones (two primary, one secondary) and two kegs.
- change up certain code paths that always used to do:
m_get() + m_clget() to instead just use m_getcl() and
try to take advantage of the newly defined secondary
Packet zone.
- netstat(1) and systat(1) quickly hacked up to do basic
stat reporting but additional stats work needs to be
done once some other details within UMA have been taken
care of and it becomes clearer to how stats will work
within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used. The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
- One report of 'ips' (ServeRAID) driver acting really
slow in conjunction with mbuma. Need more data.
Latest report is that ips is equally sucking with
and without mbuma.
- Giant leak in NFS code sometimes occurs, can't
reproduce but currently analyzing; brueffer is
able to reproduce but THIS IS NOT an mbuma-specific
problem and currently occurs even WITHOUT mbuma.
- Issues in network locking: there is at least one
code path in the rip code where one or more locks
are acquired and we end up in m_prepend() with
M_WAITOK, which causes WITNESS to whine from within
UMA. Current temporary solution: force all UMA
allocations to be M_NOWAIT from within UMA for now
to avoid deadlocks unless WITNESS is defined and we
can determine with certainty that we're not holding
any locks when we're M_WAITOK.
- I've seen at least one weird socketbuffer empty-but-
mbuf-still-attached panic. I don't believe this
to be related to mbuma but please keep your eyes
open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
rwatson,
brueffer,
Ketrien I. Saihr-Kesenchedra,
...
Reviewed by: Lots of people (for different parts)


129750 26-May-2004 tmm

Retire cpu_sched_exit(); it is not used any more.


129749 26-May-2004 tmm

Move the per-CPU vmspace pointer fixup that is required before a
struct vmspace is freed from cpu_sched_exit() to pmap_release().

This has the advantage of being able to rely on MI code to decide
when a free should occur, instead of having to inspect the reference
count ourselves.

At the same time, turn the per-CPU vmspace pointer into a pmap pointer,
so that pmap_release() can deal with pmaps exclusively.

Reviewed (and embrassing bug spotted) by: jake


129506 20-May-2004 tmm

In cpu_sched_exit(), we must check vm_refcnt against 0, not 1, since
exit1() decrements the reference count before calling this function.


129368 17-May-2004 peter

Oops, I left a duplicate 'relocbase' declaration.

Submitted by: Koop Mast <kwm@rainbow-runner.nl>


129282 16-May-2004 peter

Make a small revision to the api between the elf linker core and the
elf_reloc() backends for two reasons. First, to support the possibility
of there being two elf linkers in the kernel (eg: amd64), and second, to
pass the relocbase explicitly (for relocating .o format kld files).


129084 10-May-2004 mux

Prefer explicit ints to implicit ints in the prototype as well as in
the function definition.


129083 10-May-2004 mux

- Fix a typo in a printf(). [1]
- Fix some other style bugs while I'm here.

Submitted by: Koop Mast <kwm@rainbow-runner.nl> [1]
Fixes PR: sparc64/66448 [1]


129068 09-May-2004 alc

Correct the implementation of pmap_page_is_mapped(): It should return TRUE
only if the page has one or more managed mappings.


129053 08-May-2004 alc

Since revision 1.280 of vm/vm_page.c, vm_page_grab() always returns a
zeroed page when passed VM_ALLOC_ZERO. Thus, we can eliminate the check
against PG_ZERO from pmap_pinit().


129051 08-May-2004 marius

- Remove the old sparc64 OFW PCI code (as opposed to the former
"options OFW_NEWPCI").
This is a bit overdue, the new sparc64 OFW PCI code which is
meant to replace the old one is in place for 10 months and
enabled by default in GENERIC for 8 months. FreeBSD 5.2 and
5.2.1 also shipped with the new code enabled by default.
- Some minor clean-up, e.g. remove functions that encapsulated
the #ifdefs for OFW_NEWPCI, remove unused resp. no longer
required includes, etc.

Approved by: tmm, no objections on freebsd-sparc64


128939 04-May-2004 marius

Fix bug introduced in revision 1.9; in nexus_probe_nomatch() get device name
and type for printing info about the device that didn't probe from child, not
parent.
This fixes a panic on systems where not yet supported devices hang off of the
nexus, e.g. on E450.

Reported by: joerg


128776 30-Apr-2004 tmm

Some cleanups to the nexus code:
- Remove second license, the first was not that different and should be
fine.
- Add nexus_attach(), and do not perform its task in nexus_probe() any
more.
- Remove nexus_write_ivar(), since it was quite pointless.
- Remove superfluous devinfo members.
- Clean up some comments, minor style issues and prototypes.


128756 30-Apr-2004 marius

Update the reference to the FreeBSD sparc64 mailing list, its name has
changed a while back.


128712 28-Apr-2004 tmm

Fix the EBus driver to work with the new PCI code. Unlike other PCI
bridges, the EBus bridge has resource ranges it claims exclusively to
map its children into in its BARs. Hence, we need to allocate these
completely and manage them for the children, instead of just passing
allocations through to the PCI layer as we did before.

While being there, split ebus_probe(), which did also contain code
normally belonging into the attach method, into ebus_probe() and
ebus_attach(), and perform some minor cleanups.


128624 25-Apr-2004 tmm

Prefix a printf with the device name.


128103 11-Apr-2004 alc

Remove avail_end. It is not used.


127977 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127875 05-Apr-2004 alc

Remove avail_start on those platforms that no longer use it. (Only amd64
does anything with it beyond simple initialization.)


127869 05-Apr-2004 alc

Remove unused arguments from pmap_init().


127788 03-Apr-2004 alc

In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not. Add a new parameter so that the caller can specify which is
the case.

Reported by: dillon


127545 29-Mar-2004 kensmith

MFi386: correctly calculate the top-of-stack when a kthread is created
with a larger kernel stack. Remove inclusion of opt_kstack_pages.h now
that it's unused.

Reviewed by: marcel
Approved by: rwatson (mentor)


127344 23-Mar-2004 tmm

Correct the termination condition of the DVMA pruning loop in
iommu_dvma_vallocseg(), which I botched in r1.32. This bug could
cause an endless loop when a map was loaded and DVMA was scarce,
or that map had a stringent alignment or boundary.

Report and additional testing: Marius Strobl <marius@alchemy.franken.de>


127343 23-Mar-2004 tmm

Intitialize the frame pointer and return pc of a new process created
in cpu_fork(). This prevents the stack tracer from running past the
end of the stack (only the pc is checked in that case), which became
fatal when db_print_backtrace() was introduced and called outside
of ddb.

Additional testing: kris


127297 22-Mar-2004 alc

Add an implementation of uiomove_fromphys() to sparc64. This
implementation could be characterized as a hybrid of the amd64 and i386
implementations. Specifically, the direct virtual-to-physical mapping is
used if possible and sf_buf_alloc() is used if the direct map cannot.


127135 17-Mar-2004 njl

Convert callers to the new bus_alloc_resource_any(9) API.

Submitted by: Mark Santcroos <marks@ripe.net>
Reviewed by: imp, dfr, bde


127086 16-Mar-2004 alc

Refactor the existing machine-dependent sf_buf_free() into a machine-
dependent function by the same name and a machine-independent function,
sf_buf_mext(). Aside from the virtue of making more of the code machine-
independent, this change also makes the interface more logical. Before,
sf_buf_free() did more than simply undo an sf_buf_alloc(); it also
unwired and if necessary freed the page. That is now the purpose of
sf_buf_mext(). Thus, sf_buf_alloc() and sf_buf_free() can now be used
as a general-purpose emphemeral map cache.


126919 13-Mar-2004 scottl

Now that contigfree() does not require Giant, don't grab it in busdma.


126728 07-Mar-2004 alc

Retire pmap_pinit2(). Alpha was the last platform that used it. However,
ever since alpha/alpha/pmap.c revision 1.81 introduced the list allpmaps,
there has been no reason for having this function on Alpha. Briefly,
when pmap_growkernel() relied upon the list of all processes to find and
update the various pmaps to reflect a growth in the kernel's valid
address space, pmap_init2() served to avoid a race between pmap
initialization and pmap_growkernel(). Specifically, pmap_pinit2() was
responsible for initializing the kernel portions of the pmap and
pmap_pinit2() was called after the process structure contained a pointer
to the new pmap for use by pmap_growkernel(). Thus, an update to the
kernel's address space might be applied to the new pmap unnecessarily,
but an update would never be lost.


126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


124259 08-Jan-2004 mux

Some integrated Davicom cards in sparc64 boxes have an all zeros
MAC address in the EEPROM, and we need to get it from OpenFirmware.
This isn't very pretty but time is lacking to do this in a better
way this near 5.2-RELEASE. This is a RELENG_5_2 candidate.

Original version by: Marius Strobl <marius@alchemy.franken.de>
Tested by: Pete Bentley <pete@sorted.org>
Reviewed by: jake


124092 03-Jan-2004 davidxu

Make sigaltstack as per-threaded, because per-process sigaltstack state
is useless for threaded programs, multiple threads can not share same
stack.
The alternative signal stack is private for thread, no lock is needed,
the orignal P_ALTSTACK is now moved into td_pflags and renamed to
TDP_ALTSTACK.
For single thread or Linux clone() based threaded program, there is no
semantic changed, because those programs only have one kernel thread
in every process.

Reviewed by: deischen, dfr


123929 28-Dec-2003 silby

Track three new sendfile-related statistics:
- The number of times sendfile had to do disk I/O
- The number of times sfbuf allocation failed
- The number of times sfbuf allocation had to wait


123920 28-Dec-2003 silby

Move the declaration of sfbufspeak and sfbufsused to mbuf.h,
and use imax instead of max, as sfbufspeak and sfbufsused
are signed.

Submitted by: bde


123884 27-Dec-2003 silby

Track current and peak sfbuf usage, export the values via sysctl.


123866 26-Dec-2003 obrien

Don't confuse NULL with 0.


123865 26-Dec-2003 obrien

Don't confuse NULL with 0.


123742 23-Dec-2003 peter

Add an additional field to the elf brandinfo structure to support
quicker exec-time replacement of the elf interpreter on an emulation
environment where an entire /compat/* tree isn't really warranted.


123126 03-Dec-2003 jhb

Fix all users of mp_maxid to use the same semantics, namely:

1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by: re (scottl)
Tested on: i386, amd64, alpha


122947 21-Nov-2003 jhb

- Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
actually present and sets mp_ncpus and all_cpus. Splitting these up
allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
setting mp_maxid to MAXCPU in cpu_mp_setmaxid(). This could allow the
CPU probing code to live in a module, for example, since modules
sysinit's in modules cannot be invoked prior to SI_SUB_KLD. This is
needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
its contents in a few places. Also, add a smp_cpu_enabled() function
to avoid duplicating some code. There is room for further code
reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
to before this change. i386 now sets mp_maxid to MAXCPU.

Tested on: alpha, amd64, i386, ia64, sparc64
Approved by: re (scottl)


122821 16-Nov-2003 alc

- Remove unnecessary synchronization from sf_buf_init(). (There is only
one active CPU when sf_buf_init() is performed.)


122780 16-Nov-2003 alc

- Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.


122604 13-Nov-2003 simokawa

Respect RB_KDB flag.


122464 11-Nov-2003 jake

Fix a bug in the data access error recorvery. Before re-enabling the data
cache after a data access error we must discard all cache lines. When
disabled existing cache lines are not invalidated by stores to memory, so
we risk reading stale data that was cached before the data access error if
we don't flush them. This is especially fatal when the memory involved
is the active part of the kernel or user stack. For good measure we also
flush the instruction cache.

This fixes random crashes when the X server probes the PCI bus through
/dev/pci.


122462 11-Nov-2003 jake

Rearrange slightly so that DELAY(9) works during cninit.


122364 09-Nov-2003 marcel

Change the clear_ret argument of get_mcontext() to be a flags argument.
Since all callers either passed 0 or 1 for clear_ret, define bit 0 in
the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI
code for possible (but unlikely) future use. The remaining bits are for
use by MD code.

This change is triggered by a need on ia64 to have another knob for
get_mcontext().


121237 19-Oct-2003 peter

Add a stub cpu_idle() function for sparc64, alpha, powerpc. This is a
MI declared function so it should be everywhere.


120965 10-Oct-2003 robert

Add an 'include' directive to pull in <sys/ptrace.h>.


120937 09-Oct-2003 robert

Implement preliminary support for the PT_SYSCALL command to ptrace(2).


120722 03-Oct-2003 alc

Migrate pmap_prefault() into the machine-independent virtual memory layer.

A small helper function pmap_is_prefaultable() is added. This function
encapsulate the few lines of pmap_prefault() that actually vary from
machine to machine. Note: pmap_is_prefaultable() and pmap_mincore() have
much in common. Going forward, it's worth considering their merger.


120534 28-Sep-2003 alc

Add vm object locking to pmap_release().


120422 25-Sep-2003 peter

Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.


120287 20-Sep-2003 jake

Remove an invalid KASSERT. Apparently pmap_remove_all gets called on
unmanaged pages.


120008 12-Sep-2003 tmm

Handle ISA devices in OF_decode_addr(), with the same code that is
used in the EBus case.


119999 12-Sep-2003 alc

Add a new parameter to pmap_extract_and_hold() that is needed to eliminate
Giant from vmapbuf().

Idea from: tegge


119869 08-Sep-2003 alc

Introduce a new pmap function, pmap_extract_and_hold(). This function
atomically extracts and holds the physical page that is associated with the
given pmap and virtual address. Such a function is needed to make the
memory mapping optimizations used by, for example, pipes and raw disk I/O
MP-safe.

Reviewed by: tegge


119697 02-Sep-2003 marcel

Add function OF_decode_addr(). This function obtains the physical
address of the device identified by its phandle_t by traversing OFW's
device tree. The space and address returned by this function can
subsequently be passed to sparc64_fake_bustag() to construct a valid
tag and handle for use by the newbus I/O functions.

Use of this function is expected to be limited to pre-newbus access to
devices, such as consoles and keyboards.

Partially obtained from: tmm
Reviewed by: jake, jmg, tmm
SBus testing made possible by: jake
Tested with: LINT


119696 02-Sep-2003 marcel

Preparatory commit to allow prototypes in ofw_machdep.h to contain
both newbus types and OFW types. This involves either including
<machine/bus.h> or <dev/ofw/openfirm.h>.

Reviewed by: jake, jmg, tmm


119622 31-Aug-2003 jake

Implement cpu_set_upcall_kse. May need tweaking.


119563 29-Aug-2003 alc

Migrate the sf_buf allocator that is used by sendfile(2) and zero-copy
sockets into machine-dependent files. The rationale for this
migration is illustrated by the modified amd64 allocator. It uses the
amd64's direct map to avoid emphemeral mappings in the kernel's
address space. On an SMP, the emphemeral mappings result in an IPI
for TLB shootdown for each transmitted page. Yuck.

Maintainers of other 64-bit platforms with direct maps should be able
to use the amd64 allocator as a reference implementation.


119398 24-Aug-2003 marcel

Allow bus barrier operations on fake tags. The purpose of a fake
bus tag is to allow bus space accesses prior to having newbus
fully initialized, such as would be the case for console drivers.
Since barriers are a fundamental part of bus space accesses, not
allowing them on fake tags would defeat the purpose of these tags.
We use the barrier function normally associated with nexus. This
is the barrier used when subordinates haven't defined a barrier
themselves.


119396 24-Aug-2003 jmg

reenable the caches when a PCI peek faults. Takes my kernel compile
from 3770 real down to 1250 real.

Submitted by: jake


119380 24-Aug-2003 jake

"md" files for syscons.


119352 23-Aug-2003 marcel

s#<mk48txx/mk48txxreg.h>#<dev/mk48txx/mk48txxreg.h>#


119338 23-Aug-2003 imp

s=include <ofw/=include <dev/ofw/= to reflect removal of -I$S/dev


119164 20-Aug-2003 alc

Lock the pmap's tsb object when performing vm_page_grab() on it.


119015 17-Aug-2003 gordon

Fixup the ELF branding information to point to the new home of rtld.


119004 16-Aug-2003 marcel

In vm_thread_swap{in|out}(), remove the alpha specific conditional
compilation and replace it with a call to cpu_thread_swap{in|out}().
This allows us to add similar code on ia64 without cluttering the
code even more.


118848 12-Aug-2003 imp

Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon


118768 11-Aug-2003 jake

Fix sparc64 LINT build. <blush>


118708 09-Aug-2003 jake

Use get_mcontext in sendsig and set_mcontext in sigreturn instead of
frobbing things directly.


118443 04-Aug-2003 jhb

- Since td_critnest is now initialized in MI code, it doesn't have to be
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.


118239 31-Jul-2003 peter

Deal with 'options KSTACK_PAGES' being a global option.


118217 30-Jul-2003 tmm

Return 1 from pmap_protect_tte() instead of 0. When used with
tsb_foreach(), 0 signals to terminate the tsb traversal, so when
tsb_foreach() was used in pmap_protect() (which only happens when
the area to be protected is larger than PMAP_TSB_THRESH = 16MB), only
the first tsb entry in the specified range would be protected.

Reported by: Andrew Belashov <bel@orel.ru>


118090 27-Jul-2003 tmm

Respect BUS_DMA_ZERO in iommu_dvmamem_alloc().


118081 27-Jul-2003 mux

- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.


117658 16-Jul-2003 jmg

add support for interrupt counting on sparc64. This copies part of the
code from i386. The code has a slight bogon that interrupts are counted
twice. Once on the ithread dispatch and once on the dispatch for the vector

vmstat -i and systat -vm now contains interrupt counts.

Reviewed by: jake


117600 15-Jul-2003 davidxu

Rename thread_siginfo to cpu_thread_siginfo.

Suggested by: jhb


117390 10-Jul-2003 tmm

Lock down the IOMMU bus_dma implementation to make it safe to use
without Giant held.

A quick outline of the locking strategy:
Since all IOMMUs are synchronized, there is a single lock, iommu_mtx,
which protects the hardware registers (where needed) and the global and
per-IOMMU software states. As soon as the IOMMUs are divorced, each struct
iommu_state will have its own mutex (and the remaining global state
will be moved into the struct).
The dvma rman has its own internal mutex; the TSB slots may only be
accessed by the owner of the corresponding resource, so neither needs
extra protection.
Since there is a second access path to maps via LRU queues, the consumer-
provided locking is not sufficient; therefore, each map which is on a
queue is additionally protected by iommu_mtx (in part, there is one
member which only the map owner may access). Each map on a queue may
be accessed and removed from or repositioned in a queue in any context as
long as the lock is held; only the owner may insert a map.
To reduce lock contention, some bus_dma functions remove the map from
the queue temporarily (on behalf of the map owner) for some operations and
reinsert it when they are done. Shorter operations and operations which are
not done on behalf of the lock owner are completely covered by the lock.

To facilitate the locking, reorganize the streaming buffer handling;
while being there, fix an old oversight which would cause the streaming
buffer to always be flushed, regardless of whether streaming was enabled
in the TSB entry. The streaming buffer is still disabled for now, since
there are a number of drivers which lack critical bus_dmamp_sync() calls.

Additional testing by: jake


117294 06-Jul-2003 alc

MFi386
Updates to cnt.v_wire_count, the global count of wired pages, should be
performed using atomic ops.


117206 03-Jul-2003 alc

Background: pmap_object_init_pt() premaps the pages of a object in
order to avoid the overhead of later page faults. In general, it
implements two cases: one for vnode-backed objects and one for
device-backed objects. Only the device-backed case is really
machine-dependent, belonging in the pmap.

This commit moves the vnode-backed case into the (relatively) new
function vm_map_pmap_enter(). On amd64 and i386, this commit only
amounts to code rearrangement. On alpha and ia64, the new machine
independent (MI) implementation of the vnode case is smaller and more
efficient than their pmap-based implementations. (The MI
implementation takes advantage of the fact that objects in -CURRENT
are ordered collections of pages.) On sparc64, pmap_object_init_pt()
hadn't (yet) been implemented.


117126 01-Jul-2003 scottl

Mega busdma API commit.

Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.

sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.

If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.

Reviewed by: tmm, gibbs


117119 01-Jul-2003 tmm

Add the new sparc64 OFW PCI framework, conditional on options OFW_NEWPCI
for now. It introduces a OFW PCI bus driver and a generic OFW PCI-PCI
bridge driver. By utilizing these, the PCI handling is much more elegant
now.

The advantages of the new approach are:
- Device enumeration should hopefully be more like on Solaris now,
so unit numbers should match what's printed on the box more
closely.
- Real interrupt routing is implemented now, so cardbus bridges
etc. have at least a chance to work.
- The quirk tables are gone and have been replaced by (hopefully
sufficient) heuristics.
- Much cleaner code.

There was also a report that previously bogus interrupt assignments
are fixed now, which can be attributed to the new heuristics.

A pitfall, and the reason why this is not the default yet, is that
it changes device enumeration, as mentioned above, which can make
it necessary to change the system configuration if more than one
unit of a device type is present (on a system with two hme cars,
for example, it is possible that hme0 becomes hme1 and vice versa
after enabling the option). Systems with multiple disk controllers
may need to be booted into single user (and require manual specification
of the root file system on boot) to adjust the fstab.
Nevertheless, I would like to encourage users to use this option,
so that it can be made the default soon.

In detail, the changes are:
- Introduce an OFW PCI bus driver; it inherits most methods from the
generic PCI bus driver, but uses the firmware for enumeration,
performs additional initialization for devices and firmware-specific
interrupt routing. It also implements an OFW-specific method to allow
child devices to get their firmware nodes.
- Introduce an OFW PCI-PCI bridge driver; again, it inherits most
of the generic PCI-PCI bridge driver; it has it's own method for
interrupt routing, as well as some sparc64-specific methods (one to
get the node again, and one to adjust the bridge bus range, since
we need to reenumerate all PCI buses).
- Convert the apb driver to the new way of handling things.
- Provide a common framework for OFW bridge drivers, used be the two
drivers above.
- Provide a small common framework for interrupt routing (for all
bridge types).
- Convert the psycho driver to the new framework; this gets rid of a
bunch of old kludges in pci_read_config(), and the whole
preinitialization (ofw_pci_init()).
- Convert the ISA MD part and the EBus driver to the new way
interrupts and nodes are handled.
- Introduce types for firmware interrupt properties.
- Rename the old sparcbus_if to ofw_pci_if by repo copy (it is only
required for PCI), and move it to a more correct location (new
support methodsx were also added, and an old one was deprecated).
- Fix a bunch of minor bugs, perform some cleanups.

In some cases, I introduced some minor code duplication to keep the
new code clean, in hopes that the old code will be unifdef'ed soon.

Reviewed in part by: imp
Tested by: jake, Marius Strobl <marius@alchemy.franken.de>,
Sergey Mokryshev <mokr@mokr.net>,
Chris Jackman <cjackNOSPAM@klatsch.org>
Info on u30 firmware provided by: kris


117045 29-Jun-2003 alc

- Export pmap_enter_quick() to the MI VM. This will permit the
implementation of a largely MI pmap_object_init_pt() for vnode-backed
objects. pmap_enter_quick() is implemented via pmap_enter() on sparc64
and powerpc.
- Correct a mismatch between pmap_object_init_pt()'s prototype and its
various implementations. (I plan to keep pmap_object_init_pt() as
the MD hook for device-backed objects on i386 and amd64.)
- Correct an error in ia64's pmap_enter_quick() and adjust its interface
to match the other versions. Discussed with: marcel


117003 28-Jun-2003 tmm

Small fixes for the IOMMU code:

1.) Handle maximum segment sizes which are smaller than the IOMMU page
size by splitting up pages across multiple segments if needed; this case
was previously unimplemented, and would cause panics.
2.) KASSERT that the physical address is in range; remove a KASSERT that
has become pointless.
3.) Add a comment describing what remains to be fixed in the IOMMU code;
I plan to address these issues soon.

Desired by: dwhite (1)


116958 28-Jun-2003 davidxu

Add a machine depended function thread_siginfo, SA signal code
will use the function to construct a siginfo structure and use
the result to export to userland.

Reviewed by: julian


116795 24-Jun-2003 jmg

remove unnecessary comment. We do what the comments says we need to.


116659 22-Jun-2003 jmg

add support for peeking at pci busses on UltraSparc systems. This prevents
data access errors when trying to read/write to non-existant PCI devices.

fix the psycho bridge to use peek for probing devices. This no longer
fakes it if the OFW node doesn't exist (and the reg == 0).

Reviewed by: jake, tmm


116589 19-Jun-2003 jake

Avoid using v8 opcodes; use ba instead of b for unconditional branches.


116567 19-Jun-2003 jake

- Rename the IPI_WAIT macro to IPI_DONE.
- Don't require all receivers of ipis to wait for all other receivers,
only that the sender wait for all receivers. This should reduce the
amount of time spent with interrupts disabled, which may be a cause
of ipi timeouts.

Discussed with: tmm


116543 18-Jun-2003 jake

Ignore fake ttes in pmap_copy, its too hard to deal with them not having
a real vm_page right now. This fixes a panic when processes with resident
device mappings fork, such as the X server.


116541 18-Jun-2003 tmm

Further cleanup of the sparc64 busdma implementation:
- Move prototypes for sparc64-specific helper functions from bus.h to
bus_private.h
- Move the method pointers from struct bus_dma_tag into a separate
structure; this saves some memory, and allows to use a single method
table for each busdma backend, so that the bus drivers need no longer
be changed if the methods tables need to be modified.
- Remove the hierarchical tag method lookup. It was never really useful,
since the layering is fixed, and the current implementations do not
need to call into parent implementations anyway. Each tag inherits
its method table pointer and cookie from the parent (or the root tag)
now, and the method wrapper macros directly use the method table
of the tag.
- Add a method table to the non-IOMMU backend, remove unnecessary
prototypes, remove the extra parent tag argument.
- Rename sparc64_dmamem_alloc_map() and sparc64_dmamem_free_map() to
sparc64_dma_alloc_map() and sparc64_dma_free_map(), move them to a
better place and use them for all map allocations and deallocations.
- Add a method table to the iommu backend, and staticize functions,
remove the extra parent tag argument.
- Change the psycho and sbus drivers to just set cookie and method table
in the root tag.
- Miscellaneous small fixes.


116510 18-Jun-2003 alc

Fix a performance bug in all of the various implementations of
uma_small_alloc(): They always zeroed the page regardless of what the
caller requested.


116508 17-Jun-2003 jake

Handle recursion on the vm_page_queue_mtx manually in pmap_qenter and
pmap_qremove, in order to avoid making the mutex recursable.

Discussed with: alc


116449 16-Jun-2003 jmg

free type too if we can't add the child.


116420 15-Jun-2003 jake

The page queue lock is already held in pmap_remove, change acquire/release
to assertion of ownership. Serves me right for not booting a witness
kernel.


116417 15-Jun-2003 jake

- Mirror vm_page_queue_mtx assertions added to the i386 pmap.
- Add vm page queue locking in certain places that are only needed on
sparc64.

This should make pmap_qenter and pmap_qremove MP-safe.

Discussed with: alc


116361 15-Jun-2003 davidxu

Rename P_THREADED to P_SA. P_SA means a process is using scheduler
activations.


116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


116328 14-Jun-2003 alc

Move the *_new_altkstack() and *_dispose_altkstack() functions out of the
various pmap implementations into the machine-independent vm. They were
all identical.


116213 11-Jun-2003 tmm

Remove the psycho and sbus iommu function stubs, and put the pointer
to the iommu_state structure directly into dt_cookie. The stubs have
not been needed for a long time now.


116188 11-Jun-2003 peter

GC unused cpu_wait() function


115971 07-Jun-2003 jake

- Declare sparc64_memreg and sparc64_nmemreg in machine/ofw_mem.h.
- On startup print the total physical memory, instead of what we're told is
free by the firmware, to avoid astonishing users.


115858 04-Jun-2003 marcel

Change the second (and last) argument of cpu_set_upcall(). Previously
we were passing in a void* representing the PCB of the parent thread.
Now we pass a pointer to the parent thread itself.
The prime reason for this change is to allow cpu_set_upcall() to copy
(parts of) the trapframe instead of having it done in MI code in each
caller of cpu_set_upcall(). Copying the trapframe cannot always be
done with a simply bcopy() or may not always be optimal that way. On
ia64 specifically the trapframe contains information that is specific
to an entry into the kernel and can only be used by the corresponding
exit from the kernel. A trapframe copied verbatim from another frame
is in most cases useless without some additional normalization.

Note that this change removes the assignment to td->td_frame in some
implementations of cpu_set_upcall(). The assignment is redundant.
A previous call to cpu_thread_setup() already did the exact same
assignment. An added benefit of removing the redundant assignment is
that we can now change td_pcb without nasty side-effects.

This change officially marks the ability on ia64 for 1:1 threading.

Not tested on: amd64, powerpc
Compile & boot tested on: alpha, sparc64
Functionally tested on: i386, ia64


115417 30-May-2003 tmm

Fix interrupt assignment for non-builtin PCI devices on e450s.

This machine uses a non-standard scheme to specify the interrupts to
be assigned for devices in PCI slots; instead of giving the INO
or full interrupt number (which is done for the other devices in this
box), the firmware interrupt properties contain intpin numbers, which
have to be swizzled as usual on PCI-PCI bridges; however, the PCI host
bridge nodes have no interrupt map, so we need to guess the
correct INO by slot number of the device or the closest PCI-PCI
bridge leading to it, and the intpin.

To do this, this fix makes the following changes:
- Add a newbus method for sparc64 PCI host bridges to guess
the INO, and glue code in ofw_pci_orb_callback() to invoke it based
on a new quirk entry. The guessing is only done for interrupt numbers
too low to contain any IGN found on e450s.
- Create another new quirk entry was created to prevent mapping of EBus
interrupts at PCI level; the e450 has full INOs in the interrupt
properties of EBus devices, so trying to remap them could cause
problems.
- Set both quirk entries for e450s; remove the no-swizzle entry.
- Determine the psycho half (bus A or B) a driver instance manages
in psycho_attach()
- Implement the new guessing method for psycho, using the slot number,
psycho half and property value (intpin).

Thanks go to the testers, especially Brian Denehy, who tested many kernels
for me until I had found the right workaround.

Tested by: Brian Denehy <B.Denehy@90east.com>, jake, fenner,
Marius Strobl <marius@alchemy.franken.de>,
Marian Dobre <mari@onix.ro>
Approved by: re (scottl)


115382 29-May-2003 tmm

Completely disable interrupts (not just raise %pil) when calculating the
value to be written into tick_compare in tick_hardclock(). While
we were taking care that the value to be written was at least TICK_GRACE
ticks in the future, a vector interrupt could happen between calculating
the value and writing it. If it took longer than TICK_GRACE to complete
(which is doubtful for a single device-triggered vector interrupt, but
quite likely for some IPIs), the value written would be in the past
and tick interrupts (which drive hardclock and statclock) would stop
until %tick wraps around, which takes a long time.
Also, increase TICK_GRACE from 1000 to 10000 for good measure.

Reported by: kris
Reviewed by: jake
Approved by: re (scottl)


115343 27-May-2003 scottl

Bring back bus_dmasync_op_t. It is now a typedef to an int, though the
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.

Approved by: re (bmah)


115323 26-May-2003 scottl

Fix two typos from the last commit


115321 26-May-2003 scottl

De-orbit bus_dmamem_alloc_size from here too.

Pointed out by: des
Pointy hat to: me


115316 26-May-2003 scottl

De-orbit bus_dmamem_alloc_size(). It's a hack and was never used anyways.
No need for it to pollute the 5.x API any further.

Approved by: re (bmah)


114983 13-May-2003 jhb

- Merge struct procsig with struct sigacts.
- Move struct sigacts out of the u-area and malloc() it using the
M_SUBPROC malloc bucket.
- Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(),
sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared().
- Remove the p_sigignore, p_sigacts, and p_sigcatch macros.
- Add a mutex to struct sigacts that protects all the members of the struct.
- Add sigacts locking.
- Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now
that sigacts is locked.
- Several in-kernel functions such as psignal(), tdsignal(), trapsignal(),
and thread_stopped() are now MP safe.

Reviewed by: arch@
Approved by: re (rwatson)


114650 04-May-2003 jake

Forgot to update string and signal tables when some of the trap types
changed.


114484 02-May-2003 tmm

- Reduce the DVMA preallocation limit from 128kB to 32kB. 128kB were
quite excessive, and caused the available space to be used up too
easily. The new limit should be a better estimation of how much the
caller will need at most.
- Double the IOTSB size 64kB, for a DVMA area size of 64MB.

This should fix DMA problems on e450s and other large machines due
to DVMA space exhaustion, which were introduced in my last IOMMU
code revision in January.

Reported and tested by: fenner


114374 01-May-2003 peter

Back out last commits. The elf64/elf32 kernel name thing was more pain
than it was worth.


114342 30-Apr-2003 peter

Fix transcription error. Use == NULL, not != NULL. Fortunately this
was harmless.


114340 30-Apr-2003 peter

Look for an elf32 kernel (powerpc) and elf64 kernel (sparc64) as well
as a plain "elf kernel".


114305 30-Apr-2003 jhb

Range check the syscall number before looking it up in the syscallnames[]
array.

Submitted by: pho


114257 29-Apr-2003 jake

Allow fast instruction and data access mmu miss traps to be handled by
user trap handlers.


114189 29-Apr-2003 jake

Use 16 byte alignment for internal labels, 32 bytes is excessive.


114188 29-Apr-2003 jake

- Fix placement of cvs ids in previous commit to match .S files in libc.
- gcc uses 32 byte alignment for functions regardless of profiling, so
follow suit.


114186 28-Apr-2003 jake

This file is unused.


114085 26-Apr-2003 obrien

I was wrong, the ENTRY bits in asm.h did have a purpose -- for userland.
Restore the bits and remove them from asmacros.h. *.S will now be asm.h
consumers.

Approved by: jake


114029 25-Apr-2003 jhb

- Push down Giant into the sysarch() calls that still need Giant.
- Standardize on EINVAL rather than EOPNOTSUPP if the sysarch op value is
invalid.


113998 25-Apr-2003 deischen

Add an argument to get_mcontext() which specified whether the
syscall return values should be cleared. The system calls
getcontext() and swapcontext() want to return 0 on success
but these contexts can be switched to at a later time so
the return values need to be cleared in the saved register
sets. Other callers of get_mcontext() would normally want
the context without clearing the return values.

Remove the i386-specific context saving from the KSE code.
get_mcontext() is not i386-specific any more.

Fix a bad pointer in the alpha get_mcontext() code. The
context was being bcopy()'d from &td->tf_frame, but tf_frame
is itself a pointer, so the thread was being copied instead.
Spotted by jake.

Glanced at by: jake
Reviewed by: bde (months ago)


113833 22-Apr-2003 davidxu

Remove single threading detecting code, these code really should be
replaced by thread_user_enter(), but current we don't want to enable
this in trap.


113686 18-Apr-2003 jhb

Use the proc lock to protect p_singlethread and a P_WEXIT test. This
fixes a couple of potential KSE panics on non-i386 arch's that weren't
holding the proc lock when calling thread_exit().


113453 13-Apr-2003 jake

- Move the routine for flushing all user mappings from the tlb from pmap to
the cpu dependent files. It will need to be done differently for USIII.
- Simplify the logic for detecting context rollovers. Instead of dealing
with it when the next context switch would cause the context numbers to
rollover, deal with it when they actually do rollover.
- Move some things around in cpu_switch so that we only do 1 membar #Sync
when switching address space, instead of 2.
- Detect kernel threads by comparing the new vm space to vmspace0, instead
if checking if the tlb context is 0.
- Removed some debug code.


113347 10-Apr-2003 mux

Change the operation parameter of bus_dmamap_sync() from an
enum to an int and redefine the BUS_DMASYNC_* constants as
flags. This allows us to specify several operations in one
call to bus_dmamap_sync() as in NetBSD.


113338 10-Apr-2003 jake

Print real memory/avail memory on startup like other platforms. Hide
printing the model under bootverbose.


113255 08-Apr-2003 des

Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header. Use it instead of hand-
rolled versions wherever applicable.

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


113238 08-Apr-2003 jake

Use vm_paddr_t for physical addresses.


113172 06-Apr-2003 jake

Remove a largely useless statistic (its kept elsewhere too).


113166 06-Apr-2003 jake

Use the vis block copy/zero functions for pmap_copy_page and pmap_zero_page.
These are called through function pointers so that different implementations
can be provided for cheetah, where the block load instructions may or may
not be a win, and so they can be disabled with the machdep.use_vis tunable.
In terms of raw bandwidth the integer versions are faster, but not allocating
lines in the L2 cache for useless data gives a measurable improvement in user
time for the benchmarks I tested (mostly buildworld with -j8).

As far as I can tell the instructions used are implemented on everything
back to UltraSPARC I, so there should not be a problem with different cpu
types.


113165 06-Apr-2003 jake

Ignore attempts to pmap_kremove or pmap_qremove pages which do not have
a valid mapping. This is bug for bug compatible with other platforms.


113090 04-Apr-2003 des

Define ovbcopy() as a macro which expands to the equivalent bcopy() call,
to take care of the KAME IPv6 code which needs ovbcopy() because NetBSD's
bcopy() doesn't handle overlap like ours.

Remove all implementations of ovbcopy().

Previously, bzero was a function pointer on i386, to save a jmp to
bzero_vector. Get rid of this microoptimization as it only confuses
things, adds machine-dependent code to an MD header, and doesn't really
save all that much.

This commit does not add my pagezero() / pagecopy() code.


113027 03-Apr-2003 jake

Add optimized block copy and zero functions using vis instructions, which
can do 64 bytes at a time and don't allocate lines in the L2 cache. These
assume that everything is 64 byte aligned, and that there's more than 128
bytes of data (best for whole pages). The block load and store instructions
don't follow normal memory ordering rules and require either a memory barrier
or move between registers before the data can actually be used. This
implementation correctly shuffles around 3 out of the 4 sets of registers
in order to avoid memory barriers expect for the last 2 blocks.


113024 03-Apr-2003 jake

Add support for saving and restoring kernel floating point state. The state
will be saved if we context switch as a result of an interrupt which occured
while using the floating point registers in the kernel (which actually can't
happen right now). This allows fp disabled traps in the kernel, which
normally shouldn't happen, so make sure the trapping code is what we expect
it is.


113023 03-Apr-2003 jake

- Add space for kernel floating point registers to the pcb. These will be
used to support block copy and zero operations in the kernel which use the
floating point registers.
- While I'm changing the size, improve the layout of struct pcb, sort by size,
then alphabetical etc.
- Add some assertions to validate assumptions made about how the pcb is
allocated.


113021 03-Apr-2003 jake

- Generally improve register usage in cpu_switch. Use the 'in' registers
for temporaries relating to the state of the new process instead of the
outs, so that functions can be called without fear of clobbering them.
- Use savefpctx instead of rolling our own.


113019 03-Apr-2003 jake

Don't assume the fp state is at offset 0 in the pcb.


113018 03-Apr-2003 jake

Fix typos (don't use * when taking the size of an array).


112993 02-Apr-2003 peter

Commit a partial lazy thread switch mechanism for i386. it isn't as lazy
as it could be and can do with some more cleanup. Currently its under
options LAZY_SWITCH. What this does is avoid %cr3 reloads for short
context switches that do not involve another user process. ie: we can
take an interrupt, switch to a kthread and return to the user without
explicitly flushing the tlb. However, this isn't as exciting as it could
be, the interrupt overhead is still high and too much blocks on Giant
still. There are some debug sysctls, for stats and for an on/off switch.

The main problem with doing this has been "what if the process that you're
running on exits while we're borrowing its address space?" - in this case
we use an IPI to give it a kick when we're about to reclaim the pmap.

Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a
few more things and get some more feedback before turning it on by default.

This is NOT a replacement for Bosko's lazy interrupt stuff. This was more
meant for the kthread case, while his was for interrupts. Mine helps a
little for interrupts, but his helps a lot more.

The stats are enabled with options SWTCH_OPTIM_STATS - this has been a
pseudo-option for years, I just added a bunch of stuff to it.

One non-trivial change was to select a new thread before calling
cpu_switch() in the first place. This allows us to catch the silly
case of doing a cpu_switch() to the current process. This happens
uncomfortably often. This simplifies a bit of the asm code in cpu_switch
(no longer have to call choosethread() in the middle). This has been
implemented on i386 and (thanks to jake) sparc64. The others will come
soon. This is actually seperate to the lazy switch stuff.

Glanced at by: jake, jhb


112968 02-Apr-2003 jake

Implement cpu_thread_setup. Fix cpu_set_upcall.


112961 01-Apr-2003 jake

- Set the version number in the mcontext in get_mcontext and check it in
set_mcontext.
- Don't make assumptions about the alignment of the mcontext inside of the
ucontext; we have to save the floating point registers to the pcb and then
copy to the mcontext.


112924 01-Apr-2003 jake

- Add a flags field to struct pcb. Use this to keep track of wether or
not the pcb has floating point registers saved in it.
- Implement get_mcontext and set_mcontext.


112922 01-Apr-2003 jake

- Don't allow tf_wstate to be set in set_regs.
- Clear FPRS_FEF in set_fpregs so the new registers will be reloaded.


112921 01-Apr-2003 jake

Implement cpu_set_upcall.


112920 01-Apr-2003 jake

- Rename pcb_fpstate to pcb_ufp (user floating point), and change it to
a simple array of 64 ints.
- Use a critical section when saving floating point state in cpu_fork
instead of sched_lock.


112917 01-Apr-2003 jake

Rename pcb_fp to pcb_sp, so as to not be confused with floating point
state.


112914 01-Apr-2003 jake

Implement casuptr.


112898 01-Apr-2003 jeff

- Define a new md function 'casuptr'. This atomically compares and sets
a pointer that is in user space. It will be used as the basic primitive
for a kernel supported user space lock implementation.
- Implement this function in x86's support.s
- Provide stubs that return -1 in all other architectures. Implementations
will follow along shortly.

Reviewed by: jake


112888 31-Mar-2003 jeff

- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with
a follow on commit to kern_sig.c
- signotify() now operates on a thread since unmasked pending signals are
stored in the thread.
- PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.


112883 31-Mar-2003 jeff

- Change trapsignal() to accept a thread and not a proc.
- Change all consumers to pass in a thread.

Right now this does not cause any functional changes but it will be important
later when signals can be delivered to specific threads.


112879 31-Mar-2003 jake

- Allow the physical memory size that will be actually used by the kernel to
be overridden by setting hw.physmem.
- Fix a vm_map_find arg, we don't want to find space.
- Add tracing and statistics for off colored pages.
- Detect "stupid" pmap_kenters (same virtual and physical as existing
mapping), and do nothing in that case.


112697 27-Mar-2003 jake

Handle the fictitious pages created by the device pager. For fictitious
pages which represent actual physical memory we must strip off the fake
page in order to allow illegal aliases to be detected. Otherwise we map
uncacheable in the virtual and physical caches and set the side effect bit,
as is required for mapping device memory.

This fixes gstat on sparc64, which wants to mmap kernel memory through a
character device.


112451 20-Mar-2003 jhb

Use td->td_ucred instead of td->td_proc->p_ucred.


112436 20-Mar-2003 mux

Use atomic operations to increment and decrement the refcount
in busdma tags. There are currently no tags shared accross
different drivers so this isn't needed at the moment, but it
will be required when we'll have a proper newbus method to get
the parent busdma tag.


112399 19-Mar-2003 jake

- Remove unused cache flushing routines. These will not necessary work
on future UltraSPARC cpus for which the data cache is not direct mapped.
- Move UltraSPARC I and II (spitfire, blackbird, sapphire, sabre) specific
functions to spitfire.c, and add cheetah.c for UltraSPARC III specific
functions. Initially just cache flushing, but there are a few other
functions that will need to move here.
- Add an ipi handler for data cache flushing on UltraSPARC III.
- Use function pointers to select the right cache flushing functions based
on cpu_impl.

With this it is possible to boot single user from an mfs root on UltraSPARC
III systems, including spinning up secondary processors. There is currently
no support for the host to pci bridge, and no documentation for it is
publically available.

Thanks to Oleg Derevenetz for providing access to a system with UltraSPARC
III+ cpus.


112398 19-Mar-2003 jake

- Set cpu_impl early in sparc64_init so that we can use it to detect
UltraSPARC III and higher cpus and do needed setup.
- Disable the "system tick" interrupt for UltraSPARC III. This avoids
an interrupt storm on startup since we're not prepared for these at
all. This feature has questionable use anyway.
- Clear tick on startup and then leave it alone.


112396 19-Mar-2003 jake

Remove a workaround for mysterious junk appearing in the tlb of secondary
cpus. It turned out to be a bug in the loader.


112395 19-Mar-2003 jake

Implement db_print_backtrace. This may need to flush out the windows
as well.


112349 17-Mar-2003 jake

Clean up /dev/mem now that pmap handles illegal aliases properly. Don't
allow access to device memory through /dev/mem, or try to make modifying
kernel text through /dev/mem safe (it is not).


112330 17-Mar-2003 jake

Ensure that kstack0 has physical colour equal to virtual colour, so that
illegal aliases will not be created in the data cache if its accessed
through another such mapping.


112306 15-Mar-2003 jake

Implement is_physical_memory. Accessing memory which doesn't exist causes
traps that are difficult to recover from, so we check against the memory
map returned by the prom.


112227 14-Mar-2003 jake

lock.h must be included before mutex.h.


112215 14-Mar-2003 mux

Oops, add missing includes. Pass me the pointy hat.

Reported by: jake


112196 13-Mar-2003 mux

Grab Giant around calls to contigmalloc() and contigfree() so
that drivers converted to be MP safe don't have to deal with it.


111883 04-Mar-2003 jhb

Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().


111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


111585 27-Feb-2003 julian

Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.


111583 27-Feb-2003 davidxu

cat KSE > /dev/null


111553 26-Feb-2003 mux

Unbreak the IOMMU code.

Pointy hat to: mux
Reviewed by: tmm


111462 25-Feb-2003 mux

Cleanup of the d_mmap_t interface.

- Get rid of the useless atop() / pmap_phys_address() detour. The
device mmap handlers must now give back the physical address
without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
int. Now we properly pass a vm_offset_t * and expect it to be
filled by the mmap handler when the mapping was successful. The
mmap handler must now return 0 when successful, any other value
is considered as an error. Previously, returning -1 was the only
way to fail. This change thus accidentally fixes some devices
which were bogusly returning errno constants which would have been
considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with: alc, phk, jake
Reviewed by: peter
Compile-tested on: LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on: i386


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111072 18-Feb-2003 jake

Add drivers for the central and fhc busses found in enterprise class
UltraSPARCs, and an eeprom attachment for fhc, which allows the date
to be set properly on these machines. Central is a wierd bus which
seems to only ever have 1 fhc attached to it. FHC (FireHose Controller)
is another wierd bus with various things on it depending where its attached.
The fhc attached to central has eeprom and zs, and the fhcs which attach
directly to nexus have simm-status, environment and other nodes, none of
which I'll probably ever have documentation for.

Thanks to Ade Lovett for providing access to an 8 cpu e4500.


111032 17-Feb-2003 julian

Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by: parts by davidxu@
Reviewed by: jeff@ mini@


111028 17-Feb-2003 jeff

- Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers. This greatly simplifies much the
KSE code.

Submitted by: davidxu


111024 17-Feb-2003 jeff

- Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc. These counters are only examined through calcru.

Submitted by: davidxu
Tested on: x86, alpha, UP/SMP


110335 04-Feb-2003 harti

Fix a problem in bus_dmamap_load_{mbuf,uio} when the first mbuf or the first
uio segment is empty. In this case no dma segment is create by
bus_dmamap_load_buffer, but the calling routine clears the first flag.
Under certain combinations of addresses of the first and second mbuf/uio
buffer this leads to corrupted DMA segment descriptors. This was already
fixed by tmm in sparc64/sparc64/iommu.c.

PR: kern/47733
Reviewed by: sam
Approved by: jake (mentor)


110296 03-Feb-2003 jake

Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway. This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by: jhb, tmm
Tested on: i386, sparc64


110190 01-Feb-2003 julian

Reversion of commit by Davidxu plus fixes since applied.

I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.


110047 29-Jan-2003 scottl

Fix some more missing dt_ prefixes for dma tag fields.


110031 29-Jan-2003 scottl

Fix a typo in dt_maxsize from the last commit


110030 29-Jan-2003 scottl

Implement bus_dmamem_alloc_size() and bus_dmamem_free_size() as
counterparts to bus_dmamem_alloc() and bus_dmamem_free(). This allows
the caller to specify the size of the allocation instead of it defaulting
to the max_size field of the busdma tag.

This is intended to aid in converting drivers to busdma. Lots of
hardware cannot understand scatter/gather lists, which forces the
driver to copy the i/o buffers to a single contiguous region
before sending it to the hardware. Without these new methods, this
would require a new busdma tag for each operation, or a complex
internal allocator/cache for each driver.

Allocations greater than PAGE_SIZE are rounded up to the next
PAGE_SIZE by contigmalloc(), so this is not suitable for multiple
static allocations that would be better served by a single
fixed-length subdivided allocation.

Reviewed by: jake (sparc64)


109910 26-Jan-2003 jake

Fix standard kse breakage of non-x86 platforms.

Pointy hat to: davidxu


109877 26-Jan-2003 davidxu

Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian


109860 26-Jan-2003 jake

Merge some code paths back together so that we only instantiate 1 copy of
the user tlb fault handlers.


109810 24-Jan-2003 jake

Moved some (gas) macros up so they can be used in more places.


109651 21-Jan-2003 tmm

Fixes for a number of problems in the IOMMU code:
1.) Fix an off-by-one in the DVMA space handling, which would make it
possible to allocate one page beyond the end of the DVMA area.
This page was aliased to the first page. Apparently, this bug was
responsible for the trashed nvram/eeprom some people were reporting,
in conjunction with a number of unfortunate coincidences.
2.) Fix broken boundary and and lowaddr calculations.
3.) Fix a memory leak on an error path.
4.) Update a outdated comment to reflect the introduction of IOMMU_MAX_PRE,
make the usage of IOMMU_MAX_PRE more consistent and KASSERT that the
preallocation size is not 0.
5.) Fix a case where an error return was lost.
6.) When signalling an error to the caller by invoking the callback, do
not use a segment pointer of NULL for compatability with existing
drivers.

Also, increase the maximum segment number to 64; it is rather arbitrary,
with the exception of the of the stack space consumed by the segment
array.

Special thanks go to Harti Brandt <brandt@fokus.fraunhofer.de> for
spotting 4 and 5, and testing many iterations of patches.

Pointy hats to: tmm


109647 21-Jan-2003 tmm

Fix iommu_dvmamap_sync(): it was still operating as if the BUS_DMASYNC_*
constants where flag bits (as in NetBSD), although they are consecutively
numbered in FreeBSD. This would cause unnecessary flushing in the
BUS_DMASYNC_POSTWRITE case, but was otherwise mostly harmless.


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109615 21-Jan-2003 jeff

- Add a VM_WAIT in the appropriate cases where vm_page_alloc() fails and flags
indicate that uma_small_alloc should not. This code should be refactored so
that there is not so much cross arch duplication.

Reviewed by: jake
Spotted by: tmm
Tested on: alpha, sparc64
Pointy hat to: jeff and everyone who cut and pasted the bad code. :-)


109605 21-Jan-2003 jake

Resolve relative relocations in klds before trying to parse the module's
metadata. This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by: Hartmut Brandt <brandt@fokus.gmd.de>
PR: 46732
Tested on: alpha (obrien), i386, sparc64


109375 16-Jan-2003 mdodd

The abs() function isn't defined locally; include a header file that
defines it.


109342 16-Jan-2003 dillon

Merge all the various copies of vm_fault_quick() into a single
portable copy.


109340 15-Jan-2003 dillon

Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy. Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).


109269 15-Jan-2003 mdodd

- GC a few more hand-rolled 'abs' macros.
- GC a few hand-rolled min()/max() macros while I'm here.


109036 10-Jan-2003 jake

Don't allow user process to set an invalid window state through sigreturn.

Spotted by: tmm


108830 06-Jan-2003 tmm

Change the iommu code to be able to handle more than one DVMA area per
map. Use this new feature to implement iommu_dvmamap_load_mbuf() and
iommu_dvmamap_load_uio() functions in terms of a new helper function,
iommu_dvmamap_load_buffer(). Reimplement the iommu_dvmamap_load()
to use it, too.
This requires some changes to the map format; in addition to that,
remove unused or redundant members.
Add SBus and Psycho wrappers for the new functions, and make them
available through the respective DMA tags.


108821 06-Jan-2003 tmm

- remove the unused parent DMA tag argument from
_nexus_dmamap_load_buffer()
- implement nexus_dmamap_load() in terms of _nexus_dmamap_load_buffer().
Note that this is untested, as this code is not currently used (but
might be later for UPA devices).
- move BUS_DMAMAP_NSEGS to bus_private.h
- disable the ecache flushing in nexus_dmamap_sync(); it should not be
needed, although the docs are not entirely clear on that.


108815 06-Jan-2003 tmm

Prefix the members of struct bus_space_tag and struct bus_dma_tag with
a uniqifier. No functional changes.


108810 06-Jan-2003 tmm

Style and comment fixes, no functional changes.


108802 06-Jan-2003 tmm

Some cleanup:
- move some constants into iommureg.h
- correct some comments
- use KASSERT() in one place instead of rolling our own
- take a sanity check out of #ifdef DIAGNOSTIC
- fix a syntax error in normally #ifdef'ed out debug code


108700 05-Jan-2003 jake

- Reorganize PMAP_STATS to scale a little better.
- Add some more stats for things that are now considered interesting.


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108470 30-Dec-2002 schweikh

Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.


108379 29-Dec-2002 jake

Use the meaningful mnemonics for ancillary state registers now that gas
is invoked properly to understand them.

%asr19 -> %gsr
%asr20 -> %set_softint
%asr21 -> %clear_softint


108378 28-Dec-2002 jake

Forgot this file in previous commit.


108377 28-Dec-2002 jake

- Moved storing %g1-%g5 in the trapframe until after interrupts are enabled.
- Restore %g6 and %g7 for kernel traps if we are returning to prom code.
This allows complex traps (ones that call into C code) to be handled from
the prom.


108374 28-Dec-2002 jake

Pass 0 in %o1 to tl0_trap for all non-interrupt traps. This will be used
to pass the pil when tl0_trap also handles interrupts.


108360 28-Dec-2002 alc

Hold the page queues lock around calls to vm_page_flag_clear() and
vm_page_wakeup().


108344 28-Dec-2002 alc

Use VM_ALLOC_WIRED in pmap_pinit().


108331 27-Dec-2002 jake

Teach /dev/kmem about direct mapped addresses.

Note that a better solution for how to make kernacc work for direct mapped
addresses is needed for all platforms that use them.


108302 27-Dec-2002 jake

Implement uma_small_alloc and uma_small_free. Not yet used.


108301 27-Dec-2002 jake

- Use direct mapped addresses for the message buffer, for the crash dump
mappings, and for pmap_map which is used to map the vm_page structures.
- Don't allocate kva space for any of the above.


108245 23-Dec-2002 jake

- Change the way the direct mapped region is implemented to be generally
useful for accessing more than 1 page of contiguous physical memory, and
to use 4mb tlb entries instead of 8k. This requires that the system only
use the direct mapped addresses when they have the same virtual colour as
all other mappings of the same page, instead of being able to choose the
colour and cachability of the mapping.
- Adapt the physical page copying and zeroing functions to account for not
being able to choose the colour or cachability of the direct mapped
address. This adds a lot more cases to handle. Basically when a page has
a different colour than its direct mapped address we have a choice between
bypassing the data cache and using physical addresses directly, which
requires a cache flush, or mapping it at the right colour, which requires
a tlb flush. For now we choose to map the page and do the tlb flush.

This will allows the direct mapped addresses to be used for more things
that don't require normal pmap handling, including mapping the vm_page
structures, the message buffer, temporary mappings for crash dumps, and will
provide greater benefit for implementing uma_small_alloc, due to the much
greater tlb coverage.


108195 23-Dec-2002 jake

- Fix a bug where the faulting address for an mmu miss could sometimes be
clobbered due to some debug code. This was harmless and just superfluous
soft faults.
- Update some comments.


108193 22-Dec-2002 jake

- Rearrange pmap_bootstrap slightly to be more in dependency order.
- Put the kernel tsb before before the kernel load address, below
VM_MIN_KERNEL_ADDRESS, instead of after the kernel where it consumes
usable kva. This is magic mapped so the virtual address is irrelevant,
it just needs to be out of the way.


108187 22-Dec-2002 jake

- Add a spin lock to single thread cache invalidation and tlb flush ipis,
which allows ipis to be sent outside of Giant.
- Remove the ap boot mutex, which is unused.


108166 21-Dec-2002 jake

- Add a pmap pointer to struct md_page, and use this to find the pmap that
a mapping belongs to by setting it in the vm_page_t structure that backs
the tsb page that the tte for a mapping is in. This allows the pmap that
a mapping belongs to to be found without keeping a pointer to it in the
tte itself.
- Remove the pmap pointer from struct tte and use the space to make the
tte pv lists doubly linked (TAILQs), like on other architectures. This
makes entering or removing a mapping O(1) instead of O(n) where n is the
number of pmaps a page is mapped by (including kernel_pmap).
- Use atomic ops for setting and clearing bits in the ttes, now that they
return the old value and can be easily used for this purpose.
- Use __builtin_memset for zeroing ttes instead of bzero, so that gcc will
inline it (4 inline stores using %g0 instead of a function call).
- Initially set the virtual colour for all the vm_page_ts to be equal to their
physical colour. This will be more useful once uma_small_alloc is
implemented, but basically pages with virtual colour equal to phsyical
colour are easier to handle at the pmap level because they can be safely
accessed through cachable direct virtual to physical mappings with that
colour, without fear of causing illegal dcache aliases.

In total these changes give a minor performance improvement, about 1%
reduction in system time during buildworld.


108157 21-Dec-2002 jake

Make pmap_qenter and pmap_qremove look more like the other pmaps.


108155 21-Dec-2002 jake

Removed unused pmap_qenter_flags.


108140 20-Dec-2002 jake

Add page queue locking around functions that call vm_page_flag_set. This
fixes a failed assertion early in boot on sparc64.

Reported by: Roderick van Domburg <r.s.a.vandomburg@student.utwente.nl>


107719 10-Dec-2002 julian

Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by: re (jhb)


107517 02-Dec-2002 tmm

Remove a workaround for a binutils bug that was fixed in the recent
import, as it breaks the relocation kernel modules built with the new
binutils.
Note that this, together with the binutils import, marks a kernel module
flag day on sparc64: modules built with the old binutils will not work
with new kernels and vice versa. Mismatches will result in panics.

Approved by: re


107470 01-Dec-2002 tmm

Do not panic when a dmamap is unloaded more then once, but just silently
ignore it. This is non-fatal on the other architectures, and some
drivers seem to do this.

Approved by: re


107211 24-Nov-2002 alc

Add page queues locking to vunmapbuf().

Approved by: re (blanket)


107180 22-Nov-2002 mux

Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit(). This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit(). We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by: re@
Suggested by: jhb


107103 20-Nov-2002 jhb

Fix compile in the case of SMP defined but DDB not defined.

Approved by: re (implicit, DP2 doesn't build w/o this)


107037 18-Nov-2002 jake

Run configure at SI_SUB_THIRD instead of SI_SUB_ANY like other
architectures.


106994 17-Nov-2002 jake

MFi386 r1.369. Clear the PG_WRITEABLE flag in pmap_clear_write; return
immediately if its already clear.

Suggested by: alc


106977 16-Nov-2002 deischen

Add getcontext, setcontext, and swapcontext as system calls.
Previously these were libc functions but were requested to
be made into system calls for atomicity and to coalesce what
might be two entrances into the kernel (signal mask setting
and floating point trap) into one.

A few style nits and comments from bde are also included.

Tested on alpha by: gallatin


106838 13-Nov-2002 alc

Move pmap_collect() out of the machine-dependent code, rename it
to reflect its new location, and add page queue and flag locking.

Notes: (1) alpha, i386, and ia64 had identical implementations
of pmap_collect() in terms of machine-independent interfaces;
(2) sparc64 doesn't require it; (3) powerpc had it as a TODO.


106616 08-Nov-2002 tmm

Remove physmem from here, too, as it is defined in vm_init.c since
r1.35 (forgotten in my last commit due to a botched patch).

Pointy hat to: tmm


106555 07-Nov-2002 tmm

Add two new workaround for firmware anomalies:
1. At least some Netra t1 models have PCI buses with no associated
interrupt map, but obviously expect the PCI swizzle to be done with
the interrupt number from the higher level as intpin. In this case,
the mapping also needs to continue at parent bus nodes.
To handle that, add a quirk table based on the "name" property of
the root node to avoid breaking other boxen. This property is now
retrieved and printed at boot.
2. On SPARCengine Ultra AX machines, interrupt numbers are not mapped
at all, and full interrupt numbers (not just INOs) are given in
the interrupt properties. This is more or less cosmetical; the
PCI interrupt numbers would be wrong, but the psycho resource
allocation method would pass the right numbers on anyway.

Tested by: mux (1), Maxim Mazurok <maxim@km.ua> (2)


106541 06-Nov-2002 mux

s/HZ/Hz/


106503 06-Nov-2002 jmallett

Remove what was a temporary bogus assignment of bits of siginfo_t, as it does
not look like the prerequisites to fill it in properly will be in the tree
for the upcoming release, but it's mostly done, so there is no need for these
to stay around to remind us.


106050 27-Oct-2002 jake

Don peril sensitive sun glasses and change the default system call vector
for sparc64 from trap #9 to trap #65. This is one of the ABI "blessed"
system call vectors and is different from any other system that we might
want to emulate, making the emulation easier by reducing the number of
code paths that need to be shared. Compatibility with old applications
is provided with COMPAT_FREEBSD4.
Add defines for a few special traps that we may need to implement for
compatibility with 32bit applications, and add comments on which vectors
are used for what in other systems, and which are available.
Pass magic flags to trap() for deprecated or unimplemented system call
vectors so they will deliver SIGSYS instead of SIGILL.

This piggy backs nicely with the recent sigaction(2) system call number
change, and provided the rules are followed for upgrading past it, this
change should not be noticed.


105996 26-Oct-2002 jake

Allow deprecated or unimplemented system call vectors to deliver SIGSYS,
as suggested by the sparc v9 ABI.


105995 26-Oct-2002 jake

Remove an unused macro.


105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


105946 25-Oct-2002 tmm

Initialize tick_MHz and related variables much earlier. After the last
revision of tick.c, this was done at SI_SUB_CLOCKS, which is too late
because tick_MHz is required for DELAY() to work.

Reviewed by: jake


105945 25-Oct-2002 tmm

Fix iommu_dvmamap_sync() to use the right address when flushing the
streaming cache. This bug could have the potential to cause data
corruption on systems with Psycho U2P bridges (Sabre bridges have no
streaming cache).
However, due to the usual driver architecture, it is believed that
corruption did occur only in rare cases (if at all).


105939 25-Oct-2002 jake

Greatly improve readability of trap() by using a table to convert between
trap types and signals to send. Rearrange KASSERTs to better handle faults
early before curthread is setup, or in the case that it gets corrupted or
set to 0.


105908 25-Oct-2002 jake

Minor cleanups.
- use fields in sysent instead of PS_STRINGS
- set TSTATE_PRIV in frame0.tf_tstate for what its worth


105900 24-Oct-2002 julian

Extract out KSE specific code from machine specific code
so that there is ony one copy of it. Fix that one copy
so that KSEs with no mailbox in a KSE program are not a cause
of page faults (this can legitmatly happen).

Submitted by: (parts) davidxu


105819 23-Oct-2002 jhb

We always need sys/pcpu.h now, not just for the SMP case.

Approved by: jake


105733 22-Oct-2002 jake

- Expand struct trapframe to 256 bytes, make all fields fixed width and the
same size. Add some fields that previously overlapped with something else
or were missing.
- Make struct regs and struct mcontext (minus floating point) the same as
struct trapframe so converting between them is easy (null).
- Add space for saving floating point state to struct mcontext. This requires
that it be 64 byte aligned.
- Add assertions that none of these structures change size, as they are part
of the ABI.
- Remove some dead code in sendsig().
- Save and restore %gsr in struct trapframe. Remember to restore %fsr.
- Add some comments to exception.S.


105678 22-Oct-2002 jake

Start tick at the correct time (cpu_init_clocks), instead of cpu_startup.


105569 20-Oct-2002 mux

Set kernelname in sparc64_init() so that the kern.bootfile
sysctl works. This stuff should probably be made MI.

Reviewed by: jake


105545 20-Oct-2002 tmm

Use microuptime() instead of microtime() to bound the flush wait to
avoid hiccups in case of system time adjustment.


105531 20-Oct-2002 tmm

Add kernel dump support, based on the ia64 version (which was committed
as sparc64/sparc64/dump_machdep.c a while back).
Other than ia64 (which uses ELF), sparc64 uses a homegrown format for
the dumps (headers are required because the physical address and size of
the tsb must be noted, and because physical memory may be discontiguous);
ELF would not offer any advantages here.

Reviewed by: jake


105501 20-Oct-2002 alc

- Lock page queue accesses in pmap_release().


105469 19-Oct-2002 marcel

Add two hooks to signal module load and module unload to MD code.
The primary reason for this is to allow MD code to process machine
specific attributes, segments or sections in the ELF file and
update machine specific state accordingly. An immediate use of this
is in the ia64 port where unwind information is updated to allow
debugging and tracing in/across modules. Note that this commit
does not add the functionality to the ia64 port. See revision 1.9
of ia64/ia64/elf_machdep.c.

Validated on: alpha, i386, ia64


105346 17-Oct-2002 tmm

When entering the firmware mappings into the kernel tlb, clear all 'soft'
bits that might be set in the firmware tte data field, and set the soft
flag TD_EXEC to mark the page executable. Failing to do the latter would
cause fatal instruction faults in the prom in certain situations.

Reviewed by: jake


105012 12-Oct-2002 jake

Removed unused tl0_syscall.


104486 04-Oct-2002 sam

New bus_dma interfaces for use by crypto device drivers:

o bus_dmamap_load_mbuf
o bus_dmamap_load_uio

Test on i386. Known to compile on alpha and sparc64, but not tested.
Otherwise untried.


104354 02-Oct-2002 scottl

Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by: jake, peter, jhb


104271 01-Oct-2002 jake

Get rid of the TODO macro in the few places that still need work; either
comment it out or change to explicit panics. It conflicts with things
like #if TODO in drivers.


104247 01-Oct-2002 jake

Use M_NOWAIT instead of M_WAITOK when allocating dmamaps; the allocations
functions may be called from a device strategy routine when sleeping is
bad.

Submitted by: phk
Reviewed by: tmm


104075 28-Sep-2002 jake

Renamed intr_enqueue to intr_vector and intr_dequeue to intr_fast, to
better reflect how they are called.


104074 28-Sep-2002 jake

Moved most interrupt related code to a new file, interrupt.S.


104072 27-Sep-2002 jake

Add a workaround for what seems to be confusion between binutils and the
sparc v9 ABI. The Elf_Rela records for local symbols appear to already
have the symbol's value added in to the addend field, even though the ABI
specifies we need to lookup the symbol and add its value too. This breaks
text relocations in klds because the symbol's value is added twice, and
the resulting address points off into nowhere land, so for now just use
the addend.

Tested by: rwatson


103922 25-Sep-2002 jake

Removed debug code.


103921 25-Sep-2002 jake

Pass the function to call (trap or syscall) to tl0_trap and tl1_trap in %o2.


103919 24-Sep-2002 jake

Rearrange tl1_trap slightly, also save and restore the out registers so
that instruction emulation is possible in kernel mode.


103916 24-Sep-2002 jake

Allocate stack space for the trapframe along with the normal register
frame in the save instruction, rather than doing a separate sub.


103897 24-Sep-2002 jake

Split user trap processing out into a separate routine so that traps which
never result in user traps don't have to plow through it.


103784 22-Sep-2002 jake

Call trap directly for exceptional cases that need more processing on
return to usermode, rather than branching back to a label before the
original call.


103773 22-Sep-2002 jake

Remove unneeded opt headers.

Noticed by: benno


103770 22-Sep-2002 jake

Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c
so that it is MI. Allow nfs_mountroot to return an error if the nfs_diskless
struct is not valid, rather than panicing later on. Call nfs_setup_diskless()
from nfs_mountroot if NFS_ROOT is defined, like bootpc_init(). Removed legacy
root mount support for sparc64, and enabled NFS_ROOT by default.


103654 19-Sep-2002 jhb

Use correct function name in previous commit.

Submitted by: jake
Pointy hat to: jhb


103646 19-Sep-2002 jhb

Implement db_print_backtrace() if DDB is compiled into the kernel. This
MD function is just a wrapper around db_stack_trace_cmd() that prints out
a backtrace of curthread. Currently, this function is only implemented
on i386 and alpha (and the alpha version isn't quite tested yet, will do
that in a bit). Other changes:

- For i386, fix a bug in the raw frame address case. The eip we extract
from the passed in frame address does not match the frame we received.
Thus, instead of printing a bogus frame with the wrong eip, go ahead
and advance frame down to the same frame as the eip we are using.
- For alpha, attempt to add a way of doing a raw trace for alpha. Instead
of passing a frame address in 'addr', pass in a pointer to a structure
containing PC and KSP and use those to start the backtrace. The alpha
db_print_backtrace() uses asm to read in the current PC and KSP values
into such a request.

Tested on: i386
Requested by: many


103494 17-Sep-2002 jake

Fix standard kse breakge of non-x86 platforms. sigh.

Pointy hat to: kse


103367 15-Sep-2002 julian

Allocate KSEs and KSEGRPs separatly and remove them from the proc structure.
next step is to allow > 1 to be allocated per process. This would give
multi-processor threads. (when the rest of the infrastructure is
in place)

While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc
are diverging more than they should.. corrective action needed soon.


103081 07-Sep-2002 jmallett

Fill out two fields (si_pid, si_uid) in the siginfo structure handed back
to userland in the signal handler that were not being iflled out before, but
should and can be.

This part of sendsig could be slightly refactored to use an MI interface, or
ideally, *sendsig*() would have an API change to accept a siginfo_t, which
would be filled out by an MI function in the level above sendsig, and said MI
function would make a small call into MD code to fill out the MD parts (some
of which may be bogus, such as the si_addr stuff in some places). This would
eventually make it possible for parts of the kernel sending signals to set up
a siginfo with meaningful information.

Reviewed by: mux
MFC after: 2 weeks


102873 02-Sep-2002 jake

Remove an unneeded PROC_LOCK, which caused lock recursion panics.
Print a warning about old applications with no signal trampoline.

Reported by: marius@alchemy.franken.de


102808 01-Sep-2002 jake

Added fields for VM_MIN_ADDRESS, PS_STRINGS and stack protections to
sysentvec. Initialized all fields of all sysentvecs, which will allow
them to be used instead of constants in more places. Provided stack
fixup routines for emulations that previously used the default.


102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


102561 29-Aug-2002 jake

Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.


102557 29-Aug-2002 jake

Minor cleanup.


102555 29-Aug-2002 jake

Removed legacy signal trampoline.


102554 29-Aug-2002 jake

Removed support for in-kernel signal code.


102399 25-Aug-2002 alc

o Retire pmap_pageable(). It's an advisory routine that none
of our platforms implements.


102042 18-Aug-2002 jake

Forgot this in last commit.


102040 18-Aug-2002 jake

Add pmap support for user mappings of multiple page sizes (super pages).
This supports all hardware page sizes (8K, 64K, 512K, 4MB), but only 8k
pages are actually used as of yet.


101958 16-Aug-2002 jake

Use symbolic constants instead of magic address constants.


101956 16-Aug-2002 jake

Removed unneeded pmap_initialized flag.


101955 16-Aug-2002 jake

Demark sections of code that need special fault handling with labels.
Check if the trapped pc is inside of the demarked sections to implement
fault recovery for copyin etc, instead of pcb_onfault. Handle recovery
from data access exceptions as well as page faults.

Inspired by: bde's sys.dif


101899 15-Aug-2002 jake

Fix some confusion regarding traps that use mmu globals but don't really
have any reason to; force alternat globals instead, which is what we want.


101898 15-Aug-2002 jake

Store the number of itlb and dtlb entries separately; they may be different.
Find the prom node for the boot cpu earlier and store it in the per-cpu
area, so that cache_init can be called earlier.


101870 14-Aug-2002 jake

Set kernel_vm_end. Panic if we try to grow the kernel.


101653 10-Aug-2002 jake

Auto size available kernel virtual address space based on phsyical memory
size. This avoids blowing out kva in kmeminit() on large memory machines
(4 gigs or more).

Reviewed by: tmm


101642 10-Aug-2002 alc

o Remove the setting and clearing of the PG_MAPPED flag. (This flag is
obsolete.)


101346 05-Aug-2002 alc

o Don't set PG_MAPPED or PG_WRITEABLE when a page is mapped
using pmap_kenter() or pmap_qenter().
o Use VM_ALLOC_WIRED in pmap_new_thread().


101121 01-Aug-2002 jake

Modify the cache handling code to assume 2 virtual colours, which is much
simpler and easier to get right. Add comments. Add more statistic
gathering on cacheable and uncacheable mappings.


101119 31-Jul-2002 jake

Add some statistic gathering for cache flushes.


101082 31-Jul-2002 jake

These file are no longer used (moved to userland and/or merged into
pmap.c).


101072 31-Jul-2002 jake

These were repo-copied to have a .S extension.


100910 30-Jul-2002 jake

Add definitions for statistical and high-resolution profiling. The calling
conventions for _mcount and __cyg_profile_func_enter are different, so
statistical profiling kernels build and link but don't actually work.
IWBNI one could tell gcc to only generate calls to the former.

Define uintfptr_t properly for userland, but not for the kernel (I hope).


100909 30-Jul-2002 jake

The data cache on UltraSPARC III is not directly mapped, so don't assert
that. This breaks assumptions made by some of the cache flushing code,
but UltraSPARC III has different methods for invalidating cache lines
anyway.


100902 30-Jul-2002 jake

Panic if the data cache has too many virtual colors (more than 2).


100899 30-Jul-2002 jake

Use _ALIGN_DATA and _ALIGN_TEXT.


100844 29-Jul-2002 jake

Add routines needed for high resolution profiling.


100843 29-Jul-2002 jake

Add a symbol for btext.


100842 29-Jul-2002 jake

Remove a stale comment.


100841 29-Jul-2002 jake

Use _ALIGN_TEXT. Implement __cyg_profile_func_enter and
__cyg_profile_func_exit for GUPROF.


100839 29-Jul-2002 jake

Remove some stuff that snuck in last commit.


100830 28-Jul-2002 jake

Fix a bug introduced in previous commit. Due to the interaction of the
direct physical mappings with virtual page colour, we need to flush the
data cache when a page changes colour. I missed one case which broke
pipes.


100771 27-Jul-2002 jake

Implement a direct mapped address region, like alpha and ia64. This
basically maps all of physical memory 1:1 to a range of virtual addresses
outside of normal kva. The advantage of doing this instead of accessing
phsyical addresses directly is that memory accesses will go through the
data cache, and will participate in the normal cache coherency algorithm
for invalidating lines in our own and in other cpus' data caches. So
we don't have to flush the cache manually or send IPIs to do so on other
cpus. Also, since the mappings never change, we don't have to flush them
from the tlb manually.
This makes pmap_copy_page and pmap_zero_page MP safe, allowing the idle
zero proc to run outside of giant.

Inspired by: ia64


100718 26-Jul-2002 jake

Remove the tlb argument to tlb_page_demap (itlb or dtlb), in order to better
match the pmap_invalidate api.


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


100188 16-Jul-2002 tmm

When multiple IOMMUs are present in a system, use a single TSB for all
of them, and couple them by always performing all operations on all
present IOMMUs. This is required because with the current API there
is no way to determine on which bus a busdma operation is performed.

While being there, clean up the iommu code a bit.

This should be a step in the direction of allow some of larger machines
to work; tests have shown that there still seem to be problems left.


99999 14-Jul-2002 alc

o Lock page queue accesses by vm_page_wire().


99936 14-Jul-2002 jake

Try both upa-portid and portid properties when finding the module id of a
secondary cpu. Its called portid on UltraSPARCIII machines.


99935 14-Jul-2002 jake

Remove debug code.


99927 13-Jul-2002 alc

o Complete the locking of page queue accesses by vm_page_unwire().
o Assert that the page queues lock is held in vm_page_unwire().
o Make vm_page_lock_queues() and vm_page_unlock_queues() visible
to kernel loadable modules.


99900 13-Jul-2002 mini

Add additional cred_free_thread() calls that I had missed the first time.

Pointed out by: jhb


99896 13-Jul-2002 jake

Identify UltraSPARC-III and UltraSPARC-III+ cpus.


99887 12-Jul-2002 jhb

Set the thread state of the newly chosen to run thread to TDS_RUNNING in
choosethread() in MI C code instead of doing it in in assembly in all the
various cpu_switch() functions. This fixes problems on ia64 and sparc64.

Reviewed by: julian, peter, benno
Tested on: i386, alpha, sparc64


99830 11-Jul-2002 jhb

thread_exit() requires PROC_LOCK to be held, so lock it.


99571 08-Jul-2002 peter

Add a special page zero entry point intended to be called via the single
threaded VM pagezero kthread outside of Giant. For some platforms, this
is really easy since it can just use the direct mapped region. For others,
IPI sending is involved or there are other issues, so grab Giant when
needed.

We still have preemption issues to deal with, but Alan Cox has an
interesting suggestion on how to minimize the problem on x86.

Use Luigi's hack for preserving the (lack of) priority.

Turn the idle zeroing back on since it can now actually do something useful
outside of Giant in many cases.


99559 07-Jul-2002 peter

Collect all the (now equivalent) pmap_new_proc/pmap_dispose_proc/
pmap_swapin_proc/pmap_swapout_proc functions from the MD pmap code
and use a single equivalent MI version. There are other cleanups
needed still.

While here, use the UMA zone hooks to keep a cache of preinitialized
proc structures handy, just like the thread system does. This eliminates
one dependency on 'struct proc' being persistent even after being freed.
There are some comments about things that can be factored out into
ctor/dtor functions if it is worth it. For now they are mostly just
doing statistics to get a feel of how it is working.


99556 07-Jul-2002 peter

Fix (s/proc/thread/) some typos in two panic messages.


99422 05-Jul-2002 peter

Back out proc part of last commit. UMA manages the thread cache only, and
we just have to deal with the kstack when told to. We do not have a
UMA-managed cache for the proc struct and its associated upage yet. So,
go back to the old lazy mechanism. Note that if UMA destroys pages that
used to contain proc structures, we'll lose the corresponding upage
forever. (zones never did this - once a page was allocated, it stayed
attached to the proc zone forever)


99420 05-Jul-2002 peter

Take a shot at implementing changes from i386/pmap.c rev 1.328-1.331.


99095 29-Jun-2002 julian

Fix reverse ordering of locks. add a comment about locks on some platforms.

Submitted by: jhb@freebsd.org


99072 29-Jun-2002 julian

Part 1 of KSE-III

The ability to schedule multiple threads per process
(one one cpu) by making ALL system calls optionally asynchronous.
to come: ia64 and power-pc patches, patches for gdb, test program (in tools)

Reviewed by: Almost everyone who counts
(at various times, peter, jhb, matt, alfred, mini, bernd,
and a cast of thousands)

NOTE: this is still Beta code, and contains lots of debugging stuff.
expect slight instability in signals..


99025 29-Jun-2002 jake

Fix a deletion during traversal tailq bug.


98813 25-Jun-2002 jake

pmap_kremove can no longer be used to remove the magic device mappings
installed with pmap_kenter_flags, since the physical addresses may not
have an associated vm_page. Add a function to do this.

Tested by: Tomi Vainio <Tomi.Vainio@Sun.COM>


98765 24-Jun-2002 jake

Add an MD callout like cpu_exit, but which is called after sched_lock is
obtained, when all other scheduling activity is suspended. This is needed
on sparc64 to deactivate the vmspace of the exiting process on all cpus.
Otherwise if another unrelated process gets the exact same vmspace structure
allocated to it (same address), its address space will not be activated
properly. This seems to fix some spontaneous signal 11 problems with smp
on sparc64.


98727 24-Jun-2002 mini

Remove unused diagnostic function cread_free_thread().

Approved by: alfred


98678 23-Jun-2002 mux

Include machine/critical.h to get missing prototypes.

Reviewed by: tmm


98651 22-Jun-2002 jake

Fix a bug related to marking pages virtually uncacheable due to illegal
dcache aliasing. A page that already had more than 1 mapping of the
same virtual colour would not be correctly uncached.

Noticed by: Artur Grabowski <art@openbsd.org>


98635 22-Jun-2002 mux

Warning fix.

Reviewed by: peter


98511 20-Jun-2002 jake

{f,s}usword -> {f,s}uword16. Implement {f,s}uword32.

Requested by: peter


98480 20-Jun-2002 peter

Deorbit suibyte(). It was only used for split address space systems
for supporting UIO_USERISPACE (ie: it wasn't used).


98350 17-Jun-2002 jake

Add constants for the min and max prom addresses. Use these instead of
magic numbers. Use stxa_sync instead of stxa; membar #Sync; to ensure
that no instruction is placed between the two. This can cause random
corruption even though interrupts are already disabled.


98037 08-Jun-2002 jake

Add code to drop to ddb when a process gets a fatal signal that usually
suggests kernel bugs (4, 10, 11). Add a sysctl debug.debugger_on_signal
which turns this on and off, default off.


98033 08-Jun-2002 jake

Remove test code.


98032 08-Jun-2002 jake

Remove code from trap which is handled in userland now.


98031 08-Jun-2002 jake

Fix bizarre SMP problems. The secondary cpus sometimes start up with junk
in their tlb which the prom doesn't clear out, so we have to do so manually
before mapping the kernel page table or the cpu can hang due various
conditions which cause undefined behaviour from the tlb.


98001 07-Jun-2002 jhb

- Fixup / remove obsolete comments.
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.


97871 05-Jun-2002 jake

Use pmap_map instead of pmap_kenter to map the message buffer. Its too
early for pmap_kenter.


97511 29-May-2002 jake

Forgot to commit this file. Catch up to loader->kernel abi changes.


97450 29-May-2002 jake

Don't try to flush illegal alises from the data cache in vmapbuf and
vunmapbuf, this is handled by pmap now.


97449 29-May-2002 jake

Add an MD page flag for tracking if a page is cacheable or not, so that
we don't flush all mappings of a physical page in order to make it
virtually cachable again, if it is already cachable.


97448 29-May-2002 jake

Remove an unused variable.


97447 29-May-2002 jake

Merge the code in pv.c into pmap.c directly. Place all page mappings onto
the pv lists in the vm_page, even unmanaged kernel mappings. This is so
that the virtual cachability of these mappings can be tracked when a page
is mapped to more than one virtual address. All virtually cachable
mappings of a physical page must have the same virtual colour, or illegal
alises can be created in the data cache. This is a bit tricky because we
still have to recognize managed and unmanaged mappings, even though they
are all on the pv lists.


97446 29-May-2002 jake

Add pv list linkage and a pmap pointer to struct tte. Remove separately
allocated pv entries and use the linkage in the tte for pv operations.


97445 29-May-2002 jake

Use a contrived 'tlb_entry' structure for passing the mappings for the
kernel text and data from the loader to the kernel, so that the tte format
is not part of the loader->kernel ABI.


97444 29-May-2002 jake

Remove pmap.pm_pvlist and make the functions that use it no-ops. These are
all optimizations for architectures which have large sparse page tables,
and/or can't put the pv linkage inside of the page table entries.


97307 26-May-2002 dfr

Add declarations of suword32 and suword64. Add implementations of one or
the other (or both) to all the platforms. Similar for fuword32 and
fuword64.


97265 25-May-2002 jake

Convert the interrupt queue from an array to a linked list. Implement
intr_dequeue in asm so that it can easily be modified to do light weight
context switching.


97263 25-May-2002 jake

Try to handle "double faults" occuring at more trap levels (ie 4 :)).


97030 21-May-2002 jake

Rewrite pmap_enter to avoid copying ttes in all cases.
Pass the tte data to tsb_tte_enter instead of a whole tte, also to avoid
copying.


97027 21-May-2002 jake

Redefine the tte accessor macros to take a pointer to a tte, instead of the
value of the tag or data field.
Add macros for getting the page shift, size and mask for the physical page
that a tte maps (which may be one of several sizes).
Use the new cache functions for invalidating single pages.


97001 20-May-2002 jake

Add SMP aware cache flushing functions, which operate on a single physical
page. These send IPIs if necessary in order to keep the caches in sync on
all cpus.


96998 20-May-2002 jake

De-inline the tlb demap functions. These were so big that gcc3.1 refused
to inline them anyway. ;)


96757 16-May-2002 eric

Banish "priviledged" from kernel source.


96485 13-May-2002 jake

Add another copy of the ia64 dump_machdep.c.


96207 08-May-2002 jake

Make a macro for the guts of tl0_immu_miss, like dmmu_miss and prot.
Rearrange things slightly so that the contents of the tag access
register are read and restored outside of the macros. The intention
is to pass the page size to look up as an argument to the macros.


95744 29-Apr-2002 jake

Add support for an alternate signal trampoline; add a sysarch call to register
an alternate trampoling with the kernel.


95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


95410 25-Apr-2002 marcel

Don't use the symbol name to lookup the symbol value when we can use
the symbol index defined by the relocation. The elf_lookup() support
function is to be used by elf_reloc() when symbol lookups need to be
done. The elf_lookup() function operates on the symbol index and
will do a symbol name based lookup when such is required, otherwise
it uses the symbol index directly. This solves the problem seen on
ia64 where the symbol hash table does not contain local symbols and
a symbol name based lookup would fail for those symbols.

Don't pass the symbol name to elf_reloc(), as it isn't used any more.


95232 21-Apr-2002 jake

Avoid using pmap_kenter "early", since it may need to dink with vm_page
structures, which may not be setup yet. Minor cleanups.


95139 20-Apr-2002 jake

MFi386 1.222. Remove vm_map_growstack and acquisition and release of Giant
from trap_pfault.


95134 20-Apr-2002 jake

Check the alignment of the stack pointer before copying in windows from the
user stack in response to a failed window fill, allowing the process to be
killed if its wrong. This caused user programs which misalign their stack
pointer to get stuck in an infinite loop at the kernel-userland boundary,
which is mostly harmless.

The same thing causes a fatal RED state exception on OpenBSD and probably
NetBSD.

Inspired by: art@openbsd.org


95133 20-Apr-2002 jake

Fix off by one errors in cache flush calls (mostly harmless).


95132 20-Apr-2002 jake

Add needed include of tick.h.


94777 15-Apr-2002 peter

Pass vm_page_t instead of physical addresses to pmap_zero_page[_area]()
and pmap_copy_page(). This gets rid of a couple more physical addresses
in upper layers, with the eventual aim of supporting PAE and dealing with
the physical addressing mostly within pmap. (We will need either 64 bit
physical addresses or page indexes, possibly both depending on the
circumstances. Leaving this to pmap itself gives more flexibilitly.)

Reviewed by: jake
Tested on: i386, ia64 and (I believe) sparc64. (my alpha was hosed)


94606 13-Apr-2002 alc

o Remove vm_map_growstack() and useracc() from sendsig(). Copyout() and
suword() will automatically grow the stack if needed.
o Add a comment that osigreturn() and sigreturn() are MPSAFE.


94257 09-Apr-2002 jake

Forgot these files in previous commit to frame.h. Also add needed include
of machine/emul.h.


94254 09-Apr-2002 jake

Rename some fields in struct frame to be compatible with NetBSD/OpenBSD,
and add some compatibility defines. Add fields for ins and locals to
struct reg also for the same reason; these aren't filled in yet because
getting at those registers sucks and I'd rather not save them in the
trapframe just for this. Reorder struct reg to be ABI compatible as
well. Add needed include of machine/emul.h.

This gets pmdb (poor man's debugger) from OpenBSD mostly compiling but it
doesn't work yet :(


94151 07-Apr-2002 phk

GC the "dumplo" variable, which is no longer used.

A lot of sys/*/*/machdep.c seems not to be.


93949 06-Apr-2002 jake

Provide an implementation of KTR_CPU that doesn't use pcpu, so we don't
crash and burn if its not setup yet. Add timestamp, cpu, and (fake) file
and line recording to the asm version of CTR.


93943 06-Apr-2002 jake

Remove invalid KASSERTS.


93839 04-Apr-2002 tmm

Add MD frontents for the mk48txx driver, ported from NetBSD, and remove
stub implementations of inittodr() and resettodr(), now that the MI ones
are used.


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93793 04-Apr-2002 bde

Moved signal handling and rescheduling from userret() to ast() so that
they aren't in the usual path of execution for syscalls and traps.
The main complication for this is that we have to set flags to control
ast() everywhere that changes the signal mask.

Avoid locking in userret() in most of the remaining cases.

Submitted by: luoqi (first part only, long ago, reorganized by me)
Reminded by: dillon


93702 02-Apr-2002 jhb

- Move the MI mutexes sched_lock and Giant from being declared in the
various machdep.c's to being declared in kern_mutex.c.
- Add a new function mutex_init() used to perform early initialization
needed for mutexes such as setting up thread0's contested lock list
and initializing MI mutexes. Change the various MD startup routines
to call this function instead of duplicating all the code themselves.

Tested on: alpha, i386


93687 02-Apr-2002 tmm

Fix crashes that would happen when more than one 4MB page was used to
hold the kernel text, data and loader metadata by not using a fixed slot
to store the TSB page(s) into. Enter fake 8k page entries into the kernel
TSB that cover the 4M kernel page(s), sot that pmap_kenter() will work
without having to treat these pages as a special case.

Problem reported by: mjacob, obrien
Problem spotted and 4M page handling proposed by: jake


93685 02-Apr-2002 tmm

Remove the superfluous second argument from the IOTSBSLOT() macro.


93683 02-Apr-2002 tmm

Set mp_maxid so that UMA works with SMP.

Submitted by: jake


93607 01-Apr-2002 dillon

Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by: jake


93503 01-Apr-2002 jake

ktr changes to improve performance and make writing a userland utility to
dump the trace buffer feasible.
- Remove KTR_EXTEND. This changes the format of the trace entries when
activated, making writing a userland tool which is not tied to a specific
kernel configuration difficult.
- Use get_cyclecount() for timestamps. nanotime() is much too heavy weight
and requires recursion protection due to ktr traces occuring as a result
of ktr traces. KTR_VERBOSE may still require recursion protection, which
is now conditional on it.
- Allow KTR_CPU to be overridden by MD code. This is so that it is possible
to trace early in startup before pcpu and/or curthread are setup.
- Add a version number for the ktr interface. A userland tool can check this
to detect mismatches.
- Use an array for the parameters to make decoding in userland easier.
- Add file and line recording to the non-extended traces now that the extended
version is no more.

These changes will break gdb macros to decode the extended version of the
trace buffer which are floating around. Users of these macros should either
use the show ktr command in ddb, or use the userland utility which can be run
on a core dump.

Approved by: jhb
Tested on: i386, sparc64


93467 31-Mar-2002 phk

Centralize the "bootdev" and "dumpdev" variables. They are still pretty
bogus all things considered, but at least now they don't camouflage as
being MD variables.


93453 30-Mar-2002 alc

Correct a comment: sendsig() calls the MI vm_map_growstack() but
the corresponding comment refers to a MD grow_stack() that doesn't exist.


93389 29-Mar-2002 jake

Remove abuse of intr_disable/restore in MI code by moving the loop in ast()
back into the calling MD code. The MD code must ensure no races between
checking the astpening flag and returning to usermode.

Submitted by: peter (ia64 bits)
Tested on: alpha (peter, jeff), i386, ia64 (peter), sparc64


93317 28-Mar-2002 obrien

Don't be too fancy with null'ed out functions.

Requested by: jake


93316 28-Mar-2002 obrien

Add sysbeep() for the msmith RAID drivers.


93312 28-Mar-2002 obrien

style(9)

Approved by: jake


93273 27-Mar-2002 jeff

Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks
with this flag. Remove the dup_list and dup_ok code from subr_witness. Now
we just check for the flag instead of doing string compares.

Also, switch the process lock, process group lock, and uma per cpu locks over
to this interface. The original mechanism did not work well for uma because
per cpu lock names are unique to each zone.

Approved by: jhb


93271 27-Mar-2002 jake

Fix style bugs.


93270 27-Mar-2002 jake

Fix breakage.


93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


93121 25-Mar-2002 tmm

Add missing includes for the KTRACE case.


93119 25-Mar-2002 tmm

Make this compile (submitted by jake), add a missing include.


93118 25-Mar-2002 tmm

Remove second copy of iommu_decode_fault() which I accidentially added.

Pointy hat to: tmm


93070 24-Mar-2002 tmm

Revamp the busdma implementation a bit:
- change the IOMMU support code so that it supports overcommittting the
available DVMA memory, while still allocating as lazily as possible.
This is achieved by limiting the preallocation, and deferring the
allocation to map load time when it fails. In the latter case, the
DVMA memory reserved for unloaded maps can be stolen to free up enough
memory for loading a map.
- allow NULL settings in the method tables, and search the parent tags
until an appropriate implementation is found. This allows to remove some
kluges in the old implementation.


93068 24-Mar-2002 tmm

Fix sparc64_bus_mem_unmap() to pass the right address to kmem_free().


93067 24-Mar-2002 tmm

Make the OpenFirmware interrupt mapping code more generic, to reduce
the bus-dependent code and to be able to support more systems. The core
of the new code is mostly obtained from NetBSD.
Kluge the interrupt routing methods of the psycho and apb drivers so
that an intline of 0 can be handled for now; real routing is still not
possible (all intline registers are preinitialized instead); this will
require a sparc64-specific adaption of the driver for generic PCI-PCI
bridges with a custom routing method to work right.


93053 23-Mar-2002 tmm

Add code to print the fault virtual address for uncorrectable DMA errors
caused by IOMMU misses to aid debugging. This will only work on
UltraSPARC-IIi and IIe.


93051 23-Mar-2002 tmm

De-K&R.


93050 23-Mar-2002 tmm

Fix syscall ktraceing.


93049 23-Mar-2002 tmm

Make this compile without options DDB; use intr_disable() instead of
fiddling with PSTATE_IE manually.


93047 23-Mar-2002 tmm

Decruft some #if 0'ed code.


93030 23-Mar-2002 jake

Machine must be non-static for COMPAT_43 to compile. This is used in bsd/os
1.x compatibility code, which I'm sure we all use every day.


93028 23-Mar-2002 jake

Cleanup the trace back routine slightly. Print the leaf return value so
that traps inside of leaf functions are less confusing. Add a function
to print a non-symbolic trace of the user stack.


93001 23-Mar-2002 jake

Backout intrusive ktr traces in tlb fault handlers which have served their
purpose.


92850 21-Mar-2002 jeff

Remove references to vm_zone.h and switch over to the new uma API.

Reviewed by: jake


92844 21-Mar-2002 alfred

Remove __P.

profile.h and bus.h were excluded because there is currently WIP.

Reviewed by: tmm


92824 20-Mar-2002 jhb

Change the way we ensure td_ucred is NULL if DIAGNOSTIC is defined.
Instead of caching the ucred reference, just go ahead and eat the
decerement and increment of the refcount. Now that Giant is pushed down
into crfree(), we no longer have to get Giant in the common case. In the
case when we are actually free'ing the ucred, we would normally free it on
the next kernel entry, so the cost there is not new, just in a different
place. This also removse td_cache_ucred from struct thread. This is
still only done #ifdef DIAGNOSTIC.

Tested on: i386, alpha


92654 19-Mar-2002 jeff

This is the first part of the new kernel memory allocator. This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by: arch@


92465 17-Mar-2002 jake

Don't demap the requested page from the tlb in pmap_kenter or pmap_kremove,
even on the local cpu. These are no longer used unsafely in MI code, and
the MD code has been adjusted to compensate.


92464 17-Mar-2002 jake

Fix a problem where kernel text could become unmapped when clearing out all
the user mappings from the tlb due to the context numbers rolling over. The
store to the internal mmu register must be followed by a membar #Sync before
much else happens to "avoid data corruption", so we use special inlines which
both disable interrupts and ensure that the compiler will not insert extra
instructions between the two. Also, load the tte tag and check if the context
is nucleus context, rather than relying on the priviledged bit which doesn't
actually serve any purpose in our design, and check the lock bit too for
sanity.


92463 17-Mar-2002 jake

Use the tlb data access register to map the kernel tsb, rather than the data
in register. The latter uses the random replacment algorithm to pick the
slot, we want a specific slot.


92211 13-Mar-2002 jake

Fix braino.


92205 13-Mar-2002 jake

Add support for starting and stopping cpus with ipis.
Stop the other cpus when shutting down or entering the debugger.

Submitted by: tmm


92204 13-Mar-2002 jake

Use intr_disable/intr_restore instead of doing it manually.

Submitted by: tmm


92202 13-Mar-2002 jake

Add support for driving the clocks on secondary cpus.

Submitted by: tmm


92201 13-Mar-2002 jake

Fix a bug where the wrong number of windows were copied for a failed fill
on return to user mode. We may not have frame pointers setup for more
than 1 on return from exec.


92200 13-Mar-2002 jake

White space.


92199 13-Mar-2002 jake

Make IPI_WAIT use a bit mask of the cpus that a pmap is active on and only
wait for those cpus, instead of all of them by using a count. Oops.
Make the pointer to the mask that the primary cpu spins on volatile, so
gcc doesn't optimize out an important load. Oops again.
Activate tlb shootdown ipi synchronization now that it works. We have
all involved cpus wait until all the others are done. This may not be
necessary, it is mostly for sanity.
Make the trigger level interrupt ipi handler work.

Submitted by: tmm


91970 09-Mar-2002 tmm

Add a driver for the mem and kmem devices, based off the i386 version.


91783 07-Mar-2002 jake

Implement delivery of tlb shootdown ipis. This is currently more fine grained
than the other implementations; we have complete control over the tlb, so we
only demap specific pages. We take advantage of the ranged tlb flush api
to send one ipi for a range of pages, and due to the pm_active optimization
we rarely send ipis for demaps from user pmaps.

Remove now unused routines to load the tlb; this is only done once outside
of the tlb fault handlers.
Minor cleanups to the smp startup code.

This boots multi user with both cpus active on a dual ultra 60 and on a
dual ultra 2.


91782 07-Mar-2002 jake

Modify the tlb demap API to take a pmap instead of a tlb context number.
Due to allocating tlb contexts on the fly, we only ever need to demap the
primary context, non-primary contexts have already been implicitly flushed
by context switching. All we really need to tell is if its a kernel demap
or not, and its easier just to compare against the kernel_pmap which is a
constant.


91781 07-Mar-2002 jake

Implement kthread context stealing. This is a bit of a misnomer because
the context is not actually stolen, as it would be for i386. Instead of
deactivating a user vmspace immediately when switching out, and recycling
its tlb context, wait until the next context switch to a different user
vmspace. In this way we can switch from a user process to any number of
kernel threads and back to the same user process again, without losing any
of its mappings in the tlb that would not already be knocked by the automatic
replacement algorithm. This is not expected to have a measurable performance
improvement on the machines we currently run on, but it sounds cool and makes
the sparc64 port SMPng buzz word compliant.


91617 04-Mar-2002 jake

Add support for starting secondary cpus in kernel, as opposed to relying
on the loader to do it. Improve smp startup code to be less racy and to
defer certain things until the right time. This almost boots single user
on my dual ultra 60, it is still very fragile:

SMP: AP CPU #1 Launched!
Enter full pathname of shell or RETURN for /bin/sh:
# ls
Debugger("trapsig")
Stopped at Debugger+0x1c: ta %xcc, 1
db> heh
No such command
db>


91616 04-Mar-2002 jake

Dig the information about which tlb slots were used to map the kernel out
of the metadata passed by the loader.


91613 04-Mar-2002 jake

Allocate tlb contexts on the fly in cpu_switch, instead of statically 1 to 1
with pmaps. When the context numbers wrap around we flush all user mappings
from the tlb. This makes use of the array indexed by cpuid to allow a pmap
to have a different context number on a different cpu. If the context numbers
are then divided evenly among cpus such that none are shared, we can avoid
sending tlb shootdown ipis in an smp system for non-shared pmaps. This also
removes a limit of 8192 processes (pmaps) that could be active at any given
time due to running out of tlb contexts.

Inspired by: the brown book
Crucial bugfix from: tmm


91612 04-Mar-2002 jake

Fix obscure problems with vfork where part of the parent's stack could be
clobbered by the child. This is more complicated than usual because the
window that could get clobbered is pushed in kernel mode, so a lot of
registers would have to be saved in other registers in userland and we
don't have enough. What we do have is space in the pcb to temporarily
store user windows that were spilled in kernel mode, but could not be
immediately stored to the user stack. So we copy in the parent's topmost
window and save it in the pcb, and arrange for it to be copied back out
when the child is done frobbing the stack.

Reviewed by: tmm


91532 01-Mar-2002 jake

We don't need KTR_COMPILE in assym.s, its already in opt_global.h. Add
assyms for more ktr trace classes.


91531 01-Mar-2002 jake

Use a better trace class for ktr traces in the tlb fault handlers, which are
rather loud.


91504 28-Feb-2002 arr

- Move a comment from being on the same line as a #ifdef to the line
following it. This should have gone in the previous commit, but
misviewed Bruce's patch.

Requested by: bde


91475 28-Feb-2002 arr

- Fix panic() message and a couple style nits that snuck in from the
recent diagnostics commit (rev. 1.84).


91471 28-Feb-2002 silby

Fix a minor swap leak.

Previously, the UPAGES/KSTACK area of processes/threads would leak memory
at the time that a previously swapped process was terminated. Lukcily, the
leak was only 12K/proc, so it was unlikely to be a major problem unless you
had an undersized swap partition.

Submitted by: dillon
Reviewed by: silby
MFC after: 1 week


91403 27-Feb-2002 silby

Fix a horribly suboptimal algorithm in the vm_daemon.

In order to determine what to page out, the vm_daemon checks
reference bits on all pages belonging to all processes. Unfortunately,
the algorithm used reacted badly with shared pages; each shared page
would be checked once per process sharing it; this caused an O(N^2)
growth of tlb invalidations. The algorithm has been changed so that
each page will be checked only 16 times.

Prior to this change, a fork/sleepbomb of 1300 processes could cause
the vm_daemon to take over 60 seconds to complete, effectively
freezing the system for that time period. With this change
in place, the vm_daemon completes in less than a second. Any system
with hundreds of processes sharing pages should benefit from this change.

Note that the vm_daemon is only run when the system is under extreme
memory pressure. It is likely that many people with loaded systems saw
no symptoms of this problem until they reached the point where swapping
began.

Special thanks go to dillon, peter, and Chuck Cranor, who helped me
get up to speed with vm internals.

PR: 33542, 20393
Reviewed by: dillon
MFC after: 1 week


91360 27-Feb-2002 jake

Parameterize the number of pages to allocate for the per-cpu area on
PCPU_PAGES.


91359 27-Feb-2002 jake

Make cpu_identify take the value of the ver register and cpuid as arguments
so we can print nice things about non-current cpus.


91339 27-Feb-2002 jake

Minor cleanup.


91337 27-Feb-2002 jake

Use pcpu.pc_cpumask instead of computing 1 << cpuid.


91336 27-Feb-2002 jake

Add a macro for shift of an integer (1 << shift == sizeof). Move the pointer
define to live alongside it. For kicks assert at compile time that they are
correct. Use these instead of magic numbers.


91335 27-Feb-2002 jake

Wrap long lines.


91316 26-Feb-2002 jake

Apparently gcc3.1 is now using deprcated v8 instructions in v9 code
due to them being faster in certain cases. Therefore we need to save
and restore the v8 %y register around traps in kernel mode as well as
traps in usermode.

Tested by: obrien, tmm


91288 26-Feb-2002 jake

Convert pmap.pm_context to an array of contexts indexed by cpuid. This
doesn't make sense for SMP right now, but it is a means to an end.


91286 26-Feb-2002 jake

Pu back a call to pmap_context_destroy which was accidentily removed
in the previous commit.

Spotted by: tmm


91274 26-Feb-2002 jake

Allow the user tsb to span multiple pages. Make the default 2 pages for now
until we do some testing to see what's best. This gives a massive reduction
in system time for processes with a relatively large working set. The size
of the tsb directly affects the rss size that a user process can keep mapped.
When it starts to get full replacements occur and the process takes a lot of
soft vm faults. Increasing the default from 1 page to 2 gives the following
before and after numbers for compiling vfs_bio.c:

before:
14.27 real 6.56 user 5.69 sys
after:
8.57 real 6.11 user 1.62 sys

This should make self hosted builds more tolerable.


91257 25-Feb-2002 jake

Remove code to lock the user tsb into the tlb. We can handle faults on it
now, as we do for normal wired kernel memory.


91246 25-Feb-2002 jake

Implement a nested window state. This avoids attempting to spill a user
window to the user stack while in a nested kernel trap. We do this for
entry to the kernel from user mode, but if we get an interrupt in kernel
mode while there are still user windows in the cpu, and we attempt to spill
to the user stack, we may take too many nested traps and overflow the trap
stack, causing a red state exception. This is needed by upcoming changes
to allow the user tsb to not be locked in the tlb.

Reviewed by: tmm


91224 25-Feb-2002 jake

Modify the tte format to not include the tlb context number and to store the
virtual page number in a much more convenient way; all in one piece. This
greatly simplifies the comparison for a matching tte, and allows the fault
handlers to be much simpler due to not having to load wierd masks.
Rewrite the tlb fault handlers to account for the new format. These are also
written to allow faults on the user tsb inside of the fault handlers; the
kernel fault handler must be aware of this and not clobber the other's
registers. The faults do not yet occur due to other support that is needed
(and still under my desk).

Bug fixes from: tmm


91177 23-Feb-2002 jake

Make use of the ranged tlb demap operations where ever possible. Use
pmap_qenter and pmap_qremove in preference to pmap_kenter/pmap_kremove.
The former maps in multiple pages at a time, and so can do a ranged
flush. Don't assume that pmap_kenter and pmap_kremove will flush the tlb,
even though they still do. It will not once the MI code is updated to use
pmap_qenter and pmap_qremove.


91176 23-Feb-2002 jake

Add needed include of ucontext.h.


91169 23-Feb-2002 jake

Use PCB_REG instead of loading the pcb from curthread. This fixes a bug
where %g6 could be inconsistent for 1 instruction.


91168 23-Feb-2002 jake

Adapt the tsb_foreach interface to take a source and a destination pmap so
that it can be used for pmap_copy. Other consumers ignore the second pmap.
Add statistics gathering for tsb_foreach.
Implement pmap_copy.


91167 23-Feb-2002 jake

Add statistic gathering for various tsb operations.

Submitted by: tmm


91166 23-Feb-2002 jake

Remove debug code.


91165 23-Feb-2002 jake

Add statistic gathering for various pmap operations.

Submitted by: tmm


91164 23-Feb-2002 jake

Remove CADDR1 and CADDR2 which are no longer used. On other architectures
these are used for copy and zeroing physical pages; we use physical addresses
directly.


91158 23-Feb-2002 jake

1. Setup the user stack pointer before returning to a user trap handler.
If we don't do this here there's a 1 instruction race where an interrupt
could come in and crash the user process due to having no stack.
2. Pass %fsr to the user trap handler in %l4. Since %fsr can only be loaded
from or stored to memory, we need to do some contortions and temporarily
save it to the alternate global stack.
3. Reload the pcb and pcpu registers for traps in kernel mode, for sanity.

Submitted by: tmm (1, 2)


91156 23-Feb-2002 jake

Add needed include of ucontext.h. Fix braino setting curpcb.


91090 22-Feb-2002 julian

Add some DIAGNOSTIC code.
While in userland, keep the thread's ucred reference in a shadow
field so that the usual place to store it is NULL.
If DIAGNOSTIC is not set, the thread ucred is kept valid until the next
kernel entry, at which time it is checked against the process cred
and possibly corrected. Produces a BIG speedup in
kernels with INVARIANTS set. (A previous commit corrected it
for the non INVARIANTS case already)

Reviewed by: dillon@freebsd.org


91066 22-Feb-2002 phk

Convert p->p_runtime and PCPU(switchtime) to bintime format.


90894 19-Feb-2002 julian

Catch up with i386 change I forgot to commit.


90625 13-Feb-2002 tmm

Calculate physmem before calling init_param2().

Submitted by: jake


90624 13-Feb-2002 tmm

Avoid crashing in early boot when WITNESS is enabled by moving the
mtx_init() for intr_table_lock after the globaldata pointer
initialization.


90622 13-Feb-2002 tmm

Add the counter-timer node to the exclusion list, as it is handled
specially. While being there, sort that list.


90621 13-Feb-2002 tmm

Use stxa_sync() when accessing the LSU control register to avoid undefined
behaviour.


90620 13-Feb-2002 tmm

Use stxa_sync() when accessing the diagnostic registers to invalidate
caches; this is needed to avoid undefined behaviour.
Clean up a bit.


90619 13-Feb-2002 tmm

Add support for the counter-timer which is included in the Sun U2S and
U2P bridges as a time counter.


90616 13-Feb-2002 tmm

Merge r1.42 of iommu.c and r1.9 of iommuvar.h from NetBSD (this adds
support for managing both streaming caches on psycho pairs).
Use explicit bus space accesses instead of mapping the device memory into
kva.
Move DVMA allocation to the map creation/dma memory allocation functions.


90615 13-Feb-2002 tmm

Clean up bus space debugging support; change sparc64_bus_mem_map() to
take a bus tag and handle as argument instead of a i/o space id and a
physical address, now that nexus handles device memory resource
allocations.


90614 13-Feb-2002 tmm

Define constants for the CPU implementation id; export the dectected id
as cpu_impl.


90361 07-Feb-2002 julian

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


90065 01-Feb-2002 bde

Compile osigreturn() unconditionally since it will always be needed on
some arches and the syscall table is machine-independent. It was
(bogusly) conditional on COMPAT_43, so this usually makes no difference.

ia64: in addition:
- replace the bogus cloned comment before osigreturn() by a correct one.
osigreturn() is just a stub fo ia64's.
- fix the formatting of cloned comment before sigreturn().
- fix the return code. use nosys() instead of returning ENOSYS to get
the same semantics as if the syscall is not in the syscall table.
Generating SIGSYS is actually correct here.
- fix style bugs.

powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.

sparc64: copy the cleaned up the ia64 stub, since there was no stub before.


89052 08-Jan-2002 jake

Catch up to the latest and greatest.


89051 08-Jan-2002 jake

Add initial smp support. This gets as far as allowing the secondary
cpu(s) into the kernel, and sync-ing them up to "kernel" mode so we can
send them ipis, which also work.

Thanks to John Baldwin for providing me with access to the hardware
that made this possible.

Parts obtained from: bsd/os


89050 08-Jan-2002 jake

Setup the normal global pcb register as well on entry from user land.
Call critical_enter/critical_exit around (fast) interrupt handlers. All
non-threaded interrupts are fast, and the threaded interrupt scheduler is
itself a fast interrupt.
Assert that an interrupt handler we are about to call is non-zero.
Be paranoid about restoring the users global registers. Do it as the
last thing before switching to alternate globals (when we magically get
our preloaded registers back), and do it with interrupts disabled. Any
kind of kernel trap when the globals are not setup properly is bad news.
Don't save and restore the kernel g6, it invariably points to the current
pcb now.


89049 08-Jan-2002 jake

Adapt the vectored interrupt handler for receiving ipis. If the second
data word in an interrupt packet is non-zero, it points to code to execute
to handle the ipi, so jump to it instead of enqueueing the packet. It
is unclear if we will need queued ipis.
Interrupt g7 now points to pcpu, instead of to the per-cpu interrupt queue
itself, so use that instead. Interrupt g6 is no longer reserved.


89048 08-Jan-2002 jake

Use the per-cpu panic stack in the case of a fault with a bad kernel
stack.


89047 08-Jan-2002 jake

Remove ATOMIC_INC_INT macro which has moved elsewhere.


89046 08-Jan-2002 jake

Catch up to change in compile time assertion interface.


89045 08-Jan-2002 jake

Make this code more robust in the event of stray interrupts. Handle
stray level interrupts as well.


89044 08-Jan-2002 jake

Use cpufunc macros instead of using inline asm directly.


89043 08-Jan-2002 jake

Set the normal global pcb register when context switching.


89038 08-Jan-2002 jake

Update comments about _start, the kernel entry point, to reflect new
parameters needed for smp support.
If we are not the boot processor, jump to the smp startup code instead.
Implement a per-cpu panic stack, which is used for bootstrapping both
primary and secondary processors and during faults on the kernel stack.
Arrange the per-cpu page like the pcb, with the struct pcpu at the end
of the page and the panic stack before it.
Use the boot processor's panic stack for calling sparc64_init.
Split the code to set preloaded global registers and to map the kernel
tsb out into functions, which non-boot processors can call.
Allocate the kstack for thread0 dynamically in pmap_bootstrap, and give
it a guard page too.


89036 08-Jan-2002 jake

Fix qsort callouts used for sorting memory regions during boot. vm_offset_t
is unsigned, so we can't use signed arithmetic.

Tripped over by: jhb


88903 05-Jan-2002 peter

Convert a bunch of 1 << PCPU_GET(cpuid) to PCPU_GET(cpumask).


88826 02-Jan-2002 tmm

1. Implement an optimization for pmap_remove() and pmap_protect(): if a
substantial fraction of the number of entries of tte's in the tsb
would need to be looked up, traverse the tsb instead. This is crucial
in some places, e.g. when swapping out a process, where a certain
pmap_remove() call would take very long time to complete without this.
2. Implement pmap_qenter_flags(), which will become used later
3. Reactivate the instruction cache flush done when mapping as executable.
This is required e.g. when executing files via NFS, but is known to
cause problems on UltraSPARC-IIe CPU's. If you have such a CPU, you
will need to comment this call out for now.

Submitted by: jake (3)


88823 02-Jan-2002 tmm

Correct the defintion of struct ofw_upa_regs, and use it instead of
struct ofw_nexus_reg. Implement UPA device memory management in the
nexus driver.
Adapt the psycho driver to these changes, and do some minor cleanup work
while being there.


88822 02-Jan-2002 tmm

Close a window of time during early boot in which an interrupt would
cause a panic.

Reported and tested (in another version) by: Jamey Wood <Jamey.Wood@Sun.COM>


88791 01-Jan-2002 jake

Correctly identify the cpu in certain ultra 1s.

Noticed by: Jamey Wood <Jamey.Wood@Sun.COM>
Submitted by: tmm


88788 01-Jan-2002 jake

Add constants needed by user trap code.


88786 01-Jan-2002 jake

Enable virtual caching for kernel pages. When we enabled virtual caching
for certain user pages, stores to kernel pages would not update the
affected cache lines, which would sometimes cause the wrong data to be
returned for loads from kernel pages. This was especially fatal when
the addresses affected held the kernel stack pointer, and a random
value was loaded into it.
Fix a harmless off by one error in a dcache_inval_phys call.


88785 01-Jan-2002 jake

Add some more info to traces.
Fix a potential race in setting up the per-cpu pointer if the special
restore fails on return to user mode fails and we need to trap back
into the kernel to fault in more stack.
Remove debug code.


88784 01-Jan-2002 jake

Ensure that the syscall trap vector is properly aligned.


88782 01-Jan-2002 jake

Implement user trap delivery as specified by the sparc abi. This provides
an efficient way for the kernel to bounce certain mundane traps back to
userland for handling there. A user trap handler returns directly to the
trapping user code, rather than going through the kernel again. Only a
handful of instructions are actually executed in kernel mode.
Implement sysarch(SPARC_UTRAP_INSTALL).
Add code to handle sharing of the user trap table across forks and unsharing
at exec.

This can be used to implement efficient tracking of floating point register
usage in userland, fe by a thread library, and to handle alignment fault
fixups and instruction emulation in userland, for which the code may need
to be different for 32bit and 64bit binaries.


88781 01-Jan-2002 jake

Add a panic stack, which is used as a known good stack when there is
something wrong with the kernel stack.
Add code to check the kernel stack pointer in various important places
and try hard not to go down in flames if its wrong.


88780 01-Jan-2002 jake

Add a soft trap for restoring the fpu registers from the pcb.


88779 01-Jan-2002 jake

Fix long lines in the trap table due to the abi specificied trap types
having overly long names.


88667 29-Dec-2001 jake

Make these compile.


88666 29-Dec-2001 jake

Remove local change that crept in.


88662 29-Dec-2001 jake

Print the correct v9 opcodes.

Submitted by: tmm


88657 29-Dec-2001 jake

Update to new constants.


88656 29-Dec-2001 jake

Make cont in ddb work.


88649 29-Dec-2001 jake

Remove support for multi level tsbs, making this code much simpler and
much less magic, fragile, broken. Use ttes rather than sttes.
We still use the replacement scheme used by the original code, which
is pretty cool.

Many crucial bug fixes from: tmm


88648 29-Dec-2001 jake

Implement pv entries as separate structures from the ttes, as other
architectures do.

Many crucial bug fixes from: tmm


88647 29-Dec-2001 jake

Great pmap rewrite to use a much simpler one level tsb of ttes, instead
of sttes. Also removes many differences between this and the other pmaps.
Reserve the kva space used by the openfirmware translations.
Use physical addresses directly in pmap_zero_page and pmap_copy_page, now
that we have the cache line shooting support.
Add code to track the virtual cachability of mapped pages. The dmmu
requires that multiple mappings of the same phsyical address have the
save virtual address bits up to a colour boundary. Violating this
requires all mappings to be mapped uncacheable. We do not yet handle
the case of a badly aliased mapping becoming cachable again.

Many crucial bug fixes from: tmm


88645 29-Dec-2001 jake

Use fprs to track floating register usage. Clear it once we've saved
the registers so we don't uselessly save them over and over again for
each context switch until another floating point instruction is executed.
Use a non-specific tlb slot for the tsb, which needs to have a locked
entry.
Remove overly verbose traces.


88644 29-Dec-2001 jake

Add .register directives for gcc3.
Add macros to atomically increment an integer variable in the data
section and to atomically set a bit in a tte. Note that the latter
does not return the new value.
Rewrite RESUME_SPILLFILL_MAGIC to use more sensical calculations, and
to preserve all alternate globals religiously. Must now be called on
alternate globals.
Defer switching to the kernel stack until inside the syscall, trap,
interrupt wrappers. Splitting the windows is all that's really urgent.
Adapt to new trap types.
Add %xcc where appropriate in order to not use v8 opcodes inadvertantly
(which work fine).
Modify the low level tlb fault handlers to operate on a tsb made up of
ttes, not sttes. This effectively makes the tsb twice as large.
After atomically updating tte bits in memory, also set the bit in the
register that holds the data which will be loaded into the tlb. The
macro returns the old value.
Use the preloaded mmu global which holds the address of the current
user tsb.
Add back a low level protection fault handler instead of just punting
into the vm system. This effectively saves a soft fault per COW fault.
Add a trace to intr_enqueue.
Pass arguments to the trap, interrupt, syscall wrappers in the out
registers instead of some on the stack, some in registers.
Use the preloaded alternate global pcb register.


88643 29-Dec-2001 jake

Add needed include of fsr.h.
Use fprs to track floating point usage.
Remove misguided comment.
Clear fprs in a child process's new trapframe.


88642 29-Dec-2001 jake

1. Adapt to new trap types.
2. Make trap_pfault more like it is on other architectures.
3. Fix a bug in syscall() which caused system calls with more than
six arguments that are called through the wild card syscall to
have their arguments scrambled. This affected mmap due to the
(bogus) wrapper in libc.

Submitted by: tmm (3)


88641 29-Dec-2001 jake

Add .register directives for gcc3.
Add some traces that can be useful but are also very loud.
Use defines for offsets into jmpbuf instead of magic numbers.
Fix a style bug.
Fixup comments.


88640 29-Dec-2001 jake

Assert at compile time that structures which need to be a power of
2 really are.
Move a trace to before flushw in case it goes off the deep end.


88639 29-Dec-2001 jake

Remove a debug printf.
Setup new dedicated global register (alternate, and mmu).
Make setregs readable again.
Adapt to moving of many things into trapframe.


88638 29-Dec-2001 jake

intr_handlers is an array of function pointers, not small structures.
Assert at compile time that structures which need to be a power of 2
in size really are.


88637 29-Dec-2001 jake

Add .register directive for gcc3.
Enforce in hardware the non-use of floating point in the kernel.


88636 29-Dec-2001 jake

1. Make this more name space friendly (* -> emul_*).
2. Don't whine about unaligned fixups by default.
3. Adapt to removal of mmuframe.

Submitted by: tmm (1, 2)


88635 29-Dec-2001 jake

Be paranoid about the sizeof passed to db_get_value, not all trapframe
fields are u_long.
Print useful information about traps in backtraces.
GC some dead code.


88634 29-Dec-2001 jake

Implement dcache_inval_phys, which shoots the cache lines that correspond
to a specific physical address. This is used for page copy and zero
routines which use physical addresses directly.

Submitted by: tmm


88437 23-Dec-2001 jake

Newer versions of gcc have a bug where switch statements with only
a default: label cause a segmentation fault. So just return EINVAL
from sysarch.


88436 23-Dec-2001 jake

- Add a file for machine dependant loader metdata types. Include this in
machdep.c.
- Adapt to critical_* changes.


88435 23-Dec-2001 jake

Define our own version of abs now that we compile with -ffreestanding by
default.


88377 21-Dec-2001 tmm

Use the new rman_reserve_resource_bound() function to get boundaries
for DVMA allocations right, instead of trying to kluge around it.
Use the correct tag to pass the dmamap unload call up to. Some minor
cleanups.


88370 21-Dec-2001 tmm

Fix a bug that was indroduced while moving this code around (use the
correct length for ethernet addresses).


88368 21-Dec-2001 tmm

Add partial support for NFS_ROOT for sparc64 (only supported in in
connection with BOOTP_NFSROOT right now).


88088 18-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


87546 09-Dec-2001 dillon

Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after: 1 week


86530 18-Nov-2001 jake

1. Split fp.h into fp.h and fsr.h so that the latter can be included
in asm files.
2. Temporarily cause subnormal operands in floating point operations
to be treated as zeros so that comlpetion of the operation does not
need to be emulated.
3. Catch fp_exception_other and correctly skip over the unfinished
instruction, but basically ignore them. Emulating the instruction
is not yet supported.
4. Zero td_retval[1] as well in syscall().

Submitted by: tmm (2, 3)


86528 18-Nov-2001 jake

Avoid missing ticks and hardclock stopping.

Submitted by: tmm


86527 18-Nov-2001 jake

Catch up to new constants. (These commit messages should have a song.)


86525 18-Nov-2001 jake

1. Remove kdbframe. Bad idea.
2. Add a TF_DONE macro, which fiddles a trapframe to make the retry on
return from traps act like a done (advance past the trapping
instruction instead of re-executing).
3. Flush the windows before entering the debugger, since it is no
longer done in the breakpoint trap vector.
4. Print a warning if trace <pid> is attempted, it is not yet implemented.
5. Print traps better and decode system calls in traces.

Submitted by: rwatson (4)


86523 18-Nov-2001 jake

1. Convert the tstate saved in the pcb to a pstate and test for PSTATE_PEF
to determine if a process is using floating point. in order to avoid
sign extending a 13 bit immediate.
2. We don't need to context switch cwp anymore, it is better to just
fiddle the save tstate on return from traps. See exception.s 1.10
and 1.12.
3. Completely remove pcb_cwp.
4. Implement vmapbuf, vunmapbuf and vm_fault_quick. Completely remove
TODOs from vm_machdep.c (yay!).

Submitted by: tmm (1, 3, 4)
Obtained from: existing archs (4)


86522 18-Nov-2001 jake

Implement hw.machine and hw.model sysctls.

Submitted by: tmm


86521 18-Nov-2001 jake

1. Remove bootinfo and just pass loader metadata to the kernel.
2. Remove mcontext.mc_sp, it is redundant. Adjust spare space to make
ucontext_t a nice size.
3. Raise pil in the debugger.

Submitted by: tmm (3)


86520 18-Nov-2001 jake

1. Implement ascopyto() and ascopyfrom() for copying to an alternate address
space from kernel space and from an alternate address space to kernel
space.
2. Remove the unused and unprototyped physcopy() and physzero() and replace
with the more versatile ascopy() and aszero(), inspired by the above.
These can be used to copy and zero physical pages of memory without mapping
them into kernel space first.
3. Use magic numbers for the offsets in the jmpbuf structure like other
platforms.
4. Use SET.

Submitted by: tmm (1, 4)


86519 18-Nov-2001 jake

1. Fix a bug where the offsets of the alignment and mmu fault recorvery code
in the window trap vectors were mixed up. All this did is cause unnecesary
traps and look wierd in traces. Superfluous traps happen a lot in normal
operation, so we are rather good at recovering from them.
2. Store the arguments for a ktr trace in the right place.
3. Use a generic trap vector for breakpoints. It should not be special.
4. Save the frame pointer in the trap frame for kernel traps if DDB is compiled
in, otherwsie we don't save the out registers for kernel traps and stack
traces can't go through nested traps.
5. Apply the same fix to the return from kernel mode trap code as for user
mode traps. Ensure that the window we're returning to is the same one
that we restore to by fiddling the cwp in the saved tstate. This requires
that we transfer the values loaded from the trap frame into alternate
globals before restore-ing, but doing so is not very expensive and not
worth worrying about. Not changing the saved cwp can result in the register
values magically changing on return from traps if we happen to have slept
and the windows don't work out exactly the same. Fix the trace just before
the retry to account for different register usage.
6. Use a SET macro for loading address constants rather than a variation of
set and setx. set only works for 32 bit constants, while setx works for
64 bit constants as well, but produces bloated code when unnecessary.
Gas always generates the canonical 2 register, 6 instruction form, even
when it could be optimized; set uses 1 register and 2 instructions. At
the moment we assume that the kernel binary is below 4GB so set is
always sufficient, but the macro allows it to be configured. Note that
this has nothing to do with 32 vs. 64 bit address space, it only applies
to addresses of symbols which are known at compile/link time.

Submitted by: tmm (6)


86234 09-Nov-2001 tmm

Add a file forgotten in the previous commit (a kobj interface that
defines methods that need to be implemented by sparc64 host bridge drivers).


86230 09-Nov-2001 tmm

Support for the UltraSpac DVMA MMU (IOMMU), ported from NetBSD.


86229 09-Nov-2001 tmm

Add some OpenFirmware bus support code and definitions.


86228 09-Nov-2001 tmm

Add bus_space and busdma support for sparc64.


86227 09-Nov-2001 tmm

Add a nexus device for sparc64, which uses the OpenFirmware to attach UPA
devices (mostly host bridges) and handles interrupt allocation and setup.


86221 09-Nov-2001 tmm

Add cache handling code for sparc64.


86147 06-Nov-2001 tmm

Add a special OpenFirmware entry point for terminating the kernel (in
this case, the firmware trap table needs to be restored). Make use of
it in cpu_halt() and cpu_reset(), and make cpu_reset() reboot the kernel
that was used previously insead of behaving like cpu_halt().
Add a shutdown_final event handler that turns the power off if requested.


86146 06-Nov-2001 tmm

Add code to emulate unimplemented (non-fp) instructions and to fixup
unaligned accesses, and instr.h, which contrains definitions for the
sparc64 instruction set (partly from NetBSD).
Make use of some definitions from instr.h in db_disasm.c.


86144 06-Nov-2001 tmm

Add optimized implementations of in_cksum_skip() and related functions
for sparc64.


86143 06-Nov-2001 tmm

Fix the intial setup of the stray interrupt handler (it takes a struct
*intr_vec as argument now, not the vector number).


85586 27-Oct-2001 jake

Implement elf_reloc. This makes klds work.

Obtained from: netbsd


85585 27-Oct-2001 jake

Handle instruction access mmu miss faults in kernel mode. These can only
be generated by non-preloaded klds.


85525 26-Oct-2001 jhb

Add a per-thread ucred reference for syscalls and synchronous traps from
userland. The per thread ucred reference is immutable and thus needs no
locks to be read. However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on: x86 (SMP), alpha, sparc64


85420 24-Oct-2001 jlemon

Remove call to cninit_finish().


85297 21-Oct-2001 des

Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85294 21-Oct-2001 des

[partially forced commit due to pilot error in earlier commit attempt]

{set,fill}_{,fp,db}regs() fixup:

- Add dummy {set,fill}_dbregs() on architectures that don't have them.

- KSEfy the powerpc versions (struct proc -> struct thread).

- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.

These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
one yet.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85262 20-Oct-2001 jake

Add missing include.


85258 20-Oct-2001 jake

Add missing includes.


85257 20-Oct-2001 jake

Remove interrupt queue array. Its in globaldata now.


85247 20-Oct-2001 jake

Use KTR_PMAP instead of KTR_CT1.


85246 20-Oct-2001 jake

Catch up to changing entry point names so traces through traps
mostly work right. This catches recursive traps too early, but
generally such traps are fatal and we won't get this far anyway.


85244 20-Oct-2001 jake

Catch up to new assembly language code.


85243 20-Oct-2001 jake

Fix a bug in the kernel entry window handling where the wrong register
was used. This resulted in bogus bad window traps (invalid wstate).

Add a trace to sfsr traps (alignment among other things).

Use KTR_TRAP instead of KTR_CT1.

Use the right registers when storing the values of various
mmu registers into the trap frame. This fixes a bug where sometimes
the context number reported by a fault would be garbage. Sometimes
it would be zero for faults on user address space so the kernel would
wrongly think that it was a fault on kernel address space and fail.

Use the preloaded registers in the vectored interrupt trap instead
of reading pointers from memory. Remove traces due to register
pressure and excess verbosity. We can probably still sneak in one
trace. Remove some debug code.

Go back to using the tsb register during kernel page table lookups.
This is the best way to not have to have the address of the kernel tsb be
a compile time constant. We lie and say we have 1 page tsb when really
its much larger. This way the hardware provides bits 13-22 of the
virtual address (the lower 9 bits of the virtual page number) in the
form of the address of the tte corresponding to the fault address in
the (1 page) kernel tsb. With some clever arithmetic we can then get
bits 22 and up from the tte tag and add them to the tte address in
order to index massive tsbs (basically unlimited).

Add traps for physical address hardware watchpoints.

Don't try to pass the window state from the trap table entry point
all the way down to the common trap code. Its too easy to clobber
and reading it again doesn't cost much.

Fixup some traces.

Fiddle the cwp bits on return from the kernel to user mode so that
the window we are returning to is always the same as the one we
restore to in the trap code. Strictly speaking this is not necessary,
it only affects return from fork and exec, but setting up the windows
right would require hard coding the right cwp values in cpu_fork and
setregs, basically hard coding the number of frames between syscall and
tl0_ret. The result of getting it wrong is usually a spill to an invalid
stack pointer; either 0 or pointing into kernel space. This should also
alleviate the need to context switch the cwp.

Transfer the trap state from locals to alternate globals in the trap
return code so that we can do a restore and rotate the windows before
reloading the trap registers. If the restore fails we'll trap back
into the kernel, so there's no point in loading the trap registers
before hand. Its is crucial that the window trap recovery code not
clobber the alternate globals.


85242 20-Oct-2001 jake

Align the symbol that demarks the end of the signal code on a 16 byte
boundary. It must be on at least an 8 byte boundary so that the length
of the signal code is a multiple of 8 (well aligned). The size is used
in the calculation of the address of the argument and environment vectors
on the user stack; getting it wrong results in the string pointers being
misaligned and causes alignment faults in getenv() among other things.

Allocate a regular stack frame below the signal frame on the user stack
and join up the frame pointer to the previous frame. This fixes longjmp-ing
out of signal handlers. Longjmp traverses the stack upwards in order to
find the right frame to return to, so the frame pointers must join up
seamlessly. I thought this would just work, but obviously the frame
needs to be below the signal frame, not above it like before. Account
for the extra space in the signal code.

Preload pointers to interrupt data structures in interrupt globals.
This avoids the need to load the pointers from memory in the vectored
interrupt trap handler.

Transfer the first 2 out registers into td_retval in setregs. We use
the same registers for system call arguments as return values, so these
registers got clobbered by the system call return values on return from
execve. They now get clobbered by the right values. We must put the values
in both the out registers in the trapframe and in td_retval because init
calls exec but fails to transfer the return value into the out registers.
This fixes a bug where the first exec after init would pass junk to the
c runtime, instead of a pointer to the argument strings. A better solution
would be to return EJUSTRETURN on success from execve.

Adjust for change in pmap_bootstraps prototype.

Map the message buffer after the trap table is setup. We will fault
on it immediately.


85241 20-Oct-2001 jake

Parameterize the size of the kernel virtual address space on KVA_PAGES.
Don't use a hard coded address constant for the virtual address of the
kernel tsb. Allocate kernel virtual address space for the kernel tsb
at runtime.
Remove unused parameter to pmap_bootstrap.
Adapt pmap.c to use KVA_PAGES.
Map the message buffer too.
Add some traces.
Implement pmap_protect.


85240 20-Oct-2001 jake

Remove hardcoded cwp value.


85239 20-Oct-2001 jake

Use KTR_PROC instead of KTR_CT1 in traces.


85238 20-Oct-2001 jake

Return zero on success from su*. Apparently no one checks the return
values.
Add traces to fubyte, subyte, etc. These are useful for catching errors.
due to alignment since its usually not checked for by the caller.


85236 20-Oct-2001 jake

Add support for physical address hardware watchpoints.


85235 20-Oct-2001 jake

Change the stray count in struct intr_vector to a vector number that can
be used to index tables of counters.
Remove intr_dispatch() inline, it is implemented directly in tl*_intr now.
Count stray interrupts in a table of counters like intrcnt.
Disable interrupts briefly when setting up the interrupt vector table.
We must disable interrupts completely, not just raise the pil.
Pass pointers to the intr_vector structures rather than a vector number
to sched_ithd and intr_stray.


84849 12-Oct-2001 tmm

Add inthand_add() and inthand_remove() for use by the MD bus code and
some glue code.


84848 12-Oct-2001 tmm

Fix some warnings.


84847 12-Oct-2001 tmm

Save the floating point context to the right pcb in cpu_fork(), and add
an empty stub for is_physical_memory().


84845 12-Oct-2001 tmm

Implement DELAY() using the %tick register.


84844 12-Oct-2001 tmm

Add pmap_kenter_flags(), which is used by MD bus code that will be
committed soon, add a stub form pmap_kenter_temporary(), and implement
pmap_extract() and pmap_kextract().


84637 07-Oct-2001 des

Dissociate ptrace from procfs.

Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested. Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".


84193 30-Sep-2001 jake

Optimize bcopy and bzero etc to use 64 bit loads and stores if possible.
Handle overlap in bcopy.
Add routines for copying and zeroing pages using physical addresses
directly.
Remove all the hacks to account for calling the firmware on its own
trap table, we use the kernel trap table. There is still a problem
with OF_exit().


84192 30-Sep-2001 jake

Use %ver to identify the cpu instead of openfirmware.

Submitted by: robert


84191 30-Sep-2001 jake

Remove some debug code, add traces.


84190 30-Sep-2001 jake

Return EIO for procfs_*_dbregs.


84186 30-Sep-2001 jake

Split the low level trap code into trap, interrupt and syscall, its
easier and hopefully this code is done changing radically.

Don't use the mmu tlb register to address the kernel page table, nor
the 8k pointer register. The hardware will do some of the page table
lookup by storing the the base address in an internal register and
calculating the address of the tte in the table. However it is limited
to a 1 meg tsb, which only maps 512 megs. The kernel page table only
has one level, so its easy to just do it by hand, which has the advantage
of supporting abitrary amounts of kvm and only costs a few more instructions.

Increase kvm to 1 gig now that its easy to do so and so we don't waste
most of a 4 meg page.

Fix some traces. Fix more proc locking.

Call tsb_stte_promote if we get a soft fault on a mapping in the upper
levels of the tsb. If there is an invalid or unreferenced mapping
in the primary tsb, it will be replaced.

Immediately fail for faults occuring in {f,s}uswintr.


84185 30-Sep-2001 jake

Implement sysarch().


84184 30-Sep-2001 jake

Fix some traces. td->p_comm doesn't exist.


84183 30-Sep-2001 jake

Move the kernel to end of the first 4 gigabytes of address space, so that
one 4 meg page can map both the kernel and the openfirmware mappings.
Add the openfirmware mappings to the kernel tsb so we can call the firmware
on the kernel trap table and access kernel memory normally.
Implement pmap_swapout_proc, pmap_swapin_proc, pmap_swapout_thread,
pmap_swapin_thread, pmap_activate, pmap_page_exists, and pmap_phys_address.


84181 30-Sep-2001 jake

Include <machine/setjmp.h> instead of <setjmp.h>.


84178 30-Sep-2001 jake

Move the pcb the to the top of the kernel stack.
Add a guard page at the bottom of the kernel stack. Its unclear how easy
it will be to detect these faults and do something useful.
Setup the registers on exec how the c runtime expects.
Implement various {fill,set}_*regs.
Fix proc locking.


83756 21-Sep-2001 jake

Add kernbase symbol and use it instead of magic numbers in the
linker script.


83442 14-Sep-2001 peter

Set thread0->td_pcb, this is probably why jake was getting a null deref.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83276 10-Sep-2001 peter

Rip some well duplicated code out of cpu_wait() and cpu_exit() and move
it to the MI area. KSE touched cpu_wait() which had the same change
replicated five ways for each platform. Now it can just do it once.
The only MD parts seemed to be dealing with fpu state cleanup and things
like vm86 cleanup on x86. The rest was identical.

XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional
stub in place.

Reviewed by: jake, tmm, dillon


83088 05-Sep-2001 obrien

style(9) the structure definitions.


82939 04-Sep-2001 peter

Zap #if 0'ed map init code that got moved to the MI area.
Convert the powerpc tree to use the common code.


82913 04-Sep-2001 jake

Make this compile.


82910 03-Sep-2001 jake

Remove some stale definitions and update for new assembler code.


82909 03-Sep-2001 jake

Add ktr traces to copy{in,out} and cpu_switch.

Context switch the cwp value. The register usage in cpu_switch will
be updated shortly to better reflect the fact that the current window
may change.


82908 03-Sep-2001 jake

Add comments following what other architectures have.

Fiddle the register values in the trapframe so children returning from
fork() return 0 (and success).


82907 03-Sep-2001 jake

Change tf_arg to uintptr_t from void * to reflect the fact that
non-pointer values may be passed in it. Add appropriate casts.

The interrupt type is now passed in tf_arg instead tf_type.


82906 03-Sep-2001 jake

Implement a slightly different window spill/fill algorithm for dealing
with user windows in kernel mode. We split the windows using %otherwin,
but instead of spilling user window directly to the pcb, we attempt to
spill to user space. If this fails because a stack page is not resident
(or the stack is smashed), the fault handler at tl 2 will detect the
situation and resume at tl 1 again where recovery code can spill to the
pcb. Any windows that have been saved to the pcb will be copied out to
the user stack on return from kernel mode.

Add a first stab at 32 bit window handling. This uses much of the same
recovery code as above because the alignment of the stack pointer is used
to detect 32 bit code. Attempting to spill a 32 bit window to a 64 bit
stack, or vice versa, will cause an alignment fault. The recovery code
then changes the window state to vector to a 32 bit spill/fill handler
and retries the faulting instruction.

Add ktr traces in useful places during trap processing.

Adjust comments to reflect new code and add many more.


82903 03-Sep-2001 jake

Implement pv_bit_count which is used by pmap_ts_referenced.

Remove the modified tte bit and add a softwrite bit. Mappings are only
writeable if they have been written to, thus in general modify just
duplicates the write bit. The softwrite bit makes it easier to distinguish
mappings which should be writeable but are not yet modified.

Move the exec bit down one, it was being sign extended when used as an
immediate operand.

Use the lock bit to mean tsb page and remove the tsb bit. These are the
only form of locked (tsb) entries we support and we need to conserve bits
where possible.

Implement pmap_copy_page and pmap_is_modified and friends.

Detect mappings that are being being upgraded from read-only to read-write
due to copy-on-write and update the write bit appropriately.

Make trap_mmu_fault do the right thing for protection faults, which is
necessary to implement copy on write correctly. Also handle a bunch
more userland trap types and add ktr traces.


82902 03-Sep-2001 jake

Implement signals.


82633 31-Aug-2001 peter

Converge with i386/alpha/etc pmap.c for pmap_new_proc/pmap_dispose_proc().


82585 30-Aug-2001 dillon

Remove the MPSAFE keyword from the parser for syscalls.master.
Instead introduce the [M] prefix to existing keywords. e.g.
MSTD is the MP SAFE version of STD. This is prepatory for a
massive Giant lock pushdown. The old MPSAFE keyword made
syscalls.master too messy.

Begin comments MP-Safe procedures with the comment:
/*
* MPSAFE
*/
This comments means that the procedure may be called without
Giant held (The procedure itself may still need to obtain
Giant temporarily to do its thing).

sv_prepsyscall() is now MP SAFE and assumed to be MP SAFE
sv_transtrap() is now MP SAFE and assumed to be MP SAFE

ktrsyscall() and ktrsysret() are now MP SAFE (Giant Pushdown)
trapsignal() is now MP SAFE (Giant Pushdown)

Places which used to do the if (mtx_owned(&Giant)) mtx_unlock(&Giant)
test in syscall[2]() in */*/trap.c now do not. Instead they
explicitly unlock Giant if they previously obtained it, and then
assert that it is no longer held to catch broken system calls.

Rebuild syscall tables.


82016 21-Aug-2001 jake

Use register g6 to point to a small stack for svaing alternate globals
during trap handlers.
Implement ptrace_set_pc FWIW.
Initialize the pcb window scratch area in setregs(), and setup user
registers as specified by the SCD.

Submitted by: tmm


82014 21-Aug-2001 jake

Handle the pcb window scratch area in cpu_fork.
Implement cpu_exit.

Submitted by: tmm


82013 21-Aug-2001 jake

Save and restore %fprs and %y, which are unused by kernel code, but
may be used by 32bit userland code.
Implement cpu_throw().

Submitted by: tmm


82012 21-Aug-2001 jake

Disable interrupts when calling openfirmware.


82011 20-Aug-2001 jake

Rename fp_init_pcb to fp_init_proc. Set the FEF bit in fprs register;
according the SCD it should be set if no user trap handler in set.

Submitted by: tmm


82010 20-Aug-2001 jake

Add definitions for new assembler code.


82009 20-Aug-2001 jake

Catch up with new trap entry point names.


82007 20-Aug-2001 jake

Add code for supporting hardware watch points.

Submitted by: tmm


82006 20-Aug-2001 jake

Add a system call trap type and syscall() call request handler.
Also add support for hardware watch point traps.

Submitted by: tmm


82005 20-Aug-2001 jake

Add support for splitting the register windows on entry to the
kernel from usermode. The remaining user windows are spilled
to the pcb as necessary. The user land window fault handlers
fill directly from the pcb on return.
Add system call entry points.

Submitted by: tmm


82000 20-Aug-2001 obrien

Sync globals.h up with the other platforms. There is still some cruft in
here, but now all the platforms have the same cruft. Consistantly spell
the `struct globaldata *' "globalp".

Reviewed by: peter


81899 18-Aug-2001 jake

Don't needlessly duplicate what's basically the same copyright.


81896 18-Aug-2001 jake

Implement cpu_wait().


81895 18-Aug-2001 jake

Increase the size of the phys_avail memory map. Implement pmap_dispose_proc.
Turn some more potentially import functions into nops so we can do stuff
until they matter.


81614 14-Aug-2001 jake

Add some definitions that got left out, *blush*.


81493 10-Aug-2001 jhb

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


81392 10-Aug-2001 jake

Correct copyright language.


81391 10-Aug-2001 jake

Add code to program the tick register and to setup its interrupt handler.


81390 10-Aug-2001 jake

Add early code to support interrupts.


81389 10-Aug-2001 jake

Fake up the frame pointers on a process's initial stack so they can be
restored correctly from the trapframe.

Submitted by: tmm


81388 10-Aug-2001 jake

Handle all types of mmu misses from user mode.
Pass a context argument to tlb functions.


81387 10-Aug-2001 jake

Use the macro for getting the trap type from the trapframe.
Only set sticks (and acquire sched_lock) on entry from user mode.
Add handlers for all kinds of mmu misses, and for interrupts from
user mode.
Acquire Giant before calling into the vm system so this runs with
invariants.
Try to get the restrictions for page faults on user memory from
kernel mode right.
Only set pcb_onfault and return to the alternate return code if
this is actually a fault on user memory from kernel mode.


81386 10-Aug-2001 jake

Store 8 bytes instead of 4 in suword. Use a temporary stack that's known
to be locked in the tlb for calling openfirmware.

Submitted by: tmm


81384 10-Aug-2001 jake

Pass a context to tlb_store_slot, use a member(Sync) after setting the
secondary context register.


81383 10-Aug-2001 jake

1. Start the clock running early for testing.
2. Use the upcoming "tick" interface.
3. Save a call frame as well as a trap frame on proc0's initial stack.
4. Setup a pointer to the per-cpu interrupt queue.
5. Install the per-cpu pointer in interrupt and alternate globals as well.
6. Flush out setregs so exec works.

Submitted by: tmm (3, 5, 6)


81382 10-Aug-2001 jake

Set the pil to something sane on startup.


81381 10-Aug-2001 jake

Add definitions needed by new assembler code.


81380 10-Aug-2001 jake

1. Add code to handle traps and interrupts from user mode.
2. Add spill and fill handlers for spills to the user stack on entry
to the kernel.
3. Add code to handle instruction mmu misses from user mode.
4. Add code to handle level interrupts from kernel mode and vectored
interrupt traps from either.
5. Save the pil in the trapframe on entry from kernel mode and restore
it on return.

Submitted by: tmm (1, 2)


81379 10-Aug-2001 jake

Add code to handle stack traces that go all the way back to userland.
Use a better algorithm for finding out if an address is in the kernel.

Submitted by: tmm


81337 09-Aug-2001 obrien

The author isn't a [UC] Regents. Correct the copyright language.


81336 09-Aug-2001 obrien

Fix VCS ID spamage.


81265 08-Aug-2001 peter

Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.


81186 06-Aug-2001 jake

Handle dmmu protection faults as well as misses. Enable tracking of
the modify and reference tte bits. Implementing allocating of tsb pages.
Make tsb_stte_lookup do the right thing with the kernel pmap.


81185 06-Aug-2001 jake

Add page fault and high level tsb miss handlers.


81184 06-Aug-2001 jake

Handle switching switching mmu contexts and mapping the new primary tsb.
Rework some register usage and code placement. Comment.


81183 06-Aug-2001 jake

Save the primary mmu context around calls to the prom, and install
nucleus context. The prom runs at trap level 0, so there's no
implicit nucleus context and we have to force it.


81182 06-Aug-2001 jake

Remove some debug code.


81181 06-Aug-2001 jake

Handle managed and unmanaged mapping better. Allocate an vm object for
the tsb pages.


81180 06-Aug-2001 jake

Add trap handlers for dmmu faults from user mode, and for faults from
accessing user address space in kernel mode.


81147 05-Aug-2001 tmm

Sigh. Add two files needed for the sparc64 fp contect switching code
that were forgotten in the last commit.

Pointy hat to: tmm


81135 04-Aug-2001 tmm

Add floating point context switching code for sparc64.

Reviewed by: jake


81087 03-Aug-2001 jake

Move some code related to managing pv entries from the pmap module to
the pv module. It works now that vtophys for sttes works.


81085 03-Aug-2001 jake

Define proc0paddr. Call init_param() as early as possible.


80709 31-Jul-2001 jake

Flesh out the sparc64 port considerably. This contains:
- mostly complete kernel pmap support, and tested but currently turned
off userland pmap support
- low level assembly language trap, context switching and support code
- fully implemented atomic.h and supporting cpufunc.h
- some support for kernel debugging with ddb
- various header tweaks and filling out of machine dependent structures


80708 31-Jul-2001 jake

Add skeleton machine dependent headers and c files for a port of freebsd
to a new architecture. This is the base of the sparc64 port, but contains
limited machine dependent code, and can be used a base for ports. Included
are:
- standard machine dependent headers, tweaked for a 64 bit, big endian
architecture, including empty versions of all the machine dependent
structures
- a machine independent atomic.h, which can be used until a port has
support for interrupts and the operations really need to be atomic
- stub versions of all the machine dependent functions, which panic
when called and print out the name of the function that needs to
be implemented. functions which are normally in assembly files are
not included, but this should reduce the number of different undefined
references on the first few compiles from hundreds to 5 or 6
Given minimal startup code and console support it should be trivial to
make this compile and run the first few sysinits on almost any architecture.

Requested by: alfred, imp, jhb