History log of /freebsd-10-stable/sys/ia64/include/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
310508 24-Dec-2016 avg

define Maxmem for ia64, the only platform that didn't have it

This is a direct commit to stable/10 as the platform was removed
in the newer branches.

Maxmem is required for compiling fwohci(4) on ia64 since commit
r310081, MFC of r277511.
It was easier to add Maxmem than to make a special case for ia64
in fwohci.

Reported by: jhb, gjb
Discussed with: kib, jhb

292348 16-Dec-2015 ken

MFC r291716, r291724, r291741, r291742

In addition to those revisions, add this change to a file that is not in
head:

sys/ia64/include/bus.h:
Guard kernel-only parts of the ia64 machine/bus.h header with
#ifdef _KERNEL.

This allows userland programs to include <machine/bus.h> to get the
definition of bus_addr_t and bus_size_t.

------------------------------------------------------------------------
r291716 | ken | 2015-12-03 15:54:55 -0500 (Thu, 03 Dec 2015) | 257 lines

Add asynchronous command support to the pass(4) driver, and the new
camdd(8) utility.

CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and
completed CCBs may be retrieved via the CAMIOGET ioctl. User
processes can use poll(2) or kevent(2) to get notification when
I/O has completed.

While the existing CAMIOCOMMAND blocking ioctl interface only
supports user virtual data pointers in a CCB (generally only
one per CCB), the new CAMIOQUEUE ioctl supports user virtual and
physical address pointers, as well as user virtual and physical
scatter/gather lists. This allows user applications to have more
flexibility in their data handling operations.

Kernel memory for data transferred via the queued interface is
allocated from the zone allocator in MAXPHYS sized chunks, and user
data is copied in and out. This is likely faster than the
vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in
configurations with many processors (there are more TLB shootdowns
caused by the mapping/unmapping operation) but may not be as fast
as running with unmapped I/O.

The new memory handling model for user requests also allows
applications to send CCBs with request sizes that are larger than
MAXPHYS. The pass(4) driver now limits queued requests to the I/O
size listed by the SIM driver in the maxio field in the Path
Inquiry (XPT_PATH_INQ) CCB.

There are some things things would be good to add:

1. Come up with a way to do unmapped I/O on multiple buffers.
Currently the unmapped I/O interface operates on a struct bio,
which includes only one address and length. It would be nice
to be able to send an unmapped scatter/gather list down to
busdma. This would allow eliminating the copy we currently do
for data.

2. Add an ioctl to list currently outstanding CCBs in the various
queues.

3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do
that.

4. Test physical address support. Virtual pointers and scatter
gather lists have been tested, but I have not yet tested
physical addresses or scatter/gather lists.

5. Investigate multiple queue support. At the moment there is one
queue of commands per pass(4) device. If multiple processes
open the device, they will submit I/O into the same queue and
get events for the same completions. This is probably the right
model for most applications, but it is something that could be
changed later on.

Also, add a new utility, camdd(8) that uses the asynchronous pass(4)
driver interface.

This utility is intended to be a basic data transfer/copy utility,
a simple benchmark utility, and an example of how to use the
asynchronous pass(4) interface.

It can copy data to and from pass(4) devices using any target queue
depth, starting offset and blocksize for the input and ouptut devices.
It currently only supports SCSI devices, but could be easily extended
to support ATA devices.

It can also copy data to and from regular files, block devices, tape
devices, pipes, stdin, and stdout. It does not support queueing
multiple commands to any of those targets, since it uses the standard
read(2)/write(2)/writev(2)/readv(2) system calls.

The I/O is done by two threads, one for the reader and one for the
writer. The reader thread sends completed read requests to the
writer thread in strictly sequential order, even if they complete
out of order. That could be modified later on for random I/O patterns
or slightly out of order I/O.

camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from
the pass(4) driver and also to send request notifications internally.

For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR)
per CAM CCB on the reading side, and a scatter/gather list
(CAM_DATA_SG) on the writing side. In addition to testing both
interfaces, this makes any potential reblocking of I/O easier. No
data is copied between the reader and the writer, but rather the
reader's buffers are split into multiple I/O requests or combined
into a single I/O request depending on the input and output blocksize.

For the file I/O path, camdd(8) also uses a single buffer (read(2),
write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list
(readv(2), writev(2), preadv(2), pwritev(2)) on writes.

Things that would be nice to do for camdd(8) eventually:

1. Add support for I/O pattern generation. Patterns like all
zeros, all ones, LBA-based patterns, random patterns, etc. Right
Now you can always use /dev/zero, /dev/random, etc.

2. Add support for a "sink" mode, so we do only reads with no
writes. Right now, you can use /dev/null.

3. Add support for automatic queue depth probing, so that we can
figure out the right queue depth on the input and output side
for maximum throughput. At the moment it defaults to 6.

4. Add support for SATA device passthrough I/O.

5. Add support for random LBAs and/or lengths on the input and
output sides.

6. Track average per-I/O latency and busy time. The busy time
and latency could also feed in to the automatic queue depth
determination.

sys/cam/scsi/scsi_pass.h:
Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue
and fetch asynchronous CAM CCBs respectively.

Although these ioctls do not have a declared argument, they
both take a union ccb pointer. If we declare a size here,
the ioctl code in sys/kern/sys_generic.c will malloc and free
a buffer for either the CCB or the CCB pointer (depending on
how it is declared). Since we have to keep a copy of the
CCB (which is fairly large) anyway, having the ioctl malloc
and free a CCB for each call is wasteful.

sys/cam/scsi/scsi_pass.c:
Add asynchronous CCB support.

Add two new ioctls, CAMIOQUEUE and CAMIOGET.

CAMIOQUEUE adds a CCB to the incoming queue. The CCB is
executed immediately (and moved to the active queue) if it
is an immediate CCB, but otherwise it will be executed
in passstart() when a CCB is available from the transport layer.

When CCBs are completed (because they are immediate or
passdone() if they are queued), they are put on the done
queue.

If we get the final close on the device before all pending
I/O is complete, all active I/O is moved to the abandoned
queue and we increment the peripheral reference count so
that the peripheral driver instance doesn't go away before
all pending I/O is done.

The new passcreatezone() function is called on the first
call to the CAMIOQUEUE ioctl on a given device to allocate
the UMA zones for I/O requests and S/G list buffers. This
may be good to move off to a taskqueue at some point.
The new passmemsetup() function allocates memory and
scatter/gather lists to hold the user's data, and copies
in any data that needs to be written. For virtual pointers
(CAM_DATA_VADDR), the kernel buffer is malloced from the
new pass(4) driver malloc bucket. For virtual
scatter/gather lists (CAM_DATA_SG), buffers are allocated
from a new per-pass(9) UMA zone in MAXPHYS-sized chunks.
Physical pointers are passed in unchanged. We have support
for up to 16 scatter/gather segments (for the user and
kernel S/G lists) in the default struct pass_io_req, so
requests with longer S/G lists require an extra kernel malloc.

The new passcopysglist() function copies a user scatter/gather
list to a kernel scatter/gather list. The number of elements
in each list may be different, but (obviously) the amount of data
stored has to be identical.

The new passmemdone() function copies data out for the
CAM_DATA_VADDR and CAM_DATA_SG cases.

The new passiocleanup() function restores data pointers in
user CCBs and frees memory.

Add new functions to support kqueue(2)/kevent(2):

passreadfilt() tells kevent whether or not the done
queue is empty.

passkqfilter() adds a knote to our list.

passreadfiltdetach() removes a knote from our list.

Add a new function, passpoll(), for poll(2)/select(2)
to use.

Add devstat(9) support for the queued CCB path.

sys/cam/ata/ata_da.c:
Add support for the BIO_VLIST bio type.

sys/cam/cam_ccb.h:
Add a new enumeration for the xflags field in the CCB header.
(This doesn't change the CCB header, just adds an enumeration to
use.)

sys/cam/cam_xpt.c:
Add a new function, xpt_setup_ccb_flags(), that allows specifying
CCB flags.

sys/cam/cam_xpt.h:
Add a prototype for xpt_setup_ccb_flags().

sys/cam/scsi/scsi_da.c:
Add support for BIO_VLIST.

sys/dev/md/md.c:
Add BIO_VLIST support to md(4).

sys/geom/geom_disk.c:
Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size
limiting code in g_disk_start() a bit.

sys/kern/subr_bus_dma.c:
Change _bus_dmamap_load_vlist() to take a starting offset and
length.

Add a new function, _bus_dmamap_load_pages(), that will load a list
of physical pages starting at an offset.

Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios.
Allow unmapped I/O to start at an offset.

sys/kern/subr_uio.c:
Add two new functions, physcopyin_vlist() and physcopyout_vlist().

sys/pc98/include/bus.h:
Guard kernel-only parts of the pc98 machine/bus.h header with
#ifdef _KERNEL.

This allows userland programs to include <machine/bus.h> to get the
definition of bus_addr_t and bus_size_t.

sys/sys/bio.h:
Add a new bio flag, BIO_VLIST.

sys/sys/uio.h:
Add prototypes for physcopyin_vlist() and physcopyout_vlist().

share/man/man4/pass.4:
Document the CAMIOQUEUE and CAMIOGET ioctls.

usr.sbin/Makefile:
Add camdd.

usr.sbin/camdd/Makefile:
Add a makefile for camdd(8).

usr.sbin/camdd/camdd.8:
Man page for camdd(8).

usr.sbin/camdd/camdd.c:
The new camdd(8) utility.

Sponsored by: Spectra Logic

------------------------------------------------------------------------
r291724 | ken | 2015-12-03 17:07:01 -0500 (Thu, 03 Dec 2015) | 6 lines

Fix typos in the camdd(8) usage() function output caused by an error in
my diff filter script.

Sponsored by: Spectra Logic

------------------------------------------------------------------------
r291741 | ken | 2015-12-03 22:38:35 -0500 (Thu, 03 Dec 2015) | 10 lines

Fix g_disk_vlist_limit() to work properly with deletes.

Add a new bp argument to g_disk_maxsegs(), and add a new function,
g_disk_maxsize() tha will properly determine the maximum I/O size for a
delete or non-delete bio.

Submitted by: will
Sponsored by: Spectra Logic

------------------------------------------------------------------------
------------------------------------------------------------------------
r291742 | ken | 2015-12-03 22:44:12 -0500 (Thu, 03 Dec 2015) | 5 lines

Fix a style issue in g_disk_limit().

Noticed by: bdrewery

------------------------------------------------------------------------

Sponsored by: Spectra Logic

283910 02-Jun-2015 jhb

MFC 281266:
Move the 32-bit compatible procfs types from freebsd32.h to <sys/procfs.h>
and export them to userland.
- Define __HAVE_REG32 on platforms that define a reg32 structure and check
for this in <sys/procfs.h> to control when to export prstatus32, etc.
- Add prstatus32_t and prpsinfo32_t typedefs for the 32-bit structures.
libbfd looks for these types, and having them fixes 'gcore' in gdb of a
32-bit process on a 64-bit platform.
- Use the structure definitions from <sys/procfs.h> in gcore's elf32 core
dump code instead of duplicating the definitions.

274648 18-Nov-2014 kib

Merge the fueword(9) and casueword(9). In particular,

MFC r273783:
Add fueword(9) and casueword(9) functions.
MFC note: ia64 is handled like arm, with NO_FUEWORD define.

MFC r273784:
Replace some calls to fuword() by fueword() with proper error checking.

MFC r273785:
Convert kern_umtx.c to use fueword() and casueword().
MFC note: the sys__umtx_lock and sys__umtx_unlock syscalls are not
converted, they are removed from HEAD, and not used. The do_sem2*()
family is not yet merged to stable/10, corresponding chunk will be
merged after do_sem2* are committed.

MFC r273788 (by jkim):
Actually install casuword(9) to fix build.

MFC r273911:
Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9).

271239 07-Sep-2014 marcel

Fix previous commit: unbreak build of libkvm by including sys/systm.h
only when _KERNEL is defined.

Approved by: re@ (implicit)

271211 06-Sep-2014 marcel

Fix the PCPU access macros. It was found that the PCPU pointer, when
held in register r13, is used outside the bounds of critical_enter()
and critical_exit() by virtue of optimizations performed by the
compiler. The net effect being that address computations of fields
in the PCPU structure could be relative to the PCPU structure of the
CPU on which the address computation was performed and not related
to the CPU that executes the actual load or store operation.
The typical failure mode being that the per-CPU cache of UMA got
corrupted due to accesses from other CPUs.

Adding more volatile decorating to the register expression does not
help. The thinking being that volatile is assumed to work on memory
references and not register references. Thus, the fix is to perform
the address computation using a volatile inline assembly statement.

Additionally, since the reference is fundamentally non-atomic on ia64
by virtue of have a distinct address computation followed by the
actual load or store operation, it is required to wrap the entire
PCPU access in a critical section.

With PCPU_GET and friends requiring curthread now that they're in a
critical section, low-level use of these macros in functions like
cpu_switch() is not possible anymore. Consequently, a second order
set of changes is needed to avoid using PCPU_GET and friends where
curthread is either not set yet, or in the process of being changed.
In those cases, explicit dereferencing of pcpup is needed. In those
cases it is also possible to do that.

This is a direct commit to stable/10.

Approved by: re@ (marius)

270296 21-Aug-2014 emaste

MFC r263815, r263872:

Move ia64 efi.h to sys in preparation for amd64 UEFI support

Prototypes specific to ia64 have been left in this file for now, under
__ia64__, rather than moving them to a new header under sys/ia64.
I anticipate that (some of) the corresponding functions will be shared
by the amd64, arm64, i386, and ia64 architectures, and we can adjust
this as EFI support on other than ia64 continues to develop.

Fix missed efi.h header change in r263815

Sponsored by: The FreeBSD Foundation

268201 03-Jul-2014 marcel

MFC r263380 & r268185: Add KTR events for the PMAP interface functions.

268200 02-Jul-2014 marcel

MFC r263323: Fix and improve exception tracing.

268199 02-Jul-2014 marcel

MFC r263254: Move the implementation of kdb_cpu_trap() from <machine/kdb.h>
to machdep.c.

268184 02-Jul-2014 marcel

MFC r257477: Purge the translation cache of APs before we unleash them.

266204 16-May-2014 ian

MFC r257854 (discussed with alc@)

As of r257209, all architectures have defined
VM_KMEM_SIZE_SCALE. In other words, every architecture is now
auto-sizing the kmem arena. This revision changes kmeminit() so
that the definition of VM_KMEM_SIZE_SCALE becomes mandatory and
the definition of VM_KMEM_SIZE becomes optional.

Replace or eliminate all existing definitions of VM_KMEM_SIZE.
With auto-sizing enabled, VM_KMEM_SIZE effectively became an
alternate spelling for VM_KMEM_SIZE_MIN on most architectures.
Use VM_KMEM_SIZE_MIN for clarity.

264496 15-Apr-2014 tijl

MFC r263998:

Rename __wchar_t so it no longer conflicts with __wchar_t from clang 3.4
-fms-extensions.

262004 16-Feb-2014 marcel

MFC r260175:
Implement atomic_swap_<type>.

261985 16-Feb-2014 marcel

MFC r257487:
Use LOG2_ID_PAGE_SIZE again for the identity mapping in regions 6 & 7.

256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255289 06-Sep-2013 glebius

On those machines, where sf_bufs do not represent any real object, make
sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely
empty functions.

Reviewed by: alc, kib, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


254300 13-Aug-2013 jkim

Tidy up global locks for ACPICA. There is no functional change.


253750 28-Jul-2013 avg

Revert r253748,253749

This WIP should not have been committed yet.

Pointyhat to: avg


253748 28-Jul-2013 avg

put contents of cpu.h under _KERNEL

no userland-serviceable parts inside

MFC after: 20 days


252434 01-Jul-2013 kib

Fix issues with zeroing and fetching the counters, on x86 and ppc64.
Issues were noted by Bruce Evans and are present on all architectures.

On i386, a counter fetch should use atomic read of 64bit value,
otherwise carry from the increment on other CPU could be lost for the
given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not
available on the machine, it cannot be SMP and it is enough to disable
preemption around read to avoid the split read.

On x86 the counter increment is not atomic on purpose, which makes it
possible for the store of the incremented result to override just
zeroed per-cpu slot. The effect would be a counter going off by
arbitrary value after zeroing. Perform the counter zeroing on the
same processor which does the increments, making the operations
mutually exclusive. On i386, same as for the fetching, if the
cmpxchg8b is not available, machine is not SMP and we disable
preemption for zeroing.

PowerPC64 is treated the same as amd64.

For other architectures, the changes made to allow the compilation to
succeed, without fixing the issues with zeroing or fetching. It
should be possible to handle them by using the 64bit loads and stores
atomic WRT preemption (assuming the architectures also converted from
using critical sections to proper asm). If architecture does not
provide the facility, using global (spin) mutex would be non-optimal
but working solution.

Noted by: bde
Sponsored by: The FreeBSD Foundation


252280 27-Jun-2013 jkim

Move definitions required by userland applications out of acpica_machdep.h.


250338 07-May-2013 attilio

Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in
order to match the MAXCPU concept. The change should also be useful
for consolidation and consistency.

Sponsored by: EMC / Isilon storage division
Obtained from: jeff
Reviewed by: alc


249268 08-Apr-2013 glebius

Merge from projects/counters: counter(9).

Introduce counter(9) API, that implements fast and raceless counters,
provided (but not limited to) for gathering of statistical data.

See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html
for more details.

In collaboration with: kib
Reviewed by: luigi
Tested by: ae, ray
Sponsored by: Nginx, Inc.


249265 08-Apr-2013 glebius

Merge from projects/counters:

Pad struct pcpu so that its size is denominator of PAGE_SIZE. This
is done to reduce memory waste in UMA_PCPU_ZONE zones.

Sponsored by: Nginx, Inc.


247251 25-Feb-2013 marcel

kernacc() expects all KVAs to be covered in the kernel map. With the
introduction of the PBVM, this stopped being the case. Redefine the
VM parameters so that the PBVM is included in the kernel map. In
particular this introduces VM_INIT_KERNEL_ADDRESS to point to the base
of region 5 now that VM_MIN_KERNEL_ADDRESS points to the base of
region 4 to include the PBVM.
While here define KERNBASE to the actual link address of the kernel as
is intended.

PR: 169926


246714 12-Feb-2013 marcel

Eliminate padding by moving 'narg' next to 'code'. Both are 32-bit
entities in the syscall_args structure that otherwise has 64-bit
only fields.


242218 28-Oct-2012 kib

Fix compilation on ia64 when page size is configured for 16KB.

Reviewed by: alc, marcel


242121 26-Oct-2012 alc

Port the new PV entry allocator from amd64/i386. This allocator has two
advantages. First, PV entries are roughly half the size. Second, this
allocator doesn't access the paging queues, and thus it allows for the
removal of the page queues lock from this pmap.

Replace all uses of the page queues lock by a R/W lock that is private
to this pmap.

Tested by: marcel


238257 08-Jul-2012 marcel

Move PCPU initialization to a new function called cpu_pcpu_setup().
This makes it easier to add additional CPU or platform information
to the per-CPU structure without duplicated code.


238190 07-Jul-2012 marcel

Implement ia64_physmem_alloc() and use it consistently to get memory
before VM has been initialized. This includes:
1. Replacing pmap_steal_memory(),
2. Replace the handcrafted logic to allocate a naturally aligned VHPT,
3. Properly allocate the DPCPU for the BSP.

Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't
cross into the next PBVM page. If we were to cross into the next
page, then there wouldn't be a PTE entry on the page table for it
and we would end up with a MCA following a page fault. As such,
this commit fixes MCAs occasionally seen.


238184 07-Jul-2012 marcel

Hide the creation of phys_avail behind an API to make it easier to do it
correctly. We now iterate the EFI memory descriptors once and collect all
the information in a single pass. This includes:
1. The I/O port base address,
2. The PAL memory region. Have the physmem API track this.
3. Memory descriptors of memory we can't use, like bad memory, runtime
services code & data, etc. Have the physmem API track these.
4. memory descriptors of memory we can use or re-use, such as free
memory, boot time services code & data, loader code & data, etc.
These are added by the physmem API.

Since the PBVM page table and pages are in memory described as loader
data, inform the physmem API of chunks that need to be delated from the
available physical memory.

While here, remove Maxmem and replace it with the better named paddr_max.
Maxmem was defined as physmem, which is generally wrong. Now, paddr_max
is properly defined as the largesty physical address.

The upshot of all this is that:
1. We properly determine realmem.
2. We maximize physmem by re-using memory where possible.
3. We remove complexity from ia64_init() in machdep.c.
4. Remove confusion about realmem, physmem & Maxmem.

The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c,
as well as replace the handcrafted allocation of the VHPT for the BSP in
pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation
of phys_avail after it is being created.


237517 24-Jun-2012 andrew

Make the wchar_t type machine dependent.

This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the
ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an
unsigned short with the former preferred.

Because of this requirement we need to move the definition of __wchar_t to
a machine dependent header. It also cleans up the macros defining the limits
of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine
dependent header then using them to define WCHAR_MIN and WCHAR_MAX
respectively.

Discussed with: bde


237433 22-Jun-2012 kib

Implement mechanism to export some kernel timekeeping data to
usermode, using shared page. The structures and functions have vdso
prefix, to indicate the intended location of the code in some future.

The versioned per-algorithm data is exported in the format of struct
vdso_timehands, which mostly repeats the content of in-kernel struct
timehands. Usermode reading of the structure can be lockless.
Compatibility export for 32bit processes on 64bit host is also
provided. Kernel also provides usermode with indication about
currently used timecounter, so that libc can fall back to syscall if
configured timecounter is unknown to usermode code.

The shared data updates are initiated both from the tc_windup(), where
a fast task is queued to do the update, and from sysctl handlers which
change timecounter. A manual override switch
kern.timecounter.fast_gettime allows to turn off the mechanism.

Only x86 architectures export the real algorithm data, and there, only
for tsc timecounter. HPET counters page could be exported as well, but
I prefer to not further glue the kernel and libc ABI there until
proper vdso-based solution is developed.

Minimal stubs neccessary for non-x86 architectures to still compile
are provided.

Discussed with: bde
Reviewed by: jhb
Tested by: flo
MFC after: 1 month


237430 22-Jun-2012 kib

Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to
timekeeping information.

MFC after: 1 week


237168 16-Jun-2012 alc

The page flag PGA_WRITEABLE is set and cleared exclusively by the pmap
layer, but it is read directly by the MI VM layer. This change introduces
pmap_page_is_write_mapped() in order to completely encapsulate all direct
access to PGA_WRITEABLE in the pmap layer.

Aesthetics aside, I am making this change because amd64 will likely begin
using an alternative method to track write mappings, and having
pmap_page_is_write_mapped() in place allows me to make such a change
without further modification to the MI VM layer.

As an added bonus, tidy up some nearby comments concerning page flags.

Reviewed by: kib
MFC after: 6 weeks


235941 24-May-2012 bz

MFp4 bz_ipv6_fast:

in_cksum.h required ip.h to be included for struct ip. To be
able to use some general checksum functions like in_addword()
in a non-IPv4 context, limit the (also exported to user space)
IPv4 specific functions to the times, when the ip.h header is
present and IPVERSION is defined (to 4).

We should consider more general checksum (updating) functions
to also allow easier incremental checksum updates in the L3/4
stack and firewalls, as well as ponder further requirements by
certain NIC drivers needing slightly different pseudo values
in offloading cases. Thinking in terms of a better "library".

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


234785 29-Apr-2012 dim

Add a convenience macro for the returns_twice attribute, and apply it to
the prototypes of the appropriate functions (getcontext, savectx,
setjmp, sigsetjmp and vfork).

MFC after: 2 weeks


233125 18-Mar-2012 tijl

Eliminate ia32_reg.h by moving its contents to x86 and ia64 reg.h.

Reviewed by: kib


230475 23-Jan-2012 das

Add C11 macros describing subnormal numbers to float.h.

Reviewed by: bde


230366 20-Jan-2012 das

Add parentheses where required. Without them, `sizeof LDBL_MAX'
is a syntax error and shouldn't be, while `1 FLT_ROUNDS' isn't a
syntax error and should be. Thanks to bde for the examples.


228469 13-Dec-2011 ed

Replace __signed by signed.

The signed keyword is an integral part of the C syntax. There's no need
to use __signed.


226607 21-Oct-2011 das

People porting FreeBSD to new architectures ought not have to
implement a deprecated FPU control interface in addition to the
standard one. To make this clearer, further deprecate ieeefp.h
by not declaring the function prototypes except on architectures
that implement them already.

Currently i386 and amd64 implement the ieeefp.h interface for
compatibility, and for fp[gs]etprec(), which doesn't exist on
most other hardware. Powerpc, sparc64, and ia64 partially implement
it and probably shouldn't, and other architectures don't implement it
at all.


226112 07-Oct-2011 kib

Remove unused define.

MFC after: 1 month


224217 19-Jul-2011 attilio

Bump MAXCPU for amd64, ia64 and XLP mips appropriately.
From now on, default values for FreeBSD will be 64 maxiumum supported
CPUs on amd64 and ia64 and 128 for XLP. All the other architectures
seem already capped appropriately (with the exception of sparc64 which
needs further support on jalapeno flavour).

Bump __FreeBSD_version in order to reflect KBI/KPI brekage introduced
during the infrastructure cleanup for supporting MAXCPU > 32. This
covers cpumask_t retiral too.

The switch is considered completed at the present time, so for whatever
bug you may experience that is reconducible to that area, please report
immediately.

Requested by: marcel, jchandra
Tested by: pluknet, sbruno
Approved by: re (kib)


224207 19-Jul-2011 attilio

Add the possibility to specify from kernel configs MAXCPU value.
This patch is going to help in cases like mips flavours where you
want a more granular support on MAXCPU.

No MFC is previewed for this patch.

Tested by: pluknet
Approved by: re (kib)


224112 16-Jul-2011 marcel

Add a few more helper functions for working with memory descriptors:
o efi_md_find() - returns the md that covers the given address
o efi_md_last() - returns the last md in the list
o efi_md_prev() - returns the md that preceeds the given md.


223873 08-Jul-2011 marcel

Implement basic support for memory attributes. At this time we only
distinguish between UC and WB memory so that we can map the page to
either a region 6 address (for UC) or a region 7 address (for WB).

This change is only now possible, because previously we would map
regions 6 and 7 with 256MB translations and on top of that had the
kernel mapped in region 7 using a wired translation. The introduction
of the PBVM moved the kernel into its own region and freed up region
7 and allowed us to revert to standard page-sized translations.

This commit inroduces pmap_page_to_va() that respects the attribute.


223526 25-Jun-2011 marcel

Switch to the event timers infrastructure. This includes:
o Setting td_intr_frame to the XIVs trap frame because it's referenced
by the ET event handler.
o Signal EOI to the CPU before calling the registered XIV handlers.
This prevents lost ITC interrupts, which cause starvation in one-shot
mode.
o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters.
o Have the APs call cpu_initclocks() so as to limited the scattering of
clock related initialization. cpu_initclocks() calls the <self>_bsp()
or <self>_ap() version accordingly.
o Uncomment the ET clock handling in cpu_idle().
o Update the DDB 'show pcpu' output for the new MD fields.
o Entirely rewritten ia64_ih_clock(). Note that we don't create as many
clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale.
We can only have 240 XIVs and we can have more CPUs than that. There's
a single intrcnt index for the cumulative clock ticks and we keep per
CPU counts in the PCPU stats structure.
o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order).

Open issues:
o Clock interrupts can still be lost. Some tweaking is still necessary.

Thanks to: mav@ for his support, feedback and explanations.

ET stats while committing:
eris% sysctl machdep.cpu | grep nclks

machdep.cpu.0.nclks: 24007
machdep.cpu.1.nclks: 22895
machdep.cpu.2.nclks: 13523
machdep.cpu.3.nclks: 9342
machdep.cpu.4.nclks: 9103
machdep.cpu.5.nclks: 9298
machdep.cpu.6.nclks: 10039
machdep.cpu.7.nclks: 9479
eris% vmstat -i | grep clock
clock 108599 50


223170 17-Jun-2011 marcel

Properly serialize the global shootdown with the instruction
stream of the local processor. Also explicitly invalidate
the ALAT. This is done on the other CPUs in the coherence
domain by virtue of the ptc.ga instruction, but does not
apply to the local CPU.


222813 07-Jun-2011 attilio

etire the cpumask_t type and replace it with cpuset_t usage.

This is intended to fix the bug where cpu mask objects are
capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever
value. Anyway, as long as several structures in the kernel are
statically allocated and sized as MAXCPU, it is suggested to keep it
as low as possible for the time being.

Technical notes on this commit itself:
- More functions to handle with cpuset_t objects are introduced.
The most notable are cpusetobj_ffs() (which calculates a ffs(3)
for a cpuset_t object), cpusetobj_strprint() (which prepares a string
representing a cpuset_t object) and cpusetobj_strscan() (which
creates a valid cpuset_t starting from a string representation).
- pc_cpumask and pc_other_cpus are target to be removed soon.
With the moving from cpumask_t to cpuset_t they are now inefficient
and not really useful. Anyway, for the time being, please note that
access to pcpu datas is protected by sched_pin() in order to avoid
migrating the CPU while reading more than one (possible) word
- Please note that size of cpuset_t objects may differ between kernel
and userland. While this is not directly related to the patch itself,
it is good to understand that concept and possibly use the patch
as a reference on how to deal with cpuset_t objects in userland, when
accessing kernland members.
- KTR_CPUMASK is changed and now is represented through a string, to be
set as the example reported in NOTES.

Please additively note that no MAXCPU is bumped in this patch, but
private testing has been done until to MAXCPU=128 on a real 8x8x2(htt)
machine (amd64).

Please note that the FreeBSD version is not yet bumped because of
the upcoming pcpu changes. However, note that this patch is not
targeted for MFC.

People to thank for the time spent on this patch:
- sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested
several revision of the patches and really helped in improving
stability of this work.
- marius fixed several bugs in the sparc64 implementation and reviewed
patches related to ktr.
- jeff and jhb discussed the basic approach followed.
- kib and marcel made targeted review on some specific part of the
patch.
- marius, art, nwhitehorn and andreast reviewed MD specific part of
the patch.
- marius, andreast, gonzo, nwhitehorn and jceel tested MD specific
implementations of the patch.
- Other people have made contributions on other patches that have been
already committed and have been listed separately.

Companies that should be mentioned for having participated at several
degrees:
- Yahoo! for having offered the machines used for testing on big
count of CPUs.
- The FreeBSD Foundation for having sponsored my devsummit attendance,
which has been instrumental.
- Sandvine for having offered offices and infrastructure during
development.

(I really hope I didn't forget anyone, if it happened I apologize in
advance).


221890 14-May-2011 marcel

Be pedantic: mark the pcpu pointer (= register r13) itself as volatile.


221889 14-May-2011 marcel

Turn ia64_srlz() and ia64_srlz_i() into defines so that the code is
still correct when inlining is disabled.


221855 13-May-2011 mdf

Move the ZERO_REGION_SIZE to a machine-dependent file, as on many
architectures (i386, for example) the virtual memory space may be
constrained enough that 2MB is a large chunk. Use 64K for arches
other than amd64 and ia64, with special handling for sparc64 due to
differing hardware.

Also commit the comment changes to kmem_init_zero_region() that I
missed due to not saving the file. (Darn the unfamiliar development
environment).

Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you
see fit.

Requested by: alc
MFC after: 1 week
MFC with: r221853


221334 02-May-2011 marcel

Don't use the whole region 5 for KVA, because the CPU may not implement all
of the 61 bits available within the region for virtual addressing. Since
there's no good way for us to map out the gap in the virtual address space,
limit KVA to the architectural minimum implemented address bits. This still
gives us 1 petabyte of KVA, so no worries.


221271 30-Apr-2011 marcel

Stop linking against a direct-mapped virtual address and instead
use the PBVM. This eliminates the implied hardcoding of the
physical address at which the kernel needs to be loaded. Using the
PBVM makes it possible to load the kernel irrespective of the
physical memory organization and allows us to replicate kernel text
on NUMA machines.

While here, reduce the direct-mapped page size to the kernel's
page size so that we can support memory attributes better.


221035 25-Apr-2011 marcel

Remove prototypes of non-existent functions.


220313 04-Apr-2011 marcel

Use the new arch_loadaddr I/F to align ELF objects to PBVM page
boundaries. For good measure, align all other objects to cache
lines boundaries.

Use the new arch_loadseg I/F to keep track of kernel text and
data so that we can wire as much of it as is possible. It is
the responsibility of the kernel to link critical (read IVT
related) code and data at the front of the respective segment
so that it's covered by TRs before the kernel has a chance to
add more translations.

Use a better way of determining whether we're loading a legacy
kernel or not. We can't check for the presence of the PBVM page
table, because we may have unloaded that kernel and loaded an
older (legacy) kernel after that. Simply use the latest load
address for it.


220042 26-Mar-2011 alc

Eliminate an unused definition.

Reviewed by: marcel


219841 21-Mar-2011 marcel

Fix switching to physical mode as part of calling into EFI runtime
services or PAL procedures. The new implementation is based on
specific functions that are known to be called in certain scenarios
only. This in particular fixes the PAL call to obtain information
about translation registers. In general, the new implementation does
not bank on virtual addresses being direct-mapped and will work when
the kernel uses PBVM.

When new scenarios need to be supported, new functions are added if
the existing functions cannot be changed to handle the new scenario.
If a single generic implementation is possible, it will become clear
in due time.

While here, change bootinfo to a pointer type in anticipation of
future development.


219808 21-Mar-2011 marcel

Change region 4 to be part of the kernel. This serves 2 purposes:
1. The PBVM is in region 4, so if we want to make use of it, we
need region 4 freed up.
2. Region 4 and above cannot be represented by an off_t by virtue
of that type being signed. This is problematic for truss(1),
ktrace(1) and other such programs.


219741 18-Mar-2011 marcel

Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about
the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though
it's not used anywhere in the source tree.


219691 16-Mar-2011 marcel

MFaltix:
Add support for Pre-Boot Virtual Memory (PBVM) to the loader.

PBVM allows us to link the kernel at a fixed virtual address without
having to make any assumptions about the physical memory layout. On
the SGI Altix 350 for example, there's no usuable physical memory
below 192GB. Also, the PBVM allows us to control better where we're
going to physically load the kernel and its modules so that we can
make sure we load the kernel in memory that's close to the BSP.

The PBVM is managed by a simple page table. The minimum size of the
page table is 4KB (EFI page size) and the maximum is currently set
to 1MB. A page in the PBVM is 64KB, as that's the maximum alignment
one can specify in a linker script. The bottom line is that PBVM is
between 64KB and 8GB in size.

The loader maps the PBVM page table at a fixed virtual address and
using a single translations. The PBVM itself is also mapped using a
single translation for a maximum of 32MB.

While here, increase the heap in the EFI loader from 512KB to 2MB
and set the stage for supporting relocatable modules.


218773 17-Feb-2011 alc

Remove pmap fields that are either unused or not fully implemented.

Discussed with: kib


217515 17-Jan-2011 jkim

Add reader/writer lock around mem_range_attr_get() and mem_range_attr_set().
Compile sys/dev/mem/memutil.c for all supported platforms and remove now
unnecessary dev_mem_md_init(). Consistently define mem_range_softc from
mem.c for all platforms. Add missing #include guards for machine/memdev.h
and sys/memrange.h. Clean up some nearby style(9) nits.

MFC after: 1 month


217192 09-Jan-2011 kib

Move repeated MAXSLP definition from machine/vmparam.h to sys/vmmeter.h.
Update the outdated comments describing MAXSLP and the process
selection algorithm for swap out.

Comments wording and reviewed by: alc


217180 09-Jan-2011 das

The highest-precision floating point type on ia64 has 64 bits of
precision, so DECIMAL_DIG should be 21, as on i386/amd64.


217147 08-Jan-2011 tijl

On mixed 32/64 bit architectures (mips, powerpc) use __LP64__ rather than
architecture macros (__mips_n64, __powerpc64__) when 64 bit types (and
corresponding macros) are different from 32 bit. [1]

Correct the type of INT64_MIN, INT64_MAX and UINT64_MAX.

Define (U)INTMAX_C as an alias for (U)INT64_C matching the type definition
for (u)intmax_t. Do this on all architectures for consistency.

Suggested by: bde [1]
Approved by: kib (mentor)


217145 08-Jan-2011 tijl

Fix types of some values in machine/_limits.h.

On some architectures UCHAR_MAX and USHRT_MAX had type unsigned int.
However, lacking integer suffixes for types smaller than int, their type
should correspond to that of an object of type unsigned char (or short)
when used in an expression with objects of type int. In that case unsigned
char (short) are promoted to int (i.e. signed) so the type of UCHAR_MAX and
USHRT_MAX should also be int.

Where MIN/MAX constants implicitly have the correct type the suffix has
been removed.

While here, correct some comments.

Reviewed by: bde
Approved by: kib (mentor)


217097 07-Jan-2011 kib

Add AT_STACKPROT elf aux vector. Will be used to inform rtld about the
initial stack protection set by the kernel image activator.


216143 03-Dec-2010 brucec

Revert r216134. This checkin broke platforms where bus_space are macros:
they need to be a single statement, and do { } while (0) doesn't work in this
situation so revert until a solution can be devised.


216134 02-Dec-2010 brucec

Disallow passing in a count of zero bytes to the bus_space(9) functions.

Passing a count of zero on i386 and amd64 for [I386|AMD64]_BUS_SPACE_MEM
causes a crash/hang since the 'loop' instruction decrements the counter
before checking if it's zero.

PR: kern/80980
Discussed with: jhb


216094 01-Dec-2010 alc

phys_avail[] is correctly defined as an array of vm_paddr_t's in
machdep.c. Use that same type, and not vm_offset_t, in this include file.


215054 09-Nov-2010 jhb

- Remove <machine/mutex.h>. Most of the headers were empty, and the
contents of the ones that were not empty were stale and unused.
- Now that <machine/mutex.h> no longer exists, there is no need to allow it
to override various helper macros in <sys/mutex.h>.
- Rename various helper macros for low-level operations on mutexes to live
in the _mtx_* or __mtx_* namespaces. While here, change the names to more
closely match the real API functions they are backing.
- Drop support for including <sys/mutex.h> in assembly source files.

Suggested by: bde (1, 2)


213157 25-Sep-2010 imp

Remove clauses 3 and 4, per changes to NetBSD versions of these files.


211412 17-Aug-2010 kib

Supply some useful information to the started image using ELF aux vectors.
In particular, provide pagesize and pagesizes array, the canary value
for SSP use, number of host CPUs and osreldate.

Tested by: marius (sparc64)
MFC after: 1 month


210939 06-Aug-2010 jhb

Add a new ipi_cpu() function to the MI IPI API that can be used to send an
IPI to a specific CPU by its cpuid. Replace calls to ipi_selected() that
constructed a mask for a single CPU with calls to ipi_cpu() instead. This
will matter more in the future when we transition from cpumask_t to
cpuset_t for CPU masks in which case building a CPU mask is more expensive.

Submitted by: peter, sbruno
Reviewed by: rookie
Obtained from: Yahoo! (x86)
MFC after: 1 month


210623 29-Jul-2010 jhb

Mark the __curthread() functions as __pure2 and remove the volatile keyword
from the inline assembly. This allows the compiler to cache invocations of
curthread since it's value does not change within a thread context.

Submitted by: zec (i386)
MFC after: 1 week


210550 27-Jul-2010 jhb

Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.

Reviewed by: alc


210369 22-Jul-2010 kib

When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by: jhb, nwhitehorn
MFC after: 2 weeks


209779 07-Jul-2010 marcel

Add acpi_find_table() -- a convenience function for looking up an
ACPI table given the signature.


209671 03-Jul-2010 marcel

Allocate and setup an interrupt vector for corrected machine checks.
For now, just print when we get the interrupt, but eventually we need
to collect the details and provide a more useful report.


209618 01-Jul-2010 marcel

When compiling with profiling, we define PROF for userspace and GPROF
for the kernel.


209617 30-Jun-2010 marcel

While functions are ideally aligned to a 32-byte boundary, don't
assume this to be the case.


209026 11-Jun-2010 marcel

Bump MAX_BPAGES from 256 to 1024. It seems that a few drivers, bge(4)
in particular, do not handle deferred DMA map load operations at all.
Any error, and especially EINPROGRESS, is treated as a hard error and
typically abort the current operation. The fact that the busdma code
queues the load operation for when resources (i.e. bounce buffers in
this particular case) are available makes this especially problematic.
Bounce buffering, unlike what the PR synopsis would suggest, works
fine.

While on the subject, properly implement swi_vm().

PR: 147502
MFC after: 1 week


208514 24-May-2010 kib

Change ia64' struct syscall_args definition so that args is a pointer to
the arguments array instead of array itself. ia64 syscall arguments are
readily available in the frame, point args to it, do not do unnecessary
bcopy. Still reserve the array in syscall_args for ia32 emulation.

Suggested and reviewed by: marcel
MFC after: 1 month


208453 23-May-2010 kib

Reorganize syscall entry and leave handling.

Extend struct sysvec with three new elements:
sv_fetch_syscall_args - the method to fetch syscall arguments from
usermode into struct syscall_args. The structure is machine-depended
(this might be reconsidered after all architectures are converted).
sv_set_syscall_retval - the method to set a return value for usermode
from the syscall. It is a generalization of
cpu_set_syscall_retval(9) to allow ABIs to override the way to set a
return value.
sv_syscallnames - the table of syscall names.

Use sv_set_syscall_retval in kern_sigsuspend() instead of hardcoding
the call to cpu_set_syscall_retval().

The new functions syscallenter(9) and syscallret(9) are provided that
use sv_*syscall* pointers and contain the common repeated code from
the syscall() implementations for the architecture-specific syscall
trap handlers.

Syscallenter() fetches arguments, calls syscall implementation from
ABI sysent table, and set up return frame. The end of syscall
bookkeeping is done by syscallret().

Take advantage of single place for MI syscall handling code and
implement ptrace_lwpinfo pl_flags PL_FLAG_SCE, PL_FLAG_SCX and
PL_FLAG_EXEC. The SCE and SCX flags notify the debugger that the
thread is stopped at syscall entry or return point respectively. The
EXEC flag augments SCX and notifies debugger that the process address
space was changed by one of exec(2)-family syscalls.

The i386, amd64, sparc64, sun4v, powerpc and ia64 syscall()s are
changed to use syscallenter()/syscallret(). MIPS and arm are not
converted and use the mostly unchanged syscall() implementation.

Reviewed by: jhb, marcel, marius, nwhitehorn, stas
Tested by: marcel (ia64), marius (sparc64), nwhitehorn (powerpc),
stas (mips)
MFC after: 1 month


208283 19-May-2010 marcel

Switch to C99 exact-width types.


207410 30-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


207329 28-Apr-2010 attilio

- Extract the IODEV_PIO interface from ia64 and make it MI.
In the end, it does help fixing /dev/io usage from multithreaded
processes.
- On i386 and amd64 the old behaviour is kept but multithreaded
processes must use the new interface in order to work well.
- Support for the other architectures is greatly improved, where
necessary, by the necessity to define very small things now.

Manpage update will happen shortly.

Sponsored by: Sandvine Incorporated
PR: threads/116181
Reviewed by: emaste, marcel
MFC after: 3 weeks


207269 27-Apr-2010 kib

Style: use #define<TAB> instead of #define<SPACE>.

Noted by: bde, pluknet gmail com
MFC after: 11 days


207152 24-Apr-2010 kib

Move the constants specifying the size of struct kinfo_proc into
machine-specific header files. Add KINFO_PROC32_SIZE for struct
kinfo_proc32 for architectures providing COMPAT_FREEBSD32. Add
CTASSERT for the size of struct kinfo_proc32.

Submitted by: pluknet
Reviewed by: imp, jhb, nwhitehorn
MFC after: 2 weeks


206570 13-Apr-2010 marcel

Populate the sysctl tree with any MCA records we collected.
The sequence number is used as the name of a sysctl node,
under which we add the MCA records using the CPU id as the
leaf name.

Add the hw.mca.inject sysctl to provide a way to inject
MC errors and trigger machine checks.

PR: ia64/113102


206556 13-Apr-2010 marcel

o s/u_int64_t/uint64_t/g
o style(9) fixes.


206541 13-Apr-2010 marcel

Sync up to SDM 2.2.


205726 27-Mar-2010 marcel

Implement interrupt to CPU binding. Assign interrupts to CPUs in a
round-robin fashion, starting with the highest priority interrupt
on the highest-numbered CPU and cycling downwards.


205723 27-Mar-2010 marcel

Remove nx_pcibus from the nexus resource. Nexus is not involved
with PCI busses. Remove nexus_read_ivar() and nexus_write_ivar()
to give default behaviour. Remove <machine/nexusvar.h> as well,
because there's nothing in it that's being used.


205713 26-Mar-2010 marcel

Rename disable_intr() to ia64_disable_intr() and rename enable_intr()
to ia64_enable_intr(). This reduces confusion with intr_disable() and
intr_restore().

Have configure_final() call ia64_finalize_intr() instead of enable_intr()
in preparation of adding support for binding interrupts to all CPUs.


205665 26-Mar-2010 marcel

Only use the interval timer for clock interrupts on the BSP and
have the BSP use IPIs to trigger clock interrupts on the APs.
This allows us to run on hardware configurations for which the
ITC has non-uniform frequencies across CPUs.

While here, change the clock XIV to type IPI so as to protect
the interrupt delivery against CPU re-balancing once that's
implemented.


205431 22-Mar-2010 marcel

Define curthread as an inline function that loads the thread pointer
directly from r13, the pcpu pointer. This guarantees correct behaviour
when the thread migrates to a different CPU.


205428 21-Mar-2010 marcel

Don't include <machine/_regset.h> when _MACHINE_REGSET_H_ in defined.
This is not for multiple inclusion purposes, because _regset.h already
handles this, but to enable inclusion of the MD header by cross-tools
on non-ia64 installations. The cross-tool can include _regset.h itself
before including MD headers that depend on it.


205234 17-Mar-2010 marcel

Revamp the interrupt code based on the previous commit:
o Introduce XIV, eXternal Interrupt Vector, to differentiate from
the interrupts vectors that are offsets in the IVT (Interrupt
Vector Table). There's a vector for external interrupts, which
are based on the XIVs.

o Keep track of allocated and reserved XIVs so that we can assign
XIVs without hardcoding anything. When XIVs are allocated, an
interrupt handler and a class is specified for the XIV. Classes
are:
1. architecture-defined: XIV 15 is returned when no external
interrupt are pending,
2. platform-defined: SAL reports which XIV is used to wakeup
an AP (typically 0xFF, but it's 0x12 for the Altix 350).
3. inter-processor interrupts: allocated for SMP support and
non-redirectable.
4. device interrupts (i.e. IRQs): allocated when devices are
discovered and are redirectable.

o Rewrite the central interrupt handler to call the per-XIV
interrupt handler and rename it to ia64_handle_intr(). Move
the per-XIV handler implementation to the file where we have
the XIV allocation/reservation. Clock interrupt handling is
moved to clock.c. IPI handling is moved to mp_machdep.c.

o Drop support for the Intel 8259A because it was broken. When
XIV 0 is received, the CPU should initiate an INTA cycle to
obtain the interrupt vector of the 8259-based interrupt. In
these cases the interrupt controller we should be talking to
WRT to masking on signalling EOI is the 8259 and not the I/O
SAPIC. This requires adriver for the Intel 8259A which isn't
available for ia64. Thus stop pretending to support ExtINTs
and instead panic() so that if we come across hardware that
has an Intel 8259A, so have something real to work with.

o With XIVs for IPIs dynamically allocatedi and also based on
priority, define the IPI_* symbols as variables rather than
constants. The variable holds the XIV allocated for the IPI.

o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV
assigned to IPI_STOP is delivered.


205014 11-Mar-2010 nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by: kib, jhb


204646 03-Mar-2010 joel

The NetBSD Foundation has granted permission to remove clause 3 and 4 from
the software.

Obtained from: NetBSD


204425 27-Feb-2010 marcel

Interrupt related cleanups:
o Assign vectors based on priority, because vectors have
implied priority in hardware.
o Use unordered memory accesses to the I/O sapic and use
the acceptance form of the mf instruction.
o Remove the sapicreg.h and sapicvar.h headers. All definitions
in sapicreg.h are private to sapic.c and all definitions in
sapicvar.h are either private or interface functions. Move the
interface functions to intr.h.
o Hide the definition of struct sapic.


204182 21-Feb-2010 marcel

Remove pm_active from struct pmap as it serves no purpose.

MFC after: 1 week


203884 14-Feb-2010 marcel

Some code cleanups:
o s/u_int32_t/uint32_t/g
o Add multiple-inclusion protection.
o Break long lines.


203883 14-Feb-2010 marcel

Some code churn:
o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used
by calling either bus_space_map() or pmap_mapdev().
o Implement bus_space_map() in terms of pmap_mapdev() and implement
bus_space_unmap() in terms of pmap_unmapdev().
o Have ia64_pib hold the uncached virtual address of the processor interrupt
block throughout the kernel's life and access the elements of the PIB
through this structure pointer.

This is a non-functional change with the exception of using ia64_ld1() and
ia64_st8() to write to the PIB. We were still using assignments, for which
the compiler generates semaphore reads -- which cause undefined behaviour
for uncacheable memory. Note also that the memory barriers in ipi_send() are
critical for proper functioning.

With all the mapping of uncached memory done by pmap_mapdev(), we can keep
track of the translations and wire them in the CPU. This then eliminates
the need to reserve a whole region for uncached I/O and it eliminates
translation traps for device I/O accesses.


202273 14-Jan-2010 marcel

Add ioctl requests to /dev/io on ia64 for reading and writing
EFI variables. The primary reason for this is that it allows
sysinstall(8) to add a boot menu item for the newly installed
FreeBSD image.


202272 14-Jan-2010 marcel

Fix previous commitr:. efi_var_set() was copied from efi_var_get(),
but wasn't actually changed.


202271 14-Jan-2010 marcel

Add wrappers for the RT Variable Services. While here, translate the
EFI status into a standard errno value and change efi_set_time() to
return a standard error.

MFC after: 1 week


202097 11-Jan-2010 marcel

Use io(4) for I/O port access on ia64, rather than through sysarch(2).
I/O port access is implemented on Itanium by reading and writing to a
special region in memory. To hide details and avoid misaligned memory
accesses, a process did I/O port reads and writes by making a MD system
call. There's one fatal problem with this approach: unprivileged access
was not being prevented. /dev/io serves that purpose on amd64/i386, so
employ it on ia64 as well. Use an ioctl for doing the actual I/O and
remove the sysarch(2) interface.

Backward compatibility is not being considered. The sysarch(2) approach
was added to support X11, but support for FreeBSD/ia64 was never fully
implemented in X11. Thus, nothing gets broken that didn't need more work
to begin with.

MFC after: 1 week


201373 02-Jan-2010 marcel

Change BUS_SPACE_MAXADDR from 2^32-1 to 2^64-1. 2^32-1 is representative
for its origin, more than for its accuracy.

MFC after: 1 week


201269 30-Dec-2009 marcel

Revamp bus_space access functions:
o Optimize for memory mapped I/O by making all I/O port acceses function
calls and marking the test for the IA64_BUS_SPACE_IO tag with
__predict_false(). Implement the I/O port access functions in a new
file, called bus_machdep.c.
o Change the bus_space_handle_t for memory mapped I/O to the virtual
address rather than the physical address. This eliminates the PA->VA
translation for every I/O access. The handle for I/O port access is
still the port number.
o Move inb(), outb(), inw(), outw(), inl(), outl(), and their string
variants from cpufunc.h and define them in bus.h. On ia64 these are
not CPU functions at all. In bus.h they are merely aliases for the
new I/O port access functions defined in bus_machdep.h.
o Handle the ACPI resource bug in nexus_set_resource(). There we can
do it once so that we don't have to worry about it whenever we need
to write to an I/O port that is really a memory mapped address.

The upshot of this change is that the KBI is better defined and that I/O
port access always involves a function call, allowing us to change the
actual implementation without breaking the KBI. For memory mapped I/O the
virtual address is abstracted, so that we can change the VA->PA mapping
in the kernel without causing an KBI breakage. The exception at this time
is for bus_space_map() and bus_space_unmap().

MFC after: 1 week.


201032 26-Dec-2009 marcel

Use unordered memory loads and stores for the in* and out*
family of functions.


200889 23-Dec-2009 marcel

Export the bus, cpu and itc frequencies under the hw.freq sysctl node.
The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The
frequencies are rounded to the nearest whole MHz.

While here, rename and re-type bus_frequency, processor_frequency and
itc_frequency to bus_freq, cpu_freq and itc_freq and make them static.
As unsigned integers, the hw.freq.cpu sysctl can more easily be made
generic (across all architectures) making porting easier.

MFC after: 3 days


200888 23-Dec-2009 marcel

Add a bit definition for invalid timestamp in the record header.


200207 07-Dec-2009 marcel

Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id
excluded, as it's used by MI code) and mode the sysctl variables from
pcpu_stats to pcpu_md.
Adjust all references accordingly.

While nearby, change the PCPU sysctl tree so that they match the CPU
device sysctl tree -- they are now children of a static node called
"machdep.cpu" and are named only with their cpu ID.


200200 07-Dec-2009 marcel

Allocate the VHPT for each CPU in cpu_mp_start(), rather than
allocating MAXCPU VHPTs up-front. This allows us to max-out MAXCPU
without memory waste -- MAXCPU is now 32 for SMP kernels.

This change also eliminates the VHPT scaling based in the total
memory in the system. It's the workload that determines the best size
of the VHPT. The workload can be affected by the amount of memory,
but not necessarily. For example, there's no performance difference
between VHPT sizes of 256KB, 512KB and 1MB when building the LINT
kernel. This was observed with a system that has 8GB of memory.
By default the kernel will allocate a 1MB VHPT. The user can tune the
system with the "machdep.vhpt.log2size" tunable.


200051 03-Dec-2009 marcel

Make sure bus space accesses use unorder memory loads and stores.
Memory accesses are posted in program order by virtue of the
uncacheable memory attribute.
Since GCC, by default, adds acquire and release semantics to
volatile memory loads and stores, we need to use inline assembly
to guarantee it. With inline assembly, we don't need volatile
pointers anymore.

Itanium does not support semaphore instructions to uncacheable
memory.


199941 29-Nov-2009 marcel

Move the sysctl related fields to the end of the structure and
make them conditional upon _KERNEL. libkvm includes <sys/pcpu.h>
and <sys/sysctl.h> does not expose the structure definitions to
userland.


199893 28-Nov-2009 marcel

Eliminate teh use of MAXCPU in static arrays of interrupt counters by
adding statistics counters to the PCPU structure. Export the counters
through sysctl by giving each PCPU structure its own sysctl context.

While here, fix cnt.v_intr by not just having it count clock interrupts,
but every interrupt and add more counters for each interrupt source.


199719 23-Nov-2009 marcel

Revert previous commit. The problem was not related to overrunning
the kernel stack at all. The new USB stack simply caused a change
in timing that triggered a firmware bug more often. The addition
of PRINTF_BUFR_SIZE apparently triggered the same firmware bug
even more reliably.

But even with KSTACK_PAGES=5, one instance of the firmware bug
remained: booting with a CD inserted. This problem was run into
by accident after installing Debian and having to boot FreeBSD
to fixup the GPT partitioning (Thanks... not). After bumping
KSTACK_PAGES to 5, it was pretty unbelievable that the stack was
still being too small.

After updating the firmware we could boot with a CD inserted and
KSTACK_PAGES could be lowered back to 4 pages without problems.

Note: It is believed to be a timing related firmware bug, because
the machine check information showed access to the serial console
on one CPU and access to the EHCI HCD on the other CPU. Since
both are devices on the management unit and thus virtualized in
some way, any execution trace that does not include concurrent
access to the BMC from both CPUs is fine.

Note also that it's not understood exactly how increasing the
kernel stack avoided hitting the firmware bug. A change in page
faults does change timing, but it's not known if that's what's
happening here.

In any case: the problem is being monitored. Reverting back to
4 pages for the kernel stack is preferred, because it makes it
easier to switch to 16K pages (double the page size) without
wasting too much memory by not being able to half the number of
pages...


198733 31-Oct-2009 marcel

Reimplement the lazy FP context switching:
o Move all code into a single file for easier maintenance.
o Use a single global lock to avoid having to handle either
multiple locks or race conditions.
o Make sure to disable the high FP registers after saving
or dropping them.
o use msleep() to wait for the other CPU to save the high
FP registers.

This change fixes the high FP inconsistency panics.

A single global lock typically serializes too much, which may
be noticable when a lot of threads use the high FP registers,
but in that case it's probably better to switch the high FP
context synchronuously. Put differently: cpu_switch() should
switch the high FP registers if the incoming and outgoing
threads both use the high FP registers.


198451 24-Oct-2009 marcel

A 32KB kernel stack is not quite enough. The new USB stack is a bit
more stack hungry as compared to the old one that my RX2660 gets
a machine check and spontaneously reboots at the time the USB DVD
drive is found and attached to CAM as a mass storage device. This
doesn't happen always, but definitely varies per kernel build.
Likewise when using a 128-byte printf buffer. The additional 128
bytes that printf needs seems to be enough to have the memory stack
and register stack collide and causing a machine check.

Thus: Bump KSTACK_PAGES from 4 to 5.


198338 21-Oct-2009 marcel

o Align function on a 32-byte boundary so that the core's front-end
can deliver 2 bundles per cycle to the back-end.
o Mark syscall stubs with a special unwind ABI tag so that unwind
libraries know how to unwind.


197933 10-Oct-2009 kib

Define architectural load bases for PIE binaries. Addresses were selected
by looking at the bases used for non-relocatable executables by gnu ld(1),
and adjusting it slightly.

Discussed with: bz
Reviewed by: kan
Tested by: bz (i386, amd64), bsam (linux)
MFC after: some time


197316 18-Sep-2009 alc

Add a new sysctl for reporting all of the supported page sizes.

Reviewed by: jhb
MFC after: 3 weeks


196994 08-Sep-2009 phk

Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.


196196 13-Aug-2009 attilio

* Completely Remove the option STOP_NMI from the kernel. This option
has proven to have a good effect when entering KDB by using a NMI,
but it completely violates all the good rules about interrupts
disabled while holding a spinlock in other occasions. This can be the
cause of deadlocks on events where a normal IPI_STOP is expected.
* Adds an new IPI called IPI_STOP_HARD on all the supported architectures.
This IPI is responsible for sending a stop message among CPUs using a
privileged channel when disponible. In other cases it just does match a
normal IPI_STOP.
Right now the IPI_STOP_HARD functionality uses a NMI on ia32 and amd64
architectures, while on the other has a normal IPI_STOP effect. It is
responsibility of maintainers to eventually implement an hard stop
when necessary and possible.
* Use the new IPI facility in order to implement a new userend SMP kernel
function called stop_cpus_hard(). That is specular to stop_cpu() but
it does use the privileged channel for the stopping facility.
* Let KDB use the newly introduced function stop_cpus_hard() and leave
stop_cpus() for all the other cases
* Disable interrupts on CPU0 when starting the process of APs suspension.
* Style cleanup and comments adding

This patch should fix the reboot/shutdown deadlocks many users are
constantly reporting on mailing lists.

Please don't forget to update your config file with the STOP_NMI
option removal

Reviewed by: jhb
Tested by: pho, bz, rink
Approved by: re (kib)


195649 12-Jul-2009 alc

Add support to the virtual memory system for configuring machine-
dependent memory attributes:

Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the
fact that there are machine-dependent memory attributes that have
nothing to do with controlling the cache's behavior.

Introduce vm_object_set_memattr() for setting the default memory
attributes that will be given to an object's pages.

Introduce and use pmap_page_{get,set}_memattr() for getting and
setting a page's machine-dependent memory attributes. Add full
support for these functions on amd64 and i386 and stubs for them on
the other architectures. The function pmap_page_set_memattr() is also
responsible for any other machine-dependent aspects of changing a
page's memory attributes, such as flushing the cache or updating the
direct map. The uses include kmem_alloc_contig(), vm_page_alloc(),
and the device pager:

kmem_alloc_contig() can now be used to allocate kernel memory with
non-default memory attributes on amd64 and i386.

vm_page_alloc() and the device pager will set the memory attributes
for the real or fictitious page according to the object's default
memory attributes.

Update the various pmap functions on amd64 and i386 that map pages to
incorporate each page's memory attributes in the mapping.

Notes: (1) Inherent to this design are safety features that prevent
the specification of inconsistent memory attributes by different
mappings on amd64 and i386. In addition, the device pager provides a
warning when a device driver creates a fictitious page with memory
attributes that are inconsistent with the real page that the
fictitious page is an alias for. (2) Storing the machine-dependent
memory attributes for amd64 and i386 as a dedicated "int" in "struct
md_page" represents a compromise between space efficiency and the ease
of MFCing these changes to RELENG_7.

In collaboration with: jhb

Approved by: re (kib)


195376 05-Jul-2009 sam

Cleanup ALIGNED_POINTER:
o add to platforms where it was missing (arm, i386, powerpc, sparc64, sun4v)
o define as "1" on amd64 and i386 where there is no restriction
o make the type returned consistent with ALIGN
o remove _ALIGNED_POINTER
o make associated comments consistent

Reviewed by: bde, imp, marcel
Approved by: re (kensmith)


195060 26-Jun-2009 alc

Correct the #endif comment.

Noticed by: jmallett
Approved by: re (kib)


195033 26-Jun-2009 alc

This change is the next step in implementing the cache control functionality
required by video card drivers. Specifically, this change introduces
vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all
architectures. In addition, this changes adds a vm_cache_mode_t parameter
to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the
interfaces for allocating mapped kernel memory and physical memory,
respectively, with non-default cache modes.

In collaboration with: jhb


192324 18-May-2009 marcel

Rename ia64_invalidate_icache() to ia64_sync_icache(). We're
not invalidating anything.


191309 20-Apr-2009 rwatson

Don't conditionally define CACHE_LINE_SHIFT, as we anticipate sizing
a fair number of static data structures, making this an unlikely
option to try to change without also changing source code. [1]

Change default cache line size on ia64, sparc64, and sun4v to 128
bytes, as this was what rtld-elf was already using on those
platforms. [2]

Suggested by: bde [1], jhb [2]
MFC after: 2 weeks


191278 19-Apr-2009 rwatson

Add description and cautionary note regarding CACHE_LINE_SIZE.

MFC after: 2 weeks
Suggested by: alc


191276 19-Apr-2009 rwatson

For each architecture, define CACHE_LINE_SHIFT and a derived
CACHE_LINE_SIZE constant. These constants are intended to
over-estimate the cache line size, and be used at compile-time
when a run-time tuning alternative isn't appropriate or
available.

Defaults for all architectures are 64 bytes, except powerpc
where it is 128 bytes (used on G5 systems).

MFC after: 2 weeks
Discussed on: arch@


189926 17-Mar-2009 kib

Add AT_EXECPATH ELF auxinfo entry type. The value's a_ptr is a pointer
to the full path of the image that is being executed.
Increase AT_COUNT.

Remove no longer true comment about types used in Linux ELF binaries,
listed types contain FreeBSD-specific entries.

Reviewed by: kan


188119 04-Feb-2009 jhb

Tweak the ia64 machine check handling code to not register new sysctl nodes
while holding a spin mutex. Instead, it now shoves the machine check
records onto a queue that is later drained to add sysctl nodes for each
record. While a routine to drain the queue is present, it is not currently
called.

Reviewed by: marcel


186212 17-Dec-2008 imp

AT_DEBUG and AT_BRK were OBE like 10 years ago, so retire them.

Reviewed by: peter


185163 22-Nov-2008 marcel

Define mb(), rmb() and wmb() for real.


185162 22-Nov-2008 kmacy

- bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.


183439 28-Sep-2008 marius

Remove ipi_all() and ipi_self() as the former hasn't been used at
all to date and the latter also is only used in ia64 and powerpc
code which no longer serves a real purpose after bring-up and just
can be removed as well. Note that architectures like sun4u also
provide no means of implementing IPI'ing a CPU itself natively
in the first place.

Suggested by: jhb
Reviewed by: arch, grehan, jhb


181875 19-Aug-2008 jhb

Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports
already define _KERNEL to get to this and I'm about to add hooks to
libkvm to access per-CPU data.

MFC after: 1 week


180354 07-Jul-2008 marcel

Add inline function ia64_fc_i() to abstract inline assembly.
Use the new inline function in ia64_invalidate_icache().
While there, add proper synchronization so that we know
the fc.i instructions have taken effect when we return.


179990 25-Jun-2008 ed

Remove the unused major/minor numbers from iodev and memdev.

Now that st_rdev is being automatically generated by the kernel, there
is no need to define static major/minor numbers for the iodev and
memdev. We still need the minor numbers for the memdev, however, to
distinguish between /dev/mem and /dev/kmem.

Approved by: philip (mentor)


179382 28-May-2008 marcel

Work-around a compiler optimization bug, that broke libthr. Massive
inlining resulted in constant propagation to the extend that cmpval
was known to the compiler to be URWLOCK_WRITE_OWNER (= 0x80000000U).
Unfortunately, instead of zero-extending the unsigned constant, it
was sign-extended. As such, the cmpxchg instruction was comparing
0x0000000080000000LU to 0xffffffff80000000LU and obviously didn't
perform the exchange.
But, since the value returned by cmpxhg equalled cmpval (when zero-
extended), the _thr_rtld_lock_release() function thought the exchange
did happen and as such returned as if having released the lock. This
was not the case. Subsequent locking requests found rw_state non-zero
and the thread in question entered the kernel and block indefinitely.

The work-around is to zero-extend by casting to uint64_t.


178296 18-Apr-2008 marcel

Remove cruft we got from Alpha, which was probably inherited
from NetBSD. I.e. make it more like a FreeBSD header.


177769 30-Mar-2008 marcel

Better implement I-cache invalidation. The previous implementation
was a kluge. This implementation matches the behaviour on powerpc
and sparc64.
While on the subject, make sure to invalidate the I-cache after
loading a kernel module.

MFC after: 2 weeks


177661 27-Mar-2008 jb

When building a kernel module, define MAXCPU the same as SMP so
that modules work with and without SMP.


177642 26-Mar-2008 phk

The "free-lance" timer in the i8254 is only used for the speaker
these days, so de-generalize the acquire_timer/release_timer api
to just deal with speakers.

The new (optional) MD functions are:
timer_spkr_acquire()
timer_spkr_release()
and
timer_spkr_setfreq()

the last of which configures the timer to generate a tone of a given
frequency, in Hz instead of 1/1193182th of seconds.

Drop entirely timer2 on pc98, it is not used anywhere at all.

Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if
they exist, and do nothing otherwise.

Remove prototypes and empty acquire-/release-timer() and sysbeep()
functions from the non-beeping archs.

This eliminate the need for the speaker driver to know about
i8254frequency at all. In theory this makes the speaker driver MI,
contingent on the timer_spkr_*() functions existing but the driver
does not know this yet and still attaches to the ISA bus.

Syscons is more tricky, in one function, sc_tone(), it knows the hz
and things are just fine.

In the other function, sc_bell() it seems to get the period from
the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode
the 1193182 and leave it at that. It's probably not important.

Change a few other sysbeep() uses which obviously knew that the
argument was in terms of i8254 frequency, and leave alone those
that look like people thought sysbeep() took frequency in hertz.

This eliminates the knowledge of i8254_freq from all but the actual
clock.c code and the prof_machdep.c on amd64 and i386, where I think
it would be smart to ask for help from the timecounters anyway [TBD].


177276 16-Mar-2008 pjd

Implement atomic_fetchadd_long() for all architectures and document it.

Reviewed by: attilio, jhb, jeff, kris (as a part of the uidinfo_waitfree.patch)


177091 12-Mar-2008 jeff

Remove kernel support for M:N threading.

While the KSE project was quite successful in bringing threading to
FreeBSD, the M:N approach taken by the kse library was never developed
to its full potential. Backwards compatibility will be provided via
libmap.conf for dynamically linked binaries and static binaries will
be broken.


175959 04-Feb-2008 marcel

Allocate a stack for thread0 and switch to it before calling
mi_startup(). This frees up kstack for static PAL/SAL calls
and double-fault handling.


174938 27-Dec-2007 alc

Add configuration knobs for the superpage reservation system. Initially,
the reservation will only be enabled on amd64.


174405 07-Dec-2007 jkoshy

Add stubs to unbreak LINT.


173970 27-Nov-2007 jasone

Define atomic_readandclear_ptr.


172317 25-Sep-2007 alc

Change the management of cached pages (PQ_CACHE) in two fundamental
ways:

(1) Cached pages are no longer kept in the object's resident page
splay tree and memq. Instead, they are kept in a separate per-object
splay tree of cached pages. However, access to this new per-object
splay tree is synchronized by the _free_ page queues lock, not to be
confused with the heavily contended page queues lock. Consequently, a
cached page can be reclaimed by vm_page_alloc(9) without acquiring the
object's lock or the page queues lock.

This solves a problem independently reported by tegge@ and Isilon.
Specifically, they observed the page daemon consuming a great deal of
CPU time because of pages bouncing back and forth between the cache
queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of
this problem turned out to be a deadlock avoidance strategy employed
when selecting a cached page to reclaim in vm_page_select_cache().
However, the root cause was really that reclaiming a cached page
required the acquisition of an object lock while the page queues lock
was already held. Thus, this change addresses the problem at its
root, by eliminating the need to acquire the object's lock.

Moreover, keeping cached pages in the object's primary splay tree and
memq was, in effect, optimizing for the uncommon case. Cached pages
are reclaimed far, far more often than they are reactivated. Instead,
this change makes reclamation cheaper, especially in terms of
synchronization overhead, and reactivation more expensive, because
reactivated pages will have to be reentered into the object's primary
splay tree and memq.

(2) Cached pages are now stored alongside free pages in the physical
memory allocator's buddy queues, increasing the likelihood that large
allocations of contiguous physical memory (i.e., superpages) will
succeed.

Finally, as a result of this change long-standing restrictions on when
and where a cached page can be reclaimed and returned by
vm_page_alloc(9) are eliminated. Specifically, calls to
vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and
return a formerly cached page. Consequently, a call to malloc(9)
specifying M_NOWAIT is less likely to fail.

Discussed with: many over the course of the summer, including jeff@,
Justin Husted @ Isilon, peter@, tegge@
Tested by: an earlier version by kris@
Approved by: re (kensmith)


171739 06-Aug-2007 marcel

Keep interrupts disabled while handling external interrupts.
There's no advantage in allowing nested external interrupts.
In fact, it leads to a potential stack overrun.

While here, put the interrupt vector in the trapframe, so as
to compensate for the 36 cycle latency of reading cr.ivr.

Further simplify assembly code by dealing with ASTs from C.

Approved by: re (blanket)


171735 05-Aug-2007 marcel

In ia64_set_rr(), don't perform data serialization. This allows
us to do the data serializations once after writing multiple
region registers, as is done in pmap_switch(). All existing
calls to ia64_set_rr() are followed with calls to ia64_srlz_d().

Approved by: re (blanket)


171718 04-Aug-2007 marcel

Add ia64_srlz_d() and ia64_srlz_i() functions to aid in serialization.

Approved by: re (blanket)


171664 30-Jul-2007 marcel

Rework the interrupt code and add support for interrupt filtering
(INTR_FILTER). This includes:
o Save a pointer to the sapic structure and IRQ for every vector,
so that we can quickly EOI, mask and unmask the interrupt.
o Add locking to the sapic code now that we can reprogram a
sapic on multiple CPUs at the same time.
o Use u_int for the vector and IRQ. We only have 256 vectors, so
using a 64-bit type for it is rather excessive.
o Properly handle concurrent registration of a handler for the
same vector.

Since vectors have a corresponding priority, we should not map
IRQs to vectors in a linear fashion, but rather pick a vector
that has a priority in line with the interrupt type. This is left
for later. The vector/IRQ interchange has been untangled as much
as possible to make this easier.

Approved by: re (blacket)


171663 30-Jul-2007 marcel

Explicitly map the VHPT on all processors. Previously we were
merely lucky that the VHPT was mapped as a side-effect of
mapping the kernel, but when there's enough physical memory,
this may not at all be the case.

Approved by: re (blanket)


171662 30-Jul-2007 marcel

Add casts to some of the more commonly used pointer-type atomic
operations. We really should be able to make those inline functions,
but this would break its use for sx_locks.

Approved by: re (blanket)


170519 10-Jun-2007 alc

Add the machine-specific definitions for configuring the new physical
memory allocator.

Set the size of phys_avail[] using one of these definitions.

Approved by: re


170507 10-Jun-2007 marcel

Work around a firmware bug in the HP rx2660, where in ACPI an I/O port
is really a memory mapped I/O address. The bug is in the GAS that
describes the address and in particular the SpaceId field. The field
should not say the address is an I/O port when it clearly is not.

With an additional check for the IA64_BUS_SPACE_IO case in the bus
access functions, and the fact that I/O ports pretty much not used
in general on ia64, make the calculation of the I/O port address a
function. This avoids inlining the work-around into every driver,
and also helps reduce overall code bloat.


170473 09-Jun-2007 marcel

Add kdb_cpu_sync_icache(), intended to synchronize instruction
caches with data caches after writing to memory. This typically
is required to make breakpoints work on ia64 and powerpc. For
those architectures the function is implemented.


170291 04-Jun-2007 attilio

Rework the PCPU_* (MD) interface:
- Rename PCPU_LAZY_INC into PCPU_INC
- Add the PCPU_ADD interface which just does an add on the pcpu member
given a specific value.

Note that for most architectures PCPU_INC and PCPU_ADD are not safe.
This is a point that needs some discussions/work in the next days.

Reviewed by: alc, bde
Approved by: jeff (mentor)


170033 27-May-2007 alc

Eliminate an unused definition.


170026 27-May-2007 marcel

Have the processor defer all faults and exceptions for control
speculative loads. This at least makes control speculative loads
work. In the future we should analyze which faults/exceptions
we want to handle rather than defer to avoid having to call the
recovery code when it's not strictly necessary.


169291 05-May-2007 alc

Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory. The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used. Previously,
only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR: 112194


168920 21-Apr-2007 sepotvin

Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc


167814 22-Mar-2007 jkim

Catch up with ACPI-CA 20070320 import.


167429 11-Mar-2007 alc

Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file. Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb


166901 23-Feb-2007 piso

o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@


165967 12-Jan-2007 imp

Remove 3rd clause, renumber, ok per email


164392 18-Nov-2006 marcel

Now that printf() needs the PCPU, set it up before we call printf().
Change the pc_pcb field from a pointer to struct pcb to struct pcb
so that sizeof(struct pcb) includes the PCB we use for IPI_STOP.
Statically declare early_pcb so that we don't have to allocate the
PCB for thread0. This way we can setup the PCPU before cninit()
and thus before we use printf().


164250 13-Nov-2006 ru

Fix a comment.


163016 04-Oct-2006 jb

PR:
Submitted by:
Reviewed by:
Approved by:
Obtained from:
MFC after:
Security:
Move the relocation definitions to the common elf header so that DTrace
can use them on one architecture targeted to a different one.

Add the additional ELF types defines in Sun's "Linker and Libraries"
manual.


162954 02-Oct-2006 phk

First part of a little cleanup in the calendar/timezone/RTC handling.

Move relevant variables to <sys/clock.h> and fix #includes as necessary.

Use libkern's much more time- & spamce-efficient BCD routines.


162487 21-Sep-2006 kan

Use __builtin_va_start instead of __builtin_stdarg_start. GCC4 obsoletes
the former and __builtin_va_start was present in all GCC version 3.1 and
later.


161628 25-Aug-2006 alc

Eliminate unused definitions. (They came from NetBSD.)

Discussed with: cognet, grehan, marcel


161223 11-Aug-2006 jhb

First pass at allowing memory to be mapped using cache modes other than
WB (write-back) on x86 via control bits in PTEs and PDEs (including making
use of the PAT MSR). Changes include:
- A new pmap_mapdev_attr() function for amd64 and i386 which takes an
additional parameter (relative to pmap_mapdev()) specifying the cache
mode for this mapping. Note that on amd64 only WB mappings are done with
the direct map, all other modes result in a private mapping.
- pmap_mapdev() on i386 and amd64 now defaults to using UC (uncached)
mappings rather than WB. Previously we relied on the BIOS setting up
MTRR's to enforce memio regions being treated as UC. This might make
hw.cbb_start_memory unnecessary in some cases now for example.
- A new pmap_mapbios()/pmap_unmapbios() API has been added to allow places
that used pmap_mapdev() to map non-device memory (such as ACPI tables)
to do so using WB as before.
- A new pmap_change_attr() function for amd64 and i386 that changes the
caching mode for a range of KVA.

Reviewed by: alc


160105 05-Jul-2006 bde

Fixed FP_R*. fp{get_set}round() apparently never worked on ia64, since
the alpha values were used and are quite different.

Fixed some style bugs by copying from the i386 version where it is better.


160040 29-Jun-2006 marcel

Partial support for branch long emulation. This only emulates the
branch long jump and not the branch long call. Support for that is
forthcoming.


158459 11-May-2006 marcel

Fix braino in previous commit: Don't redefine OID_AUTO to something
not equal to -1, or at all for that matter.


158445 11-May-2006 phk

Clean out sysctl machdep.* related defines.

The cmos clock related stuff should really be in MI code.


157448 03-Apr-2006 marcel

Eliminate HAVE_STOPPEDPCBS. On ia64 the PCPU holds a pointer to the
PCB in which the context of stopped CPUs is stored. To access this
PCB from KDB, we introduce a new define, called KDB_STOPPEDPCB. The
definition, when present, lives in <machine/kdb.h> and abstracts
where MD code saves the context. Define KDB_STOPPEDPCB on i386,
amd64, alpha and sparc64 in accordance to previous code.


154958 28-Jan-2006 marcel

s/DT_IA64_PLT_RESERVE/DT_IA_64_PLT_RESERVE/


154496 18-Jan-2006 marcel

o Add missing relocations.
o Minor white-space fixups.


154491 17-Jan-2006 marcel

s/R_IA64_/R_IA_64_/g as per the ia64 psABI.


154128 09-Jan-2006 imp

By popular demand, move __HAVE_ACPI and __PCI_REROUTE_INTERRUPT into
param.h. Per request, I've placed these just after the
_NO_NAMESPACE_POLLUTION ifndef. I've not renamed anything yet, but
may since we don't need the __.

Submitted by: bde, jhb, scottl, many others.


153955 01-Jan-2006 imp

Define __HAVE_ACPI and/or __PCI_REROUTE_INTERRUPT, as appropriate for
each platform. These will be used in the pci code in preference to
the complicated #ifdefs we have there now.


153666 22-Dec-2005 jhb

Tweak how the MD code calls the fooclock() methods some. Instead of
passing a pointer to an opaque clockframe structure and requiring the
MD code to supply CLKF_FOO() macros to extract needed values out of the
opaque structure, just pass the needed values directly. In practice this
means passing the pair (usermode, pc) to hardclock() and profclock() and
passing the boolean (usermode) to hardclock_cpu() and hardclock_process().
Other details:
- Axe clockframe and CLKF_FOO() macros on all architectures. Basically,
all the archs were taking a trapframe and converting it into a clockframe
one way or another. Now they can just extract the PC and usermode values
directly out of the trapframe and pass it to fooclock().
- Renamed hardclock_process() to hardclock_cpu() as the latter is more
accurate.
- On Alpha, we now run profclock() at hz (profhz == hz) rather than at
the slower stathz.
- On Alpha, for the TurboLaser machines that don't have an 8254
timecounter, call hardclock() directly. This removes an extra
conditional check from every clock interrupt on Alpha on the BSP.
There is probably room for even further pruning here by changing Alpha
to use the simplified timecounter we use on x86 with the lapic timer
since we don't get interrupts from the 8254 on Alpha anyway.
- On x86, clkintr() shouldn't ever be called now unless using_lapic_timer
is false, so add a KASSERT() to that affect and remove a condition
to slightly optimize the non-lapic case.
- Change prototypeof arm_handler_execute() so that it's first arg is a
trapframe pointer rather than a void pointer for clarity.
- Use KCOUNT macro in profclock() to lookup the kernel profiling bucket.

Tested on: alpha, amd64, arm, i386, ia64, sparc64
Reviewed by: bde (mostly)


153179 06-Dec-2005 jhb

- Cleanup whitespace and extra ()s in vtophys() macros.
- Move vtophys() macros next to vtopte() where vtopte() exists to match
comments above vtopte().
- Remove references to the alternate address space in the comment above
vtopte(). amd64 never had the alternate address space, and i386 lost it
prior to PAE support being added.
- s/entires/entries/ in comments.

Reviewed by: alc


153168 06-Dec-2005 ru

Drop _MACHINE_ARCH and _MACHINE defines (not to be confused with
MACHINE_ARCH and MACHINE). Their purpose was to be able to test
in cpp(1), but cpp(1) only understands integer type expressions.
Using such unsupported expressions introduced a number of subtle
bugs, which were discovered by compiling with -Wundef.


150627 27-Sep-2005 jhb

Add a new atomic_fetchadd() primitive that atomically adds a value to a
variable and returns the previous value of the variable.

Tested on: i386, alpha, sparc64, arm (cognet)
Reviewed by: arch@
Submitted by: cognet (arm)
MFC after: 1 week


150008 11-Sep-2005 alc

Eliminate unused definitions.


149777 03-Sep-2005 marcel

o s/vhpt_size/pmap_vhpt_log2size/g
o s/vhpt_base/pmap_vhpt_base/g
o s/vhpt_bucket/pmap_vhpt_bucket/g
o Declare the above in <machine/pmap.h>
o Move the vm.stats.vhpt.* sysctls to machdep.vhpt.*
o Create a tunable machdep.vhpt.log2size, with corresponding sysctl.
The tunable allows the user to specify the VHPT size from the loader.
o Don't keep track of the number of PTEs in the VHPT. Calculate the
population when necessary by iterating the buckets and summing up
the length of the buckets.
o Don't perform the tpa instruction with a bucket lock held. The
instruction can (theoretically) fault and locking is not needed.


149337 20-Aug-2005 stefanf

Move MINSIGSTKSZ from <machine/signal.h> to <machine/_limits.h> and rename
it to __MINSIGSTKSZ. Define MINSIGSTKSZ in <sys/signal.h>.

This is done in order to use MINSIGSTKSZ for the macro PTHREAD_STACK_MIN
in <pthread.h> (soon <limits.h>) without having to include the whole
<sys/signal.h> header.

Discussed with: bde


148807 06-Aug-2005 marcel

Improve SMP support:
o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
uses to look up translations it can't find in the TLB. As such,
the VHPT serves as a level 1 cache (the TLB being a level 0 cache)
and best results are obtained when it's not shared between CPUs.
The collision chain (i.e. the hash bucket) is shared between CPUs,
as all buckets together constitute our collection of PTEs. To
achieve this, the collision chain does not point to the first PTE
in the list anymore, but to a hash bucket head structure. The
head structure contains the pointer to the first PTE in the list,
as well as a mutex to lock the bucket. Thus, each bucket is locked
independently of each other. With at least 1024 buckets in the VHPT,
this provides for sufficiently finei-grained locking to make the
ssolution scalable to large SMP machines.
o Add synchronisation to the lazy FP context switching. We do this
with a seperate per-thread lock. On SMP machines the lazy high FP
context switching without synchronisation caused inconsistent
state, which resulted in a panic. Since the use of the high FP
registers is not common, it's possible that races exist. The ia64
package build has proven to be a good stress test, so this will
get plenty of exercise in the near future.
o Don't use the local ID of the processor we want to send the IPI to
as the argument to ipi_send(). use the struct pcpu pointer instead.
The reason for this is that IPI delivery is unreliable. It has been
observed that sending an IPI to a CPU causes it to receive a stray
external interrupt. As such, we need a way to make the delivery
reliable. The intended solution is to queue requests in the target
CPU's per-CPU structure and use a single IPI to inform the CPU that
there's a new entry in the queue. If that IPI gets lost, the CPU
can check it's queue at any convenient time (such as for each
clock interrupt). This also allows us to send requests to a CPU
without interrupting it, if such would be beneficial.

With these changes SMP is almost working. There are still some random
process crashes and the machine can hang due to having the IPI lost
that deals with the high FP context switch.

The overhead of introducing the hash bucket head structure results
in a performance degradation of about 1% for UP (extra pointer
indirection). This is surprisingly small and is offset by gaining
reasonably/good scalable SMP support.


148805 06-Aug-2005 marcel

Reduce the default MAXCPU from 16 to 4. This is in preparation of
allocating a VHPT per CPU. Since we don't yet know how many CPUs
are actually in the system at the time we need to allocate the
VHPTs, we allocate for MAXCPU processors. This can result in a
lot of wasted space for 2-way machines. So, for now, limit MAXCPU
to something smaller until we have something more dynamic.


148804 06-Aug-2005 marcel

For ia64_ptc_{e,g,ga,l}(), use instruction serialization. We
typically don't know what the TLB described and need to assume
that it affects the fetching of instructions.


148067 15-Jul-2005 jhb

Convert the atomic_ptr() operations over to operating on uintptr_t
variables rather than void * variables. This makes it easier and simpler
to get asm constraints and volatile keywords correct.

MFC after: 3 days
Tested on: i386, alpha, sparc64
Compiled on: ia64, powerpc, amd64
Kernel toolchain busted on: arm


147773 05-Jul-2005 marcel

Enhance ia64_flush_dirty() to handle the case in which td != curthread.
This case is triggered with ptrace(2) and the PT_SETREGS function.
Change the return type of the function to int so that errors can be
passed on to the caller.

Approved by: re (scottl)


147745 02-Jul-2005 marcel

Implement functions calls from within DDB on ia64. On ia64 a function
pointer doesn't point to the first instruction of that function, but
rather to a descriptor. The descriptor has the address of the first
instruction, as well as the value of the global pointer. The symbol
table doesn't know anything about descriptors, so if you lookup the
name of a function you get the address of the first instruction. The
cast from the address, which is the result of the symbol lookup, to a
function pointer as is done in db_fncall is therefore invalid.
Abstract this detail behind the DB_CALL macro. By default DB_CALL is
defined as db_fncall_generic, which yields the old behaviour. On ia64
the macro is defined as db_fncall_ia64, in which a descriptor is
constructed to yield a valid function pointer.

While here, introduce DB_MAXARGS. DB_MAXARGS replaces the existing
(local) MAXARGS. The DB_MAXARGS macro can be defined by platforms to
create a convenient maximum. By default this will be the legacy 10.
On ia64 we define this macro to be 8, for 8 is the maximum number of
arguments that can be passed in registers. This avoids having to
implement spilling of arguments on the memory stack.

Approved by: re (dwhite)


147322 12-Jun-2005 marcel

Define IPI_PREEMPT. Update a nearby comment while I'm here.


147191 09-Jun-2005 jkoshy

MFP4:

- Implement sampling modes and logging support in hwpmc(4).

- Separate MI and MD parts of hwpmc(4) and allow sharing of
PMC implementations across different architectures.
Add support for P4 (EMT64) style PMCs to the amd64 code.

- New pmcstat(8) options: -E (exit time counts) -W (counts
every context switch), -R (print log file).

- pmc(3) API changes, improve our ability to keep ABI compatibility
in the future. Add more 'alias' names for commonly used events.

- bug fixes & documentation.


146734 29-May-2005 nyan

Remove bus_{mem,p}io.h and related code for a micro-optimization on i386
and amd64. The optimization is a trivial on recent machines.

Reviewed by: -arch (imp, marcel, dfr)


146037 10-May-2005 marcel

Don't define _MACHINE_BUS_MEMIO_H_ nor _MACHINE_BUS_PIO_H_.


145389 22-Apr-2005 marcel

Sanity the RTC code:
o Remove the clock interface. Not only does it conflict with the MI
version when device genclock is added to the kernel, it was also
not possible to have more than 1 clock device. This of course would
have been a problem if we actually had more than 1 clock device.
In short: we don't need a clock interface and if we do eventually,
we should be using the MI one.
o Rewrite inittodr() and resettodr() to take into account that:
1) We use the EFI interface directly.
2) time_t is 64-bit and we do need to make sure we can determine
leap years from year 2100 and on. Add a nice explanation of
where leap years come from and why.
3) This rewrite happened in 2005 so any date prior to 1/1/2005
(either M/D/Y or D/M/Y) is bogus. Reprogram the EFI clock with
1/1/2005 in that case.
4) The EFI clock has a high probability of being correct, so
only (further) correct the EFI clock when the file system time
is larger. That should never happen in a time-synchronised world.
Complain when EFI lost 2 days or more.

Replace the copyright notice now that I (pretty much) rewrote all of
this file.


145332 20-Apr-2005 marcel

Add empty header (except of the multiple-inclusion protection) to
get hwpmc(4) to compile on this platform.


145253 18-Apr-2005 imp

Break out the definition of bus_space_{tag,handle}_t and a few other types
into _bus.h to help with name space polution from including all of bus.h.
In a few days, I'll commit changes to the MI code to take advantage of thse
sepration (after I've made sure that these changes don't break anything in
the main tree, I've tested in my trees, but you never know...).

Suggested by: bde (in 2002 or 2003 I think)
Reviewed in principle by: jhb


144637 04-Apr-2005 jhb

Divorce critical sections from spinlocks. Critical sections as denoted by
critical_enter() and critical_exit() are now solely a mechanism for
deferring kernel preemptions. They no longer have any affect on
interrupts. This means that standalone critical sections are now very
cheap as they are simply unlocked integer increments and decrements for the
common case.

Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter()
and spinlock_exit(). This KPI is responsible for providing whatever MD
guarantees are needed to ensure that a thread holding a spin lock won't
be preempted by any other code that will try to lock the same lock. For
now all archs continue to block interrupts in a "spinlock section" as they
did formerly in all critical sections. Note that I've also taken this
opportunity to push a few things into MD code rather than MI. For example,
critical_fork_exit() no longer exists. Instead, MD code ensures that new
threads have the correct state when they are created. Also, we no longer
try to fixup the idlethreads for APs in MI code. Instead, each arch sets
the initial curthread and adjusts the state of the idle thread it borrows
in order to perform the initial context switch.

This change is largely a big NOP, but the cleaner separation it provides
will allow for more efficient alternative locking schemes in other parts
of the kernel (bare critical sections rather than per-CPU spin mutexes
for per-CPU data for example).

Reviewed by: grehan, cognet, arch@, others
Tested on: i386, alpha, sparc64, powerpc, arm, possibly more


143598 14-Mar-2005 scottl

Refactor the bus_dma header files so that the interface is described in
sys/bus_dma.h instead of being copied in every single arch. This slightly
reorders a flag that was specific to AXP and thus changes the ABI there.
The interface still relies on bus_space definitions found in <machine/bus.h>
so it cannot be included on its own yet, but that will be fixed at a later
date. Add an MD <machine/bus_dma.h> for ever arch for consistency and to
allow for future MD augmentation of the API. sparc64 makes heavy use of
this right now due to its different bus_dma implemenation.


143063 02-Mar-2005 joerg

netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild. Extension to other compilers is supposed
to be possible, of course.

Submitted by: netchild
Reviewed by: various developers on arch@, some time ago


142107 19-Feb-2005 ru

Use a common multi-inclusion protection, and add such a
protection to alpha/include/exec.h.


141085 31-Jan-2005 imp

nit in /*-


140311 15-Jan-2005 scottl

Add bus_dmamap_load_mbuf_sg() to ia64


139790 06-Jan-2005 imp

/* -> /*- for copyright notices, minor format tweaks as necessary


139554 02-Jan-2005 marcel

Further enhance the handling of misaligned loads and stores:
o implement double-extended and single precision loads and stores,
o implement double precision stores,
o replace the machdep.unaligned_print sysctl with debug.unaligned_print
and change the default value to 0,
o replace the machdep.unaligned_sigbus sysctl with debug.unaligned_test,
o Remmove the fillfd() function. The function is trvial enough for
inline assembly.

The debug.unaligned_test sysctl is used to test the emulation of
misaligned loads and stores. When PSR.ac is 0, the CPU will handle
misaligned memory accesses itselfi and we don't get an exception
for it. When PSR.ac is 1, the process needs to be signalled and we
should not emulate. The sysctl takes effect when PSR.ac is 1 and
tells us that we should emulate and not send a signal.

PR: 72268
MFC after: 1 week


138674 11-Dec-2004 marcel

Use primitive types to avoid creating an artificial header dependency:
o s/u_long/unsigned long/
o s/uint32_t/unsigned int/g
o s/uint64_t/unsigned long/g

Trigger case: multimedia/mpeg2codec


138543 08-Dec-2004 marcel

Don't obtain the HCDP address directly from the bootinfo structure.
Use a function to keep the details at arms length from uart(4).


138253 01-Dec-2004 marcel

Change gdb_cpu_setreg() to not take the value to which to set the
specified register, but a pointer to the in-memory representation of
that value. The reason for this is twofold:
1. Not all registers can be represented by a register_t. In particular
FP registers fall in that category. Passing the new register value
by reference instead of by value makes this point moot.
2. When we receive a G or P packet, both are for writing a register,
the packet will have the register value in target-byte order and
in the memory representation (modulo the fact that bytes are sent
as 2 printable hexadecimal numbers of course). We only need to
decode the packet to have a pointer to the register value.

This change fixes the bug of extracting the register value of the P
packet as a hexadecimal number instead of as a bit array. The quick
(and dirty) fix to bswap the register value in gdb_cpu_setreg() as
it has been added on i386 and amd64 can therefore be removed and has
in fact been that.

Tested on: alpha, amd64, i386, ia64, sparc64


138145 28-Nov-2004 marcel

Whitespace fixes:
o Remove a bogus comment that relates to alpha.
o s/u_int64_t/uint64_t/g
o Add bi_spare2 to make the internal padding explicit.
o Move BOOTINFO_MAGIC after the field it applies to.


137978 21-Nov-2004 marcel

Remove struct ia64_itir and use a plain old uint64_t instead.


137914 20-Nov-2004 das

Remove UAREA_PAGES.

Reviewed by: arch@


136366 11-Oct-2004 njl

Move the code for halting the CPU (acpi_cpu_c1) into machdep files.
This removes the last MD portion of acpi_cpu.c.

MFC after: 2 weeks


135783 25-Sep-2004 marcel

Move the IA-32 trap handling from trap() to ia32_trap(). Move the
ia32_syscall() function along with it to ia32_trap.c. When COMPAT_IA32
is not defined, we'll raise SIGEMT instead.


135590 23-Sep-2004 marcel

Redefine a PTE as a 64-bit integral type instead of a struct of
bit-fields. Unify the PTE defines accordingly and update all
uses.


135583 22-Sep-2004 marcel

For the atomic_{add|clear|set|subtract} family of inlines, return the
old or previous value instead of void. This is not as is documented
in atomic(9), but is API (and ABI) compatible and simply makes sense.
This feature will primarily be used for atomic PTE updates in PMAP/ng.


135581 22-Sep-2004 marcel

MFp4: various style fixes, including
o s/u_int/uint/g
o s/#define<sp>/#define<tab>/g
o indent macro definitions
o Improve vertical spacing
o Globally align line continuation character


135453 19-Sep-2004 marcel

MFp4:
Completely remove the remaining EFI includes and add our own (type)
definitions instead. While here, abstract more of the internals by
providing interface functions.


135405 17-Sep-2004 marcel

Provide our own FPSWA definitions, instead of depending on the Intel
EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers
conflict with the Intel ACPI headers (duplicate type definitions), so
are being phased out in the kernel.


134398 27-Aug-2004 marcel

Move the kernel-specific logic to adjust frompc from MI to MD. For
these two reasons:
1. On ia64 a function pointer does not hold the address of the first
instruction of a functions implementation. It holds the address
of a function descriptor. Hence the user(), btrap(), eintr() and
bintr() prototypes are wrong for getting the actual code address.
2. The logic forces interrupt, trap and exception entry points to
be layed-out contiguously. This can not be achieved on ia64 and is
generally just bad programming.

The MCOUNT_FROMPC_USER macro is used to set the frompc argument to
some kernel address which represents any frompc that falls outside
the kernel text range. The macro can expand to ~0U to bail out in
that case.
The MCOUNT_FROMPC_INTR macro is used to set the frompc argument to
some kernel address to represent a call to a trap or interrupt
handler. This to avoid that the trap or interrupt handler appear to
be called from everywhere in the call graph. The macro can expand
to ~0U to prevent adjusting frompc. Note that the argument is selfpc,
not frompc.

This commit defines the macros on all architectures equivalently to
the original code in sys/libkern/mcount.c. People can take it from
here...

Compile-tested on: alpha, amd64, i386, ia64 and sparc64
Boot-tested on: i386


134289 25-Aug-2004 marcel

Get a step closer to profiling the kernel by fixing the definitions
of the MCOUNT_ENTER, MCOUNT_EXIT and MCOUNT_DECL defines. Also make
sure there's a prototype of _MCOUNT_DECL(). This allows us to build
a kernel. There are still unresolved symbols, so linking fails.


134287 25-Aug-2004 marcel

Make profiling actually work. The gcc compiler emits a call to the
_mcount() stub when profiling is enabled. Emit this code sequence
for assembly routines as welli (MCOUNT definition in <machine/asm.h>.
We do not pass the GOT entry however as the 4th argument, because it's
not used. The _mcount() stub calls __mcount(), which does the actual
work. Define _MCOUNT_DECL to define __mcount. We do not have an
implementation of mcount(), so we define MCOUNT as empty, but have a
weak alias to _mcount() in _mcount.S.
Note that the _mcount() stub in the kernel is slightly different from
the stub in userland. This is because we do not have to worry about
nested routines in the kernel.


133879 16-Aug-2004 marcel

As I said: the previous commit was untested... Remove an #endif which
should have ceased to exist when its corresponding #if was removed.


133878 16-Aug-2004 marcel

Catch up with the drive-by renaming of IA32 to COMPAT_IA32. It must
have been rush hour...

While here, move COMPAT_IA32 from opt_global.h to opt_compat.h like on
amd64. Consequently, it's unsafe to use the option in pcb.h. We now
unconditionally have the ia32 specific registers in the PCB.

This commit is untested.


133464 11-Aug-2004 marcel

Add __elfN(dump_thread). This function is called from __elfN(coredump)
to allow dumping per-thread machine specific notes. On ia64 we use this
function to flush the dirty registers onto the backingstore before we
write out the PRSTATUS notes.

Tested on: alpha, amd64, i386, ia64 & sparc64
Not tested on: arm, powerpc


133405 09-Aug-2004 marcel

Better preserve the original protection for the mappings we maintain.
The hardware always gives read access for privilege level 0, which
means that we cannot use the hardware access rights and privilege
level in the PTE to test whether there's a change in protection. So,
we save the original vm_prot_t in the PTE as well.
Add pmap_pte_prot() to set the proper access rights and privilege
level on the PTE given a pmap and the requested protection.

The above allows us to compare the protection in pmap_extract_and_hold()
which was missing. While in pmap_extract_and_hold(), add pmap locking.

While here, clean up most (i.e. all but one) PTE macros we inherited
from alpha. They were either unused, used inconsistently, badly named
or simply weren't beneficial. We save the wired and managed state of
the PTE in distinct (bit) fields.

While in pte.h, s/u_int64_t/uint64_t/g

pmap locking obtained from: alc@
feedback & review by: alc@


133285 07-Aug-2004 marcel

De-inline gdb_cpu_signal() because we need to convert the trap vectors
related to breakpoints and single stepping into SIGTRAP so gdb(1) knows
why the remote target has stopped. In particular, gdb(1) needs to know
if the reason is something of its own doing.


133084 03-Aug-2004 mux

Instead of calling ia32_pause() conditionally on __i386__ or __amd64__
being defined, define and use a new MD macro, cpu_spinwait(). It only
expands to something on i386 and amd64, so the compiled code should be
identical.

Name of the macro found by: jhb
Reviewed by: jhb


132965 01-Aug-2004 markm

Remove extraneous ';'.


132956 01-Aug-2004 markm

Break out the MI part of the /dev/[k]mem and /dev/io drivers into
their own directory and module, leaving the MD parts in the MD
area (the MD parts _are_ part of the modules). /dev/mem and /dev/io
are now loadable modules, thus taking us one step further towards
a kernel created entirely out of modules. Of course, there is nothing
preventing the kernel from having these statically compiled.


132871 30-Jul-2004 marcel

Fix -O builds with gcc 3.4 by defining ffs as __builtin_ffs instead of
creating an inline function that just calls __builtin_ffs.


132700 27-Jul-2004 rwatson

Pass a thread argument into cpu_critical_{enter,exit}() rather than
dereference curthread. It is called only from critical_{enter,exit}(),
which already dereferences curthread. This doesn't seem to affect SMP
performance in my benchmarks, but improves MySQL transaction throughput
by about 1% on UP on my Xeon.

Head nodding: jhb, bmilekic


132383 19-Jul-2004 das

Make FLT_ROUNDS correctly reflect the dynamic rounding mode.


132378 19-Jul-2004 alc

Add partial pmap locking.

Tested by: marcel@


132235 16-Jul-2004 alc

Remove unused fields from the pmap.


131952 10-Jul-2004 marcel

Mega update for the KDB framework: turn DDB into a KDB backend.
Most of the changes are a direct result of adding thread awareness.
Typically, DDB_REGS is gone. All registers are taken from the
trapframe and backtraces use the PCB based contexts. DDB_REGS was
defined to be a trapframe on all platforms anyway.
Thread awareness introduces the following new commands:
thread X switch to thread X (where X is the TID),
show threads list all threads.

The backtrace code has been made more flexible so that one can
create backtraces for any thread by giving the thread ID as an
argument to trace.

With this change, ia64 has support for breakpoints.


131945 10-Jul-2004 marcel

Update for the KDB framework:
o ksym_start and ksym_end changed type to vm_offset_t.
o Make debugging support conditional upon KDB instead of DDB.
o Call kdb_enter() instead of breakpoint().
o Remove implementation of Debugger().
o Call kdb_trap() according to the new world order.

unwinder:
o s/db_active/kdb_active/g
o Various s/ddb/kdb/g
o Add support for unwinding from the PCB as well as the trapframe.
Abuse a spare field in the special register set to flag whether
the PCB was actually constructed from a trapframe so that we can
make the necessary adjustments.

md_var.h:
o Add RSE convenience macros.
o Add ia64_bsp_adjust() to add or subtract from BSP while taking
NaT collections into account.


131905 10-Jul-2004 marcel

Implement makectx(). The makectx() function is used by KDB to create
a PCB from a trapframe for purposes of unwinding the stack. The PCB
is used as the thread context and all but the thread that entered the
debugger has a valid PCB.
This function can also be used to create a context for the threads
running on the CPUs that have been stopped when the debugger got
entered. This however is not done at the time of this commit.


131903 10-Jul-2004 marcel

Introduce the KDB debugger frontend. The frontend provides a framework
in which multiple (presumably different) debugger backends can be
configured and which provides basic services to those backends.
Besides providing services to backends, it also serves as the single
point of contact for any and all code that wants to make use of the
debugger functions, such as entering the debugger or handling of the
alternate break sequence. For this purpose, the frontend has been
made non-optional.
All debugger requests are forwarded or handed over to the current
backend, if applicable. Selection of the current backend is done by
the debug.kdb.current sysctl. A list of configured backends can be
obtained with the debug.kdb.available sysctl. One can enter the
debugger by writing to the debug.kdb.enter sysctl.


131899 10-Jul-2004 marcel

Introduce the GDB debugger backend for the new KDB framework. The
backend improves over the old GDB support in the following ways:
o Unified implementation with minimal MD code.
o A simple interface for devices to register themselves as debug
ports, ala consoles.
o Compression by using run-length encoding.
o Implements GDB threading support.


130965 23-Jun-2004 alc

- Remove unused definitions.
- Move a definition inside the scope of a #ifdef _KERNEL.


130764 20-Jun-2004 bde

Backed out previous commit. Blind substitution of dev_t by `struct cdev *'
was just wrong here because the dev_t's are user dev_t's.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


129444 19-May-2004 bde

Moved most of the "MI" definitions and declarations from <machine/profile.h>
to <sys/gmon.h>. Cleaned them up a little by not attempting to ifdef
for incomplete and out of date support for GUPROF in userland, as in
the sparc64 version.


129393 18-May-2004 stefanf

<stdint.h> should define WINT_M{AX,IN} independent from whether WCHAR_MIN is
defined. Otherwise first including <wchar.h> and then <stdint.h> leads to no
WINT_M{AX,IN} at all.

PR: 64956
Approved by: das (mentor)


128979 05-May-2004 njl

Add an MI implementation of the ACPI global lock routines and retire the
individual asm versions. The global lock is shared between the BIOS and
OS and thus cannot use our mutexes. It is defined in section 5.2.9.1 of
the ACPI specification.

Reviewed by: marcel, bde, jhb


128629 25-Apr-2004 das

Hide FLT_EVAL_METHOD and DECIMAL_DIG in pre-C99 compilation
environments.

PR: 63935
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>


128393 18-Apr-2004 alc

MFamd64
Simplify the sf_buf implementation. In short, make it a veneer
over the direct virtual-to-physical mapping.


128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127922 06-Apr-2004 alc

Remove avail_end. As of yesterday, it is unused.


127875 05-Apr-2004 alc

Remove avail_start on those platforms that no longer use it. (Only amd64
does anything with it beyond simple initialization.)


127253 21-Mar-2004 marcel

In breakpoint(), use a different immediate to make sure we can
distinguish between debugger inserted breakpoints and fixed
breakpoints. While here, make sure the break instruction never
ends up in the last slot of a bundle by forcing it to be an
M-unit instruction. This makes it easier for use to skip over
it.


127239 20-Mar-2004 marcel

Introduce the cpumask_t type. The purpose of the type is to create a
level of abstraction for any and all CPU mask and CPU bitmap variables
so that platforms have the ability to break free from the hard limit
of 32 CPUs, simply because we don't have more bits in an u_int. Note
that the type is not supposed to solve massive parallelism, where
the number of CPUs can be larger than the width of the widest integral
type. As such, cpumask_t is not supposed to be a compound type. If
such would be necessary in the future, we can deal with the issues
then and there. For now, it can be assumed that the type is integral
and unsigned.

With this commit, all MD definitions start off as u_int. This allows
us to phase-in cpumask_t at our leasure without breaking anything.
Once cpumask_t is used consistently, platforms can switch to wider
(or smaller) types if such would be beneficial (or not; whatever :-)

Compile-tested on: i386


127219 20-Mar-2004 marcel

Replace uint64_t with unsigned long in struct dbreg.


126715 07-Mar-2004 alc

Remove unused declarations. (Some time ago, these variables became fields
of vm/vm.h's struct kva_md_info.)


126649 05-Mar-2004 le

Fix syntax errors and wrong function prototypes in several MD header
files when using non-GNUC compilers.

PR: kern/58515
Submitted by: Stefan Farfeleder <stefan@fafoe.narf.at>
Approved by: grog (mentor), obrien


126106 22-Feb-2004 marcel

Do not pre-map the I/O port space. On the Intel Tiger 4 this conflicts
with a memory mapped I/O range that's immediately before it and is
not 256MB aligned. As a result, when an address is accessed in the
memory mapped range and a direct mapping is added for it, it overlaps
with the pre-mapped I/O port space and causes a machine check.

Based on a patch from: arun@


124478 13-Jan-2004 des

Whitespace nit.


124296 09-Jan-2004 nectar

Provide sysarch(2) prototypes in the MD sysarch.h headers. While I'm
at it, use the ANSI C generic pointer type for the second argument,
thus matching the documentation.

Remove the now extraneous (and now conflicting) function declarations
in various libc sources. Remove now unnecessary casts.

Reviewed by: bde


123791 24-Dec-2003 peter

GC the unused <machine/kse.h> file.


123420 10-Dec-2003 peter

Fix last second typo.


123419 10-Dec-2003 peter

Use gcc's superior ffs() builtin.


123418 10-Dec-2003 peter

Use ffs(x) == popcnt(x ^ (x - 1)) to implement 64 bit ffsl(). gcc's
ffs() builtin uses this already but truncates the upper 32 bits.


123288 08-Dec-2003 obrien

Move the bktr(4) <arch>/include/ioctl_{bt848,meteor}.h files to dev/bktr
as these ioctl's aren't MD. This also means they are installed in
/usr/include/dev/bktr now. Also provide compatability wrappers for
where these headers lived in 4.x.


123255 07-Dec-2003 marcel

Simplify the contexts created by the kernel and remove the related
flags. We now create asynchronous contexts or syscall contexts only.
Syscall contexts differ from the minimal ABI dictated contexts by
having the scratch registers saved and restored because that's where
we keep the syscall arguments and syscall return values.
Since this change affects KSE, have it use kse_switchin(2) for the
"new" syscall context.


123223 07-Dec-2003 imp

Ooops. These are still used by the bktr driver. David O'Brien has
plans for dealing, but I'll let him deal.

Pointy hat to: imp@


123213 07-Dec-2003 imp

Remote meteor driver. It hasn't compiled in over 3 years. If someone
makes it compile again, and can test it, we can restore the driver to
the tree.


122829 17-Nov-2003 bde

Fixed a pedantic syntax error (a stray semicolon at the end of
PCPU_MD_FIELDS).


122780 16-Nov-2003 alc

- Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
- Move the sf_buf API to its own header file; make struct sf_buf's
definition machine dependent. In this commit, we remove an
unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
Ultimately, we may eliminate struct sf_buf on those architecures
except as an opaque pointer that references a vm page.


122763 15-Nov-2003 njl

Add the pc_acpi_id PCPU member. The new acpi_cpu driver uses this to
dereference the softc.


122525 12-Nov-2003 marcel

Remove ia64_highfp_load() now that it's unused.


122368 09-Nov-2003 marcel

Use get_mcontext() to construct the signal context in sendsig() and
use set_mcontext() to restore the context in sigreturn(). Since we
put the syscall number and the syscall arguments in the trapframe
(we don't save the scratch registers for syscalls, which allows us
to reuse the space to our advantage), create a MD specific flag so
that we save the scratch registers even for syscalls. We would not
be able to restart a syscall otherwise.

The signal trampoline does not need to flush the regiters anymore,
because get_mcontext() already handles that. In fact, if we set up
the context correctly, we do not need to have a trampoline at all.
This change however only minimally changes the trampoline code. In
follow-up commits this can be further optimized.

Note that normally we preserve cfm and iip in the trapframe created
by the EPC syscall path when we restore a context in set_mcontext()
because those fields are not normally set for a synchronuous context.
The kernel puts the return address and frame info of the syscall
stub in there. By preserving these fields we hide this detail from
userland which allows us to use setcontext(2) for user created
contexts. However, sigreturn() is commonly called from the trampoline,
which means that if we preserve cfm and iip in all cases, we would
return to the trampoline after the sigreturn(), which means we hit
the safety net: we call exit(2). So, we do not preserve cfm and iip
when we have a synchronous context that also has scratch registers
(the uncommon context created by sendsig() only), under the assumption
that if such a context is created in userland, something special is
going on and the use of cfm and iip is then just another quirk. All
this is invisible in the common case.


122266 07-Nov-2003 scottl

Document the lockfunc and lockfuncarg arguments to bus_dma_tag_create() in
the busdma headers.


121926 03-Nov-2003 marcel

Add a bogus definition of __va_list for use by lint. Make it visible
only when lint is defined to protect builds with non-GNU compilers.


121892 02-Nov-2003 marcel

Remove headers copied from i386 and either useless or wrong on ia64.
An example of useless is bios.h. An example of wrong is msdos.h (due
to the use of long for 32-bit fields).

display.h cannot be removed because it's used by syscons. That header
however has no platform dependency and shouldn't really be here.

Removal if these headers may cause build failures in the ports tree.
It's the ports that need fixing in that case.

Tested with: buildworld, LINT


121622 27-Oct-2003 marcel

The previous commit removed both clause 3 and clause 4 from the UCB
license. Only clause 3 has been revoked. Restore the fourth clause
as clause 3.

Pointed out by: das@

Remove my name as a copyright holder since I don't use a BSD license
compatible or comparable to the UCB license. I choose not to add a
complete second license for my work for aesthetic reasons, nor to
replace the UCB license on grounds of rewriting more than 90% of the
source files. The rewrite can also be seen as an enhancement and since
the files were practically empty, it's rather trivial to have changed
90% of the files.


121600 27-Oct-2003 marcel

Add support for userland to access I/O port space. This is primarily
added for XFree86. There are 2 reasons for doing this with sysarch():
1. The memory mapped I/O space is not at a fixed physical address. An
application has to use some interface to get the base address. It
gets worse if the machine has multiple memory mapped I/O spaces.
2. Access to the memory mapped I/O space needs to happen through a
translation that is flagged as uncachable. There's no interface
that allows a process to do uncached memory I/O, other than though
/dev/mem (possibly).

So, until we either disallow direct access to I/O or bus space from
userland or have a better way of doing this, sysarch() has the least
negative impact on existing interfaces.


121459 24-Oct-2003 marcel

Remove unused header. See also ia64/disasm/disasm.h.


121416 23-Oct-2003 marcel

Cleanup. Remove the md_flags for threads. It's not used. The flags
we had were bogus.
While here, reassign the copyright to the Project. There's nothing
in this files that originates from NetBSD, especially now that the
FreeBSD/alpha bits have been removed, but even then the amount of
inherited code that we actually used was nil.


121411 23-Oct-2003 marcel

Add prototypes for spillfd() and unaligned_fixup().


121294 21-Oct-2003 marcel

Remove md_bspstore from the MD fields of struct thread. Now that
the backing store is at a fixed address, there's no need for a
per-thread variable.


121268 20-Oct-2003 marcel

Put the RSE backing store at a fixed address. This change is triggered
by libguile that needs to know the base of the RSE backing store. We
currently do not export the fixed address to userland by means of a
sysctl so user code needs to hardcode it for now. This will be revisited
later.

The RSE backing store is now at the bottom of region 4. The memory stack
is at the top of region 4. This means that the whole region is usable
for the stacks, giving a 61-bit stack space.

Port: lang/guile (depended of x11/gnome2)


120831 06-Oct-2003 bms

Move pmap_resident_count() from the MD pmap.h to the MI pmap.h.
Add a definition of pmap_wired_count().
Add a definition of vmspace_wired_count().

Reviewed by: truckman
Discussed with: peter


120540 28-Sep-2003 marcel

Drop any and all support for varargs. There's no history to worry
about because we're still tier 2 and our current compiler, as well
as future compilers will not support varargs. This is mostly a
no-op in practice, because <sys/varargs.h> should already cause
compile failures.


120422 25-Sep-2003 peter

Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bit
systems where the data/stack/etc limits are too big for a 32 bit process.

Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c.

Supply an ia32_fixlimits function. Export the clip/default values to
sysctl under the compat.ia32 heirarchy.

Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max
value rather than the sysctl tweakable variable. This allows mmap to
place mappings at sensible locations when limits have been reduced.

Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same
method as mmap(0, ...) now does.

Note that we cannot remove all references to the sysctl tweakable
maxdsiz etc variables because /etc/login.conf specifies a datasize
of 'unlimited'. And that causes exec etc to fail since it can no
longer find space to mmap things.


120375 23-Sep-2003 nyan

Implement the bus_space_map() function to allocate resources and initialize
a bus_handle, but currently it does only initializing a bus_handle.


120222 19-Sep-2003 marcel

Change TRAPF_USERMODE and CLOCKF_USERMODE to not test for CPL == 3,
but for CPL != 0. For some reason yet unknown it is possible for the
CPL to be 2. This would previously be counted as kernel mode, which
resulted in nasty panics. By changing the test it is now treated as
user mode, which is more correct. We still need to figure out how it
is possible that the privilege level can be 2 (or 1 for that matter),
because it's not used by us. We only use 3 (user mode) and 0 (kernel
mode).


119970 10-Sep-2003 marcel

Rewrite the SAPIC initialization to always program the RTEs with what
we think is the correct trigger mode and polarity. This allows us to
implement BUS_CONFIG_INTR() as an update of the RTE in question.
Consequently, we can trust the RTE when we enable an interrupt and
avoids that we need to know about the trigger mode and polarity at
that time.


119906 09-Sep-2003 marcel

Introduce IA64_ID_PAGE_{MASK|SHIFT|SIZE} and LOG2_ID_PAGE_SIZE. The
latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn
determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.

The constants are used instead of the literal hardcoding (in its
various forms) of the size of the direct mappings created in region
6 and 7. The default and probably only workable size is still 256M,
but for kicks we use 128M for LINT.


119787 05-Sep-2003 marcel

Fix a place where I forgot to change the code that checks whether
we return to kernel or userland. This triggered a panic in a KSE
application when TDF_USTATCLOCK was set in the case userland was
interrupted, but we never called ast() on our way out. As such,
we called ast() at some other time. Unfortunately, TDF_USTATCLOCK
handling assumes running in the interrupt thread. This was not
the case anymore.

To avoid making the same mistake later, interrupt() now returns
to its caller whether we interrupted userland or not. This avoids
that we have to duplicate the check in assembly, where it's bound
to fall off the scope. Now we simply check the return value and
call ast() if appropriate.

Run into this: davidxu


119347 23-Aug-2003 marcel

Remove PAGE_SIZE_4K, PAGE_SIZE_8K and PAGE_SIZE_16K and replace them
with LOG2_PAGE_SIZE. A single option is better to LINT than multiple
mutual exclusive ones.


119159 20-Aug-2003 marcel

Undo the mistake made in revision 1.77 of trap.c and which was the
ultimate trigger for the follow-up fixes in revisions 1.78, 1.80,
1.81 and 1.82 of trap.c. I was simply too pre-occupied with the
gateway page and how it blurs kernel space with user space and
vice versa that I couldn't see that it was all a load of bollocks.

It's not the IP address that matters, it's the privilege level that
counts. We never run in user space with lifted permissions and we
sure can not run in kernel space without it. Sure, the gateway page
is the exception, but not if you look at the privilege level. It's
user space if you run with user permissions and kernel space otherwise.

So, we're back to looking at the privilege level like it should be.
There's no other way.

Pointy hat: marcel


118990 16-Aug-2003 marcel

Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MI
prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to
cpu.h. This affects db_command.c and kern_shutdown.c.

ia64: move all MD prototypes from cpu.h to md_var.h. This affects
madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory().
It's not used (vm_machdep.c).

alpha: the MD prototypes have been left in cpu.h with a comment
that they should be there. Moving them is left for later. It was
expected that the impact would be significant enough to be done in
a seperate commit.

powerpc: MD prototypes left in cpu.h. Comment added.

Suggested by: bde
Tested with: make universe (pc98 incomplete)


118934 15-Aug-2003 marcel

Add an instruction group break after the move to application register
and the move to control register to avoid dependency violations when
these functions are used. Note that explicit data and instruction
serialization also need to be in a subsequent instruction group.
This too requires that we have an igrp break here.


118933 15-Aug-2003 marcel

Introduce two machine specific ptrace(2) requests: PT_GETKSTACK and
PT_SETKSTACK. These requests allow the tracing process to access the
dirty registers of the traced process that are on the kernel stack.

Note that there's currently no way to access the rnat register for
those dirty registers that are not (yet) covered by a nat collection
point. The interface for this is still being slept on.

Also note that implied by these requests is the division of work:
The tracing process has to keep track of where registers are spilled
and is responsible to figure out where the NaT bit of the stacked
registers are at any time during the execution of the traced process.
The kernel provides the interfaces but will not abstract the fact
that the register stack can be split. This model does not follow
the approach taken in Linux where PT_PEEK and PT_POKE deals with
this automagically.


118848 12-Aug-2003 imp

Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon's
copyrighted files.

Approved by: Matt Dillon


118811 12-Aug-2003 marcel

Cleanup prototypes in cpu.h, including fswintrberr and any references
to it. Sort the remaining prototypes in cpu.h.

No functional change.


118798 11-Aug-2003 marcel

Cleanup and style(9) fixes. No functional change.


118607 07-Aug-2003 jhb

Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort. In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by: bde (kern_ktrace.c)


118590 07-Aug-2003 marcel

Better define the flags in the mcontext_t and properly set the flags
when we create contexts. The meaning of the flags are documented in
<machine/ucontext.h>. I only list them here to help browsing the
commit logs:
_MC_FLAGS_ASYNC_CONTEXT
_MC_FLAGS_HIGHFP_VALID
_MC_FLAGS_KSE_SET_MBOX
_MC_FLAGS_RETURN_VALID
_MC_FLAGS_SCRATCH_VALID

Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)


118443 04-Aug-2003 jhb

- Since td_critnest is now initialized in MI code, it doesn't have to be
set in cpu_critical_fork_exit() anymore.
- As far as I can tell, cpu_thread_link() has never been used, not even
when it was originally added, so remove it.


118414 04-Aug-2003 marcel

Cleanup the clock code. This includes:
o Remove alpha specific timer code (mc146818A) and compiled-out
calibration of said timer.
o Remove i386 inherited timer code (i8253) and related acquire and
release functions.
o Move sysbeep() from clock.c to machdep.c and have it return
ENODEV. Console beeps should be implemented using ACPI or if no
such device is described, using the sound driver.
o Move the sysctls related to adjkerntz, disable_rtc_set and
wall_cmos_clock from machdep.c to clock.c, where the variables
are.
o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't
bother faking a stathz that's 1/8 of that. Keep it simple: hz
defaults to HZ and stathz equals hz. This is also how it's done
for sparc64.
o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj)
to calculate ITC skew and corrections. On average, we adjust the
ITC match register once every ~1500 interrupts for a duration of
2 consequtive interruprs. This is to correct the non-deterministic
behaviour of the ITC interrupt (there's a delay between the match
and the raising of the interrupt).
o Add 4 debugging sysctls to monitor clock behaviour. Those are
debug.clock_adjust_edges, debug.clock_adjust_excess,
debug.clock_adjust_lost and debug.clock_adjust_ticks. The first
counts the individual adjustment cycles (when the skew first
crosses the threshold), the second counts the number of times the
adjustment was excessive (any non-zero value is to be considered
a bug), the third counts lost clock interrupts and the last counts
the number of interrupts for which we applied an adjustment
(debug.clock_adjust_ticks / debug.clock_adjust_edges gives the
avarage duration of an individual adjustment -- should be ~2).

While here, remove some nearby (trivial) left-overs from alpha and
other cleanups.


118382 03-Aug-2003 obrien

Style sync.


118333 02-Aug-2003 marcel

Don't use uint64_t. Use unsigned long instead. One is supposed to use
ucontext_t without having to include headers other than <ucontext.h>.


118239 31-Jul-2003 peter

Deal with 'options KSTACK_PAGES' being a global option.


118081 27-Jul-2003 mux

- Introduce a new busdma flag BUS_DMA_ZERO to request for zero'ed
memory in bus_dmamem_alloc(). This is possible now that
contigmalloc() supports the M_ZERO flag.
- Remove the locking of Giant around calls to contigmalloc() since
contigmalloc() now grabs Giant itself.


118050 26-Jul-2003 marcel

Remove prototype of ia64_pa_access(). The function has been moved to
mem.c where it's been made static.


118048 26-Jul-2003 marcel

Avoid using __aligned(16). Instead define the jmp_buf in terms of
long doubles. This gives us 16-byte alignment. Add a CTASSERT for
the size of the jmp_buf to detect ABI breakages.


118046 26-Jul-2003 marcel

Unbreak ia64 builds now -Werror is enabled again. Avoid obsolete
memory operand construct.


118034 25-Jul-2003 marcel

Revert previous commit. We don't use setjmp()/longjmp() for context
switching anymore, so there's no need to save and restore GP. This
change breaks threaded applications linked against libc_r. Pull the
tier 2 card again: relink. This will link against libthr instead.


117999 25-Jul-2003 marcel

Remove __aligned(16) from the definition of struct _ia64_fpreg. It's
a non-standard construct. Instead, redefine struct _ia64_fpreg as a
union and put a long double in it. On ia64 and for LP64, this is
defined by the ABI to have 16-byte alignment. For ILP32 a long double
has 4-byte alignment, but we don't support ILP32.

Note that the in-memory image of a long double does not match the in-
memory image of spilled FP registers. This means that one cannot use
the fpr_flt field to interpet the bits. For this reason we continue
to use an aggregate type.


117908 23-Jul-2003 marcel

We sloppily created an array for the high FP registers (f32-f127),
but this just created a weird inconsistency when porting gdb(1).
Instead, we name each high FP register seperately, like we do for
all the other registers.


117499 13-Jul-2003 marcel

Enable the high FP registers when we call the FPSWA handler and disable
them again afterwards. This fixes a disabled FP fault while in the FPSWA
handler.
While here, merge the FP fault and FP trap handling code to reduce code
duplication. Where code was different, it was not sure it should be.

Trigger case: ports/math/atlas


117467 12-Jul-2003 marcel

Add logic to trace across/over a trapframe. We have ABI markers in
our unwind information for functions that are entry points into the
kernel. When stepping to the next frame, the unwinder will let us
know when sych a marker was encountered. We use this to stop the
current unwind session, query the trapframe and restart a new
unwind session based on the new trapframe.

The implementation is a bit sloppy, but at this time there are
bigger fish to fry.


117267 05-Jul-2003 marcel

Don't call malloc() and free() while in the debugger and unwinding
to get a stacktrace. This does not work even with M_NOWAIT when we
have WITNESS and is generally a bad idea (pointed out by bde@). We
allocate an 8K heap for use by the unwinder when ddb is active. A
stack trace roughly takes up half of that in any case, so we have
some room for complex unwind situations. We don't want to waste too
much space though. Due to the nature of unwinding, we don't worry
too much about fragmentation or performance of unwinding while in
the debugger. For now we have our own heap management, but we may
be able to leverage from existing code at some later time.

While here:
o Make sure we actually free the unwind environment after unwinding.
This fixes a memory leak.
o Replace Doug's license with mine in unwind.c and unwind.h. Both
files don't have much, if any, of Doug's code left since the EPC
syscall overhaul and the import of the unwinder.
o Remove dead code.
o Replace M_NOWAIT with M_WAITOK for all remaining malloc() calls.


117126 01-Jul-2003 scottl

Mega busdma API commit.

Add two new arguments to bus_dma_tag_create(): lockfunc and lockfuncarg.
Lockfunc allows a driver to provide a function for managing its locking
semantics while using busdma. At the moment, this is used for the
asynchronous busdma_swi and callback mechanism. Two lockfunc implementations
are provided: busdma_lock_mutex() performs standard mutex operations on the
mutex that is specified from lockfuncarg. dftl_lock() is a panic
implementation and is defaulted to when NULL, NULL are passed to
bus_dma_tag_create(). The only time that NULL, NULL should ever be used is
when the driver ensures that bus_dmamap_load() will not be deferred.
Drivers that do not provide their own locking can pass
busdma_lock_mutex,&Giant args in order to preserve the former behaviour.

sparc64 and powerpc do not provide real busdma_swi functions, so this is
largely a noop on those platforms. The busdma_swi on is64 is not properly
locked yet, so warnings will be emitted on this platform when busdma
callback deferrals happen.

If anyone gets panics or warnings from dflt_lock() being called, please
let me know right away.

Reviewed by: tmm, gibbs


116570 19-Jun-2003 marcel

Add TLS related relocation.


116355 14-Jun-2003 alc

Migrate the thread stack management functions from the machine-dependent
to the machine-independent parts of the VM. At the same time, this
introduces vm object locking for the non-i386 platforms.

Two details:

1. KSTACK_GUARD has been removed in favor of KSTACK_GUARD_PAGES. The
different machine-dependent implementations used various combinations
of KSTACK_GUARD and KSTACK_GUARD_PAGES. To disable guard page, set
KSTACK_GUARD_PAGES to 0.

2. Remove the (unnecessary) clearing of PG_ZERO in vm_thread_new. In
5.x, (but not 4.x,) PG_ZERO can only be set if VM_ALLOC_ZERO is passed
to vm_page_alloc() or vm_page_grab().


115913 06-Jun-2003 marcel

Have TRAPF_USERMODE() take into account that the gateway page is not
always kernel space. It should be treated as user space when run with
user privileges (which is the case for the signal trampolines). This
fixes its only use in a KASSERT in subr_trap.c.


115564 31-May-2003 marcel

Make the regset pointers const pointers for the context restore functions.
This works better with set_mcontext() and is more precise in general.


115416 30-May-2003 hmp

Rename BUS_DMAMEM_NOSYNC to BUS_DMA_COHERENT.

The current name is confusing, because it indicates to
the client that a bus_dmamap_sync() operation is not
necessary when the flag is specified, which is wrong.

The main purpose of this flag is to hint the underlying
architecture that DMA memory should be mapped in a coherent
way, but the architecture can ignore it. But if the
architecture does supports coherent mapping of memory, then
it makes bus_dmamap_sync() calls cheap.

This flag is the same as the one in NetBSD's Bus DMA.

Reviewed by: gibbs, scottl, des (implicitly)
Approved by: re@ (jhb)


115378 29-May-2003 marcel

Move the sysctls of the misalignment handler to where they belong
and use OID_AUTO instead of fixed IDs.

Approved by: re@ (blanket)


115343 27-May-2003 scottl

Bring back bus_dmasync_op_t. It is now a typedef to an int, though the
BUS_DMASYNC_ definitions remain as before. The does not change the ABI,
and reverts the API to be a bit more compatible and flexible. This has
survived a full 'make universe'.

Approved by: re (bmah)


115295 24-May-2003 marcel

Be more careful how we restore interrupts. Don't rewrite most of the
PSR only to achieve setting PSR.i back to it's previous value. It
makes it impossible to change any of the 30+ other unrelated bits
when done between intr_disable() and intr_restore(). That's bad.

Instead have intr_disable() return 1 when interrupts were previously
enabled and 0 otherwise and only enable interrupts in intr_restore()
when given a non-0 value.

This change specifically disallows using intr_restore() to disable
interrupts. The reason is simple: interrupts only need to be restored
after they are being disabled, which means that intr_restore() is
called with interrupts disabled and we only need to enable them if
they were previously enabled.

This change does not fix any bugs, other than that it bugged me...

Approved by: re@ (blanket)


115294 24-May-2003 marcel

Consistently us the same metric to differentiate between kernel mode
and user mode. We need to take into account that the EPC syscall path
introduces a grey area in which one can argue either way, including a
third: neither.

We now use the region in which the IP address lies. Regions 5, 6 and 7
are kernel VA regions and if the IP lies any any of those regions we
assume we're in kernel mode. Hence, we can be in kernel mode even if
we're not on the kernel stack and/or have user privileges. There're
gremlins living in the twilight zone :-)

For the EPC syscall path this particularly means that the process
leaves user mode the moment it calls into the gateway page. This
makes the most sense because from a process' point of view the call
represents a request to the kernel for some service and that service
has been performed if the call returns. With the metric we picked,
this also means that we're back in user mode IFF the call returns.

Approved by: re@ (blanket)


115276 24-May-2003 marcel

Fix an alpha inheritance bug:

On alpha, PAL is involved in context management and after wiring
the CPU (in alpha_init()) a context switch was performed to tell
PAL about the context. This was bogusly brought over to ia64
where it introduced bugs, because we restored the context from
a mostly uninitialized PCB.

The cleanup constitutes:
o Remove the unused arguments from ia64_init().
o Don't return from ia64_init(), but instead call mi_startup()
directly. This reduces the amount of muckery in assembly and
also allows for the next bullet:
o Save our currect context prior to calling mi_startup(). The
reason for this is that many threads are created from thread0
by cloning the PCB. By saving our context in the PCB, we have
something sane to clone. It also ensures that a cloned thread
that does not alter the context in any way will return to
the saved context, where we're ready for the eventuality with
a nice, user unfriendly panic().

The cleanup fixes at least the following bugs:
o Entering mi_startup() with the RSE in enforced lazy mode.
o Re-execution of ia64_init() in certain "lab" conditions.

While here, add proper unwind directives to __start() so that
the unwind knows it has reached the bottom of the (call) stack.

Approved by: re@ (blanket)


115164 19-May-2003 kan

sys/sys/limits.h:

- Fix visibilty test for LONG_BIT and WORD_BIT. `#if defined(__FOO_VISIBLE)'
is alays wrong because __FOO_VISIBLE is always defined (to 0 for
invisibility).

sys/<arch>/include/limits.h
sys/<arch>/include/_limits.h:

- Style fixes.

Submitted by: bde
Reviewed by: bsdmike
Approved by: re (scottl)


115148 19-May-2003 marcel

pmap_install() needs to be atomic WRT to context switching. Protect
switching user regions (region 0-4) with schedlock. Avoid unnecessary
recursion on schedlock by moving the core functionality to another
function (pmap_switch()) where we assert schedlock is held. Turn
pmap_install() into a wrapper that grabs schedlock. This minimizes
the number of callsites that need to be changed.
Since we already have schedlock in cpu_switch() and cpu_throw(),
have them call pmap_switch() directly. These were also the only two
calls to pmap_install() outside pmap.c, so make pmap_install() static
and remove its prototype from pmap.h

Approved by: re (blanket)


115094 17-May-2003 marcel

Remove unused files. cpu_switch() and cpu_throw(), normally in swtch.s,
can be found in machdep.c.

Approved: re@


115084 16-May-2003 marcel

Revamp of the syscall path, exception and context handling. The
prime objectives are:
o Implement a syscall path based on the epc inststruction (see
sys/ia64/ia64/syscall.s).
o Revisit the places were we need to save and restore registers
and define those contexts in terms of the register sets (see
sys/ia64/include/_regset.h).

Secundairy objectives:
o Remove the requirement to use contigmalloc for kernel stacks.
o Better handling of the high FP registers for SMP systems.
o Switch to the new cpu_switch() and cpu_throw() semantics.
o Add a good unwinder to reconstruct contexts for the rare
cases we need to (see sys/contrib/ia64/libuwx)

Many files are affected by this change. Functionally it boils
down to:
o The EPC syscall doesn't preserve registers it does not need
to preserve and places the arguments differently on the stack.
This affects libc and truss.
o The address of the kernel page directory (kptdir) had to
be unstaticized for use by the nested TLB fault handler.
The name has been changed to ia64_kptdir to avoid conflicts.
The renaming affects libkvm.
o The trapframe only contains the special registers and the
scratch registers. For syscalls using the EPC syscall path
no scratch registers are saved. This affects all places where
the trapframe is accessed. Most notably the unaligned access
handler, the signal delivery code and the debugger.
o Context switching only partly saves the special registers
and the preserved registers. This affects cpu_switch() and
triggered the move to the new semantics, which additionally
affects cpu_throw().
o The high FP registers are either in the PCB or on some
CPU. context switching for them is done lazily. This affects
trap().
o The mcontext has room for all registers, but not all of them
have to be defined in all cases. This mostly affects signal
delivery code now. The *context syscalls are as of yet still
unimplemented.

Many details went into the removal of the requirement to use
contigmalloc for kernel stacks. The details are mostly CPU
specific and limited to exception_save() and exception_restore().
The few places where we create, destroy or switch stacks were
mostly simplified by not having to construct physical addresses
and additionally saving the virtual addresses for later use.

Besides more efficient context saving and restoring, which of
course yields a noticable speedup, this also fixes the dreaded
SMP bootup problem as a side-effect. The details of which are
still not fully understood.

This change includes all the necessary backward compatibility
code to have it handle older userland binaries that use the
break instruction for syscalls. Support for break-based syscalls
has been pessimized in favor of a clean implementation. Due to
the overall better performance of the kernel, this will still
be notived as an improvement if it's noticed at all.

Approved by: re@ (jhb)


115019 15-May-2003 marcel

This file creates register sets based on the runtime specification.
The advantage of using register sets is that you don't focus on each
register seperately, but instead instroduce a level of abstraction.
This reduces the chance of errors, and also simplifies the code.
The register sers form the basis of everything register.
The sets in this file are:

struct _special
contains all of the control related registers, such as instruction
pointer and stack pointer. It also contains interrupt specific registers
like the faulting address. The set is roughly split in 3 groups. The
first contains the registers that define a context or thread. This is
the only group that the kernel needs to switch threads. The second group
contains registers needed in addition to the first group needed to switch
userland threads. This group contains the thread pointer and the FP control
register. The third group contains those registers we need for execption
handling and are used on top of the first two groups.

struct _callee_saved, struct _callee_saved_fp
These sets contain the preserved registers, including the NaT after
spilling. The general registers (including branch registers) are
seperated from the FP registers for ptrace(2).

struct _caller_saved, struct _caller_saved_fp
These sets contain the scratch registers based on SDM 2.1, This means that
both ar.csd and ar.ccd are included here, even though they contain ia32
segment register descriptions. We keep seperate NaT bits for scratch and
preserved registers, because they are never saved/restored at the same
time.

struct _high_fp
The upper 96 FP registers that can be enabled/disabled seperately on
the CPU from the lower 32 FP registers. Due to the size of this set,
we treat them specially, even though they are defined as scratch
registers.

CVS ----------------------------------------------------------------------


114678 04-May-2003 kan

Style fixes.
Remove DBL_DIG, DBL_MIN, DBL_MAX and their FLT_ counterparts, they
were marked for deprecation ever since SUSv1 at least.
Only define ULLONG_MIN/MAX and LLONG_MAX if long long type is
supported.
Restore a lost comment in MI _limits.h file and remove it from
sys/limits.h where it does not belong.


114347 30-Apr-2003 marcel

Kill MID_MACHINE, its a.out specific, the only platform that supports
it is i386. All of the other platforms should remove it too.
-- peter@


114216 29-Apr-2003 kan

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


114208 29-Apr-2003 marcel

Revamp the newbus functions:
o do not use the in* and out* functions. These functions are used by
legacy drivers and thus must have ia32 compatible behaviour. Hence,
they need to have fences. Using these functions for newbus would
then pessimize performance.
o remove the conditional compilation of PIO and/or MEMIO support. It's
a PITA without having any significant benefit. We always support them
both. Since there are no I/O ports on ia64 (they are simulated by the
chipset by translating memory mapped I/O to predefined uncacheable
memory regions) the only difference between PIO and MEMIO is in the
address calculation. There should be enough ILP that can be exploited
here that making these computations compile-time conditional is not
worth it. We now also don't use the read* and write* functions.
o Add the missing *_8 variants. They were missing, although not missed.
It's for completeness.
o Do not add the fences that were present in the low-level support
functions here. We're using uncacheable memory, which means that
accesses are in program order. Change the barrier implementation
to not only do a memory fence, but also an acceptance fence. This
should more reliably synchronize drivers with the hardware. The
memory fence enforces ordering, but does not imply visibility (ie
the access does not necessarily have happened). This is what the
acceptance deals with.

cpufunc.h cleanup:
o Remove the low-level memory mapped I/O support functions. They are
not used. Keep the low-level I/O port access functions for legacy
drivers and add fences to ensure ia32 compatibility.
o Remove the syscons specific functions now that we have moved the
proper definitions where they belong.
o Replace the ia64_port_address() and ia64_memory_address() functions
with macros. There's a bigger change inline functions get inlined
when there aren't function callsi and the calculations are simply
enough to do it with macros.

Replace the one reference to ia64_memory address in mp_machdep.c to
use the macro.


113941 23-Apr-2003 kan

Add a new sys/limits.h file which in turn depends on machine/_limits.h
to get actual constant values. This is in preparation for machine/limits.h
retirement.

Discussed on: standards@
Submitted by: Craig Rodrigues <rodrigc@attbi.com> (*)
Modified by: kan


113831 22-Apr-2003 marcel

Don't use the tpa instruction to implement pmap_kextract. The tpa
instruction requires that a translation is present in the TC. This
may trigger a TLB miss and a subsequent call to vm_fault().
This implementation is deliberately non-inline for debugging and
profiling purposes. Partial or full inlining should eventually be
done.

Valuable insights by: jake


113350 10-Apr-2003 mux

I deserve a big pointy hat for having missed all those references
to bus_dmasync_op_t in my last commit.


112721 27-Mar-2003 das

Correct LDBL_* constants based on values from i386.


112569 25-Mar-2003 jake

- Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
with PAE.
- Use this to represent physical addresses in the MI vm system and in the
i386 pmap code. This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by: DARPA, Network Associates Laboratories
Discussed with: re, phk (cdevsw change)


112312 16-Mar-2003 jake

Made the prototypes for pmap_kenter and pmap_kremove MD. These functions
are machine dependent because they are not required to update the tlb when
mappings are added or removed, and doing so is machine dependent.
In addition, an implementation may require that pages mapped with pmap_kenter
have a backing vm_page_t, which is not necessarily true of all physical
pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the
physical address in order to make this requirement clear.


111897 05-Mar-2003 marcel

Fix threaded applications on ia64 that are linked dynamicly. We did
not save (restore) the global pointer (GP) in the jmpbuf in setjmp
(longjmp) because it's not needed in general. GP is considered a
scratch register at callsites and hence is always restored after a
call (when it's possible that the call resolves to a symbol in a
different loadmodule; otherwise GP does not have to be saved and
restored at all), including calls to setjmp/longjmp. There's just
one problem with this now that we use setjmp/longjmp for context
switching: A new context must have GP defined properly for the
thread's entry point. This means that we need to put GP in the
jmpbuf and consequently that we have to restore is in longjmp.
This automaticly requires us to save it as well.

When setjmp/longjmp isn't used for context switching, this can be
reverted again.


111894 05-Mar-2003 marcel

ABI breaker: Move the J_SIGMASK field in the jmpbuf before
the J_SIG0 field. While here, rename J_SIG0 to J_SIGSET and
remove J_SIG1. The main reason for this change is that the
128-bit sigset_t is now aligned on a 16-byte boundary, which
allows us to use 16-byte atomic loads and stores on CPUs that
support it. The removal of J_SIG1 is done to avoid confusion:
it is never accessed and should not be. Renaming J_SIG0 to
J_SIGSET is the icing on the cake that's better done now than
later.


111697 01-Mar-2003 alc

MFi386 revision 1.88
Remove some long unused declarations.


111524 26-Feb-2003 mux

Correctly set BUS_SPACE_MAXSIZE in all the busdma backends.
It was bogusly set to 64 * 1024 or 128 * 1024 because it was
bogusly reused in the BUS_DMAMAP_NSEGS definition.


111031 17-Feb-2003 marcel

Define _ALIGNBYTES to be 15. This should have been done right away.


110566 08-Feb-2003 mike

Implement fpclassify():
o Add a MD header private to libc called _fpmath.h; this header
contains bitfield layouts of MD floating-point types.
o Add a MI header private to libc called fpmath.h; this header
contains bitfield layouts of MI floating-point types.
o Add private libc variables to lib/libc/$arch/gen/infinity.c for
storing NaN values.
o Add __double_t and __float_t to <machine/_types.h>, and provide
double_t and float_t typedefs in <math.h>.
o Add some C99 manifest constants (FP_ILOGB0, FP_ILOGBNAN, HUGE_VALF,
HUGE_VALL, INFINITY, NAN, and return values for fpclassify()) to
<math.h> and others (FLT_EVAL_METHOD, DECIMAL_DIG) to <float.h> via
<machine/float.h>.
o Add C99 macro fpclassify() which calls __fpclassify{d,f,l}() based
on the size of its argument. __fpclassifyl() is never called on
alpha because (sizeof(long double) == sizeof(double)), which is good
since __fpclassifyl() can't deal with such a small `long double'.

This was developed by David Schultz and myself with input from bde and
fenner.

PR: 23103
Submitted by: David Schultz <dschultz@uclink.Berkeley.EDU>
(significant portions)
Reviewed by: bde, fenner (earlier versions)


110211 01-Feb-2003 marcel

Remove special casing for running in the simulator from the kernel
and instead add platform, firmware and EFI stubs to the loader.
The net effect of this change is that besides a special console and
disk driver, the kernel has no knowledge of the simulator. This has
the following advantages:
o Simulator support is much harder to break,
o It's easier to make use of more feature complete simulators.
This would only need a change in the simulator specific loader,
o Running SMP kernels within the simulator. Note that ski at this
time does not simulate IPIs, so there's no way to start APs.

The platform, firmware and EFI stubs describe the following hardware:
o 4 CPU Itanium,
o 128 MB RAM within the 4GB address space,
o 64 MB RAM above the 4GB address space.

NOTE: The stubs in the skiloader describe a machine that should in
parts be defined by the simulator. Things like processor interrupt
block and AP wakeup vector cannot be choosen at random because they
require interpretation by the simulator. Currently the simulator is
ignorant of this.

This change introduces an unofficial SSC call SSC_SAL_SET_VECTORS
which is ignored by the simulator.

Tested with: ski (version 0.943 for linux)


108756 06-Jan-2003 marcel

Replace the hardcoding of 255 as the clock interrupt vector with
CLOCK_VECTOR and define it as 254, not 255. Vector 255 is already
in use as the AP wakeup vector on the HP rx2600.

This needs to be made more dynamic. The likelyhood of vector 254
being in use is pretty small, but we already have code to assign
vectors to IPIs (see sal.c) and it's preobably better to have a
centralized "vector manager" that hands out vectors based on
some imput (like priority).


108751 06-Jan-2003 marcel

Manually inline handleclock(). There's only a single caller and
handleclock itself is trivial.

While here, replace (itc_frequency+hz/2)/hz with itm_reload for
consistency. There's now a single place where we determine the
ITM reload value.


108737 05-Jan-2003 marcel

Don't hardcode the address of the local (S)APIC (aka processor
interrupt block). We use the previously hardcoded address as a
default only, but will otherwise use whatever ACPI tells us.
The address can be found in the MADT table header or in the
LAPIC override table entry.


108734 05-Jan-2003 marcel

Bump the number of interrupts from 65 to 257. This is a waste of
space most of the time, but handles machines with lots of I/O
(S)APICs. We cannot make this more dynamic without breaking the
interface with vmstat. Hence, we need to fix the interface first.


108730 05-Jan-2003 marcel

Make all memory I/O addresses (explicitly) 64-bit. Memory mapped
devices aren't necessarily mapped within 4GB. I/O port addresses
are offsets into the memory mapped I/O port space, which is not
larger than 16MB. No need to convert those to 64 bit types.


108728 05-Jan-2003 marcel

Provide a null-implementation for bus_space_unmap, like i386.
bus_space_unmap is required for puc(4).


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108175 22-Dec-2002 tjr

MB_LEN_MAX is not MD, move it to the MI limits.h.


108049 18-Dec-2002 marcel

More MFp4: DIG64 structures.


107685 08-Dec-2002 marcel

Use one of the bi_spare entries for the DIG64 HCDP table address.
The HCDP table is one (non-proprietary) way for the platform to
inform the OS about headless operation. This field would normally
hold the address as can be found by scanning the EFI system table,
which we also pass to the kernel. The apparent duplication allows
us to synthesize a HCDP table in the loader by whatever means we
can think of, including relocating the platform table into pre-
mapped address space. In short: it gives us more freedom.

Approved by: re (blanket)


107395 29-Nov-2002 marcel

Implement bus_space_subregion(). Identical to i386.

Approved by: re (carte blanc)


107206 24-Nov-2002 marcel

MFp4:
Add function map_port_space() to map the memory mapped I/O port
range as uncacheable virtual memory and call it prior to probing
for a console. This removes the dependency on the loader to have
done this for us. Note that this change does not include doing
the same for APs.

Approved by: re (blanket)


106964 15-Nov-2002 peter

Test the water. Make time_t long (64 bit) on ia64 since we do not have
to worry about ABI vs released systems yet. This is mostly transparent
since there is no significant exposure in the syscall interface. The
things that go wrong are mostly userland stuff - time(&intvariable).

Reviewed by: dfr, marcel
Approved by: re (jhb)


106755 11-Nov-2002 marcel

ia64 ABI breaker:
Don't force 16-byte alignment at run-time. Do it at compile-time.
This saves us the pointer fiddling by the setjmp functions and
reduces complexity. While here, increase the jmp_buf by 16 bytes
to an even 512 bytes. Coincidentally, due to the way alignment
was handled prior to this change, the jmp_buf has not changed in
size, but only in how the space is used. Prior to this change
the 16 bytes were reserved for enforcing alignment; now they are
reserved by us for future extensions.
Therefore, this ABI breaker is relatively save: the failure is
always an alignment trap.


106486 06-Nov-2002 marcel

Define UMA_MD_SMALL_ALLOC so that we can allocate memory with region
7 addresses for use by page tables and kernel stacks.

Obtained from: peter


106189 30-Oct-2002 marcel

Rewrite cpu_switch(). The most notable change is the fact that we now
have f16-f31 as part of the context. The PCB has been reorganized to
better match how we save and restore the (preserved) registers. This
commit also moves the context restoriation to its own function (named
pcb_restore), as we did with pcb_save.

Only minimal effort has been put in writing optimal assembly. The
expectation is that there will be more rounds of changes.


106067 28-Oct-2002 marcel

Remove mf.a (the acceptance form of the memory fence instruction)
from all low-level bus space support functions. There's no need
to actually force the read/write to be accepted by the platform
before we can do anything else. We still have the mf instruction
there, which forces ordering. This too is not required given the
semantices of the bus space I/O functions, but it's not at all
clear to me if there are any poorly written device drivers that
depend on the strict ordering by the processor. The motto here is
to take small steps...


106066 28-Oct-2002 marcel

Make vmstat -i work:
o Properly set the pointer to the counter for each interrupt and
update the intrnames table.
o Remove Alpha cruft from intrcnt.h.
o Create INTRNAME_LEN as the single entity that defines the width
of the names in the intrnames table (incl. terminatinf '\0').


105950 25-Oct-2002 peter

Split 4.x and 5.x signal handling so that we can keep 4.x signal
handling clean and functional as 5.x evolves. This allows some of the
nasty bandaids in the 5.x codepaths to be unwound.

Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an
anti-foot-shooting measure in place, 5.x folks need this for a while) and
finish encapsulating the older stuff under COMPAT_43. Since the ancient
stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *'
to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn
is supposed to take), add a compile time check to prevent foot shooting
there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.

Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago).
Approved by: re


105470 19-Oct-2002 marcel

Update the unwind information when modules are loaded and unloaded
by using the linker hooks. Since these hooks are called for the
kernel as well, we don't need to deal with that with a special
SYSINIT. The initialization implicitly performed on the first
update of the unwind information is made explicit with a SYSINIT.
We now don't need the _ia64_unwind_{start|end} symbols.


105138 15-Oct-2002 peter

The a.out md_coredump stuff isn't referenced anywhere anymore, and
hasn't been filled in for ages.. Nuked.


105049 13-Oct-2002 mike

struct ia64_fpreg needs to be available outside of the kernel for
struct sigcontext.

Pointy hat to: mike


105014 13-Oct-2002 mike

Add standards visibility conditionals. Change any uses of sigset_t to
struct __sigset to avoid depending on objects from <sys/signal.h>.


104998 12-Oct-2002 marcel

Remove the dependency on ia64_cpu.h by not defining pmap_kextract()
as a trivial function that only calls ia64_tpa() and hence requires
the prototype of ia64_tpa(), but by defining pmap_kextract as
ia64_tpa. This solves the inclusion ordering issue in ddb/db_watch.c


104584 06-Oct-2002 mike

Add conditionals to allow va_list to be defined in other headers.


104583 06-Oct-2002 mike

o Add conditionals to allow va_list to be defined in other headers.
o Standardize on _MACHINE_STDARG_H_ to allow multiple header includes.
o Restrict the definition of va_copy() to C99 environments.


104551 06-Oct-2002 obrien

It appears CPU_MAXID should be 1 more than the number of CPU_* defines.


104505 05-Oct-2002 mike

Fix namespace issues by using visibility conditionals from
<sys/cdefs.h>.


104493 04-Oct-2002 mike

style(9) <machine/setjmp.h> headers so they look mostly the same.


104486 04-Oct-2002 sam

New bus_dma interfaces for use by crypto device drivers:

o bus_dmamap_load_mbuf
o bus_dmamap_load_uio

Test on i386. Known to compile on alpha and sparc64, but not tested.
Otherwise untried.


104439 04-Oct-2002 peter

Gah, spell extern correctly. Do not trust cut/paste via old mozilla
builds.


104438 04-Oct-2002 peter

List the IO SAPIC delivery mode definitions.


104436 04-Oct-2002 peter

Declare itc_frequency and itm_reload.


103870 23-Sep-2002 alfred

use __packed.


103834 23-Sep-2002 peter

At great personal risk, add a __packed and __aligned(x) define that
expand to __attribute__((packed)) and __attribute__((aligned(x)))
respectively. Replace the handful of gcc-ism's that use
__attribute__((aligned(16))) etc around the kernel with __aligned(16).

There are over 400 __attribute__((packed)) to deal with, that can come
later. I just want to use __packed in new code rather than add more
gcc-ism's.


103814 23-Sep-2002 mike

Be careful not to define GCC-specific optimizations in the non-GCC
case.


103526 18-Sep-2002 mike

Implement C99's va_copy() macro.


103436 17-Sep-2002 peter

Initiate deorbit burn for the i386-only a.out related support. Moves are
under way to move the remnants of the a.out toolchain to ports. As the
comment in src/Makefile said, this stuff is deprecated and one should not
expect this to remain beyond 4.0-REL. It has already lasted WAY beyond
that.

Notable exceptions:
gcc - I have not touched the a.out generation stuff there.
ldd/ldconfig - still have some code to interface with a.out rtld.
old as/ld/etc - I have not removed these yet, pending their move to ports.
some includes - necessary for ldd/ldconfig for now.

Tested on: i386 (extensively), alpha


102883 03-Sep-2002 peter

Make this compile


102874 03-Sep-2002 mike

Now that _BSD_CLK_TCK_ and _BSD_CLOCKS_PER_SEC_ are the same on all
architectures, move the definition directly into <time.h> and finish
the removal of <machine/ansi.h>.


102869 02-Sep-2002 mike

Align _BSD_CLK_TCK_ and _BSD_CLOCKS_PER_SEC_ with most other
platforms. This introduces some binary incompatibilities for
dynamically linked programs which make use of clock(3) and times(3).


102600 30-Aug-2002 peter

Change hw.physmem and hw.usermem to unsigned long like they used to be
in the original hardwired sysctl implementation.

The buf size calculator still overflows an integer on machines with large
KVA (eg: ia64) where the number of pages does not fit into an int. Use
'long' there.

Change Maxmem and physmem and related variables to 'long', mostly for
completeness. Machines are not likely to overflow 'int' pages in the
near term, but then again, 640K ought to be enough for anybody. This
comes for free on 32 bit machines, so why not?


102561 29-Aug-2002 jake

Renamed poorly named setregs to exec_setregs. Moved its prototype to
imgact.h with the other exec support functions.


102330 23-Aug-2002 marcel

s/_BSD_VA_LIST_/__va_list/. The former type doesn't exist anymore.


102315 23-Aug-2002 mike

Move several MI types from <machine/_types.h> to <sys/_types.h>.
These types are unlikely to ever become very MD. They include:
clockid_t, ct_rune_t, fflags_t, intrmask_t, mbstate_t, off_t, pid_t,
rune_t, socklen_t, timer_t, wchar_t, and wint_t.

While moving them, make a few adjustments (submitted by bde):
o __ct_rune_t needs to be precisely `int', not necessarily __int32_t,
since the arg type of the ctype functions is int.
o __rune_t, __wchar_t and __wint_t inherit this via a typedef of
__ct_rune_t.
o Some minor wording changes in the comment blocks for ct_rune_t and
mbstate_t.

Submitted by: bde (partially)


102278 22-Aug-2002 mux

Convert NEXUS_ACCESSOR to use the __BUS_ACCESSOR
macro instead of reimplementing it.

Approved by: peter


102227 21-Aug-2002 mike

o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
architectures.
o Change all headers to make use of this. This mainly involves
changing:
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
#endif
to:
#ifndef _FOO_T_DECLARED
typedef __foo_t foo_t;
#define _FOO_T_DECLARED
#endif

Concept by: bde
Reviewed by: jake, obrien


101479 07-Aug-2002 alc

o Introduce pmap_page_is_mapped(). Its purpose is to obsolete
the PG_MAPPED flag.


100969 30-Jul-2002 iwasaki

Resolve conflicts arising from the ACPI CA 20020725 import.


100882 29-Jul-2002 mike

Create a new header <machine/_stdint.h> for storing MD parts of
<stdint.h>. Previously, parts were defined in <machine/ansi.h> and
<machine/limits.h>. This resulted in two problems:
(1) Defining macros in <machine/ansi.h> gets in the way of that
header only defining types.
(2) Defining C99 limits in <machine/limits.h> adds pollution to
<limits.h>.


100384 20-Jul-2002 peter

Infrastructure tweaks to allow having both an Elf32 and an Elf64 executable
handler in the kernel at the same time. Also, allow for the
exec_new_vmspace() code to build a different sized vmspace depending on
the executable environment. This is a big help for execing i386 binaries
on ia64. The ELF exec code grows the ability to map partial pages when
there is a page size difference, eg: emulating 4K pages on 8K or 16K
hardware pages.

Flesh out the i386 emulation support for ia64. At this point, the only
binary that I know of that fails is cvsup, because the cvsup runtime
tries to execute code in pages not marked executable.

Obtained from: dfr (mostly, many tweaks from me).


99733 10-Jul-2002 mike

Remove label_t and physadr, which seem to have never been used in
FreeBSD.

Submitted by: bde


99630 09-Jul-2002 mike

Remove an unused type.


99594 08-Jul-2002 mike

Move __offsetof() macro from <machine/ansi.h> to <sys/cdefs.h>. It's
hardly MD, since all our platforms share the same macro. It's not
really compiler dependent either, but this helps in reducing
<machine/ansi.h> to only type definitions.


99117 30-Jun-2002 mike

Since printf(3) now supports the `j' conversion specifier, use that
when printing intmax_t and uintmax_t.

Forgotten by: mike
Noticed by: bde


99077 29-Jun-2002 julian

Add a copy of the sparc64 machine/kse.h to satisfy depencies..
dfr will fill in the correct contents at a later time.


98469 20-Jun-2002 peter

Move the "- 1" into the RQB_FFS(mask) macro itself so that
implementations can provide a base zero ffs function if they wish.
This changes
#define RQB_FFS(mask) (ffs64(mask))
foo = RQB_FFS(mask) - 1;
to
#define RQB_FFS(mask) (ffs64(mask) - 1)
foo = RQB_FFS(mask);
On some platforms we can get the "- 1" for free, eg: those that use the
C code for ffs64().

Reviewed by: jake (in principle)


97564 30-May-2002 dfr

Move the definition of ElfN_Hashelt to common headers. The only platform
which has a different definition for this is alpha.


97443 29-May-2002 marcel

Remove the definition of struct mca_guid and use the generic
struct uuid defined in <sys/uuid.h>.

Use uuid/UUID instead of guid/GUID to emphasize that the
identifiers are DCE version 1 identifiers and also to avoid
inconsistencies as much a possible.


97261 25-May-2002 jake

Make the run queue parameters machine dependent. Optimize 64 bit
architectures by using a 64 bit word for the bit array which keeps
track of non-empty queues.

Reviewed by: peter


97090 22-May-2002 marcel

o Add records for PCI bus and PCI device errors.
o Rename mem_platform_id to mem_oem_id.
o Minor style fixes.


96973 20-May-2002 marcel

Flesh-out ptrace support. This obviously needs more work.


96956 19-May-2002 marcel

Simplify IA64_CMPXCHG to avoid having braced-groups in expressions.
As a minor positive side-effect, code at -O0 is more optimal. As a
minor negative side-effect, certain boundary cases yield no better
code than non-boundary cases. For example, atomic_set_acq_32(p, 0)
does a useless logical OR with value 0. This was previously elimina-
ted as part of if/while optimizations. Non-boundary cases yield
identical code at -O1 and -O2.


96921 19-May-2002 marcel

Add record definition for memory checks.


96912 19-May-2002 marcel

o Remove namespace pollution from param.h:
- Don't include ia64_cpu.h and cpu.h
- Guard definitions by _NO_NAMESPACE_POLLUTION
- Move definition of KERNBASE to vmparam.h

o Move definitions of IA64_RR_{BASE|MASK} to vmparam.h
o Move definitions of IA64_PHYS_TO_RR{6|7} to vmparam.h

o While here, remove some left-over Alpha references.


96907 19-May-2002 marcel

o Move prototypes for restorectx and savectx from cpu.h to pcb.h,
o Remove Alpha specific contents of struct md_coredump.


96606 14-May-2002 phk

Move MI stuff out of MD param.h files.

It can all still be overridden in the MD files should need suddenly arise.


96604 14-May-2002 phk

Remove the unused definitions of ctod() and dotc().


96495 13-May-2002 marcel

s/_ALPHA_/_MACHINE_/


96494 13-May-2002 marcel

Remove reference to the "Alpha Calling Standard".


96442 12-May-2002 marcel

o Rename ia64_count_aps to ia64_count_cpus and reimplement the
function to return the total number of CPUs and not the highest
CPU id.
o Define mp_maxid based on the minimum of the actual number of
CPUs in the system and MAXCPU.
o In cpu_mp_add, when the CPU id of the CPU we're trying to add
is larger than mp_maxid, don't add the CPU. Formerly this was
based on MAXCPU. Don't count CPUs when we add them. We already
know how many CPUs exist.
o Replace MAXCPU with mp_maxid when used in loops that iterate
over the id space. This avoids a couple of useless iterations.
o In cpu_mp_unleash, use the number of CPUs to determine if we
need to launch the CPUs.
o Remove mp_hardware as it's not used anymore.
o Move the IPI vector array from mp_machdep.c to sal.c. We use
the array as a centralized place to collect vector assignments.
Note that we still assign vectors to SMP specific IPIs in
non-SMP configurations. Rename the array from mp_ipi_vector to
ipi_vector.
o Add IPI_MCA_RENDEZ and IPI_MCA_CMCV. These are used by MCA.
Note that IPI_MCA_CMCV is not SMP specific.
o Initialize the ipi_vector array so that we place the IPIs in
sensible priority classes. The classes are relative to where
the AP wake-up vector is located to guarantee that it's the
highest priority (external) interrupt. Class assignment is
as follows:
class IPI notes
x AP wake-up (normally x=15)
x-1 MCA rendezvous
x-2 AST, Rendezvous, stop
x-3 CMCV, test


96336 10-May-2002 marcel

Add missing #endif


96318 10-May-2002 obrien

Gcc 3.1 varargs support.


96146 07-May-2002 marcel

o Add ar.lc to the pcb.
o Create pcb_save as the backend for savectx and cpu_switch.
o While here, use explicit bundling for pcb_save and optimize
for compactness (~87% density).

o Not part of the commit is a backend pcb_restore. restorectx()
still jumps halfway into cpu_switch().


96062 05-May-2002 marcel

o Add struct mca_guid
o Add currently known GUIDs
o Slight restyling


96058 05-May-2002 marcel

o Move definition of struct ia64_fdesc here to remove duplication.
o Add prototype of os_boot_rendez.


95929 02-May-2002 dfr

The width of segsz_t should be 64, not 32 on ia64.


95768 30-Apr-2002 marcel

Add ar.lc and ar.ec to the trapframe. These are not saved for syscalls,
only for exceptions.

While adding this to exception_save and exception_restore, it was hard
to find a good place to put the instructions. The code sequence was
sufficiently arbitrarily ordered that the density was low (roughly 67%).
No explicit bundling was used.
Thus, I rewrote the functions to optimize for density (close to 80% now),
and added explicit bundles and nop instructions. The immediate operand
on the nop instruction has been incremented with each instance, to make
debugging a bit easier when looking at recurring patterns. Redundant
stops have been removed as much as possible. Future optimizations can
focus more on performance. A well-placed lfetch can make all the
difference here!

Also, the FRAME_Fxx defines in frame.h were mostly bogus. FRAME_F10 to
FRAME_F15 were copied from FRAME_F9 and still had the same index. We
don't use them yet, so nothing was broken.


95710 29-Apr-2002 peter

Tidy up some loose ends.
i386/ia64/alpha - catch up to sparc64/ppc:
- replace pmap_kernel() with refs to kernel_pmap
- change kernel_pmap pointer to (&kernel_pmap_store)
(this is a speedup since ld can set these at compile/link time)
all platforms (as suggested by jake):
- gc unused pmap_reference
- gc unused pmap_destroy
- gc unused struct pmap.pm_count
(we never used pm_count - we track address space sharing at the vmspace)


95515 26-Apr-2002 marcel

Machine Check Architecture (MCA) structures and constants.


95244 22-Apr-2002 marcel

Add state information types.


94494 12-Apr-2002 marcel

Fix definition of va_start: We don't need to take the address of
va_list. It's a builtin type. gcc 3.1 doesn't care either way,
but gcc 3.2 is more picky and doesn't like the former.


94375 10-Apr-2002 dfr

Add ucode values for SIGFPE etc. Copied from i386/include/signal.h.


94374 10-Apr-2002 dfr

Add fields for saving/restoring the IA-32 state.


94373 10-Apr-2002 dfr

Add definitions for IA-32 exceptions, interrupts and intercepts.


94363 10-Apr-2002 mike

Remove the hack for segsz_t from <sys/types.h>; use the normal
_BSD_FOO_T_ method for defining segsz_t.


94362 10-Apr-2002 mike

Add manifest constants: _LITTLE_ENDIAN, _BIG_ENDIAN, _PDP_ENDIAN, and
_BYTE_ORDER. These are far more useful than their non-underscored
equivalents as these can be used in restricted namespace environments.
Mark the non-underscored variants as deprecated.


94271 09-Apr-2002 dfr

Define a complete set of accessors for application and control registers.


93961 06-Apr-2002 dfr

Merge fixes for dbtob() and btodb() from alpha/include/param.h. This stops
ffs_snapshot() from using negative numbers for byte offsets in large file
systems.


93757 04-Apr-2002 marcel

o Add architecture specific segment types.
o Add architecture specific segment attributes.


93607 01-Apr-2002 dillon

Stage-2 commit of the critical*() code. This re-inlines cpu_critical_enter()
and cpu_critical_exit() and moves associated critical prototypes into their
own header file, <arch>/<arch>/critical.h, which is only included by the
three MI source files that need it.

Backout and re-apply improperly comitted syntactical cleanups made to files
that were still under active development. Backout improperly comitted program
structure changes that moved localized declarations to the top of two
procedures. Partially re-apply one of the program structure changes to
move 'mask' into an intermediate block rather then in three separate
sub-blocks to make the code more readable. Re-integrate bug fixes that Jake
made to the sparc64 code.

Note: In general, developers should not gratuitously move declarations out
of sub-blocks. They are where they are for reasons of structure, grouping,
readability, compiler-localizability, and to avoid developer-introduced bugs
similar to several found in recent years in the VFS and VM code.

Reviewed by: jake


93264 27-Mar-2002 dillon

Compromise for critical*()/cpu_critical*() recommit. Cleanup the interrupt
disablement assumptions in kern_fork.c by adding another API call,
cpu_critical_fork_exit(). Cleanup the td_savecrit field by moving it
from MI to MD. Temporarily move cpu_critical*() from <arch>/include/cpufunc.h
to <arch>/<arch>/critical.c (stage-2 will clean this up).

Implement interrupt deferral for i386 that allows interrupts to remain
enabled inside critical sections. This also fixes an IPI interlock bug,
and requires uses of icu_lock to be enclosed in a true interrupt disablement.

This is the stage-1 commit. Stage-2 will occur after stage-1 has stabilized,
and will move cpu_critical*() into its own header file(s) + other things.
This commit may break non-i386 architectures in trivial ways. This should
be temporary.

Reviewed by: core
Approved by: core


93256 27-Mar-2002 marcel

o Revert previous commit in asm.h. There's no need to undefine
__FBSDID first, because it should not be defined at all,
o Remove inclusion of cdefs.h in locore.s.

Pointed out by: peter


93192 26-Mar-2002 obrien

Get the guarding right. The IA-64 has a different organization for this
than our other platforms.


93092 24-Mar-2002 obrien

Guard against redefining __gnuc_va_list.


93088 24-Mar-2002 marcel

Undefine __FBSDID before defining it as it's already defined at
that point.


92998 23-Mar-2002 obrien

ASM versions of __FBSDID.


92870 21-Mar-2002 dfr

Change critical_t to register_t for intr_disable/restore.


92869 21-Mar-2002 dfr

Change cpu_critical_enter/exit to intr_disable/restore.


92843 20-Mar-2002 alfred

Remove __P.

Reviewd by: peter


92805 20-Mar-2002 dfr

Change intr_enable to intr_restore for consistency with sparc64.


92781 20-Mar-2002 dfr

Recreate intr_disable/intr_enable and implement cpu_critical_enter/exit
in terms of that (for now).


92675 19-Mar-2002 peter

Move a couple of prototypes together instead of being incompletely
scattered around.


92670 19-Mar-2002 peter

Believe it or not, I ran into the 32MB stack size limit using a natively
hosted gcc.


92383 16-Mar-2002 des

Move the definition of PT_[GS]ET{,DB,FP}REGS from the MD ptrace.h to the
MI ptrace.h, since all platforms define them. Keep the MD ptrace.h around
for FIX_SSTEP (which is currently only needed on Alpha).


92280 14-Mar-2002 dfr

Add a field to hold the current pmap of a thread.


92271 14-Mar-2002 dfr

Add ia64_sync_i(), ia64_get_tpr() and ia64_set_tpr().


92267 14-Mar-2002 dfr

Add debug code to print SAPIC registers.


92124 12-Mar-2002 peter

Fix some -Wunused warnings by "using" a macro argument


91959 09-Mar-2002 mike

o Don't require long long support in bswap64() functions.
o In i386's <machine/endian.h>, macros have some advantages over
inlines, so change some inlines to macros.
o In i386's <machine/endian.h>, ungarbage collect word_swap_int()
(previously __uint16_swap_uint32), it has some uses on i386's with
PDP endianness.

Submitted by: bde

o Move a comment up in <machine/endian.h> that was accidentially moved
down a few revisions ago.
o Reenable userland's use of optimized inline-asm versions of
byteorder(3) functions.
o Fix ordering of prototypes vs. redefinition of byteorder(3)
functions, so that the non-GCC (libc asm) case has proper
prototypes.
o Add proper prototypes for byteorder(3) functions in <sys/param.h>.
o Prevent redundant duplicate prototypes by making use of the
_BYTEORDER_PROTOTYPED define.
o Move the bswap16(), bswap32(), bswap64() C functions into MD space
for platforms in which asm versions don't exist. This significantly
reduces the complexity of some things at the cost of duplicate code.

Reviewed by: bde


91394 27-Feb-2002 tmm

Add the following functions/macros to support byte order conversions and
device drivers for bus system with other endinesses than the CPU (using
interfaces compatible to NetBSD):

- bwap16() and bswap32(). These have optimized implementations on some
architectures; for those that don't, there exist generic implementations.
- macros to convert from a certain byte order to host byte order and vice
versa, using a naming scheme like le16toh(), htole16().
These are implemented using the bswap functions.
- stream bus space access functions, which do not perform a byte order
conversion (while the normal access functions would if the bus endianess
differs from the CPU endianess).

htons(), htonl(), ntohs() and ntohl() are implemented using the new
functions above for kernel usage. None of the above interfaces is currently
exported to user land.

Make use of the new functions in a few places where local implementations
of the same functionality existed.

Reviewed by: mike, bde
Tested on alpha by: mike


90885 19-Feb-2002 mike

Add C++ support.


90868 18-Feb-2002 mike

o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
complexities associated with having MD (asm and inline) versions, and
having to prevent exposure of these functions in other headers that
happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on: alpha, i386
Reviewed by: bde, jake, tmm


90711 15-Feb-2002 wollman

Resurrect one of the easiest changes from my big include files roll-up
patch from a year ago: give file flags their own type. This does not
(yet) change the type used by system calls or library functions.
The underlying type was chosen to match what is returned by stat().


89491 18-Jan-2002 marcel

Declare ddb_regs as extern to avoid creating a tentative definition.
This fixes the link failure caused by disabling the emission of
common symbols.


88691 30-Dec-2001 marcel

Cleanup the IPIs.


88690 30-Dec-2001 marcel

Remove unused MD fields (pc_pending_ipis, pc_next_asn and
pc_current_asngen) and add SMP specific fields (pc_pcb,
pc_lid and pc_awake).


88441 23-Dec-2001 dfr

Fix CRITICAL_FORK so that it compiles.


88088 18-Dec-2001 jhb

Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
count and a per-thread critical section saved state set when entering
a critical section while at nesting level 0 and restored when exiting
to nesting level 0. This moves the saved state out of spin mutexes so
that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
cpu_critical_enter/exit. MI code such as device drivers and spin
mutexes use the MI wrappers. Note that since the MI wrappers store
the state in the current thread, they do not have any return values or
arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
assigned to curthread->td_savecrit during fork_exit().

Tested on: i386, alpha


87702 11-Dec-2001 jhb

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


87572 09-Dec-2001 obrien

style(9)


87454 06-Dec-2001 jhb

Add multiple inclusion protection.


87158 01-Dec-2001 mike

o Stop abusing MD headers with non-MD types.
o Hide nonstandard functions and types in <netinet/in.h> when
_POSIX_SOURCE is defined.
o Add some missing types (required by POSIX.1-200x) to <netinet/in.h>.
o Restore vendor ID from Rev 1.1 in <netinet/in.h> and make use of new
__FBSDID() macro.
o Fix some miscellaneous issues in <arpa/inet.h>.
o Correct final argument for the inet_ntop() function (POSIX.1-200x).
o Get rid of the namespace pollution from <sys/types.h> in
<arpa/inet.h>.

Reviewed by: fenner
Partially submitted by: bde


86592 19-Nov-2001 peter

Initial cut at calling the EFI-provided FPSWA (Floating Point Software
Assist) driver to handle the "messy" floating point cases which
cause traps to the kernel for handling.


86587 19-Nov-2001 peter

Use some (now) spare space for passing through a pointer to the FPSWA
Interface provided by EFI (Floating Point SoftWare Assist).


86586 19-Nov-2001 peter

Remove bootinfo.bi_kernel. It isn't used by the kernel. struct bootinfo
should go away on ia64, we should be loader metadata based since that is
the only way we can boot (loader, skiload).


86213 09-Nov-2001 dfr

* Make sure we increment pm_stats.resident_count in pmap_enter_quick
* Re-organise RID allocation so that we don't accidentally give a RID
to two different processes. Also randomise the order to try to reduce
collisions in VHPT and TLB. Don't allocate RIDs for regions which are
unused.
* Allocate space for VHPT based on the size of physical memory. More
tuning is needed here.
* Add sysctl instrumentation for VHPT - see sysctl vm.stats.vhpt
* Fix a bug in pmap_prefault() which prevented it from actually adding
pages to the pmap.
* Remove ancient dead debugging code.
* Add DDB commands for examining translation registers and region
registers.

The first change fixes the 'free/cache page %p was dirty' panic which I
have been seeing when the system is put under moderate load. It also
fixes the negative RSS values in ps which have been confusing me for a
while.

With this set of changes the ia64 port is reliable enough to build its
own kernels, even with a 20-way parallel build. Next stop buildworld.


86209 09-Nov-2001 dfr

Define PS and VE fields of region register correctly.


85973 03-Nov-2001 dfr

Implement <machine/ieeefp.h>


85892 02-Nov-2001 mike

o Add new header <sys/stdint.h>.
o Make <stdint.h> a symbolic link to <sys/stdint.h>.
o Move most of <sys/inttypes.h> into <sys/stdint.h>, as per C99.
o Remove <sys/inttypes.h>.
o Adjust includes in sys/types.h and boot/efi/include/ia64/efibind.h
to reflect new location of integer types in <sys/stdint.h>.
o Remove previously symbolicly linked <inttypes.h>, instead create a
new file.
o Add MD headers <machine/_inttypes.h> from NetBSD.
o Include <sys/stdint.h> in <inttypes.h>, as required by C99; and
include <machine/_inttypes.h> in <inttypes.h>, to fill in the
remaining requirements for <inttypes.h>.
o Add additional integer types in <machine/ansi.h> and
<machine/limits.h> which are included via <sys/stdint.h>.

Partially obtain from: NetBSD
Tested on: alpha, i386
Discussed on: freebsd-standards@bostonradio.org
Reviewed by: bde, fenner, obrien, wollman


85685 29-Oct-2001 dfr

* Factor out common code for manipulating the RSE backing store.
* Implement a fairly simplistic parser for unwinding stack frames.
* Use unwind records for DDB's 'trace' command. Also add support for
tracing past exceptions to the context which generated the exception.

The stack unwind code requires a toolchain based on binutils-2.11.2 or
later and gcc-3.0.1 or later.


85680 29-Oct-2001 peter

The size of the ELF hash table changed from 64 bits in the prototype
toolchains to 32 bits in 2.11.2.

Obtained from: dfr


85673 29-Oct-2001 marcel

Add an IPI used for testing proper operation of delivering IPIs.


85656 29-Oct-2001 marcel

o Do not parse the MADT as a side-effect in AcpiOsGetRootPointer,
do it as a side-effect of probing for MP hardware. This allows
us to scan for local SAPICs early (especially before MBUF
initialization).
o Fix the Local SAPIC structure so that matches the Local SAPIC
table entry. Now that the Local SAPIC info is the same as the
Local APIC info, stop dumping the Local APIC entries.
o For every Local SAPIC entry in the MADT that's not disabled,
let the SMP code know about it. They represent actual CPUs.
o Register the OS_BOOT_RENDEZ entry point and provide a (bogus)
implementation for the entry point.
o Provide a mapping for internal IPI numbers to ExtINT vectors.
o In a MP system, announce the CPUs and start them by sending
IPI_AP_WAKEUP to each of them. Not that it makes a difference
at this time :-)
o Miscellaneous style fixes and other adjustments.


85356 23-Oct-2001 dfr

Add data serialisations after ptc and mov to rr[] instructions.


85335 23-Oct-2001 mike

Remove funky right justification.

Pointed out by: bde


85294 21-Oct-2001 des

[partially forced commit due to pilot error in earlier commit attempt]

{set,fill}_{,fp,db}regs() fixup:

- Add dummy {set,fill}_dbregs() on architectures that don't have them.

- KSEfy the powerpc versions (struct proc -> struct thread).

- Some architectures had the prototypes in md_var.h, some in reg.h, and
some in both; for consistency, move them to reg.h on all platforms.

These functions aren't really MD (the implementation is MD, but the interface
is MI), so they should move to an MI header, but I haven't figured out which
one yet.

Run-tested on i386, build-tested on Alpha, untested on other platforms.


85282 21-Oct-2001 dfr

Add ia64_set_fpsr().


85269 21-Oct-2001 marcel

Add define for the PIB default address and include a reference to
the SDM.


85230 20-Oct-2001 dfr

Reserve space for signal state.


85210 20-Oct-2001 marcel

Make this compile under option SMP.


85189 19-Oct-2001 ru

Try two on the preprocessing logic.

Reviewed by: obrien


85183 19-Oct-2001 obrien

Blah, fix braino where ru had to remind me of proper preprocessor syntax.
Bad fingers, no cookie.


85142 19-Oct-2001 dfr

Rework pmap so that it separates the PTE structure from the pv_entry
structure. This makes it possible to pre-allocate PTEs for the kernel,
which is necessary for a reliable implementation of pmap_kenter(). This
also avoids wasting space (about 48 bytes per page) for kernel mappings
and user mappings of memory-mapped devices.

This also fixes a bug with the previous version where the implementation
required the pv_entry structure to be physically contiguous but did not
enforce this (the structure size was not a power of two). This meant
that the pv_entry free list was quickly corrupted as soon as the system
was even mildly loaded.


85109 18-Oct-2001 dfr

Shift the code which packs and unpacks instruction bundles out of DDB
since it is useful for various emulations duties (e.g. unaligned trap
handling).


85085 18-Oct-2001 obrien

Add support for "__gnuc_va_list". Some overly "smart" libraries assume
the existence of the __gnuc_va_list type[*] because our compiler is GCC.

[*] __gnuc_va_list is defined in the GCC ginclude/stdarg.h replacement
headerwhich we don't use.


84801 11-Oct-2001 dfr

Implement MCOUNT hook for assembler. Probably doesn't work right.


84800 11-Oct-2001 dfr

Implement mcount trampoline (untested).


84783 10-Oct-2001 ps

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


84751 10-Oct-2001 dfr

Add a definition for the ia64's special PLT_RESERVE entry in the _DYNAMIC
section.


84638 07-Oct-2001 dfr

Implement inline versions of ntohl etc.


84590 06-Oct-2001 dfr

Assume round-to-nearest mode for floating point.


84581 06-Oct-2001 marcel

o Change ia64_memory_address to explicitly take a u_int64_t
o Add memcpy_fromio, memcpy_io, memcpy_toio, memset_io,
memsetw and memsetw_io. I'm not sure this is the right
place for it, though.


84541 05-Oct-2001 dfr

Wire up most of the interrupt handling infrastructure. Not sure it works
right yet but its enough for the ATA probe to work. The SCSI probes which
follow are broken though.


84534 05-Oct-2001 dfr

Add ia64_get_lid().


84477 04-Oct-2001 dfr

Don't pretend the argument to clockattach is a device - it isn't.


84129 29-Sep-2001 dfr

Add a couple of arguments to ia64_init. I'll use them later to improve
the method of passing bootinfo from the loader.


84124 29-Sep-2001 dfr

Start hooking up devices.


84123 29-Sep-2001 dfr

Add pmap_unmapdev().


84122 29-Sep-2001 dfr

Fill out the firmware interfaces somewhat.


83905 24-Sep-2001 dfr

We need different call stubs for static and stacked calling conventions.


83900 24-Sep-2001 dfr

Factor out PTE and related definitions from pmap.h - they are useful in
the loader.


83856 23-Sep-2001 dfr

Add definitions of SAL System Table.


83836 22-Sep-2001 dfr

Add implementations of readx() and writex().


83835 22-Sep-2001 dfr

Add declaration of ia64_running_in_simulator().


83766 21-Sep-2001 dfr

Add ia64_fc().


83644 18-Sep-2001 jhb

Whitespace fixes.


83643 18-Sep-2001 jhb

- If we ever do the per-cpu KTR stuff, the index won't be volatile as it
will be private to each CPU.
- Re-style(9) the globaldata structures. There really needs to be a MI
struct pcpu that has a MD struct mdpcpu member at some point.


83622 18-Sep-2001 dfr

Add ia64_get_cpuid().


83511 15-Sep-2001 dfr

Implement inx() and outx() functions for accessing I/O ports.


83510 15-Sep-2001 dfr

Add ia64_mf_a() which executes an mf.a instruction.


83506 15-Sep-2001 dfr

Fill out some gaps in ia64 DDB support. This involves generalising DDB's
breakpoint handling slightly to cope with the fact that ia64 instructions
are not located on byte boundaries.


83407 13-Sep-2001 dfr

* Enable dynamically linked kernel. This involves adding a self-relocator
to locore to process the @fptr relocations in the dynamic executable.
* Don't initialise the timer until *after* we install the timecounter to
avoid a race between timecounter initialisation and hardclock.
* Tidy up bootinfo somewhat including adding sanity checks for when the
kernel is loaded without a recognisable bootinfo.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83301 10-Sep-2001 dfr

* Make a start on a realistic definition for bootinfo.
* Switch to proc0's stack and backing store before calling ia64_init
so that we don't rely on the loader's stack at all.
* Change kernel entry point name from locorestart to __start.


83198 07-Sep-2001 dfr

Add options to select between 4k, 8k and 16k page sizes on ia64. The
default is now 8k.


83156 06-Sep-2001 dfr

Add struct tags to avoid warnings in kernel code.


83047 05-Sep-2001 obrien

style(9) the structure definitions.


82867 03-Sep-2001 dfr

Add a working version of setjmp/longjmp.

Obtained from: Intel's EFI toolkit.


82530 30-Aug-2001 mike

o Remove some GCCisms in src/powerpc/include/endian.h.
o Unify <machine/endian.h>'s across all architectures.
o Make bswapXX() functions use a different spelling of u_int16_t and
friends to reduce namespace pollution. The bswapXX() functions
don't actually exist, but we'll probably import these at some
point. Atleast one driver (if_de) depends on bswapXX() for big
endian cases.
o Deprecate byteorder(3) prototypes from <sys/types.h>, these are
now prototyped indirectly in <arpa/inet.h>.
o Deprecate in_addr_t and in_port_t typedefs in <sys/types.h>, these
are now typedef'd in <arpa/inet.h>.
o Change byteorder(3) prototypes to use standards compliant uint32_t
(spelled __uint32_t to reduce namespace pollution).
o Document new preferred headers and standards compliance.

Discussed with: bde
PR: 29946
Reviewed by: bmilekic


82313 25-Aug-2001 peter

vm_page_zero_idle() is no longer MD.


82107 21-Aug-2001 peter

Strip out some #if's for old implementations of global data pointers.


81763 16-Aug-2001 obrien

style(9) and make consistent across platforms


81727 15-Aug-2001 ache

OFF_T -> OFF (more standard style)


81720 15-Aug-2001 ache

Add OFF_T_MAX/OFF_T_MIN


81675 15-Aug-2001 obrien

Style changes to commonize the various platforms.


81493 10-Aug-2001 jhb

- Close races with signals and other AST's being triggered while we are in
the process of exiting the kernel. The ast() function now loops as long
as the PS_ASTPENDING or PS_NEEDRESCHED flags are set. It returns with
preemption disabled so that any further AST's that arrive via an
interrupt will be delayed until the low-level MD code returns to user
mode.
- Use u_int's to store the tick counts for profiling purposes so that we
do not need sched_lock just to read p_sticks. This also closes a
problem where the call to addupc_task() could screw up the arithmetic
due to non-atomic reads of p_sticks.
- Axe need_proftick(), aston(), astoff(), astpending(), need_resched(),
clear_resched(), and resched_wanted() in favor of direct bit operations
on p_sflag.
- Fix up locking with sched_lock some. In addupc_intr(), use sched_lock
to ensure pr_addr and pr_ticks are updated atomically with setting
PS_OWEUPC. In ast() we clear pr_ticks atomically with clearing
PS_OWEUPC. We also do not grab the lock just to test a flag.
- Simplify the handling of Giant in ast() slightly.

Reviewed by: bde (mostly)


81265 08-Aug-2001 peter

Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they
are a really nasty interface that should have been killed long ago
when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they
operate on (struct user) will not be around much longer since it
is part-per-process and part-per-thread in a post-KSE world.

gdb does not actually use this except for the obscure 'info udot'
command which does a hexdump of as much of the child's 'struct user'
as it can get. It carries its own #defines so it doesn't break
compiles.


80700 31-Jul-2001 jake

Use a machine dependent type, Elf_Hashelt, for the elements of the elf
dynamic symbol table buckets and chains. The sparc64 toolchain uses 32
bit .hash entries, unlike other 64 bits architectures (alpha), which use
64 bit entries.

Discussed with: dfr, jdp


78983 29-Jun-2001 jhb

Move ast() and userret() to sys/kern/subr_trap.c now that they are MI.


78962 29-Jun-2001 jhb

Add a new MI pointer to the process' trapframe p_frame instead of using
various differently named pointers buried under p_md.

Reviewed by: jake (in principle)


77931 09-Jun-2001 obrien

Fix style of defines.


77800 06-Jun-2001 joerg

Nuke the various poorly maintained copies of ioctl_fd.h. The file is
not machine-dependant, thus it has been moved out (repo-copied) into
<sys/fdcio.h>.


77626 02-Jun-2001 phk

Properly wrap mtx_intr_enable() macro in "do $bla while (0)"


77582 01-Jun-2001 tmm

Clean up the code exporting interrupt statistics via sysctl a bit:
- move the sysctl code to kern_intr.c
- do not use INTRCNT_COUNT, but rather eintrcnt - intrcnt to determine
the length of the intrcnt array
- move the declarations of intrnames, eintrnames, intrcnt and eintrcnt
from machine-dependent include files to sys/interrupt.h
- remove the hw.nintr sysctl, it is not needed.
- fix various style bugs

Requested by: bde
Reviewed by: bde (some time ago)


76784 18-May-2001 obrien

Style changes -- revert ordering to mostly two revs ago.
Embellish some comments, fix tab'ing.

Requested by: bde


76701 16-May-2001 obrien

Consistently define the rune types.
Follow NetBSD's lead and add a _BSD_MBSTATE_T_ type.


76700 16-May-2001 obrien

Move the int typedefs to the top so they can be used in defining other types.
Ensure every platform has __offsetof.
Make multiple inclusion detection consistent with other
<platform>/include/*.h files.


76651 15-May-2001 jhb

"Sir, the deorbit burn completed succesfully."

RIP {sys/machine}/ipl.h.


76078 27-Apr-2001 jhb

Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
into hardclock_process() and statclock_process() respectively. hardclock()
and statclock() call the *_process() functions for the current process so
that UP systems will run as before. For SMP systems, it is simply necessary
to ensure that all other processors execute the *_process() functions when the
main clock functions are triggered on one CPU by an interrupt. For the alpha
4100, clock interrupts are delievered in a staggered broadcast fashion, so
we simply call hardclock/statclock on the boot CPU and call the *_process()
functions on the secondaries. For x86, we call statclock and hardclock as
usual and then call forward_hardclock/statclock in the MD code to send an IPI
to cause the AP's to execute forwared_hardclock/statclock which then call the
*_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
involve less hackery. Now the cpu doing the forward sets any flags, etc. and
sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically
return so that they can execute ast() and don't bother with setting the
astpending or needresched flags themselves. This also removes the loop in
forward_signal() as sched_lock closes the race condition that the loop worked
around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
a process to act on rather than assuming curproc so that they can be used to
implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
header sys/smp.h declares MI SMP variables and API's. The IPI API's from
machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
SLIST of globaldata structures has become MI and moved into subr_smp.c.
Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by: jake, peter
Looked over by: eivind


75666 18-Apr-2001 dfr

Implement a simple stack trace for DDB. This will have to be redone
if/when we change to a more modern toolchain.


75421 11-Apr-2001 jhb

Rename the IPI API from smp_ipi_* to ipi_* since the smp_ prefix is just
"redundant noise" and to match the IPI constant namespace (IPI_*).

Requested by: bde


74912 28-Mar-2001 jhb

Rework the witness code to work with sx locks as well as mutexes.
- Introduce lock classes and lock objects. Each lock class specifies a
name and set of flags (or properties) shared by all locks of a given
type. Currently there are three lock classes: spin mutexes, sleep
mutexes, and sx locks. A lock object specifies properties of an
additional lock along with a lock name and all of the extra stuff needed
to make witness work with a given lock. This abstract lock stuff is
defined in sys/lock.h. The lockmgr constants, types, and prototypes have
been moved to sys/lockmgr.h. For temporary backwards compatability,
sys/lock.h includes sys/lockmgr.h.
- Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin
locks held. By making this per-cpu, we do not have to jump through
magic hoops to deal with sched_lock changing ownership during context
switches.
- Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with
proc->p_sleeplocks, which is a list of held sleep locks including sleep
mutexes and sx locks.
- Add helper macros for logging lock events via the KTR_LOCK KTR logging
level so that the log messages are consistent.
- Add some new flags that can be passed to mtx_init():
- MTX_NOWITNESS - specifies that this lock should be ignored by witness.
This is used for the mutex that blocks a sx lock for example.
- MTX_QUIET - this is not new, but you can pass this to mtx_init() now
and no events will be logged for this lock, so that one doesn't have
to change all the individual mtx_lock/unlock() operations.
- All lock objects maintain an initialized flag. Use this flag to export
a mtx_initialized() macro that can be safely called from drivers. Also,
we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness
performs the corresponding checks using the initialized flag.
- The lock order reversal messages have been improved to output slightly
more accurate file and line numbers.


74900 28-Mar-2001 jhb

- Switch from using save/disable/restore_intr to using critical_enter/exit
and change the u_int mtx_saveintr member of struct mtx to a critical_t
mtx_savecrit.
- On the alpha we no longer need a custom _get_spin_lock() macro to avoid
an extra PAL call, so remove it.
- Partially fix using mutexes with WITNESS in modules. Change all the
_mtx_{un,}lock_{spin,}_flags() macros to accept explicit file and line
parameters and rename them to use a prefix of two underscores. Inside
of kern_mutex.c, generate wrapper functions for
_mtx_{un,}lock_{spin,}_flags() (only using a prefix of one underscore)
that are called from modules. The macros mtx_{un,}lock_{spin,}_flags()
are mapped to the __mtx_* macros inside of the kernel to inline the
usual case of mutex operations and map to the internal _mtx_* functions
in the module case so that modules will use WITNESS and KTR logging if
the kernel is compiled with support for it.


74897 28-Mar-2001 jhb

- Add the new critical_t type used to save state inside of critical
sections.
- Add implementations of the critical_enter() and critical_exit() functions
and remove restore_intr() and save_intr().
- Remove the somewhat bogus disable_intr() and enable_intr() functions on
the alpha as the alpha actually uses a priority level and not simple bit
flag on the CPU.


74746 24-Mar-2001 ume

Unbreak build on alpha.
- Move in_port_t to sys/types.h.
- Nuke in_addr_t from each endian.h.

Reported by: jhb


74733 24-Mar-2001 jhb

- Define and use MAXCPU like the alpha and i386 instead of NCPUS.
- Sort the sys/mutex.h include in mp_machdep.c into a closer to correct
location.


74732 24-Mar-2001 jhb

Stick a prototype for handleclock() in machine/clock.h and include it
interrupt.c to quiet a warning.


74670 23-Mar-2001 tmm

Export intrnames and intrcnt as sysctls (hw.nintr, hw.intrnames and
hw.intrcnt).

Approved by: rwatson


74384 17-Mar-2001 peter

Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash).
Make the name cache hash as well as the nfsnode hash use it.

As a special tweak, create an unsigned version of register_t. This allows
us to use a special tweak for the 64 bit versions that significantly
speeds up the i386 version (ie: int64 XOR int64 is slower than int64
XOR int32).

The code layout is a little strange for the string function, but I was
able to get between 5 to 10% improvement over the original version I
started with. The layout affects gcc code generation choices and this way
was fastest on x86 and alpha.

Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is
around 45% faster with -march=pentiumpro on a p6 cpu.


74016 09-Mar-2001 jhb

Fix mtx_legal2block. The only time that it is bad to block on a mutex is
if we hold a spin mutex, since we can trivially get into deadlocks if we
start switching out of processes that hold spinlocks. Checking to see if
interrupts were disabled was a sort of cheap way of doing this since most
of the time interrupts were only disabled when holding a spin lock. At
least on the i386. To fix this properly, use a per-process counter
p_spinlocks that counts the number of spin locks currently held, and
instead of checking to see if interrupts are disabled in the witness code,
check to see if we hold any spin locks. Since child processes always
start up with the sched lock magically held in fork_exit(), we initialize
p_spinlocks to 1 for child processes. Note that proc0 doesn't go through
fork_exit(), so it starts with no spin locks held.

Consulting from: cp


73555 04-Mar-2001 dfr

Fix a couple of typos which became obvious when I started to actually use
this on real hardware.


72988 24-Feb-2001 jhb

Clockframes have a trapframe stored in a cf_tf member, not ct_tf.


72985 24-Feb-2001 jhb

Whitespace nits.


72984 24-Feb-2001 jhb

Pass in process to mark ast on to aston().


72907 22-Feb-2001 jhb

Axe pcb_schednest as it is no longer used.


72906 22-Feb-2001 jhb

Rename switch_trampoline() to fork_trampoline() on the alpha and ia64.

Suggested by: dfr


72896 22-Feb-2001 jhb

- Axe the now unused ASS_* assertions for interrupt status.
- Use ia64_get_psr() instead of save_intr() in mtx_legal2block().


72895 22-Feb-2001 jhb

Add a inline function to read the psr.


72894 22-Feb-2001 jhb

Add a mtx_intr_enable() macro.


72893 22-Feb-2001 jhb

Axe the astpending per-cpu variable.


72892 22-Feb-2001 jhb

Add TRAPF_PC() and TRAPF_USERMODE() macros and redefine CLKF_PC() and
CLKF_USERMODE() in terms of them.


72884 22-Feb-2001 jhb

Catch up to new MI astpending and need_resched handling.


72569 17-Feb-2001 ume

Correct disordering which is corresponding to bde's fix to
i386/include/ansi.h.


72510 15-Feb-2001 ume

Correct 2nd argument of getnameinfo(3) to socklen_t.

Reviewed by: itojun


72358 11-Feb-2001 markm

RIP <machine/lock.h>.

Some things needed bits of <i386/include/lock.h> - cy.c now has its
own (only) copy of the COM_(UN)LOCK() macros, and IMASK_(UN)LOCK()
has been moved to <i386/include/apic.h> (AKA <machine/apic.h>).
Reviewed by: jhb


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


71803 29-Jan-2001 dfr

Flesh out EFI support somewhat.


71731 28-Jan-2001 marcel

Add gd_witness_spin_check.


71730 28-Jan-2001 marcel

Fix typo.


71596 24-Jan-2001 dfr

Change cpuno to cpuid.


71576 24-Jan-2001 jasone

Convert all simplelocks to mutexes and remove the simplelock implementations.


71554 24-Jan-2001 jhb

- Proc locking.
- P_OWEUPC -> PS_OWEUPC.


71337 21-Jan-2001 jake

Make intr_nesting_level per-process, rather than per-cpu. Setup
interrupt threads to run with it always >= 1, so that malloc can
detect M_WAITOK from "interrupt" context. This is also necessary
in order to context switch from sched_ithd() directly.

Reviewed By: peter


71103 16-Jan-2001 phk

These files have been on deathrow for a couple of months, no appeal.


70952 12-Jan-2001 jake

Remove unused per-cpu variables inside_intr and ss_eflags.


70928 11-Jan-2001 jake

- Remove compatibility macros for accessing per-cpu variables.
__FreeBSD_version 500015 can be used to detect their disappearance.
- Move the symbols for SMP_prvspace and lapic from globals.s to
locore.s.
- Remove globals.s with extreme prejudice.


70789 08-Jan-2001 obrien

Put VCS ids in a consistent place and form.


70786 08-Jan-2001 obrien

Remove seconds types we don't use that came in thru the NetBSD heiratage.


70723 06-Jan-2001 jake

Implement accessors for per-cpu variables which don't depend on the
symbols in globals.s.

PCPU_GET(name) returns the value of the per-cpu variable
PCPU_PTR(name) returns a pointer to the per-cpu variable
PCPU_SET(name, val) sets the value of the per-cpu variable

In general these are not yet used, compatibility macros remain.

Unifdef SMP struct globaldata, this makes variables such as cpuid
available for UP as well.

Rebuilding modules is probably a good idea, but I believe old
modules will still work, as most of the old infrastructure
remains.


70508 30-Dec-2000 dfr

Merge ALIGN changes from alpha code.


69586 05-Dec-2000 jake

Remove the last of the MD netisr code. It is now all MI. Remove
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.

Reviewed by: jhb


69379 30-Nov-2000 marcel

Don't use p->p_sigstk.ss_flags to keep state of whether the
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.

We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.

PR: 22286


69003 21-Nov-2000 markm

Add a consistent API to a feature that most modern CPUs have; a fast
counter register in-CPU.

This is to be used as a fast "timer", where linearity is more important
than time, and multiple lines in the linearity caused by multiple CPUs
in an SMP machine is not a problem.

This adds no code whatsoever to the FreeBSD kernel until it is actually
used, and then as a single-instruction inline routine (except for the
80386 and 80486 where it is some more inline code around nanotime(9).

Reviewed by: bde, kris, jhb


68974 20-Nov-2000 phk

Make programs which still #include <machine/{mouse,console}.h> fail
at compiletime, with an explanatory error message. Previously they
would only get a warning.

These files will be finally removed 2001-01-15


68520 09-Nov-2000 marcel

Make MINSIGSTKSZ machine dependent, and have the sigaltstack
syscall compare against a variable sv_minsigstksz in struct
sysentvec as to properly take the size of the machine- and
ABI dependent struct sigframe into account.

The SVR4 and iBCS2 modules continue to have a minsigstksz of
8192 to preserve behavior. The real values (if different) are
not known at this time. Other ABI modules use the real
values.

The native MINSIGSTKSZ is now defined as follows:

Arch MINSIGSTKSZ
---- -----------
alpha 4096
i386 2048
ia64 12288

Reviewed by: mjacob
Suggested by: bde


67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


67540 25-Oct-2000 jhb

Implement atomic_{set,clear,add,subtract}_{acq_,rel_,}_ptr()


67522 24-Oct-2000 dfr

* Various fixes to breakage introduced by the atomic and mutex reorgs.
* Fixes to the signal delivery code. Not quite right yet.

I would have preferred to wait until I have signal delivery actually
working but the current kernel in CVS doesn't build.


67488 24-Oct-2000 obrien

Adjust comments
Submitted by: bde

Add ISO C99's long long type limits.
Reviewed by: bde


67474 23-Oct-2000 mjacob

CURTHD now defines in globals.h


67403 20-Oct-2000 jhb

Define the mtx_legal2block() macro used in the witness code that managed
to get lost during the MI mutex conversion.

Reported by: Steve Kargl <sgk@troutmask.apl.washington.edu>


67352 20-Oct-2000 jhb

- Make the mutex code almost completely machine independent. This greatly
reducues the maintenance load for the mutex code. The only MD portions
of the mutex code are in machine/mutex.h now, which include the assembly
macros for handling mutexes as well as optionally overriding the mutex
micro-operations. For example, we use optimized micro-ops on the x86
platform #ifndef I386_CPU.
- Change the behavior of the SMP_DEBUG kernel option. In the new code,
mtx_assert() only depends on INVARIANTS, allowing other kernel developers
to have working mutex assertiions without having to include all of the
mutex debugging code. The SMP_DEBUG kernel option has been renamed to
MUTEX_DEBUG and now just controls extra mutex debugging code.
- Abolish the ugly mtx_f hack. Instead, we dynamically allocate
seperate mtx_debug structures on the fly in mtx_init, except for mutexes
that are initiated very early in the boot process. These mutexes
are declared using a special MUTEX_DECLARE() macro, and use a new
flag MTX_COLD when calling mtx_init. This is still somewhat hackish,
but it is less evil than the mtx_f filler struct, and the mtx struct is
now the same size with and without mutex debugging code.
- Add some micro-micro-operation macros for doing the actual atomic
operations on the mutex mtx_lock field to make it easier for other archs
to override/optimize mutex ops if needed. These new tiny ops also clean
up the code in some places by replacing long atomic operation function
calls that spanned 2-3 lines with a short 1-line macro call.
- Don't call mi_switch() from mtx_enter_hard() when we block while trying
to obtain a sleep mutex. Calling mi_switch() would bogusly release
Giant before switching to the next process. Instead, inline most of the
code from mi_switch() in the mtx_enter_hard() function. Note that when
we finally kill Giant we can back this out and go back to calling
mi_switch().


67351 20-Oct-2000 jhb

- Expand the set of atomic operations to optionally include memory barriers
in most of the atomic operations. Now for these operations, you can
use the normal atomic operation, you can use the operation with a read
barrier, or you can use the operation with a write barrier. The function
names follow the same semantics used in the ia64 instruction set. An
atomic operation with a read barrier has the extra suffix 'acq', due to
it having "acquire" semantics. An atomic operation with a write barrier
has the extra suffix 'rel'. These suffixes are inserted between the
name of the operation to perform and the typename. For example, the
atomic_add_int() function now has 3 variants:
- atomic_add_int() - this is the same as the previous function
- atomic_add_acq_int() - this function combines the add operation with a
read memory barrier
- atomic_add_rel_int() - this function combines the add operation with a
write memory barrier
- Add 'ptr' to the list of types that we can perform atomic operations
on. This allows one to do atomic operations on uintptr_t's. This is
useful in the mutex code, for example, because the actual mutex lock is
a pointer.
- Add two new operations for doing loads and stores with memory barriers.
The new load operations use a read barrier before the load, and the
new store operations use a write barrier after the load. For example,
atomic_load_acq_int() will atomically load an integer as well as
enforcing a read barrier.


67350 20-Oct-2000 jhb

Axe the barrier_{read,write,rw}() helper functions as this method of
doing memory barriers doesn't really scale well for the ia64. Also,
memory barriers are more a property of the CPU than bus space.

Requested by: dfr


67285 18-Oct-2000 jhb

Add in a simple API for memory barriers to machine/bus.h:
- barrier_read() enforces a memory read barrier
- barrier_write() enforces a memory write barrier
- barrier_rw() enforces a memory read/write barrier


67247 17-Oct-2000 ps

Implement write combining for crashdumps. This is useful when
write caching is disabled on both SCSI and IDE disks where large
memory dumps could take up to an hour to complete.

Taking an i386 scsi based system with 512MB of ram and timing (in
seconds) how long it took to complete a dump, the following results
were obtained:

Before: After:
WCE TIME WCE TIME
------------------ ------------------
1 141.820972 1 15.600111
0 797.265072 0 65.480465

Obtained from: Yahoo!
Reviewed by: peter


67199 16-Oct-2000 dfr

* Correct some of my misunderstandings about how best to switch to the
kernel backing store.
* Implement syscalls via break instructions.
* Fix backing store copying in cpu_fork() so that the child gets the right
register values.

This thing is actually starting to work now. This set of changes takes me
up to the second execve (the one which runs the first shell). Next stop
single-user mode :-).


67158 15-Oct-2000 phk

Move DELAY() from <machine/clock.h> to <sys/systm.h>


67032 12-Oct-2000 dfr

Implement a rudimentary interrupt handling system which should be good
enough for clock interrupts in SKI.


67020 12-Oct-2000 dfr

* Fix exception handling so that it actually works. We can now handle
exceptions from both kernel and user mode.
* Fix context switching so that we can switch back to a proc which we
switched away from (we were saving the state in the wrong place).
* Implement lazy switching of the high-fp state. This needs to be looked
at again for SMP to cope with the case of a process migrating from one
processor to another while it has the high-fp state.
* Make setregs() work properly. I still think this should be called
cpu_exec() or something.
* Various other minor fixes.

With this lot, we can execve() /sbin/init and we get all the way up to its
first syscall. At that point, we stop because syscall handling is not done
yet.


66937 10-Oct-2000 dfr

* Add rudimentary DDB support (no kgdb, no backtrace, no single step).
* Track recent changes to SWI code.
* Allocate RIDs for pmaps (untested).
* Implement assembler version of cpu_switch - its cleaner that way.


66860 09-Oct-2000 phk

Initiate deorbit burn sequence for <machine/mouse.h>.

Replace all in-tree uses with <sys/mouse.h> which repo-copied a few
moments ago from src/sys/i386/include/mouse.h by peter.
This is also the appropriate fix for exo-tree sources.

Put warnings in <machine/mouse.h> to discourage use.
November 15th 2000 the warnings will be converted to errors.
January 15th 2001 the <machine/mouse.h> files will be removed.


66834 08-Oct-2000 phk

Initiate deorbit burn sequence for <machine/console.h>.

Replace all in-tree uses with necessary subset of <sys/{fb,kb,cons}io.h>.
This is also the appropriate fix for exo-tree sources.

Put warnings in <machine/console.h> to discourage use.
November 15th 2000 the warnings will be converted to errors.
January 15th 2001 the <machine/console.h> files will be removed.


66802 08-Oct-2000 bmilekic

Cleanup comment in machine/param.h regarding mbuf-related sizes, and get rid
of MCLOFSET, which does not appear to be used anywhere anymore, and if it is,
it probably shouldn't be.


66739 06-Oct-2000 bde

Work around a bug by adding struct tags. gcc-2.95 apparently gets the
check in the [basic.link] section of the C++ standard wrong. gcc-2.7.2.3
apparently doesn't do the check, so the bug doesn't affect RELENG_3.

PR: 16170, 21427
Submitted by: Max Khon <fjoe@lark.websci.ru> (i386 version)
Discussed with: jdp


66721 06-Oct-2000 jasone

Reduce userland namespace polution.

#include <proc.h>, since curproc is needed.


66633 04-Oct-2000 dfr

Next round of fixes to the ia64 code. This includes simulated clock and
disk drivers along with a load of fixes to context switching, fork
handling and a load of other stuff I can't remember now. This takes us as
far as start_init() before it dies. I guess now I will have to finish off
the VM system and syscall handling :-).


66486 30-Sep-2000 dfr

Next round of ia64 work, including fixes to context switching,
implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these
changes, my test kernel reaches the mountroot prompt.


66462 29-Sep-2000 dfr

Bodge the simplelocks in a way which works UP.


66459 29-Sep-2000 dfr

Add a few more files for the ia64 port.


66458 29-Sep-2000 dfr

This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will
not work on any real hardware (or fully work on any simulator). Much more
needs to happen before this is actually functional but its nice to see
the FreeBSD copyright message appear in the ia64 simulator.