Cross Reference: /freebsd-10.0-release/sys/ia64/ia64/machdep.c

History log of /freebsd-10.0-release/sys/ia64/ia64/machdep.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 259065	07-Dec-2013	gjb	- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1 [1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.0-release /freebsd-10.0-release/sys/conf/newvers.sh /freebsd-10.0-release/sys/sys/param.h
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
# 247454	28-Feb-2013	davide	MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.
# 246715	12-Feb-2013	marcel	Eliminate the PC_CURTHREAD symbol and load the current thread's thread structure pointer atomically from r13 (the pcpu pointer) for the current CPU/core. Add a CTASSERT in machdep.c to make sure that pc_curthread is in fact the first field in struct pcpu. The only non-atomic operations left were those related to process- space operations, such as casuword, subyte, suword16, fubyte, fuword16, copyin, copyout and their variations. The casuword function has been re-structured more complete than the others. This way we have an example of a better bundling without introducing a lot of risk when we get it wrong. The other functions can be rebundled in separate commits and with the appropriate testing.
# 238257	08-Jul-2012	marcel	Move PCPU initialization to a new function called cpu_pcpu_setup(). This makes it easier to add additional CPU or platform information to the per-CPU structure without duplicated code.
# 238190	07-Jul-2012	marcel	Implement ia64_physmem_alloc() and use it consistently to get memory before VM has been initialized. This includes: 1. Replacing pmap_steal_memory(), 2. Replace the handcrafted logic to allocate a naturally aligned VHPT, 3. Properly allocate the DPCPU for the BSP. Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't cross into the next PBVM page. If we were to cross into the next page, then there wouldn't be a PTE entry on the page table for it and we would end up with a MCA following a page fault. As such, this commit fixes MCAs occasionally seen.
# 238184	06-Jul-2012	marcel	Hide the creation of phys_avail behind an API to make it easier to do it correctly. We now iterate the EFI memory descriptors once and collect all the information in a single pass. This includes: 1. The I/O port base address, 2. The PAL memory region. Have the physmem API track this. 3. Memory descriptors of memory we can't use, like bad memory, runtime services code & data, etc. Have the physmem API track these. 4. memory descriptors of memory we can use or re-use, such as free memory, boot time services code & data, loader code & data, etc. These are added by the physmem API. Since the PBVM page table and pages are in memory described as loader data, inform the physmem API of chunks that need to be delated from the available physical memory. While here, remove Maxmem and replace it with the better named paddr_max. Maxmem was defined as physmem, which is generally wrong. Now, paddr_max is properly defined as the largesty physical address. The upshot of all this is that: 1. We properly determine realmem. 2. We maximize physmem by re-using memory where possible. 3. We remove complexity from ia64_init() in machdep.c. 4. Remove confusion about realmem, physmem & Maxmem. The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c, as well as replace the handcrafted allocation of the VHPT for the BSP in pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation of phys_avail after it is being created.
# 232250	28-Feb-2012	gavin	Correct capitalization of "Hz" in user-visible text (manpages, printf(), etc). MFC after: 3 days
# 227309	07-Nov-2011	ed	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# 225617	16-Sep-2011	kmacy	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# 223526	25-Jun-2011	marcel	Switch to the event timers infrastructure. This includes: o Setting td_intr_frame to the XIVs trap frame because it's referenced by the ET event handler. o Signal EOI to the CPU before calling the registered XIV handlers. This prevents lost ITC interrupts, which cause starvation in one-shot mode. o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters. o Have the APs call cpu_initclocks() so as to limited the scattering of clock related initialization. cpu_initclocks() calls the <self>_bsp() or <self>_ap() version accordingly. o Uncomment the ET clock handling in cpu_idle(). o Update the DDB 'show pcpu' output for the new MD fields. o Entirely rewritten ia64_ih_clock(). Note that we don't create as many clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale. We can only have 240 XIVs and we can have more CPUs than that. There's a single intrcnt index for the cumulative clock ticks and we keep per CPU counts in the PCPU stats structure. o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order). Open issues: o Clock interrupts can still be lost. Some tweaking is still necessary. Thanks to: mav@ for his support, feedback and explanations. ET stats while committing: eris% sysctl machdep.cpu \| grep nclks machdep.cpu.0.nclks: 24007 machdep.cpu.1.nclks: 22895 machdep.cpu.2.nclks: 13523 machdep.cpu.3.nclks: 9342 machdep.cpu.4.nclks: 9103 machdep.cpu.5.nclks: 9298 machdep.cpu.6.nclks: 10039 machdep.cpu.7.nclks: 9479 eris% vmstat -i \| grep clock clock 108599 50
# 223478	23-Jun-2011	marcel	Unblock the outgoing thread after we performed pmap_switch() to switch the region registers. pmap_switch() returns the pmap for which the region register are currently programmed, which needs to be re-programmed on the CPU the ougoing thread gets switched in. This change does not noticibly change anything or fix known bugs, but does give me a warm fuzzy feeling by being more correct.
# 222971	11-Jun-2011	marcel	Add the model number for the Montvale processor (marketed as Itanium 2 9100). At this time we're missing just one: Tukwila (Itanium 2 9300).
# 222800	06-Jun-2011	marcel	Call set_cputicker() to have the time counter use the ITC register. Note that the ITC frequency is fixed.
# 222769	06-Jun-2011	marcel	Improve cpu_idle(): o cpu_idle_hook is expected to be called with interrupts disabled and re-enables interrupts on return. o sync with x86: don't idle when the CPU has runnable tasks o have callers of ia64_call_pal_static() disable interrupts and re-enable interrupts. o add, but compile-out, support for idle mode. This will be enabled at some later time, after proper testing.
# 222531	31-May-2011	nwhitehorn	On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb
# 221271	30-Apr-2011	marcel	Stop linking against a direct-mapped virtual address and instead use the PBVM. This eliminates the implied hardcoding of the physical address at which the kernel needs to be loaded. Using the PBVM makes it possible to load the kernel irrespective of the physical memory organization and allows us to replicate kernel text on NUMA machines. While here, reduce the direct-mapped page size to the kernel's page size so that we can support memory attributes better.
# 219841	21-Mar-2011	marcel	Fix switching to physical mode as part of calling into EFI runtime services or PAL procedures. The new implementation is based on specific functions that are known to be called in certain scenarios only. This in particular fixes the PAL call to obtain information about translation registers. In general, the new implementation does not bank on virtual addresses being direct-mapped and will work when the kernel uses PBVM. When new scenarios need to be supported, new functions are added if the existing functions cannot be changed to handle the new scenario. If a single generic implementation is possible, it will become clear in due time. While here, change bootinfo to a pointer type in anticipation of future development.
# 219741	18-Mar-2011	marcel	Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though it's not used anywhere in the source tree.
# 219523	11-Mar-2011	mdf	Mostly revert r219468, as I had misremembered the C standard regarding the size of an extern array. Keep one change from strncpy to strlcpy.
# 219468	10-Mar-2011	mdf	Use MAXPATHLEN rather than the size of an extern array when copying the kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh
# 217688	21-Jan-2011	pluknet	Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize. Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
# 215052	09-Nov-2010	jhb	Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.
# 214835	05-Nov-2010	jhb	Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month
# 209613	30-Jun-2010	jhb	Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
# 205660	25-Mar-2010	nwhitehorn	Fix the ia64 build. Pointy hat to: me
# 205642	25-Mar-2010	nwhitehorn	Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb
# 205357	20-Mar-2010	marcel	Don't check for boot_verbose in the environment. The loader does that already and sets RB_VERBOSE. The loader has always done it.
# 205234	16-Mar-2010	marcel	Revamp the interrupt code based on the previous commit: o Introduce XIV, eXternal Interrupt Vector, to differentiate from the interrupts vectors that are offsets in the IVT (Interrupt Vector Table). There's a vector for external interrupts, which are based on the XIVs. o Keep track of allocated and reserved XIVs so that we can assign XIVs without hardcoding anything. When XIVs are allocated, an interrupt handler and a class is specified for the XIV. Classes are: 1. architecture-defined: XIV 15 is returned when no external interrupt are pending, 2. platform-defined: SAL reports which XIV is used to wakeup an AP (typically 0xFF, but it's 0x12 for the Altix 350). 3. inter-processor interrupts: allocated for SMP support and non-redirectable. 4. device interrupts (i.e. IRQs): allocated when devices are discovered and are redirectable. o Rewrite the central interrupt handler to call the per-XIV interrupt handler and rename it to ia64_handle_intr(). Move the per-XIV handler implementation to the file where we have the XIV allocation/reservation. Clock interrupt handling is moved to clock.c. IPI handling is moved to mp_machdep.c. o Drop support for the Intel 8259A because it was broken. When XIV 0 is received, the CPU should initiate an INTA cycle to obtain the interrupt vector of the 8259-based interrupt. In these cases the interrupt controller we should be talking to WRT to masking on signalling EOI is the 8259 and not the I/O SAPIC. This requires adriver for the Intel 8259A which isn't available for ia64. Thus stop pretending to support ExtINTs and instead panic() so that if we come across hardware that has an Intel 8259A, so have something real to work with. o With XIVs for IPIs dynamically allocatedi and also based on priority, define the IPI_* symbols as variables rather than constants. The variable holds the XIV allocated for the IPI. o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV assigned to IPI_STOP is delivered.
# 205172	15-Mar-2010	marcel	Have cpu_throw() loop on blocked_lock as well. This bug has existed a long time and has gone unnoticed just as long, because I kept using sched_4bsd (due to sched_ule not working with preemption), but GENERIC had sched_ule by default -- including SMP. While here, remove unused inclusion of <machine/clock.h>, remove totally bogus inclusion of <i386/include/specialreg.h>.
# 205014	11-Mar-2010	nwhitehorn	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
# 203883	14-Feb-2010	marcel	Some code churn: o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used by calling either bus_space_map() or pmap_mapdev(). o Implement bus_space_map() in terms of pmap_mapdev() and implement bus_space_unmap() in terms of pmap_unmapdev(). o Have ia64_pib hold the uncached virtual address of the processor interrupt block throughout the kernel's life and access the elements of the PIB through this structure pointer. This is a non-functional change with the exception of using ia64_ld1() and ia64_st8() to write to the PIB. We were still using assignments, for which the compiler generates semaphore reads -- which cause undefined behaviour for uncacheable memory. Note also that the memory barriers in ipi_send() are critical for proper functioning. With all the mapping of uncached memory done by pmap_mapdev(), we can keep track of the translations and wire them in the CPU. This then eliminates the need to reserve a whole region for uncached I/O and it eliminates translation traps for device I/O accesses.
# 203054	27-Jan-2010	marcel	In cpu_switch(), use an atomic operation to set the td_lock of the old thread to the mutex that's passed. Pointed out by: attilio, jhb
# 202904	23-Jan-2010	marcel	Remove cpu_boot() and call efi_reset_system() directly from cpu_reset().
# 201269	30-Dec-2009	marcel	Revamp bus_space access functions: o Optimize for memory mapped I/O by making all I/O port acceses function calls and marking the test for the IA64_BUS_SPACE_IO tag with __predict_false(). Implement the I/O port access functions in a new file, called bus_machdep.c. o Change the bus_space_handle_t for memory mapped I/O to the virtual address rather than the physical address. This eliminates the PA->VA translation for every I/O access. The handle for I/O port access is still the port number. o Move inb(), outb(), inw(), outw(), inl(), outl(), and their string variants from cpufunc.h and define them in bus.h. On ia64 these are not CPU functions at all. In bus.h they are merely aliases for the new I/O port access functions defined in bus_machdep.h. o Handle the ACPI resource bug in nexus_set_resource(). There we can do it once so that we don't have to worry about it whenever we need to write to an I/O port that is really a memory mapped address. The upshot of this change is that the KBI is better defined and that I/O port access always involves a function call, allowing us to change the actual implementation without breaking the KBI. For memory mapped I/O the virtual address is abstracted, so that we can change the VA->PA mapping in the kernel without causing an KBI breakage. The exception at this time is for bus_space_map() and bus_space_unmap(). MFC after: 1 week.
# 200889	23-Dec-2009	marcel	Export the bus, cpu and itc frequencies under the hw.freq sysctl node. The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The frequencies are rounded to the nearest whole MHz. While here, rename and re-type bus_frequency, processor_frequency and itc_frequency to bus_freq, cpu_freq and itc_freq and make them static. As unsigned integers, the hw.freq.cpu sysctl can more easily be made generic (across all architectures) making porting easier. MFC after: 3 days
# 200207	07-Dec-2009	marcel	Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id excluded, as it's used by MI code) and mode the sysctl variables from pcpu_stats to pcpu_md. Adjust all references accordingly. While nearby, change the PCPU sysctl tree so that they match the CPU device sysctl tree -- they are now children of a static node called "machdep.cpu" and are named only with their cpu ID.
# 200051	03-Dec-2009	marcel	Make sure bus space accesses use unorder memory loads and stores. Memory accesses are posted in program order by virtue of the uncacheable memory attribute. Since GCC, by default, adds acquire and release semantics to volatile memory loads and stores, we need to use inline assembly to guarantee it. With inline assembly, we don't need volatile pointers anymore. Itanium does not support semaphore instructions to uncacheable memory.
# 199893	28-Nov-2009	marcel	Eliminate teh use of MAXCPU in static arrays of interrupt counters by adding statistics counters to the PCPU structure. Export the counters through sysctl by giving each PCPU structure its own sysctl context. While here, fix cnt.v_intr by not just having it count clock interrupts, but every interrupt and add more counters for each interrupt source.
# 198733	31-Oct-2009	marcel	Reimplement the lazy FP context switching: o Move all code into a single file for easier maintenance. o Use a single global lock to avoid having to handle either multiple locks or race conditions. o Make sure to disable the high FP registers after saving or dropping them. o use msleep() to wait for the other CPU to save the high FP registers. This change fixes the high FP inconsistency panics. A single global lock typically serializes too much, which may be noticable when a lot of threads use the high FP registers, but in that case it's probably better to switch the high FP context synchronuously. Put differently: cpu_switch() should switch the high FP registers if the incoming and outgoing threads both use the high FP registers.
# 198507	27-Oct-2009	kib	In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month
# 196268	15-Aug-2009	marcel	Decouple ACPI CPU Ids from FreeBSD's cpuid. The ACPI Ids can be sparse, which causes a kernel assert. Approved by: re (kensmith)
# 194784	23-Jun-2009	jeff	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
# 192324	18-May-2009	marcel	Rename ia64_invalidate_icache() to ia64_sync_icache(). We're not invalidating anything.
# 192323	18-May-2009	marcel	Add cpu_flush_dcache() for use after non-DMA based I/O so that a possible future I-cache coherency operation can succeed. On ARM for example the L1 cache can be (is) virtually mapped, which means that any I/O that uses temporary mappings will not see the I-cache made coherent. On ia64 a similar behaviour has been observed. By flushing the D-cache, execution of binaries backed by md(4) and/or NFS work reliably. For Book-E (powerpc), execution over NFS exhibits SIGILL once in a while as well, though cpu_flush_dcache() hasn't been implemented yet. Doing an explicit D-cache flush as part of the non-DMA based I/O read operation eliminates the need to do it as part of the I-cache coherency operation itself and as such avoids pessimizing the DMA-based I/O read operations for which D-cache are already flushed/invalidated. It also allows future optimizations whereby the bcopy() followed by the D-cache flush can be integrated in a single operation, which could be implemented using on-chips DMA engines, by-passing the D-cache altogether.
# 180354	07-Jul-2008	marcel	Add inline function ia64_fc_i() to abstract inline assembly. Use the new inline function in ia64_invalidate_icache(). While there, add proper synchronization so that we know the fc.i instructions have taken effect when we return.
# 179229	23-May-2008	alc	The VM system no longer uses setPQL2(). Remove it and its helpers.
# 179173	21-May-2008	marcel	We can call ia64_flush_dirty() when the corresponding process is locked or not. As such, use PROC_LOCKED() to determine which case it is and lock the process when not.
# 178494	25-Apr-2008	marcel	Unbreak previous commit. While here, refactor the code a bit.
# 178471	25-Apr-2008	jeff	- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted. Sponsored by: Nokia
# 178429	22-Apr-2008	phk	Now that all platforms use genclock, shuffle things around slightly for better structure. Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait. In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only. Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h> <sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof. Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references. Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere. Remove a lot of unnecessary <sys/clock.h> includes. Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.
# 178215	15-Apr-2008	marcel	Support and switch to the ULE scheduler: o Implement IPI_PREEMPT, o Set td_lock for the thread being switched out, o For ULE & SMP, loop while td_lock points to blocked_lock for the thread being switched in, o Enable ULE by default in GENERIC and SKI,
# 177769	30-Mar-2008	marcel	Better implement I-cache invalidation. The previous implementation was a kluge. This implementation matches the behaviour on powerpc and sparc64. While on the subject, make sure to invalidate the I-cache after loading a kernel module. MFC after: 2 weeks
# 177642	26-Mar-2008	phk	The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers. The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq() the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds. Drop entirely timer2 on pc98, it is not used anywhere at all. Move sysbeep() to kern/tty_cons.c and use the timer_spkr() if they exist, and do nothing otherwise. Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs. This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_() functions existing but the driver does not know this yet and still attaches to the ISA bus. Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine. In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important. Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz. This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].
# 177253	16-Mar-2008	rwatson	In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
# 177126	12-Mar-2008	jeff	- Fix build breakage; there was a reference to a removed syscall in a KASSERT(). Attempt to cleanup the comment to reflect reality.
# 177091	12-Mar-2008	jeff	Remove kernel support for M:N threading. While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
# 176286	14-Feb-2008	marcel	On Montecito processors, the instruction cache is in fact not coherent with the data caches. Implement a quick fix to allow us to boot on Montecito, while I'm working on a better fix in the mean time. Commit made on Montecito-based Itanium...
# 175959	04-Feb-2008	marcel	Allocate a stack for thread0 and switch to it before calling mi_startup(). This frees up kstack for static PAL/SAL calls and double-fault handling.
# 174898	25-Dec-2007	rwatson	Add a new 'why' argument to kdb_enter(), and a set of constants to use for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run. Assign approximate why values to all current consumers of the kdb_enter() interface.
# 173615	14-Nov-2007	marcel	o Rename cpu_thread_setup() to cpu_thread_alloc() to better communicate that it relates to (is called by) thread_alloc() o Add cpu_thread_free() which is called from thread_free() to counter-act cpu_thread_alloc(). i386: Have cpu_thread_free() call cpu_thread_clean() to preserve behaviour. ia64: Have cpu_thread_free() call mtx_destroy() for the mutex initialized in cpu_thread_alloc(). PR: ia64/118024
# 173361	05-Nov-2007	kib	Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
# 171720	04-Aug-2007	marcel	Replace "__asm __volatile()" by equivalent support functions from ia64_cpu.h. This improves readability and consistency and aids in auditing the code. Add data-serialization after writing to cr.tpr. Approved by: re (blanket)
# 171663	30-Jul-2007	marcel	Explicitly map the VHPT on all processors. Previously we were merely lucky that the VHPT was mapped as a side-effect of mapping the kernel, but when there's enough physical memory, this may not at all be the case. Approved by: re (blanket)
# 170519	10-Jun-2007	alc	Add the machine-specific definitions for configuring the new physical memory allocator. Set the size of phys_avail[] using one of these definitions. Approved by: re
# 170507	10-Jun-2007	marcel	Work around a firmware bug in the HP rx2660, where in ACPI an I/O port is really a memory mapped I/O address. The bug is in the GAS that describes the address and in particular the SpaceId field. The field should not say the address is an I/O port when it clearly is not. With an additional check for the IA64_BUS_SPACE_IO case in the bus access functions, and the fact that I/O ports pretty much not used in general on ia64, make the calculation of the I/O port address a function. This avoids inlining the work-around into every driver, and also helps reduce overall code bloat.
# 170444	08-Jun-2007	marcel	Physical memory regions can be larger than INT_MAX. Change size1 from an int to a long to avoid printing negative byte and page counts.
# 170403	07-Jun-2007	marcel	Remove remaining references to pc_curtid missed in previous commit.
# 170390	06-Jun-2007	davidxu	Fix compiling error.
# 170306	04-Jun-2007	jeff	Commit 13/14 of sched_lock decomposition. - Add a new parameter to cpu_switch() that is used to release the lock on the outgoing thread and properly acquire the lock on the incoming thread. This parameter is not required for schedulers that don't do per-cpu locking and architectures which do not support it may continue to use the 4BSD scheduler. This feature is presently not supported on ia64 Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
# 170170	31-May-2007	attilio	Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
# 169667	18-May-2007	jeff	- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
# 169291	05-May-2007	alc	Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194
# 165369	20-Dec-2006	davidxu	Add a lwpid field into per-cpu structure, the lwpid represents current running thread's id on each cpu. This allow us to add in-kernel adaptive spin for user level mutex. While spinning in user space is possible, without correct thread running state exported from kernel, it hardly can be implemented efficiently without wasting cpu cycles, however exporting thread running state unlikely will be implemented soon as it has to design and stablize interfaces. This implementation is transparent to user space, it can be disabled dynamically. With this change, mutex ping-pong program's performance is improved massively on SMP machine. performance of mysql super-smack select benchmark is increased about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems which have bunch of cpus and system-call overhead is low (athlon64, opteron, and core-2 are known to be fast), the adaptive spin does help performance. Added sysctls: kern.threads.umtx_dflt_spins if the sysctl value is non-zero, a zero umutex.m_spincount will cause the sysctl value to be used a spin cycle count. kern.threads.umtx_max_spins the sysctl sets upper limit of spin cycle count. Tested on: Athlon64 X2 3800+, Dual Xeon 5130
# 164936	06-Dec-2006	julian	Threading cleanup.. part 2 of several. Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it. Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable. The ULE scheduler compiles again but I have no idea if it works. The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit. Tested by David Xu, and Dan Eischen using libthr and libpthread.
# 164395	18-Nov-2006	marcel	Since printf also has at least one critical section, we need to initialize pc_curthread. While here, rename early_pcpu to pcpu0 to be conistent (compare thread0 and proc0).
# 164392	18-Nov-2006	marcel	Now that printf() needs the PCPU, set it up before we call printf(). Change the pc_pcb field from a pointer to struct pcb to struct pcb so that sizeof(struct pcb) includes the PCB we use for IPI_STOP. Statically declare early_pcb so that we don't have to allocate the PCB for thread0. This way we can setup the PCPU before cninit() and thus before we use printf().
# 163928	03-Nov-2006	marcel	Make sure kern_envp is never NULL. If we don't get a pointer to the environment from the loader, use the static environment.
# 163709	26-Oct-2006	jb	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@
# 159850	21-Jun-2006	marcel	Identify the cual-core Montecito. MFC after: 3 days
# 155922	22-Feb-2006	jhb	Close some races between procfs/ptrace and exit(2): - Reorder the events in exit(2) slightly so that we trigger the S_EXIT stop event earlier. After we have signalled that, we set P_WEXIT and then wait for any processes with a hold on the vmspace via PHOLD to release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops to zero. - Change proc_rwmem() to require that the processing read from has its vmspace held via PHOLD by the caller and get rid of all the junk to screw around with the vmspace reference count as we no longer need it. - In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it doesn't exist. - Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem() to clear an earlier single-step simualted via a breakpoint). We only do one to avoid races. Also, by making the EINVAL error for unknown requests be part of the default: case in the switch, the various switch cases can now just break out to return which removes a _lot_ of duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug where a LWP ptrace command could return EINVAL with the proc lock still held. - Changed the locking for ptrace_single_step(), ptrace_set_pc(), and ptrace_clear_single_step() to always be called with the proc lock held (it was a mixed bag previously). Alpha and arm have to drop the lock while the mess around with breakpoints, but other archs avoid extra lock release/acquires in ptrace(). I did have to fix a couple of other consumers in kern_kse and a few other places to hold the proc lock and PHOLD. Tested by: ps (1 mostly, but some bits of 2-4 as well) MFC after: 1 week
# 155680	14-Feb-2006	jhb	Fix the hw.realmem sysctl. The global realmem variable is a count of pages, not a count of bytes. The sysctl handler for hw.realmem already uses ctob() to convert realmem from pages to bytes. Thus, on archs that were storing a byte count in the realmem variable, hw.realmem was inflated. Reported by: Valerio daelli valerio dot daelli at gmail dot com (alpha) MFC after: 3 days
# 153940	31-Dec-2005	netchild	MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)
# 153165	06-Dec-2005	ru	Fix -Wundef warnings from compiling GENERIC and LINT kernels of all architectures.
# 151316	14-Oct-2005	davidxu	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
# 149915	09-Sep-2005	marcel	Change the High FP lock from a sleep lock to a spin lock. We can take the lock from interrupt context, which causes an implicit lock order reversal. We've been using the lock carefully enough that making it a spin lock should not be harmful.
# 148807	06-Aug-2005	marcel	Improve SMP support: o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU uses to look up translations it can't find in the TLB. As such, the VHPT serves as a level 1 cache (the TLB being a level 0 cache) and best results are obtained when it's not shared between CPUs. The collision chain (i.e. the hash bucket) is shared between CPUs, as all buckets together constitute our collection of PTEs. To achieve this, the collision chain does not point to the first PTE in the list anymore, but to a hash bucket head structure. The head structure contains the pointer to the first PTE in the list, as well as a mutex to lock the bucket. Thus, each bucket is locked independently of each other. With at least 1024 buckets in the VHPT, this provides for sufficiently finei-grained locking to make the ssolution scalable to large SMP machines. o Add synchronisation to the lazy FP context switching. We do this with a seperate per-thread lock. On SMP machines the lazy high FP context switching without synchronisation caused inconsistent state, which resulted in a panic. Since the use of the high FP registers is not common, it's possible that races exist. The ia64 package build has proven to be a good stress test, so this will get plenty of exercise in the near future. o Don't use the local ID of the processor we want to send the IPI to as the argument to ipi_send(). use the struct pcpu pointer instead. The reason for this is that IPI delivery is unreliable. It has been observed that sending an IPI to a CPU causes it to receive a stray external interrupt. As such, we need a way to make the delivery reliable. The intended solution is to queue requests in the target CPU's per-CPU structure and use a single IPI to inform the CPU that there's a new entry in the queue. If that IPI gets lost, the CPU can check it's queue at any convenient time (such as for each clock interrupt). This also allows us to send requests to a CPU without interrupting it, if such would be beneficial. With these changes SMP is almost working. There are still some random process crashes and the machine can hang due to having the IPI lost that deals with the high FP context switch. The overhead of introducing the hash bucket head structure results in a performance degradation of about 1% for UP (extra pointer indirection). This is surprisingly small and is offset by gaining reasonably/good scalable SMP support.
# 147773	05-Jul-2005	marcel	Enhance ia64_flush_dirty() to handle the case in which td != curthread. This case is triggered with ptrace(2) and the PT_SETREGS function. Change the return type of the function to int so that errors can be passed on to the caller. Approved by: re (scottl)
# 144637	04-Apr-2005	jhb	Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case. Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch. This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example). Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
# 143057	02-Mar-2005	marcel	Make sure fpswa_iface equals NULL when bootinfo.bi_fpswa equals 0. We need to be able to test for the (possible) non-existence of the FPSWA code. PR: ia64/77591 Submitted by: Christian Kandeler (christian dot kandeler at hob dot de) MFC after: 1 day
# 142956	01-Mar-2005	wes	Attempt to doff the pointy hat: implement 'hw.realmem' on remaining architectures. Pointed out by O'Brien, ScottL via email. Reviewed by: obrien (various)
# 141378	05-Feb-2005	njl	Finish the job of sorting all includes and fix the build by including malloc.h before proc.h on sparc64. Noticed by das@ Compiled on: alpha, amd64, i386, pc98, sparc64
# 141248	04-Feb-2005	marcel	Include sys/bus.h before sys/cpu.h. The latter needs device_t.
# 141237	04-Feb-2005	njl	Add an implementation of cpu_est_clockrate(9). This function estimates the current clock frequency for the given CPU id in units of Hz.
# 138543	08-Dec-2004	marcel	Don't obtain the HCDP address directly from the bootinfo structure. Use a function to keep the details at arms length from uart(4).
# 138129	27-Nov-2004	das	Don't include sys/user.h merely for its side-effect of recursively including other headers.
# 137912	20-Nov-2004	das	U areas are going away, so don't allocate one for process 0. Reviewed by: arch@
# 136183	06-Oct-2004	marcel	Add the Madison II, which is the second generation Madison. The Madison II is model 2 in the Itanium 2 family and has up to 9MB of L3 cache and clocks higher than 1.5Ghz. There's no LV variant AFAICT.
# 135590	22-Sep-2004	marcel	Redefine a PTE as a 64-bit integral type instead of a struct of bit-fields. Unify the PTE defines accordingly and update all uses.
# 135453	19-Sep-2004	marcel	MFp4: Completely remove the remaining EFI includes and add our own (type) definitions instead. While here, abstract more of the internals by providing interface functions.
# 135405	17-Sep-2004	marcel	Provide our own FPSWA definitions, instead of depending on the Intel EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers conflict with the Intel ACPI headers (duplicate type definitions), so are being phased out in the kernel.
# 134791	05-Sep-2004	julian	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
# 133878	16-Aug-2004	marcel	Catch up with the drive-by renaming of IA32 to COMPAT_IA32. It must have been rush hour... While here, move COMPAT_IA32 from opt_global.h to opt_compat.h like on amd64. Consequently, it's unsafe to use the option in pcb.h. We now unconditionally have the ia32 specific registers in the PCB. This commit is untested.
# 133472	11-Aug-2004	marcel	In set_regs(), flush the dirty registers onto the backingstore before we update the registers. That way we don't have any dirty registers to worry about and also know that bsp=bspstore, which makes updating the RSE related registers predictable. This is not the end of it. We need more validity checks, but for now this allows us to complete the gdb testsuite without crashing the kernel.
# 133464	11-Aug-2004	marcel	Add __elfN(dump_thread). This function is called from __elfN(coredump) to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes. Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc
# 133291	07-Aug-2004	marcel	Implement single stepping when we leave the kernel through the EPC syscall path. The basic problem is that we cannot set the single stepping flag directly, because we don't leave the kernel via an interrupt return. So, we need another way to set the single stepping flag. The way we do this is by enabling the lower-privilege transfer trap, which gets raised when we drop the privilege level. However, since we're still running in kernel space (sec), we're not yet done. We clear the lower- privilege transfer trap, enable the taken-branch trap and continue exiting the kernel until we branch into user space. Given the current code, there's a total of two traps this way before we can raise SIGTRAP.
# 132088	13-Jul-2004	davidxu	Add ptrace_clear_single_step(), alpha already has it for years, the function will be used by ptrace to clear a thread's single step state.
# 131945	10-Jul-2004	marcel	Update for the KDB framework: o ksym_start and ksym_end changed type to vm_offset_t. o Make debugging support conditional upon KDB instead of DDB. o Call kdb_enter() instead of breakpoint(). o Remove implementation of Debugger(). o Call kdb_trap() according to the new world order. unwinder: o s/db_active/kdb_active/g o Various s/ddb/kdb/g o Add support for unwinding from the PCB as well as the trapframe. Abuse a spare field in the special register set to flag whether the PCB was actually constructed from a trapframe so that we can make the necessary adjustments. md_var.h: o Add RSE convenience macros. o Add ia64_bsp_adjust() to add or subtract from BSP while taking NaT collections into account.
# 131905	10-Jul-2004	marcel	Implement makectx(). The makectx() function is used by KDB to create a PCB from a trapframe for purposes of unwinding the stack. The PCB is used as the thread context and all but the thread that entered the debugger has a valid PCB. This function can also be used to create a context for the threads running on the CPUs that have been stopped when the debugger got entered. This however is not done at the time of this commit.
# 130344	11-Jun-2004	phk	Deorbit COMPAT_SUNOS. We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.
# 126825	10-Mar-2004	marcel	Identify the Deerfield processor. Deerfield is a low-voltage variant based on the Madison core and targeting the low end of the spectrum. Its clock frequency is 1Ghz, whereas Madison starts at 1.3Ghz. Since the CPUID information is the same for Madison and Deerfield, we use the clock frequency to identify the processor. Supposedly the Deerfield only uses 62W, which seems to be less than modern Xeon processors (about 70W) and about half what a Madison would need.
# 126106	22-Feb-2004	marcel	Do not pre-map the I/O port space. On the Intel Tiger 4 this conflicts with a memory mapped I/O range that's immediately before it and is not 256MB aligned. As a result, when an address is accessed in the memory mapped range and a direct mapping is added for it, it overlaps with the pre-mapped I/O port space and causes a machine check. Based on a patch from: arun@
# 124092	03-Jan-2004	davidxu	Make sigaltstack as per-threaded, because per-process sigaltstack state is useless for threaded programs, multiple threads can not share same stack. The alternative signal stack is private for thread, no lock is needed, the orignal P_ALTSTACK is now moved into td_pflags and renamed to TDP_ALTSTACK. For single thread or Linux clone() based threaded program, there is no semantic changed, because those programs only have one kernel thread in every process. Reviewed by: deischen, dfr
# 123528	13-Dec-2003	marcel	In set_mcontext(), take into account that kse_switchin(2) will eventually be passed an async. context as well as a syscall context. While here, fix a serious bug in that if the trapframe is a syscall frame, but we're restoring an async context, we need to clear the FRAME_SYSCALL flag so that we leave the kernel via exception_restore.
# 123255	07-Dec-2003	marcel	Simplify the contexts created by the kernel and remove the related flags. We now create asynchronous contexts or syscall contexts only. Syscall contexts differ from the minimal ABI dictated contexts by having the scratch registers saved and restored because that's where we keep the syscall arguments and syscall return values. Since this change affects KSE, have it use kse_switchin(2) for the "new" syscall context.
# 122918	20-Nov-2003	marcel	Set the ACPI processor Id in the PCPU structure so that CPU idling on SMP systems has a chance of working. This was a loose end of the implementation of the ACPI Cx idle states. Since our logical CPU Id is the ACPI processor Id, we do not need to jump through hoops to obtain it. Approved: re@ (jhb)
# 122763	15-Nov-2003	njl	Add the pc_acpi_id PCPU member. The new acpi_cpu driver uses this to dereference the softc.
# 122525	12-Nov-2003	marcel	Remove ia64_highfp_load() now that it's unused.
# 122518	11-Nov-2003	marcel	Further work-out the handling of the high FP registers. The most important change is in cpu_switch() where we disable the high FP registers for the thread that we switch-out if the CPU currently has its high FP registers. This avoids that the high FP registers remain enabled for the thread even when the CPU has unloaded them or the thread migrated to another processor. Likewise, when we switch-in a thread of that has its high FP registers on the CPU, we enable them. This avoids an otherwise harmless, but unnecessary trap to have them enabled. The code that handles the disabled high FP trap (in trap()) has been turned into a critical section for the most part to avoid being preempted. If there's a race, we bail out and have the processor trap again if necessary. Avoid using the generic ia64_highfp_save() function when the context is predictable. The function adds unnecessary overhead. Don't use ia64_highfp_load() for the same reason. The function is now unused and can be removed. These changes make the lazy context switching of the high FP registers in an UP kernel functional.
# 122480	11-Nov-2003	marcel	Save and restore the high FP registers in {g\|s}_mcontext(). Note that we currently do not keep track of whether the thread has actually used the high FP registers before. If not, we should not save them in the context which automaticly means that we also would not restore them from the context. For now, do it unconditionally so that we can reach functional completeness.
# 122479	11-Nov-2003	marcel	Fix a nasty bug that got exposed when the sendsig() and sigreturn() functions switched to using {g\|s}et_mcontext(). The problem is that sigreturn(), being a syscall, can be given an async. context (i.e. one corresponding to an interrupt or trap). When this happens, we try to return to user mode via epc_syscall_return with a trapframe that can only be used to return to user mode via exception_restore. To fix this, we check the frame's flags immediately prior to epc_syscall_return and branch to exception_restore for non-syscall frames. Modify the assertion in set_mcontext() to check that if there's a mismatch, it's because of sigreturn().
# 122389	10-Nov-2003	marcel	In get_mcontext(), do not update bspstore and ndirty in the trapframe. Only update them in the newly created context to reflect the state after copying the dirty registers onto the user stack. If we were to update the trapframe, we lose the state at entry into the kernel. We may need that after we create the context, such as for KSE upcalls. We have to update the trapframe after writing the dirty registers to the user stack for signal delivery to work. But this is best done in sendsig() itself where it applies, not in get_mcontext() where it's done unconditionally.
# 122368	09-Nov-2003	marcel	Use get_mcontext() to construct the signal context in sendsig() and use set_mcontext() to restore the context in sigreturn(). Since we put the syscall number and the syscall arguments in the trapframe (we don't save the scratch registers for syscalls, which allows us to reuse the space to our advantage), create a MD specific flag so that we save the scratch registers even for syscalls. We would not be able to restart a syscall otherwise. The signal trampoline does not need to flush the regiters anymore, because get_mcontext() already handles that. In fact, if we set up the context correctly, we do not need to have a trampoline at all. This change however only minimally changes the trampoline code. In follow-up commits this can be further optimized. Note that normally we preserve cfm and iip in the trapframe created by the EPC syscall path when we restore a context in set_mcontext() because those fields are not normally set for a synchronuous context. The kernel puts the return address and frame info of the syscall stub in there. By preserving these fields we hide this detail from userland which allows us to use setcontext(2) for user created contexts. However, sigreturn() is commonly called from the trampoline, which means that if we preserve cfm and iip in all cases, we would return to the trampoline after the sigreturn(), which means we hit the safety net: we call exit(2). So, we do not preserve cfm and iip when we have a synchronous context that also has scratch registers (the uncommon context created by sendsig() only), under the assumption that if such a context is created in userland, something special is going on and the use of cfm and iip is then just another quirk. All this is invisible in the common case.
# 122364	09-Nov-2003	marcel	Change the clear_ret argument of get_mcontext() to be a flags argument. Since all callers either passed 0 or 1 for clear_ret, define bit 0 in the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI code for possible (but unlikely) future use. The remaining bits are for use by MD code. This change is triggered by a need on ia64 to have another knob for get_mcontext().
# 121635	28-Oct-2003	marcel	When switching the RSE to use the kernel stack as backing store, keep the RNAT bit index constant. The net effect of this is that there's no discontinuity WRT NaT collections which greatly simplifies certain operations. The cost of this is that there can be up to 504 bytes of unused stack between the true base of the kernel stack and the start of the RSE backing store. The cost of adjusting the backing store pointer to keep the RNAT bit index constant, for each kernel entry, is negligible. The primary reasons for this change are: 1. Asynchronuous contexts in KSE processes have the disadvantage of having to copy the dirty registers from the kernel stack onto the user stack. The implementation we had so far copied the registers one at a time without calculating NaT collection values. A process that used speculation would not work. Now that the RNAT bit index is constant, we can block-copy the registers from the kernel stack to the user stack without having to worry about NaT collections. They will be in the right place on the user stack. 2. The ndirty field in the trapframe is now also usable in userland. This was previously not the case because ndirty also includes the space occupied by NaT collections. The value could be off by 8, depending on the discontinuity. Now that the RNAT bit index is contants, we have exactly the same number of NaT collection points on the kernel stack as we would have had on the user stack if we didn't switch backing stores. 3. Debuggers and other applications that use ptrace(2) can now copy the dirty registers from the kernel stack (using ptrace(2)) and copy them whereever they want them (onto the user stack of the inferior as might be the case for gdb) without having to worry about NaT collections in the same way the kernel doesn't have to worry about them. There's a second order effect caused by the randomization of the base of the backing store, for it depends on the number of dirty registers the processor happened to have at the time of entry into the kernel. The second order effect is that the RSE will have a better cache utilization as compared to having the backing store always aligned at page boundaries. This has not been measured and may be in practice only minimally beneficial, if at all measurable.
# 121457	24-Oct-2003	marcel	Remove ia64_pack_bundle() and ia64_unpack_bundle(). They are not used anymore.
# 121452	24-Oct-2003	arun	Use a TR of size 1 << IA64_ID_PAGE_SHIFT instead of 16M to avoid overlapping TR/TC entries (which results in a machine check). Note that we don't look at the size of the memory descriptor, because it doesn't guarantee non-overlap. With this change, a UP kernel could boot on a Intel Tiger4 machine with the following options: options LOG2_ID_PAGE_SIZE=26 # 64M options LOG2_PAGE_SIZE=14 # 16K Approved by: marcel
# 121294	20-Oct-2003	marcel	Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.
# 121228	18-Oct-2003	njl	Add the cpu_idle_hook() function pointer so that other idlers can be hooked at runtime. Make C1 sleep (e.g., HLT) be the default. This prepares the way for further ACPI sleep states.
# 121148	17-Oct-2003	marcel	Implement cpu_idle() on ia64. We put the processor in a lightweight halt state that minimizes power consumption while still preserving cache and TLB coherency. Halting the processor is not conditional at this time. Tested with UP and SMP kernels.
# 120683	03-Oct-2003	marcel	Swap the syscall caller frame info (i.e. the return pointer and frame marker) and the syscall stub frame info in the trap frame. Previously we stored the stub frame info in (rp,pfs) and the caller frame info in (iip,cfm). This ends up being suboptimal for the following reasons: 1. When we create a new context, such as for an execve(2), we had to set the (rp,pfs) pair for the entry point when using the syscall path out of the kernel but we need to set the (iip,cfm) pair when we take the interrupt way out. This is mostly just an inconsistency from the kernel's point of view, but an ugly irregularity from gdb(1)'s point of view. 2. The getcontext(2) and setcontext(2) syscalls had to swap the (rp,pfs) and (iip,cfm) pairs to make the context compatible with one created purely in userland. Swapping the (rp,pfs) and (iip,cfm) pairs is visible to signal handlers that actually peek at the mcontext_t and to gdb(1). Since this change is made for gdb(1) and we don't care about signal handlers that peek at the mcontext_t because we're still a tier 2 platform, this ABI breakage is academic at this moment in time. Note that there was no real reason to save the caller frame info in (iip,cfm) and the stub frame info in (rp,pfs).
# 120296	20-Sep-2003	marcel	Fix the last remaining problem encountered by KSE: apparently it is not guaranteed that the RSE writes the NaT collection immediately, sort of atomically, to the backing store when it writes the register immediately prior to the NaT collection point. This means that we cannot assume that the low 9 bits of the backingstore pointer do not point to the NaT collection. This is rather a surprise and I don't know at this time if it's a bug in the Merced or that it's actually a valid condition of the architecture. A quick scan over the sources does not indicate that we depend on the false assumption elsewhere, but it's something to keep in mind. The fix is to write the saved contents of the ar.rnat register to the backingstore prior to entering the loop that copies the dirty registers from the kernel stack to the user stack.
# 120252	19-Sep-2003	marcel	Fix the most significant KSE breakage caused by not restoring the restart instruction bits in the PSR. As such, we were returning from interrupt to the instruction in the bundle that caused us to enter the kernel, only now we're returning to a completely different bundle. While close here: add two KASSERTs to make sure that we restore sync contexts only when entered the kernel through a syscall and restore an async context only when entered the kernel through an interrupt, trap or fault. While not exactly here, but close enough: use suword64() when we copy the dirty registers from the kernel stack to the user stack. The code was intended to be be replaced shortly after being added, but that was a couple of weeks ago. I might as well avoid that it is a source for panics until it's replaced.
# 119906	09-Sep-2003	marcel	Introduce IA64_ID_PAGE_{MASK\|SHIFT\|SIZE} and LOG2_ID_PAGE_SIZE. The latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE. The constants are used instead of the literal hardcoding (in its various forms) of the size of the direct mappings created in region 6 and 7. The default and probably only workable size is still 256M, but for kicks we use 128M for LINT.
# 119649	01-Sep-2003	marcel	Use pmap_steal_memory() for the msgbuf instead of trying to squeeze it in the last chunk (phys_avail block). The last chunk very often is not larger than one or two pages, resulting in a msgbuf that's too small to hold a complete verbose boot. Note that pmap_steal_memory() will bzero the memory it "allocates". Consequently, ia64 will never preserve previous msgbufs. This is not a noticable difference in practice. If the msgbuf could be reused, it was invariably too small to have anything preserved anyway.
# 119337	22-Aug-2003	marcel	Remove unused inclusion of opt_acpi.h
# 118818	12-Aug-2003	marcel	Extend identifycpu(): o Differentiate between CPU family and CPU model. There are multiple Itanium 2 models and it's nice to differentiate between them. o Seperately export the CPU family and CPU model with sysctl. o Merced is the only model in the Itanium family. o Add Madison to the Itanium 2 family. We already knew about McKinley. o Print the CPU family between parenthesis, like we do with the i386 CPU class. My prototype now identifies itself as: CPU: Merced (800.03-Mhz Itanium) pluto1 and pluto2 will eventually identify themselves as: CPU: McKinley (900.00-Mhz Itanium 2)
# 118739	10-Aug-2003	marcel	o move cpu_reset() from vm_machdep.c to machdep.c. o reorder cpu_boot(), cpu_halt() and identifycpu(). No functional change.
# 118717	10-Aug-2003	marcel	Now that we can ignore up to 8KB of dirty registers, remove the RSE magic from exec_setregs(). In set_mcontext() we now also don't have to worry that we entered the kernel with more that 512 bytes of dirty registers on the kernel stack. Note that we cannot make any assumptions anymore WRT to NaT collection points in exec_setregs(), so we have to deal with them now.
# 118590	07-Aug-2003	marcel	Better define the flags in the mcontext_t and properly set the flags when we create contexts. The meaning of the flags are documented in <machine/ucontext.h>. I only list them here to help browsing the commit logs: _MC_FLAGS_ASYNC_CONTEXT _MC_FLAGS_HIGHFP_VALID _MC_FLAGS_KSE_SET_MBOX _MC_FLAGS_RETURN_VALID _MC_FLAGS_SCRATCH_VALID Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)
# 118503	05-Aug-2003	marcel	o Put the syscall return registers in the context. Not only do we need this for swapcontext(), KSE upcalls initiated from ast() also need to save them so that we properly return the syscall results after having had a context switch. Note that we don't use r11 in the kernel. However, the runtime specification has defined r8-r11 as return registers, so we put r11 in the context as well. I think deischen@ was trying to tell me that we should save the return registers before. I just wasn't ready for it :-) o The EPC syscall code has 2 return registers and 2 frame markers to save. The first (rp/pfs) belongs to the syscall stub itself. The second (iip/cfm) belongs to the caller of the syscall stub. We want to put the second in the context (note that iip and cfm relate to interrupts. They are only being misused by the syscall code, but are not part of a regular context). This way, when the context is switched to again, we return to the caller of setcontext(2) as one would expect. o Deal with dirty registers on the kernel stack. The getcontext() syscall will flush the RSE, so we don't expect any dirty registers in that case. However, in thread_userret() we also need to save the context in certain cases. When that happens, we are sure that there are dirty registers on the kernel stack. This implementation simply copies the registers, one at a time, from the kernel stack to the user stack. NAT collections are not dealt with. Hence we don't preserve NaT bits. A better solution needs to be found at some later time. We also don't deal with this in all cases in set_mcontext. No temporay solution is implemented because it's not a showstopper. The problem is that we need to ignore the dirty registers and we automaticly do that for at most 62 registers. When there are more than 62 dirty registers we have a memory "leak". This commit is fundamental for KSE support.
# 118414	04-Aug-2003	marcel	Cleanup the clock code. This includes: o Remove alpha specific timer code (mc146818A) and compiled-out calibration of said timer. o Remove i386 inherited timer code (i8253) and related acquire and release functions. o Move sysbeep() from clock.c to machdep.c and have it return ENODEV. Console beeps should be implemented using ACPI or if no such device is described, using the sound driver. o Move the sysctls related to adjkerntz, disable_rtc_set and wall_cmos_clock from machdep.c to clock.c, where the variables are. o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't bother faking a stathz that's 1/8 of that. Keep it simple: hz defaults to HZ and stathz equals hz. This is also how it's done for sparc64. o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj) to calculate ITC skew and corrections. On average, we adjust the ITC match register once every ~1500 interrupts for a duration of 2 consequtive interruprs. This is to correct the non-deterministic behaviour of the ITC interrupt (there's a delay between the match and the raising of the interrupt). o Add 4 debugging sysctls to monitor clock behaviour. Those are debug.clock_adjust_edges, debug.clock_adjust_excess, debug.clock_adjust_lost and debug.clock_adjust_ticks. The first counts the individual adjustment cycles (when the skew first crosses the threshold), the second counts the number of times the adjustment was excessive (any non-zero value is to be considered a bug), the third counts lost clock interrupts and the last counts the number of interrupts for which we applied an adjustment (debug.clock_adjust_ticks / debug.clock_adjust_edges gives the avarage duration of an individual adjustment -- should be ~2). While here, remove some nearby (trivial) left-overs from alpha and other cleanups.
# 118296	01-Aug-2003	marcel	Write the preserved registers to (and read them from) struct reg and struct fpreg.
# 118238	30-Jul-2003	peter	Cosmetic: fix some disorder of #include "opt_...." files
# 117993	25-Jul-2003	marcel	Move ia64_pa_access() from machdep.c to mem.c and declare it static. It's only used in mem.c and cannot accidentally be used elsewhere this way.
# 117608	15-Jul-2003	marcel	Rename thread_siginfo to cpu_thread_siginfo.
# 116958	28-Jun-2003	davidxu	Add a machine depended function thread_siginfo, SA signal code will use the function to construct a siginfo structure and use the result to export to userland. Reviewed by: julian
# 116227	11-Jun-2003	marcel	Make sure pcpu->pc_pcb is pointing to a 16-byte aligned address. The PCB contains FP registers, whose alignment must be 16 bytes at least. Since the PCB pointed to by pc_pcb is immediately after the PCPU itself, round-up the size of thge PCPU to a multiple of 16 bytes. The PCPU is page aligned. This fixes a misalignment trap caused by stopping a CPU in a SMP kernel, such as been done when entering the debugger. Reported by: Alan Robinson <alan.robinson@fujitsu-siemens.com>
# 115652	01-Jun-2003	marcel	Improve set_mcontext: o Don't copy psr verbatim from the user supplied context. Only allow userland to change the processor settings that are part of the user mask.
# 115566	31-May-2003	marcel	Implement set_mcontext() and get_mcontext(). Just as for sendsig() and sigreturn(), we cheat and assume the preserved registers are still on-chip and unmodified. This is actually the case, but more by accident than by design. We need to use unwinding eventually or explicitly compile the kernel in a way that the compiler steers clear from using the preserved registers completely.
# 115558	31-May-2003	marcel	Make sure we have all the dirty registers in user frames on the backing store before we discard them. It is possible that we enter the kernel (due to an execve in this case) with a lot of dirty user registers and that the RSE has only partially spilled them (to make room for new frames). We cannot move the backing store pointer down (to discard user registers) when not all of the user registers are on the backing store. So, we flush the register stack IFF this happens. Unconditionally doing the flush is too costly, because the condition in which we need to flush is very rare. This change appears to fix the SIGSEGV that sometimes happen for newly executed processes and so far also appears to fix the last of the corruption. It is possible, although not likely, that this change prevents some other bug from happening, even though it is itself not a fix. Hence the uncertainty. We'll know in a couple of months I guess :-)
# 115378	29-May-2003	marcel	Move the sysctls of the misalignment handler to where they belong and use OID_AUTO instead of fixed IDs. Approved by: re@ (blanket)
# 115341	26-May-2003	marcel	Fix fu{byte\|word} and su{byte\|word}: o If the address was not within user space we jumped to fusufault where we would clear pcb_onfault and return 0. There are two bugs here: 1. We never got to the point where we assigned the address of pcb_onfault to r15, which means that we would clobber some random memory location, including I/O space or ROM. 2. We're supposed to return -1 on error. o Make sure we have proper memory ordering for setting pcb_onfault, doing the memory access to user space and clearing pcb_onfault. For the fu* family of functions this means that we need a mf instruction, because we don't have acquire semantics on stores and release semantics on loads (hence st;ld cannot be ordered without intermediate mf). While here, implement casuptr() so that we are a (small) step closer to supporting libthr and deobfuscate the non-implementation of {f\|s}uswintr. Approved by: re@ (blanket)
# 115276	23-May-2003	marcel	Fix an alpha inheritance bug: On alpha, PAL is involved in context management and after wiring the CPU (in alpha_init()) a context switch was performed to tell PAL about the context. This was bogusly brought over to ia64 where it introduced bugs, because we restored the context from a mostly uninitialized PCB. The cleanup constitutes: o Remove the unused arguments from ia64_init(). o Don't return from ia64_init(), but instead call mi_startup() directly. This reduces the amount of muckery in assembly and also allows for the next bullet: o Save our currect context prior to calling mi_startup(). The reason for this is that many threads are created from thread0 by cloning the PCB. By saving our context in the PCB, we have something sane to clone. It also ensures that a cloned thread that does not alter the context in any way will return to the saved context, where we're ready for the eventuality with a nice, user unfriendly panic(). The cleanup fixes at least the following bugs: o Entering mi_startup() with the RSE in enforced lazy mode. o Re-execution of ia64_init() in certain "lab" conditions. While here, add proper unwind directives to __start() so that the unwind knows it has reached the bottom of the (call) stack. Approved by: re@ (blanket)
# 115148	19-May-2003	marcel	pmap_install() needs to be atomic WRT to context switching. Protect switching user regions (region 0-4) with schedlock. Avoid unnecessary recursion on schedlock by moving the core functionality to another function (pmap_switch()) where we assert schedlock is held. Turn pmap_install() into a wrapper that grabs schedlock. This minimizes the number of callsites that need to be changed. Since we already have schedlock in cpu_switch() and cpu_throw(), have them call pmap_switch() directly. These were also the only two calls to pmap_install() outside pmap.c, so make pmap_install() static and remove its prototype from pmap.h Approved by: re (blanket)
# 115084	16-May-2003	marcel	Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h). Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx) Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented. Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use. Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood. This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all. Approved by: re@ (jhb)
# 114983	13-May-2003	jhb	- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe. Reviewed by: arch@ Approved by: re (rwatson)
# 114553	02-May-2003	marcel	Option KADB does not exist. It came from alpha, where it still exists.
# 113998	24-Apr-2003	deischen	Add an argument to get_mcontext() which specified whether the syscall return values should be cleared. The system calls getcontext() and swapcontext() want to return 0 on success but these contexts can be switched to at a later time so the return values need to be cleared in the saved register sets. Other callers of get_mcontext() would normally want the context without clearing the return values. Remove the i386-specific context saving from the KSE code. get_mcontext() is not i386-specific any more. Fix a bad pointer in the alpha get_mcontext() code. The context was being bcopy()'d from &td->tf_frame, but tf_frame is itself a pointer, so the thread was being copied instead. Spotted by jake. Glanced at by: jake Reviewed by: bde (months ago)
# 112898	31-Mar-2003	jeff	- Define a new md function 'casuptr'. This atomically compares and sets a pointer that is in user space. It will be used as the basic primitive for a kernel supported user space lock implementation. - Implement this function in x86's support.s - Provide stubs that return -1 in all other architectures. Implementations will follow along shortly. Reviewed by: jake
# 112888	31-Mar-2003	jeff	- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
# 112882	31-Mar-2003	jeff	- Use sigexit() instead of twiddling the signal mask, catch, ignore, and action bits to allow SIGILL to work as expected. This brings this file in line with other architectures.
# 111030	17-Feb-2003	marcel	Print two new processor features: o Spontaneous deferral (A feature required by dutch railways :-) o 16-byte atomic operations (ld, st, cmpxchg)
# 110211	01-Feb-2003	marcel	Remove special casing for running in the simulator from the kernel and instead add platform, firmware and EFI stubs to the loader. The net effect of this change is that besides a special console and disk driver, the kernel has no knowledge of the simulator. This has the following advantages: o Simulator support is much harder to break, o It's easier to make use of more feature complete simulators. This would only need a change in the simulator specific loader, o Running SMP kernels within the simulator. Note that ski at this time does not simulate IPIs, so there's no way to start APs. The platform, firmware and EFI stubs describe the following hardware: o 4 CPU Itanium, o 128 MB RAM within the 4GB address space, o 64 MB RAM above the 4GB address space. NOTE: The stubs in the skiloader describe a machine that should in parts be defined by the simulator. Things like processor interrupt block and AP wakeup vector cannot be choosen at random because they require interpretation by the simulator. Currently the simulator is ignorant of this. This change introduces an unofficial SSC call SSC_SAL_SET_VECTORS which is ignored by the simulator. Tested with: ski (version 0.943 for linux)
# 107206	24-Nov-2002	marcel	MFp4: Add function map_port_space() to map the memory mapped I/O port range as uncacheable virtual memory and call it prior to probing for a console. This removes the dependency on the loader to have done this for us. Note that this change does not include doing the same for APs. Approved by: re (blanket)
# 106977	16-Nov-2002	deischen	Add getcontext, setcontext, and swapcontext as system calls. Previously these were libc functions but were requested to be made into system calls for atomicity and to coalesce what might be two entrances into the kernel (signal mask setting and floating point trap) into one. A few style nits and comments from bde are also included. Tested on alpha by: gallatin
# 106697	09-Nov-2002	des	Print real / avail memory in megabytes rather than kilobytes.
# 106605	07-Nov-2002	tmm	Move the definitions of the hw.physmem, hw.usermem and hw.availpages sysctls to MI code; this reduces code duplication and makes all of them available on sparc64, and the latter two on powerpc. The semantics by the i386 and pc98 hw.availpages is slightly changed: previously, holes between ranges of available pages would be included, while they are excluded now. The new behaviour should be more correct and brings i386 in line with the other architectures. Move physmem to vm/vm_init.c, where this variable is used in MI code.
# 106503	06-Nov-2002	jmallett	Remove what was a temporary bogus assignment of bits of siginfo_t, as it does not look like the prerequisites to fill it in properly will be in the tree for the upcoming release, but it's mostly done, so there is no need for these to stay around to remind us.
# 106189	30-Oct-2002	marcel	Rewrite cpu_switch(). The most notable change is the fact that we now have f16-f31 as part of the context. The PCB has been reorganized to better match how we save and restore the (preserved) registers. This commit also moves the context restoriation to its own function (named pcb_restore), as we did with pcb_save. Only minimal effort has been put in writing optimal assembly. The expectation is that there will be more rounds of changes.
# 105950	25-Oct-2002	peter	Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound. Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext ' to the current sigreturn(2), instead of the 'ucontext_t ' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc. Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
# 105891	24-Oct-2002	jhb	Oops, I missed a few changes in 'device acpica' -> 'device acpi' change. Submitted by: Hiten Pandya <hiten@angelica.unixdaemons.com>
# 105470	19-Oct-2002	marcel	Update the unwind information when modules are loaded and unloaded by using the linker hooks. Since these hooks are called for the kernel as well, we don't need to deal with that with a special SYSINIT. The initialization implicitly performed on the first update of the unwind information is made explicit with a SYSINIT. We now don't need the _ia64_unwind_{start\|end} symbols.
# 105432	19-Oct-2002	marcel	Make this compile when DDB is not defined by conditionally compiling all references to ksym_start and ksym_end.
# 104433	03-Oct-2002	peter	Do a bit of rude hackery to get clock interrupts on all CPUs. This is partly based on the Alpha system which duplicates the clock to each cpu, instead of doing a clock roundrobin like on i386. This means we get hz * ncpu clocks per second and so we have to seperate clock sampling from actual 'do the work' clock processing. The BSP runs the complete processing, the rest just sample state etc. Using the on-cpu interval timer is not ideal as it will drift. There is more to be done here, we should use an external clock source.
# 103703	20-Sep-2002	phk	For reasons now lost in historical fog, the bounds_check_with_label() function were put in i386/i386/machdep.c from where it has been cut and pasted to other architectures with only minor corruption. Disklabel is really a MI format in many ways, at least it certainly is when you operate on struct disklabel. Put bounds_check_with_label() back in subr_disklabel.c where it belongs. Sponsored by: DARPA & NAI Labs.
# 103367	15-Sep-2002	julian	Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place) While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
# 103081	07-Sep-2002	jmallett	Fill out two fields (si_pid, si_uid) in the siginfo structure handed back to userland in the signal handler that were not being iflled out before, but should and can be. This part of sendsig could be slightly refactored to use an MI interface, or ideally, sendsig() would have an API change to accept a siginfo_t, which would be filled out by an MI function in the level above sendsig, and said MI function would make a small call into MD code to fill out the MD parts (some of which may be bogus, such as the si_addr stuff in some places). This would eventually make it possible for parts of the kernel sending signals to set up a siginfo with meaningful information. Reviewed by: mux MFC after: 2 weeks
# 102666	31-Aug-2002	peter	Take a shot at fixing up a whole stack of style and other embarresing unforced errors that Bruce identified. I have not yet addressed all of his concerns.
# 102600	30-Aug-2002	peter	Change hw.physmem and hw.usermem to unsigned long like they used to be in the original hardwired sysctl implementation. The buf size calculator still overflows an integer on machines with large KVA (eg: ia64) where the number of pages does not fit into an int. Use 'long' there. Change Maxmem and physmem and related variables to 'long', mostly for completeness. Machines are not likely to overflow 'int' pages in the near term, but then again, 640K ought to be enough for anybody. This comes for free on 32 bit machines, so why not?
# 102561	29-Aug-2002	jake	Renamed poorly named setregs to exec_setregs. Moved its prototype to imgact.h with the other exec support functions.
# 101251	03-Aug-2002	peter	Ignore memory above 4GB for now due to unpleasant pci issues.
# 97443	29-May-2002	marcel	Remove the definition of struct mca_guid and use the generic struct uuid defined in <sys/uuid.h>. Use uuid/UUID instead of guid/GUID to emphasize that the identifiers are DCE version 1 identifiers and also to avoid inconsistencies as much a possible.
# 96973	20-May-2002	marcel	Flesh-out ptrace support. This obviously needs more work.
# 96912	19-May-2002	marcel	o Remove namespace pollution from param.h: - Don't include ia64_cpu.h and cpu.h - Guard definitions by _NO_NAMESPACE_POLLUTION - Move definition of KERNBASE to vmparam.h o Move definitions of IA64_RR_{BASE\|MASK} to vmparam.h o Move definitions of IA64_PHYS_TO_RR{6\|7} to vmparam.h o While here, remove some left-over Alpha references.
# 95919	02-May-2002	marcel	PCPU(current_pmap) is initialized in pmap_bootstrap. No need to do it again.
# 95863	01-May-2002	peter	Connect up kern_envp before we use it for getenv() and console probing. It is a bit late after that when we have no consoles. :-] Also, fix a comment nit and print a warning about missing metadata.
# 95762	30-Apr-2002	marcel	Make this work for ski again. Don't call ia64_mca_init() when we're in the simulator.
# 95519	26-Apr-2002	marcel	Initialize MCA in cpu_startup() so that it's ready before we wake-up the application processors. This allows us to collect unconsumed AP specific error records as part of the wake-up.
# 95458	25-Apr-2002	marcel	The official name for McKinley is: Itanium 2
# 95025	19-Apr-2002	marcel	Remove the bootinfo kludge. We get the address of the bootinfo block from the loader.
# 94936	17-Apr-2002	mux	Rework the kernel environment subsystem. We now convert the static environment needed at boot time to a dynamic subsystem when VM is up. The dynamic kernel environment is protected by an sx lock. This adds some new functions to manipulate the kernel environment : freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be called after every getenv() when you have finished using the string. testenv() only tests if an environment variable is present, and doesn't require a freeenv() call. setenv() and unsetenv() are self explanatory. The kenv(2) syscall exports these new functionalities to userland, mainly for kenv(1). Reviewed by: peter
# 94639	14-Apr-2002	peter	Allow a kernel to be compiled with both SKI and acpica and still work on real hardware. (SKI used to break the sapic probes)
# 94628	13-Apr-2002	alc	Add comment that sigreturn() is MPSAFE.
# 94496	12-Apr-2002	dfr	Initialise ar.cflg, which contains the IA-32 registers cr0 and cr4. Since all IA-32 processes use the same values for cr0 and cr4, we initialise them at system startup.
# 94481	12-Apr-2002	peter	Really fix uniprocessor on IA64. Note to self: do not use variables before they are initialized. I had correctly figured out that the UP problem was the pcpu current_pmap thing, but didn't fix it right last time.
# 94275	09-Apr-2002	phk	GC various bits and pieces of USERCONFIG from all over the place.
# 93793	04-Apr-2002	bde	Moved signal handling and rescheduling from userret() to ast() so that they aren't in the usual path of execution for syscalls and traps. The main complication for this is that we have to set flags to control ast() everywhere that changes the signal mask. Avoid locking in userret() in most of the remaining cases. Submitted by: luoqi (first part only, long ago, reorganized by me) Reminded by: dillon
# 93761	04-Apr-2002	alc	o Kill the MD grow_stack(). Call the MI vm_map_growstack() in its place. o Eliminate the use of useracc() and grow_stack() from sendsig(). Reviewed by: peter
# 93713	03-Apr-2002	marcel	o GC dumplo o Replace the string lit. "ia64" with MACHINE
# 93702	02-Apr-2002	jhb	- Move the MI mutexes sched_lock and Giant from being declared in the various machdep.c's to being declared in kern_mutex.c. - Add a new function mutex_init() used to perform early initialization needed for mutexes such as setting up thread0's contested lock list and initializing MI mutexes. Change the various MD startup routines to call this function instead of duplicating all the code themselves. Tested on: alpha, i386
# 93627	02-Apr-2002	marcel	o GC totalphysmem and resvmem. o Rephrase comment describing that the memory region can contain the kernel.
# 93458	30-Mar-2002	marcel	Transition to a model where the loader passes the address of the bootinfo block in register r8. In locore.s we save the address in the global variable 'pa_bootinfo'. In machdep.c we compare this value against the hardwired address, but don't depend on its validity yet (ie: we still expect the bootinfo block to be at the hardwired address). After a small amount of time, we'll flip the switch and depend on the loader to pass us the address. From that moment on the loader is free to put it anywhere it likes, provided the machine itself likes it as well. Add some verbosity to aid in the transition. We emit a message if the loader didn't pass the address and we also emit a message if there's no bootinfo block at the hardwired address. While in locore.s, reduce the number of redundant serialization instructions. A srlz.i is a proper superset of a srlz.d and thus is a valid replacement. Also slightly reorder the movl instructions to improve bundle density.
# 93273	27-Mar-2002	jeff	Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares. Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone. Approved by: jhb
# 92865	21-Mar-2002	peter	In UP mode, the primary cpu's per-cpu current_pmap was not initialized - this was only done as a side effect of calling cpu_mp_start(). I haven't actually tested that this fixes UP kernels, but it feels about right.
# 92843	20-Mar-2002	alfred	Remove __P. Reviewd by: peter
# 92677	19-Mar-2002	peter	My ia64 box for some reason likes to fragment the beginning/end of memory a bit before handing it over to the OS. I occasionally have 11 segments with several 8K or so fragments depending on nvram settings and what I have done under loader(8) before booting. This needs to be revisited.
# 92675	19-Mar-2002	peter	Move a couple of prototypes together instead of being incompletely scattered around.
# 92262	14-Mar-2002	dfr	Move the call to pmap_bootstrap to after the initialisation of thread0. This allows us to use mutexes in pmap safely. Also initialise fpcurthread for cpu0 so that ia64_fpstate_check doesn't barf during boot.
# 92123	11-Mar-2002	peter	Fix a warning (make ucontext_t *ucp a const)
# 92122	11-Mar-2002	peter	Stop concatenating __func__ with strings
# 91598	03-Mar-2002	dfr	* Include <sys/ucontext.h> so that this compiles again. * Move the section which manipulates ia64_pal_base to after cninit() so that we don't risk printing anything before we have a console. * Don't call ia64_probe_sapics() for a SKI build. This should really be dependant on ACPICA being present or something.
# 90361	07-Feb-2002	julian	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
# 90065	01-Feb-2002	bde	Compile osigreturn() unconditionally since it will always be needed on some arches and the syscall table is machine-independent. It was (bogusly) conditional on COMPAT_43, so this usually makes no difference. ia64: in addition: - replace the bogus cloned comment before osigreturn() by a correct one. osigreturn() is just a stub fo ia64's. - fix the formatting of cloned comment before sigreturn(). - fix the return code. use nosys() instead of returning ENOSYS to get the same semantics as if the syscall is not in the syscall table. Generating SIGSYS is actually correct here. - fix style bugs. powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment. sparc64: copy the cleaned up the ia64 stub, since there was no stub before.
# 89492	18-Jan-2002	marcel	Remove the definition of bootverbose. This fixes the link failure caused by disabling the emission of common symbols.
# 88693	30-Dec-2001	marcel	o Reimplement map_pal_code to work with a global variable ia64_pal_base instead of scanning the EFI tables. This way AP startup code can more easily use the function. o Initialize ia64_pal_base in ia64_init(). When the PAL code doesn't need explicit mapping or no PAL code has been found, ia64_pal_base will be 0. o Remove some unused global variables. o Also in ia64_init(), allocate only 1 page for struct pcpu and remove some Alpha leftovers. o Initialize pc_pcb in cpu_pcpu_init().
# 87702	11-Dec-2001	jhb	Overhaul the per-CPU support a bit: - The MI portions of struct globaldata have been consolidated into a MI struct pcpu. The MD per-CPU data are specified via a macro defined in machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the interface would be cleaner (PCPU_GET(my_md_field) vs. PCPU_GET(md.md_my_md_field)). - All references to globaldata are changed to pcpu instead. In a UP kernel, this data was stored as global variables which is where the original name came from. In an SMP world this data is per-CPU and ideally private to each CPU outside of the context of debuggers. This also included combining machine/globaldata.h and machine/globals.h into machine/pcpu.h. - The pointer to the thread using the FPU on i386 was renamed from npxthread to fpcurthread to be identical with other architectures. - Make the show pcpu ddb command MI with a MD callout to display MD fields. - The globaldata_register() function was renamed to pcpu_init() and now init's MI fields of a struct pcpu in addition to registering it with the internal array and list. - A pcpu_destroy() function was added to remove a struct pcpu from the internal array and list. Tested on: alpha, i386 Reviewed by: peter, jake
# 87546	08-Dec-2001	dillon	Allow maxusers to be specified as 0 in the kernel config, which will cause the system to auto-size to between 32 and 512 depending on the amount of memory. MFC after: 1 week
# 86592	19-Nov-2001	peter	Initial cut at calling the EFI-provided FPSWA (Floating Point Software Assist) driver to handle the "messy" floating point cases which cause traps to the kernel for handling.
# 86291	12-Nov-2001	marcel	o os_boot_rendez is responsible for clearing the IRR bit by reading cr.ivr, as well as writing to cr.eoi. o use global variables to pass information to os_boot_rendez so that it doesn't have to jump through hoops to find it out. This avoids traps on the AP without it even being initialized. This fixes SMP configurations. o Move the probing of the MADT to the end of cpu_startup, instead of at the start of cpu_mp_probe. We need to probe the MADT for non-SMP configurations as well. This fixes uniprocessor configurations. o Serialize AP wake-up by waiting for the AP. We need to do this since we use global variables to for the AP to use. As a side-effect, we can use printf() more easily to see what's going on.
# 86286	12-Nov-2001	peter	Remove #if 0'ed code that was replaced by vm_ksubmap_init() and GC'ed on other platforms.
# 86211	09-Nov-2001	dfr	Reserve more space for phys_avail. Really need to be more careful about overflowing phys_avail.
# 85852	01-Nov-2001	peter	argh! cut/paste typo. :-( (committed on a different machine to what I was testing it on)
# 85850	01-Nov-2001	peter	"Fix" a problem that got copied from alpha to ia64 and broke there. When we truncate the msgbuf size because the last chunk is too small, correctly terminate the phys_avail[] array - the VM system tests the end for zero, not the start. This leads the VM startup to attempt to recreate a duplicate set of pages for all physical memory. XXX the msgbuf handling is suspiciously different on i386 vs alpha/ia64...
# 85685	29-Oct-2001	dfr	* Factor out common code for manipulating the RSE backing store. * Implement a fairly simplistic parser for unwinding stack frames. * Use unwind records for DDB's 'trace' command. Also add support for tracing past exceptions to the context which generated the exception. The stack unwind code requires a toolchain based on binutils-2.11.2 or later and gcc-3.0.1 or later.
# 85383	23-Oct-2001	peter	Fix RAW dependency violation when compiled with gcc-3 Warning: Use of 'br.ret.sptk.many' violates RAW dependency 'PSR.tb' (data)
# 85360	23-Oct-2001	peter	Turn off the single-user override. We've been running multi-user for some time. Having a machine boot unattended is useful. :-)
# 85294	21-Oct-2001	des	[partially forced commit due to pilot error in earlier commit attempt] {set,fill}_{,fp,db}regs() fixup: - Add dummy {set,fill}_dbregs() on architectures that don't have them. - KSEfy the powerpc versions (struct proc -> struct thread). - Some architectures had the prototypes in md_var.h, some in reg.h, and some in both; for consistency, move them to reg.h on all platforms. These functions aren't really MD (the implementation is MD, but the interface is MI), so they should move to an MI header, but I haven't figured out which one yet. Run-tested on i386, build-tested on Alpha, untested on other platforms.
# 85293	21-Oct-2001	des	{set,fill}_{,fp,db}regs() fixup: - Add dummy {set,fill}_dbregs() on architectures that don't have them. - KSEfy the powerpc versions (struct proc -> struct thread). - Some architectures had the prototypes in md_var.h, some in reg.h, and some in both; for consistency, move them to reg.h on all platforms. These functions aren't really MD (the implementation is MD, but the interface is MI), so they should move to an MI header, but I haven't figured out which {set,fill}_{,fp,db}regs() fixup: - Add dummy {set,fill}_dbregs() on architectures that don't have them. - KSEfy the powerpc versions (struct proc -> struct thread). - Some architectures had the prototypes in md_var.h, some in reg.h, and some in both; for consistency, move them to reg.h on all platforms. These functions aren't really MD (the implementation is MD, but the interface is MI), so they should move to an MI header, but I haven't figured out which one yet. Run-tested on i386, build-tested on Alpha, untested on other platforms.
# 85283	21-Oct-2001	dfr	Use ia64_set_fpsr() instead of __asm to set ar.fpsr.
# 85109	18-Oct-2001	dfr	Shift the code which packs and unpacks instruction bundles out of DDB since it is useful for various emulations duties (e.g. unaligned trap handling).
# 84966	15-Oct-2001	marcel	When compiling with SKI support, create the fake memory regions when either the memory descriptor in the bootinfo is NULL or the descriptor count is 0.
# 84798	11-Oct-2001	dfr	* Change the calling convention for execve so that it conforms to normal C calling conventions. This allows crt1.c to be written nearly without any inline assembler. * Initialise cpu_model[] so that the hw.model sysctl works properly.
# 84621	07-Oct-2001	dfr	Remove bogus include.
# 84592	06-Oct-2001	dfr	Move console probes until after we set boothowto so that 'boot -h' works.
# 84128	29-Sep-2001	dfr	Various changes to use the firmware on a real machine.
# 83834	22-Sep-2001	dfr	* Turn off memory descriptor debugging - its served its purpose. * Don't get confused when memory regions don't lie on page boundaries - remember our page size is typically larger than the firmware's page size. * Add a function ia64_running_in_simulator() which is intended to detect whether the kernel is running in SKI or on real hardware.
# 83611	18-Sep-2001	dfr	Flesh out identifycpu().
# 83522	15-Sep-2001	dfr	Rearrange so we search for I/O port space as early as possible (i.e. before console probing). Also fix a confusion between EFI's page size which is fixed at 4096 and our own page size which is variable at compile time.
# 83512	15-Sep-2001	dfr	Use the MI console code to initialise the console.
# 83509	15-Sep-2001	dfr	* Use Intel's EFI headers instead of home-grown ones. * Use the bootinfo's memory map if present instead of hard-coding SKI's memory map. * Record the location of the I/O Port Space if present in the memory map.
# 83407	13-Sep-2001	dfr	* Enable dynamically linked kernel. This involves adding a self-relocator to locore to process the @fptr relocations in the dynamic executable. * Don't initialise the timer until after we install the timecounter to avoid a race between timecounter initialisation and hardclock. * Tidy up bootinfo somewhat including adding sanity checks for when the kernel is loaded without a recognisable bootinfo.
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 83301	10-Sep-2001	dfr	* Make a start on a realistic definition for bootinfo. * Switch to proc0's stack and backing store before calling ia64_init so that we don't rely on the loader's stack at all. * Change kernel entry point name from locorestart to __start.
# 83163	06-Sep-2001	jhb	Call sendsig() with the proc lock held and return with it held.
# 82785	02-Sep-2001	peter	Merge from i386: various cleanups including moving the map calculations to MI code. This gets ia64 to compile again.
# 82393	27-Aug-2001	peter	Enable hardwiring of things like tunables from embedded enironments that do not start from loader(8).
# 82025	21-Aug-2001	peter	Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this stuff.
# 81265	08-Aug-2001	peter	Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they are a really nasty interface that should have been killed long ago when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they operate on (struct user) will not be around much longer since it is part-per-process and part-per-thread in a post-KSE world. gdb does not actually use this except for the obscure 'info udot' command which does a hexdump of as much of the child's 'struct user' as it can get. It carries its own #defines so it doesn't break compiles.
# 81197	06-Aug-2001	dfr	Remove usage of nonexistent vm_mtx.
# 80421	26-Jul-2001	peter	Call the early tunable setup functions as soon as kern_envp is available. Some things depend on hz being set not long after this.
# 78962	29-Jun-2001	jhb	Add a new MI pointer to the process' trapframe p_frame instead of using various differently named pointers buried under p_md. Reviewed by: jake (in principle)
# 78888	27-Jun-2001	jhb	Catch up to mbuf allocator changes from last September so this compiles again.
# 77448	29-May-2001	jhb	- Catch up to the VM mutex changes. - Sort includes in a few places.
# 77031	23-May-2001	ru	- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.
# 76770	17-May-2001	jhb	- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT. - Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be toggled after boot. - Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just afer the display of the copyright message instead of doing it by hand in three MD places.
# 76440	10-May-2001	jhb	- Split out the support for per-CPU data from the SMP code. UP kernels have per-CPU data and gdb on the i386 at least needs access to it. - Clean up includes in kern_idle.c and subr_smp.c. Reviewed by: jake
# 76078	27-Apr-2001	jhb	Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP. - It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the _process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the _process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the _process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the _process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in. Reviewed by: jake, peter Looked over by: eivind
# 75913	24-Apr-2001	dfr	Align stack pointer and backing store pointer to 16 byte boundary when delivering signals.
# 75002	29-Mar-2001	obrien	Reduce the emasculation of bounds_check_with_label() by one line, so we propagate a bio error condition to the caller and above.
# 74912	28-Mar-2001	jhb	Rework the witness code to work with sx locks as well as mutexes. - Introduce lock classes and lock objects. Each lock class specifies a name and set of flags (or properties) shared by all locks of a given type. Currently there are three lock classes: spin mutexes, sleep mutexes, and sx locks. A lock object specifies properties of an additional lock along with a lock name and all of the extra stuff needed to make witness work with a given lock. This abstract lock stuff is defined in sys/lock.h. The lockmgr constants, types, and prototypes have been moved to sys/lockmgr.h. For temporary backwards compatability, sys/lock.h includes sys/lockmgr.h. - Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin locks held. By making this per-cpu, we do not have to jump through magic hoops to deal with sched_lock changing ownership during context switches. - Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with proc->p_sleeplocks, which is a list of held sleep locks including sleep mutexes and sx locks. - Add helper macros for logging lock events via the KTR_LOCK KTR logging level so that the log messages are consistent. - Add some new flags that can be passed to mtx_init(): - MTX_NOWITNESS - specifies that this lock should be ignored by witness. This is used for the mutex that blocks a sx lock for example. - MTX_QUIET - this is not new, but you can pass this to mtx_init() now and no events will be logged for this lock, so that one doesn't have to change all the individual mtx_lock/unlock() operations. - All lock objects maintain an initialized flag. Use this flag to export a mtx_initialized() macro that can be safely called from drivers. Also, we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness performs the corresponding checks using the initialized flag. - The lock order reversal messages have been improved to output slightly more accurate file and line numbers.
# 74030	09-Mar-2001	dfr	Adjust a comment slightly.
# 73929	07-Mar-2001	jhb	Grab the process lock while calling psignal and before calling psignal.
# 72226	09-Feb-2001	jhb	Move the initailization of the proc lock for proc0 very early into the MD startup code.
# 72221	09-Feb-2001	jhb	Remove bogus #if 0'd code that dinked with the saved interrupt state in sched_lock.
# 72200	09-Feb-2001	bmilekic	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# 71984	04-Feb-2001	peter	All the world is not an i386. Merge rev 1.438 of i386/i386/machdep.c. Make buffer_map a system map.
# 71803	29-Jan-2001	dfr	Flesh out EFI support somewhat.
# 71684	26-Jan-2001	dfr	Initialise proc0.p_heldmtx and proc0.p_contested and call mtx_enter(&Giant, MTX_DEF) after Giant is initialised. Reviewed by: jhb
# 71552	24-Jan-2001	jhb	- Proc locking. - P_FOO -> PS_FOO.
# 71320	21-Jan-2001	jasone	Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex initialization until after malloc() is safe to call, then iterate through all mutexes and complete their initialization. This change is necessary in order to avoid some circular bootstrapping dependencies.
# 71228	18-Jan-2001	bmilekic	Implement MTX_RECURSE flag for mtx_init(). All calls to mtx_init() for mutexes that recurse must now include the MTX_RECURSE bit in the flag argument variable. This change is in preparation for an upcoming (further) mutex API cleanup. The witness code will call panic() if a lock is found to recurse but the MTX_RECURSE bit was not set during the lock's initialization. The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to MTX_RECURSED, which is more appropriate given its meaning. The following locks have been made "recursive," thus far: eventhandler, Giant, callout, sched_lock, possibly some others declared in the architecture-specific code, all of the network card driver locks in pci/, as well as some other locks in dev/ stuff that I've found to be recursive. Reviewed by: jhb
# 69586	04-Dec-2000	jake	Remove the last of the MD netisr code. It is now all MI. Remove spending, which was unused now that all software interrupts have their own thread. Make the legacy schednetisr use an atomic op for setting bits in the netisr mask. Reviewed by: jhb
# 69379	30-Nov-2000	marcel	Don't use p->p_sigstk.ss_flags to keep state of whether the process is on the alternate stack or not. For compatibility with sigstack(2) state is being updated if such is needed. We now determine whether the process is on the alternate stack by looking at its stack pointer. This allows a process to siglongjmp from a signal handler on the alternate stack to the place of the sigsetjmp on the normal stack. When maintaining state, this would have invalidated the state information and causing a subsequent signal to be delivered on the normal stack instead of the alternate stack. PR: 22286
# 69207	26-Nov-2000	jlemon	Add 'mpsafe' parameter to callout_init() in MD bits. Reminded by: jake
# 68889	19-Nov-2000	jake	- Protect the callout wheel with a separate spin mutex, callout_lock. - Use the mutex in hardclock to ensure no races between it and softclock. - Make softclock be INTR_MPSAFE and provide a flag, CALLOUT_MPSAFE, which specifies that a callout handler does not need giant. There is still no way to set this flag when regstering a callout. Reviewed by: -smp@, jlemon
# 67708	27-Oct-2000	phk	Convert all users of fldoff() to offsetof(). fldoff() is bad because it only takes a struct tag which makes it impossible to use unions, typedefs etc. Define __offsetof() in <machine/ansi.h> Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h> Remove myriad of local offsetof() definitions. Remove includes of <stddef.h> in kernel code. NB: Kernelcode should never include from /usr/include ! Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API. Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001. Paritials reviews by: various. Significant brucifications by: bde
# 67522	24-Oct-2000	dfr	* Various fixes to breakage introduced by the atomic and mutex reorgs. * Fixes to the signal delivery code. Not quite right yet. I would have preferred to wait until I have signal delivery actually working but the current kernel in CVS doesn't build.
# 67357	20-Oct-2000	jhb	- machine/mutex.h -> sys/mutex.h - Use MUTEX_DECLARE() and MTX_COLD for Giant and sched_lock.
# 67325	19-Oct-2000	dfr	Don't force bootverbose anymore.
# 67199	16-Oct-2000	dfr	* Correct some of my misunderstandings about how best to switch to the kernel backing store. * Implement syscalls via break instructions. * Fix backing store copying in cpu_fork() so that the child gets the right register values. This thing is actually starting to work now. This set of changes takes me up to the second execve (the one which runs the first shell). Next stop single-user mode :-).
# 67032	12-Oct-2000	dfr	Implement a rudimentary interrupt handling system which should be good enough for clock interrupts in SKI.
# 67020	12-Oct-2000	dfr	* Fix exception handling so that it actually works. We can now handle exceptions from both kernel and user mode. * Fix context switching so that we can switch back to a proc which we switched away from (we were saving the state in the wrong place). * Implement lazy switching of the high-fp state. This needs to be looked at again for SMP to cope with the case of a process migrating from one processor to another while it has the high-fp state. * Make setregs() work properly. I still think this should be called cpu_exec() or something. * Various other minor fixes. With this lot, we can execve() /sbin/init and we get all the way up to its first syscall. At that point, we stop because syscall handling is not done yet.
# 66937	10-Oct-2000	dfr	* Add rudimentary DDB support (no kgdb, no backtrace, no single step). * Track recent changes to SWI code. * Allocate RIDs for pmaps (untested). * Implement assembler version of cpu_switch - its cleaner that way.
# 66633	04-Oct-2000	dfr	Next round of fixes to the ia64 code. This includes simulated clock and disk drivers along with a load of fixes to context switching, fork handling and a load of other stuff I can't remember now. This takes us as far as start_init() before it dies. I guess now I will have to finish off the VM system and syscall handling :-).
# 66486	30-Sep-2000	dfr	Next round of ia64 work, including fixes to context switching, implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these changes, my test kernel reaches the mountroot prompt.
# 66458	29-Sep-2000	dfr	This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will not work on any real hardware (or fully work on any simulator). Much more needs to happen before this is actually functional but its nice to see the FreeBSD copyright message appear in the ia64 simulator.