#
259065 |
|
07-Dec-2013 |
gjb |
- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1
[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
248084 |
|
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
#
247454 |
|
28-Feb-2013 |
davide |
MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days.
The commmit is not targeted for MFC.
|
#
246715 |
|
12-Feb-2013 |
marcel |
Eliminate the PC_CURTHREAD symbol and load the current thread's thread structure pointer atomically from r13 (the pcpu pointer) for the current CPU/core. Add a CTASSERT in machdep.c to make sure that pc_curthread is in fact the first field in struct pcpu.
The only non-atomic operations left were those related to process- space operations, such as casuword, subyte, suword16, fubyte, fuword16, copyin, copyout and their variations.
The casuword function has been re-structured more complete than the others. This way we have an example of a better bundling without introducing a lot of risk when we get it wrong. The other functions can be rebundled in separate commits and with the appropriate testing.
|
#
238257 |
|
08-Jul-2012 |
marcel |
Move PCPU initialization to a new function called cpu_pcpu_setup(). This makes it easier to add additional CPU or platform information to the per-CPU structure without duplicated code.
|
#
238190 |
|
07-Jul-2012 |
marcel |
Implement ia64_physmem_alloc() and use it consistently to get memory before VM has been initialized. This includes: 1. Replacing pmap_steal_memory(), 2. Replace the handcrafted logic to allocate a naturally aligned VHPT, 3. Properly allocate the DPCPU for the BSP.
Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't cross into the next PBVM page. If we were to cross into the next page, then there wouldn't be a PTE entry on the page table for it and we would end up with a MCA following a page fault. As such, this commit fixes MCAs occasionally seen.
|
#
238184 |
|
06-Jul-2012 |
marcel |
Hide the creation of phys_avail behind an API to make it easier to do it correctly. We now iterate the EFI memory descriptors once and collect all the information in a single pass. This includes: 1. The I/O port base address, 2. The PAL memory region. Have the physmem API track this. 3. Memory descriptors of memory we can't use, like bad memory, runtime services code & data, etc. Have the physmem API track these. 4. memory descriptors of memory we can use or re-use, such as free memory, boot time services code & data, loader code & data, etc. These are added by the physmem API.
Since the PBVM page table and pages are in memory described as loader data, inform the physmem API of chunks that need to be delated from the available physical memory.
While here, remove Maxmem and replace it with the better named paddr_max. Maxmem was defined as physmem, which is generally wrong. Now, paddr_max is properly defined as the largesty physical address.
The upshot of all this is that: 1. We properly determine realmem. 2. We maximize physmem by re-using memory where possible. 3. We remove complexity from ia64_init() in machdep.c. 4. Remove confusion about realmem, physmem & Maxmem.
The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c, as well as replace the handcrafted allocation of the VHPT for the BSP in pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation of phys_avail after it is being created.
|
#
232250 |
|
28-Feb-2012 |
gavin |
Correct capitalization of "Hz" in user-visible text (manpages, printf(), etc).
MFC after: 3 days
|
#
227309 |
|
07-Nov-2011 |
ed |
Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
|
#
225617 |
|
16-Sep-2011 |
kmacy |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls.
Reviewed by: rwatson Approved by: re (bz)
|
#
223526 |
|
25-Jun-2011 |
marcel |
Switch to the event timers infrastructure. This includes: o Setting td_intr_frame to the XIVs trap frame because it's referenced by the ET event handler. o Signal EOI to the CPU before calling the registered XIV handlers. This prevents lost ITC interrupts, which cause starvation in one-shot mode. o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters. o Have the APs call cpu_initclocks() so as to limited the scattering of clock related initialization. cpu_initclocks() calls the <self>_bsp() or <self>_ap() version accordingly. o Uncomment the ET clock handling in cpu_idle(). o Update the DDB 'show pcpu' output for the new MD fields. o Entirely rewritten ia64_ih_clock(). Note that we don't create as many clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale. We can only have 240 XIVs and we can have more CPUs than that. There's a single intrcnt index for the cumulative clock ticks and we keep per CPU counts in the PCPU stats structure. o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order).
Open issues: o Clock interrupts can still be lost. Some tweaking is still necessary.
Thanks to: mav@ for his support, feedback and explanations.
ET stats while committing: eris% sysctl machdep.cpu | grep nclks
machdep.cpu.0.nclks: 24007 machdep.cpu.1.nclks: 22895 machdep.cpu.2.nclks: 13523 machdep.cpu.3.nclks: 9342 machdep.cpu.4.nclks: 9103 machdep.cpu.5.nclks: 9298 machdep.cpu.6.nclks: 10039 machdep.cpu.7.nclks: 9479 eris% vmstat -i | grep clock clock 108599 50
|
#
223478 |
|
23-Jun-2011 |
marcel |
Unblock the outgoing thread after we performed pmap_switch() to switch the region registers. pmap_switch() returns the pmap for which the region register are currently programmed, which needs to be re-programmed on the CPU the ougoing thread gets switched in. This change does not noticibly change anything or fix known bugs, but does give me a warm fuzzy feeling by being more correct.
|
#
222971 |
|
11-Jun-2011 |
marcel |
Add the model number for the Montvale processor (marketed as Itanium 2 9100). At this time we're missing just one: Tukwila (Itanium 2 9300).
|
#
222800 |
|
06-Jun-2011 |
marcel |
Call set_cputicker() to have the time counter use the ITC register. Note that the ITC frequency is fixed.
|
#
222769 |
|
06-Jun-2011 |
marcel |
Improve cpu_idle(): o cpu_idle_hook is expected to be called with interrupts disabled and re-enables interrupts on return. o sync with x86: don't idle when the CPU has runnable tasks o have callers of ia64_call_pal_static() disable interrupts and re-enable interrupts. o add, but compile-out, support for idle mode. This will be enabled at some later time, after proper testing.
|
#
222531 |
|
31-May-2011 |
nwhitehorn |
On multi-core, multi-threaded PPC systems, it is important that the threads be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end.
Reviewed by: jhb
|
#
221271 |
|
30-Apr-2011 |
marcel |
Stop linking against a direct-mapped virtual address and instead use the PBVM. This eliminates the implied hardcoding of the physical address at which the kernel needs to be loaded. Using the PBVM makes it possible to load the kernel irrespective of the physical memory organization and allows us to replicate kernel text on NUMA machines.
While here, reduce the direct-mapped page size to the kernel's page size so that we can support memory attributes better.
|
#
219841 |
|
21-Mar-2011 |
marcel |
Fix switching to physical mode as part of calling into EFI runtime services or PAL procedures. The new implementation is based on specific functions that are known to be called in certain scenarios only. This in particular fixes the PAL call to obtain information about translation registers. In general, the new implementation does not bank on virtual addresses being direct-mapped and will work when the kernel uses PBVM.
When new scenarios need to be supported, new functions are added if the existing functions cannot be changed to handle the new scenario. If a single generic implementation is possible, it will become clear in due time.
While here, change bootinfo to a pointer type in anticipation of future development.
|
#
219741 |
|
18-Mar-2011 |
marcel |
Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk about the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though it's not used anywhere in the source tree.
|
#
219523 |
|
11-Mar-2011 |
mdf |
Mostly revert r219468, as I had misremembered the C standard regarding the size of an extern array.
Keep one change from strncpy to strlcpy.
|
#
219468 |
|
10-Mar-2011 |
mdf |
Use MAXPATHLEN rather than the size of an extern array when copying the kernel name. Also consistenly use strlcpy().
Suggested by: Warner Losh
|
#
217688 |
|
21-Jan-2011 |
pluknet |
Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.
Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
|
#
215052 |
|
09-Nov-2010 |
jhb |
Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.
|
#
214835 |
|
05-Nov-2010 |
jhb |
Adjust the order of operations in spinlock_enter() and spinlock_exit() to work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers.
In cooperation with: bde MFC after: 1 month
|
#
209613 |
|
30-Jun-2010 |
jhb |
Move prototypes for kern_sigtimedwait() and kern_sigprocmask() to <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
|
#
205660 |
|
25-Mar-2010 |
nwhitehorn |
Fix the ia64 build.
Pointy hat to: me
|
#
205642 |
|
25-Mar-2010 |
nwhitehorn |
Change the arguments of exec_setregs() so that it receives a pointer to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms.
Reviewed by: jhb
|
#
205357 |
|
20-Mar-2010 |
marcel |
Don't check for boot_verbose in the environment. The loader does that already and sets RB_VERBOSE. The loader has always done it.
|
#
205234 |
|
16-Mar-2010 |
marcel |
Revamp the interrupt code based on the previous commit: o Introduce XIV, eXternal Interrupt Vector, to differentiate from the interrupts vectors that are offsets in the IVT (Interrupt Vector Table). There's a vector for external interrupts, which are based on the XIVs.
o Keep track of allocated and reserved XIVs so that we can assign XIVs without hardcoding anything. When XIVs are allocated, an interrupt handler and a class is specified for the XIV. Classes are: 1. architecture-defined: XIV 15 is returned when no external interrupt are pending, 2. platform-defined: SAL reports which XIV is used to wakeup an AP (typically 0xFF, but it's 0x12 for the Altix 350). 3. inter-processor interrupts: allocated for SMP support and non-redirectable. 4. device interrupts (i.e. IRQs): allocated when devices are discovered and are redirectable.
o Rewrite the central interrupt handler to call the per-XIV interrupt handler and rename it to ia64_handle_intr(). Move the per-XIV handler implementation to the file where we have the XIV allocation/reservation. Clock interrupt handling is moved to clock.c. IPI handling is moved to mp_machdep.c.
o Drop support for the Intel 8259A because it was broken. When XIV 0 is received, the CPU should initiate an INTA cycle to obtain the interrupt vector of the 8259-based interrupt. In these cases the interrupt controller we should be talking to WRT to masking on signalling EOI is the 8259 and not the I/O SAPIC. This requires adriver for the Intel 8259A which isn't available for ia64. Thus stop pretending to support ExtINTs and instead panic() so that if we come across hardware that has an Intel 8259A, so have something real to work with.
o With XIVs for IPIs dynamically allocatedi and also based on priority, define the IPI_* symbols as variables rather than constants. The variable holds the XIV allocated for the IPI.
o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV assigned to IPI_STOP is delivered.
|
#
205172 |
|
15-Mar-2010 |
marcel |
Have cpu_throw() loop on blocked_lock as well. This bug has existed a long time and has gone unnoticed just as long, because I kept using sched_4bsd (due to sched_ule not working with preemption), but GENERIC had sched_ule by default -- including SMP.
While here, remove unused inclusion of <machine/clock.h>, remove totally bogus inclusion of <i386/include/specialreg.h>.
|
#
205014 |
|
11-Mar-2010 |
nwhitehorn |
Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms.
Reviewed by: kib, jhb
|
#
203883 |
|
14-Feb-2010 |
marcel |
Some code churn: o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used by calling either bus_space_map() or pmap_mapdev(). o Implement bus_space_map() in terms of pmap_mapdev() and implement bus_space_unmap() in terms of pmap_unmapdev(). o Have ia64_pib hold the uncached virtual address of the processor interrupt block throughout the kernel's life and access the elements of the PIB through this structure pointer.
This is a non-functional change with the exception of using ia64_ld1() and ia64_st8() to write to the PIB. We were still using assignments, for which the compiler generates semaphore reads -- which cause undefined behaviour for uncacheable memory. Note also that the memory barriers in ipi_send() are critical for proper functioning.
With all the mapping of uncached memory done by pmap_mapdev(), we can keep track of the translations and wire them in the CPU. This then eliminates the need to reserve a whole region for uncached I/O and it eliminates translation traps for device I/O accesses.
|
#
203054 |
|
27-Jan-2010 |
marcel |
In cpu_switch(), use an atomic operation to set the td_lock of the old thread to the mutex that's passed.
Pointed out by: attilio, jhb
|
#
202904 |
|
23-Jan-2010 |
marcel |
Remove cpu_boot() and call efi_reset_system() directly from cpu_reset().
|
#
201269 |
|
30-Dec-2009 |
marcel |
Revamp bus_space access functions: o Optimize for memory mapped I/O by making all I/O port acceses function calls and marking the test for the IA64_BUS_SPACE_IO tag with __predict_false(). Implement the I/O port access functions in a new file, called bus_machdep.c. o Change the bus_space_handle_t for memory mapped I/O to the virtual address rather than the physical address. This eliminates the PA->VA translation for every I/O access. The handle for I/O port access is still the port number. o Move inb(), outb(), inw(), outw(), inl(), outl(), and their string variants from cpufunc.h and define them in bus.h. On ia64 these are not CPU functions at all. In bus.h they are merely aliases for the new I/O port access functions defined in bus_machdep.h. o Handle the ACPI resource bug in nexus_set_resource(). There we can do it once so that we don't have to worry about it whenever we need to write to an I/O port that is really a memory mapped address.
The upshot of this change is that the KBI is better defined and that I/O port access always involves a function call, allowing us to change the actual implementation without breaking the KBI. For memory mapped I/O the virtual address is abstracted, so that we can change the VA->PA mapping in the kernel without causing an KBI breakage. The exception at this time is for bus_space_map() and bus_space_unmap().
MFC after: 1 week.
|
#
200889 |
|
23-Dec-2009 |
marcel |
Export the bus, cpu and itc frequencies under the hw.freq sysctl node. The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The frequencies are rounded to the nearest whole MHz.
While here, rename and re-type bus_frequency, processor_frequency and itc_frequency to bus_freq, cpu_freq and itc_freq and make them static. As unsigned integers, the hw.freq.cpu sysctl can more easily be made generic (across all architectures) making porting easier.
MFC after: 3 days
|
#
200207 |
|
07-Dec-2009 |
marcel |
Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_id excluded, as it's used by MI code) and mode the sysctl variables from pcpu_stats to pcpu_md. Adjust all references accordingly.
While nearby, change the PCPU sysctl tree so that they match the CPU device sysctl tree -- they are now children of a static node called "machdep.cpu" and are named only with their cpu ID.
|
#
200051 |
|
03-Dec-2009 |
marcel |
Make sure bus space accesses use unorder memory loads and stores. Memory accesses are posted in program order by virtue of the uncacheable memory attribute. Since GCC, by default, adds acquire and release semantics to volatile memory loads and stores, we need to use inline assembly to guarantee it. With inline assembly, we don't need volatile pointers anymore.
Itanium does not support semaphore instructions to uncacheable memory.
|
#
199893 |
|
28-Nov-2009 |
marcel |
Eliminate teh use of MAXCPU in static arrays of interrupt counters by adding statistics counters to the PCPU structure. Export the counters through sysctl by giving each PCPU structure its own sysctl context.
While here, fix cnt.v_intr by not just having it count clock interrupts, but every interrupt and add more counters for each interrupt source.
|
#
198733 |
|
31-Oct-2009 |
marcel |
Reimplement the lazy FP context switching: o Move all code into a single file for easier maintenance. o Use a single global lock to avoid having to handle either multiple locks or race conditions. o Make sure to disable the high FP registers after saving or dropping them. o use msleep() to wait for the other CPU to save the high FP registers.
This change fixes the high FP inconsistency panics.
A single global lock typically serializes too much, which may be noticable when a lot of threads use the high FP registers, but in that case it's probably better to switch the high FP context synchronuously. Put differently: cpu_switch() should switch the high FP registers if the incoming and outgoing threads both use the high FP registers.
|
#
198507 |
|
27-Oct-2009 |
kib |
In r197963, a race with thread being selected for signal delivery while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls.
Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race.
Reviewed by: davidxu Tested by: pho MFC after: 1 month
|
#
196268 |
|
15-Aug-2009 |
marcel |
Decouple ACPI CPU Ids from FreeBSD's cpuid. The ACPI Ids can be sparse, which causes a kernel assert.
Approved by: re (kensmith)
|
#
194784 |
|
23-Jun-2009 |
jeff |
Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_*. Requires only one extra instruction more than PCPU_* and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent.
Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas
|
#
192324 |
|
18-May-2009 |
marcel |
Rename ia64_invalidate_icache() to ia64_sync_icache(). We're not invalidating anything.
|
#
192323 |
|
18-May-2009 |
marcel |
Add cpu_flush_dcache() for use after non-DMA based I/O so that a possible future I-cache coherency operation can succeed. On ARM for example the L1 cache can be (is) virtually mapped, which means that any I/O that uses temporary mappings will not see the I-cache made coherent. On ia64 a similar behaviour has been observed. By flushing the D-cache, execution of binaries backed by md(4) and/or NFS work reliably. For Book-E (powerpc), execution over NFS exhibits SIGILL once in a while as well, though cpu_flush_dcache() hasn't been implemented yet.
Doing an explicit D-cache flush as part of the non-DMA based I/O read operation eliminates the need to do it as part of the I-cache coherency operation itself and as such avoids pessimizing the DMA-based I/O read operations for which D-cache are already flushed/invalidated. It also allows future optimizations whereby the bcopy() followed by the D-cache flush can be integrated in a single operation, which could be implemented using on-chips DMA engines, by-passing the D-cache altogether.
|
#
180354 |
|
07-Jul-2008 |
marcel |
Add inline function ia64_fc_i() to abstract inline assembly. Use the new inline function in ia64_invalidate_icache(). While there, add proper synchronization so that we know the fc.i instructions have taken effect when we return.
|
#
179229 |
|
23-May-2008 |
alc |
The VM system no longer uses setPQL2(). Remove it and its helpers.
|
#
179173 |
|
21-May-2008 |
marcel |
We can call ia64_flush_dirty() when the corresponding process is locked or not. As such, use PROC_LOCKED() to determine which case it is and lock the process when not.
|
#
178494 |
|
25-Apr-2008 |
marcel |
Unbreak previous commit. While here, refactor the code a bit.
|
#
178471 |
|
25-Apr-2008 |
jeff |
- Add an integer argument to idle to indicate how likely we are to wake from idle over the next tick. - Add a new MD routine, cpu_wake_idle() to wakeup idle threads who are suspended in cpu specific states. This function can fail and cause the scheduler to fall back to another mechanism (ipi). - Implement support for mwait in cpu_idle() on i386/amd64 machines that support it. mwait is a higher performance way to synchronize cpus as compared to hlt & ipis. - Allow selecting the idle routine by name via sysctl machdep.idle. This replaces machdep.cpu_idle_hlt. Only idle routines supported by the current machine are permitted.
Sponsored by: Nokia
|
#
178429 |
|
22-Apr-2008 |
phk |
Now that all platforms use genclock, shuffle things around slightly for better structure.
Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait.
In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only.
Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h>
<sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof.
Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references.
Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere.
Remove a lot of unnecessary <sys/clock.h> includes.
Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.
|
#
178215 |
|
15-Apr-2008 |
marcel |
Support and switch to the ULE scheduler: o Implement IPI_PREEMPT, o Set td_lock for the thread being switched out, o For ULE & SMP, loop while td_lock points to blocked_lock for the thread being switched in, o Enable ULE by default in GENERIC and SKI,
|
#
177769 |
|
30-Mar-2008 |
marcel |
Better implement I-cache invalidation. The previous implementation was a kluge. This implementation matches the behaviour on powerpc and sparc64. While on the subject, make sure to invalidate the I-cache after loading a kernel module.
MFC after: 2 weeks
|
#
177642 |
|
26-Mar-2008 |
phk |
The "free-lance" timer in the i8254 is only used for the speaker these days, so de-generalize the acquire_timer/release_timer api to just deal with speakers.
The new (optional) MD functions are: timer_spkr_acquire() timer_spkr_release() and timer_spkr_setfreq()
the last of which configures the timer to generate a tone of a given frequency, in Hz instead of 1/1193182th of seconds.
Drop entirely timer2 on pc98, it is not used anywhere at all.
Move sysbeep() to kern/tty_cons.c and use the timer_spkr*() if they exist, and do nothing otherwise.
Remove prototypes and empty acquire-/release-timer() and sysbeep() functions from the non-beeping archs.
This eliminate the need for the speaker driver to know about i8254frequency at all. In theory this makes the speaker driver MI, contingent on the timer_spkr_*() functions existing but the driver does not know this yet and still attaches to the ISA bus.
Syscons is more tricky, in one function, sc_tone(), it knows the hz and things are just fine.
In the other function, sc_bell() it seems to get the period from the KDMKTONE ioctl in terms if 1/1193182th second, so we hardcode the 1193182 and leave it at that. It's probably not important.
Change a few other sysbeep() uses which obviously knew that the argument was in terms of i8254 frequency, and leave alone those that look like people thought sysbeep() took frequency in hertz.
This eliminates the knowledge of i8254_freq from all but the actual clock.c code and the prof_machdep.c on amd64 and i386, where I think it would be smart to ask for help from the timecounters anyway [TBD].
|
#
177253 |
|
16-Mar-2008 |
rwatson |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr.
MFC after: 1 month Discussed with: imp, rink
|
#
177126 |
|
12-Mar-2008 |
jeff |
- Fix build breakage; there was a reference to a removed syscall in a KASSERT(). Attempt to cleanup the comment to reflect reality.
|
#
177091 |
|
12-Mar-2008 |
jeff |
Remove kernel support for M:N threading.
While the KSE project was quite successful in bringing threading to FreeBSD, the M:N approach taken by the kse library was never developed to its full potential. Backwards compatibility will be provided via libmap.conf for dynamically linked binaries and static binaries will be broken.
|
#
176286 |
|
14-Feb-2008 |
marcel |
On Montecito processors, the instruction cache is in fact not coherent with the data caches. Implement a quick fix to allow us to boot on Montecito, while I'm working on a better fix in the mean time.
Commit made on Montecito-based Itanium...
|
#
175959 |
|
04-Feb-2008 |
marcel |
Allocate a stack for thread0 and switch to it before calling mi_startup(). This frees up kstack for static PAL/SAL calls and double-fault handling.
|
#
174898 |
|
25-Dec-2007 |
rwatson |
Add a new 'why' argument to kdb_enter(), and a set of constants to use for that argument. This will allow DDB to detect the broad category of reason why the debugger has been entered, which it can use for the purposes of deciding which DDB script to run.
Assign approximate why values to all current consumers of the kdb_enter() interface.
|
#
173615 |
|
14-Nov-2007 |
marcel |
o Rename cpu_thread_setup() to cpu_thread_alloc() to better communicate that it relates to (is called by) thread_alloc() o Add cpu_thread_free() which is called from thread_free() to counter-act cpu_thread_alloc().
i386: Have cpu_thread_free() call cpu_thread_clean() to preserve behaviour. ia64: Have cpu_thread_free() call mtx_destroy() for the mutex initialized in cpu_thread_alloc().
PR: ia64/118024
|
#
173361 |
|
05-Nov-2007 |
kib |
Fix for the panic("vm_thread_new: kstack allocation failed") and silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL.
As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done.
The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper).
In collaboration with: Peter Holm Reviewed by: jhb
|
#
171720 |
|
04-Aug-2007 |
marcel |
Replace "__asm __volatile()" by equivalent support functions from ia64_cpu.h. This improves readability and consistency and aids in auditing the code. Add data-serialization after writing to cr.tpr.
Approved by: re (blanket)
|
#
171663 |
|
30-Jul-2007 |
marcel |
Explicitly map the VHPT on all processors. Previously we were merely lucky that the VHPT was mapped as a side-effect of mapping the kernel, but when there's enough physical memory, this may not at all be the case.
Approved by: re (blanket)
|
#
170519 |
|
10-Jun-2007 |
alc |
Add the machine-specific definitions for configuring the new physical memory allocator.
Set the size of phys_avail[] using one of these definitions.
Approved by: re
|
#
170507 |
|
10-Jun-2007 |
marcel |
Work around a firmware bug in the HP rx2660, where in ACPI an I/O port is really a memory mapped I/O address. The bug is in the GAS that describes the address and in particular the SpaceId field. The field should not say the address is an I/O port when it clearly is not.
With an additional check for the IA64_BUS_SPACE_IO case in the bus access functions, and the fact that I/O ports pretty much not used in general on ia64, make the calculation of the I/O port address a function. This avoids inlining the work-around into every driver, and also helps reduce overall code bloat.
|
#
170444 |
|
08-Jun-2007 |
marcel |
Physical memory regions can be larger than INT_MAX. Change size1 from an int to a long to avoid printing negative byte and page counts.
|
#
170403 |
|
07-Jun-2007 |
marcel |
Remove remaining references to pc_curtid missed in previous commit.
|
#
170390 |
|
06-Jun-2007 |
davidxu |
Fix compiling error.
|
#
170306 |
|
04-Jun-2007 |
jeff |
Commit 13/14 of sched_lock decomposition. - Add a new parameter to cpu_switch() that is used to release the lock on the outgoing thread and properly acquire the lock on the incoming thread. This parameter is not required for schedulers that don't do per-cpu locking and architectures which do not support it may continue to use the 4BSD scheduler. This feature is presently not supported on ia64
Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
|
#
170170 |
|
31-May-2007 |
attilio |
Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately.
Requested by: alc Approved by: jeff (mentor)
|
#
169667 |
|
18-May-2007 |
jeff |
- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
|
#
169291 |
|
05-May-2007 |
alc |
Define every architecture as either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing.
This change is a combination of Nathan Whitehorn's patch and my own work in perforce.
Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194
|
#
165369 |
|
20-Dec-2006 |
davidxu |
Add a lwpid field into per-cpu structure, the lwpid represents current running thread's id on each cpu. This allow us to add in-kernel adaptive spin for user level mutex. While spinning in user space is possible, without correct thread running state exported from kernel, it hardly can be implemented efficiently without wasting cpu cycles, however exporting thread running state unlikely will be implemented soon as it has to design and stablize interfaces. This implementation is transparent to user space, it can be disabled dynamically. With this change, mutex ping-pong program's performance is improved massively on SMP machine. performance of mysql super-smack select benchmark is increased about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems which have bunch of cpus and system-call overhead is low (athlon64, opteron, and core-2 are known to be fast), the adaptive spin does help performance.
Added sysctls: kern.threads.umtx_dflt_spins if the sysctl value is non-zero, a zero umutex.m_spincount will cause the sysctl value to be used a spin cycle count. kern.threads.umtx_max_spins the sysctl sets upper limit of spin cycle count.
Tested on: Athlon64 X2 3800+, Dual Xeon 5130
|
#
164936 |
|
06-Dec-2006 |
julian |
Threading cleanup.. part 2 of several.
Make part of John Birrell's KSE patch permanent.. Specifically, remove: Any reference of the ksegrp structure. This feature was never fully utilised and made things overly complicated. All code in the scheduler that tried to make threaded programs fair to unthreaded programs. Libpthread processes will already do this to some extent and libthr processes already disable it.
Also: Since this makes such a big change to the scheduler(s), take the opportunity to rename some structures and elements that had to be moved anyhow. This makes the code a lot more readable.
The ULE scheduler compiles again but I have no idea if it works.
The 4bsd scheduler still reqires a little cleaning and some functions that now do ALMOST nothing will go away, but I thought I'd do that as a separate commit.
Tested by David Xu, and Dan Eischen using libthr and libpthread.
|
#
164395 |
|
18-Nov-2006 |
marcel |
Since printf also has at least one critical section, we need to initialize pc_curthread. While here, rename early_pcpu to pcpu0 to be conistent (compare thread0 and proc0).
|
#
164392 |
|
18-Nov-2006 |
marcel |
Now that printf() needs the PCPU, set it up before we call printf(). Change the pc_pcb field from a pointer to struct pcb to struct pcb so that sizeof(struct pcb) includes the PCB we use for IPI_STOP. Statically declare early_pcb so that we don't have to allocate the PCB for thread0. This way we can setup the PCPU before cninit() and thus before we use printf().
|
#
163928 |
|
03-Nov-2006 |
marcel |
Make sure kern_envp is never NULL. If we don't get a pointer to the environment from the loader, use the static environment.
|
#
163709 |
|
26-Oct-2006 |
jb |
Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE).
Reviewed by: davidxu@
|
#
159850 |
|
21-Jun-2006 |
marcel |
Identify the cual-core Montecito.
MFC after: 3 days
|
#
155922 |
|
22-Feb-2006 |
jhb |
Close some races between procfs/ptrace and exit(2): - Reorder the events in exit(2) slightly so that we trigger the S_EXIT stop event earlier. After we have signalled that, we set P_WEXIT and then wait for any processes with a hold on the vmspace via PHOLD to release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops to zero. - Change proc_rwmem() to require that the processing read from has its vmspace held via PHOLD by the caller and get rid of all the junk to screw around with the vmspace reference count as we no longer need it. - In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it doesn't exist. - Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem() to clear an earlier single-step simualted via a breakpoint). We only do one to avoid races. Also, by making the EINVAL error for unknown requests be part of the default: case in the switch, the various switch cases can now just break out to return which removes a _lot_ of duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug where a LWP ptrace command could return EINVAL with the proc lock still held. - Changed the locking for ptrace_single_step(), ptrace_set_pc(), and ptrace_clear_single_step() to always be called with the proc lock held (it was a mixed bag previously). Alpha and arm have to drop the lock while the mess around with breakpoints, but other archs avoid extra lock release/acquires in ptrace(). I did have to fix a couple of other consumers in kern_kse and a few other places to hold the proc lock and PHOLD.
Tested by: ps (1 mostly, but some bits of 2-4 as well) MFC after: 1 week
|
#
155680 |
|
14-Feb-2006 |
jhb |
Fix the hw.realmem sysctl. The global realmem variable is a count of pages, not a count of bytes. The sysctl handler for hw.realmem already uses ctob() to convert realmem from pages to bytes. Thus, on archs that were storing a byte count in the realmem variable, hw.realmem was inflated.
Reported by: Valerio daelli valerio dot daelli at gmail dot com (alpha) MFC after: 3 days
|
#
153940 |
|
31-Dec-2005 |
netchild |
MI changes: - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible)
MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's)
Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not.
Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)
|
#
153165 |
|
06-Dec-2005 |
ru |
Fix -Wundef warnings from compiling GENERIC and LINT kernels of all architectures.
|
#
151316 |
|
14-Oct-2005 |
davidxu |
1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code.
2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread.
3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc.
4. Add td_sigqueue to thread structure to hold all signals delivered to thread.
5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed.
6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals.
Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64
|
#
149915 |
|
09-Sep-2005 |
marcel |
Change the High FP lock from a sleep lock to a spin lock. We can take the lock from interrupt context, which causes an implicit lock order reversal. We've been using the lock carefully enough that making it a spin lock should not be harmful.
|
#
148807 |
|
06-Aug-2005 |
marcel |
Improve SMP support: o Allocate a VHPT per CPU. The VHPT is a hash table that the CPU uses to look up translations it can't find in the TLB. As such, the VHPT serves as a level 1 cache (the TLB being a level 0 cache) and best results are obtained when it's not shared between CPUs. The collision chain (i.e. the hash bucket) is shared between CPUs, as all buckets together constitute our collection of PTEs. To achieve this, the collision chain does not point to the first PTE in the list anymore, but to a hash bucket head structure. The head structure contains the pointer to the first PTE in the list, as well as a mutex to lock the bucket. Thus, each bucket is locked independently of each other. With at least 1024 buckets in the VHPT, this provides for sufficiently finei-grained locking to make the ssolution scalable to large SMP machines. o Add synchronisation to the lazy FP context switching. We do this with a seperate per-thread lock. On SMP machines the lazy high FP context switching without synchronisation caused inconsistent state, which resulted in a panic. Since the use of the high FP registers is not common, it's possible that races exist. The ia64 package build has proven to be a good stress test, so this will get plenty of exercise in the near future. o Don't use the local ID of the processor we want to send the IPI to as the argument to ipi_send(). use the struct pcpu pointer instead. The reason for this is that IPI delivery is unreliable. It has been observed that sending an IPI to a CPU causes it to receive a stray external interrupt. As such, we need a way to make the delivery reliable. The intended solution is to queue requests in the target CPU's per-CPU structure and use a single IPI to inform the CPU that there's a new entry in the queue. If that IPI gets lost, the CPU can check it's queue at any convenient time (such as for each clock interrupt). This also allows us to send requests to a CPU without interrupting it, if such would be beneficial.
With these changes SMP is almost working. There are still some random process crashes and the machine can hang due to having the IPI lost that deals with the high FP context switch.
The overhead of introducing the hash bucket head structure results in a performance degradation of about 1% for UP (extra pointer indirection). This is surprisingly small and is offset by gaining reasonably/good scalable SMP support.
|
#
147773 |
|
05-Jul-2005 |
marcel |
Enhance ia64_flush_dirty() to handle the case in which td != curthread. This case is triggered with ptrace(2) and the PT_SETREGS function. Change the return type of the function to int so that errors can be passed on to the caller.
Approved by: re (scottl)
|
#
144637 |
|
04-Apr-2005 |
jhb |
Divorce critical sections from spinlocks. Critical sections as denoted by critical_enter() and critical_exit() are now solely a mechanism for deferring kernel preemptions. They no longer have any affect on interrupts. This means that standalone critical sections are now very cheap as they are simply unlocked integer increments and decrements for the common case.
Spin mutexes now use a separate KPI implemented in MD code: spinlock_enter() and spinlock_exit(). This KPI is responsible for providing whatever MD guarantees are needed to ensure that a thread holding a spin lock won't be preempted by any other code that will try to lock the same lock. For now all archs continue to block interrupts in a "spinlock section" as they did formerly in all critical sections. Note that I've also taken this opportunity to push a few things into MD code rather than MI. For example, critical_fork_exit() no longer exists. Instead, MD code ensures that new threads have the correct state when they are created. Also, we no longer try to fixup the idlethreads for APs in MI code. Instead, each arch sets the initial curthread and adjusts the state of the idle thread it borrows in order to perform the initial context switch.
This change is largely a big NOP, but the cleaner separation it provides will allow for more efficient alternative locking schemes in other parts of the kernel (bare critical sections rather than per-CPU spin mutexes for per-CPU data for example).
Reviewed by: grehan, cognet, arch@, others Tested on: i386, alpha, sparc64, powerpc, arm, possibly more
|
#
143057 |
|
02-Mar-2005 |
marcel |
Make sure fpswa_iface equals NULL when bootinfo.bi_fpswa equals 0. We need to be able to test for the (possible) non-existence of the FPSWA code.
PR: ia64/77591 Submitted by: Christian Kandeler (christian dot kandeler at hob dot de) MFC after: 1 day
|
#
142956 |
|
01-Mar-2005 |
wes |
Attempt to doff the pointy hat: implement 'hw.realmem' on remaining architectures. Pointed out by O'Brien, ScottL via email.
Reviewed by: obrien (various)
|
#
141378 |
|
05-Feb-2005 |
njl |
Finish the job of sorting all includes and fix the build by including malloc.h before proc.h on sparc64. Noticed by das@
Compiled on: alpha, amd64, i386, pc98, sparc64
|
#
141248 |
|
04-Feb-2005 |
marcel |
Include sys/bus.h before sys/cpu.h. The latter needs device_t.
|
#
141237 |
|
04-Feb-2005 |
njl |
Add an implementation of cpu_est_clockrate(9). This function estimates the current clock frequency for the given CPU id in units of Hz.
|
#
138543 |
|
08-Dec-2004 |
marcel |
Don't obtain the HCDP address directly from the bootinfo structure. Use a function to keep the details at arms length from uart(4).
|
#
138129 |
|
27-Nov-2004 |
das |
Don't include sys/user.h merely for its side-effect of recursively including other headers.
|
#
137912 |
|
20-Nov-2004 |
das |
U areas are going away, so don't allocate one for process 0.
Reviewed by: arch@
|
#
136183 |
|
06-Oct-2004 |
marcel |
Add the Madison II, which is the second generation Madison. The Madison II is model 2 in the Itanium 2 family and has up to 9MB of L3 cache and clocks higher than 1.5Ghz. There's no LV variant AFAICT.
|
#
135590 |
|
22-Sep-2004 |
marcel |
Redefine a PTE as a 64-bit integral type instead of a struct of bit-fields. Unify the PTE defines accordingly and update all uses.
|
#
135453 |
|
19-Sep-2004 |
marcel |
MFp4: Completely remove the remaining EFI includes and add our own (type) definitions instead. While here, abstract more of the internals by providing interface functions.
|
#
135405 |
|
17-Sep-2004 |
marcel |
Provide our own FPSWA definitions, instead of depending on the Intel EFI headers and put them all in <machine/fpu.h>. The Intel EFI headers conflict with the Intel ACPI headers (duplicate type definitions), so are being phased out in the kernel.
|
#
134791 |
|
05-Sep-2004 |
julian |
Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces.
The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time.
The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own.
Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens.
Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure.
The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring.
A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated.
Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can.
Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week
|
#
133878 |
|
16-Aug-2004 |
marcel |
Catch up with the drive-by renaming of IA32 to COMPAT_IA32. It must have been rush hour...
While here, move COMPAT_IA32 from opt_global.h to opt_compat.h like on amd64. Consequently, it's unsafe to use the option in pcb.h. We now unconditionally have the ia32 specific registers in the PCB.
This commit is untested.
|
#
133472 |
|
11-Aug-2004 |
marcel |
In set_regs(), flush the dirty registers onto the backingstore before we update the registers. That way we don't have any dirty registers to worry about and also know that bsp=bspstore, which makes updating the RSE related registers predictable. This is not the end of it. We need more validity checks, but for now this allows us to complete the gdb testsuite without crashing the kernel.
|
#
133464 |
|
11-Aug-2004 |
marcel |
Add __elfN(dump_thread). This function is called from __elfN(coredump) to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes.
Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc
|
#
133291 |
|
07-Aug-2004 |
marcel |
Implement single stepping when we leave the kernel through the EPC syscall path. The basic problem is that we cannot set the single stepping flag directly, because we don't leave the kernel via an interrupt return. So, we need another way to set the single stepping flag. The way we do this is by enabling the lower-privilege transfer trap, which gets raised when we drop the privilege level. However, since we're still running in kernel space (sec), we're not yet done. We clear the lower- privilege transfer trap, enable the taken-branch trap and continue exiting the kernel until we branch into user space. Given the current code, there's a total of two traps this way before we can raise SIGTRAP.
|
#
132088 |
|
13-Jul-2004 |
davidxu |
Add ptrace_clear_single_step(), alpha already has it for years, the function will be used by ptrace to clear a thread's single step state.
|
#
131945 |
|
10-Jul-2004 |
marcel |
Update for the KDB framework: o ksym_start and ksym_end changed type to vm_offset_t. o Make debugging support conditional upon KDB instead of DDB. o Call kdb_enter() instead of breakpoint(). o Remove implementation of Debugger(). o Call kdb_trap() according to the new world order.
unwinder: o s/db_active/kdb_active/g o Various s/ddb/kdb/g o Add support for unwinding from the PCB as well as the trapframe. Abuse a spare field in the special register set to flag whether the PCB was actually constructed from a trapframe so that we can make the necessary adjustments.
md_var.h: o Add RSE convenience macros. o Add ia64_bsp_adjust() to add or subtract from BSP while taking NaT collections into account.
|
#
131905 |
|
10-Jul-2004 |
marcel |
Implement makectx(). The makectx() function is used by KDB to create a PCB from a trapframe for purposes of unwinding the stack. The PCB is used as the thread context and all but the thread that entered the debugger has a valid PCB. This function can also be used to create a context for the threads running on the CPUs that have been stopped when the debugger got entered. This however is not done at the time of this commit.
|
#
130344 |
|
11-Jun-2004 |
phk |
Deorbit COMPAT_SUNOS.
We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither a sparc32 port nor a SunOS4.x compatibility desire these days.
|
#
126825 |
|
10-Mar-2004 |
marcel |
Identify the Deerfield processor. Deerfield is a low-voltage variant based on the Madison core and targeting the low end of the spectrum. Its clock frequency is 1Ghz, whereas Madison starts at 1.3Ghz. Since the CPUID information is the same for Madison and Deerfield, we use the clock frequency to identify the processor. Supposedly the Deerfield only uses 62W, which seems to be less than modern Xeon processors (about 70W) and about half what a Madison would need.
|
#
126106 |
|
22-Feb-2004 |
marcel |
Do not pre-map the I/O port space. On the Intel Tiger 4 this conflicts with a memory mapped I/O range that's immediately before it and is not 256MB aligned. As a result, when an address is accessed in the memory mapped range and a direct mapping is added for it, it overlaps with the pre-mapped I/O port space and causes a machine check.
Based on a patch from: arun@
|
#
124092 |
|
03-Jan-2004 |
davidxu |
Make sigaltstack as per-threaded, because per-process sigaltstack state is useless for threaded programs, multiple threads can not share same stack. The alternative signal stack is private for thread, no lock is needed, the orignal P_ALTSTACK is now moved into td_pflags and renamed to TDP_ALTSTACK. For single thread or Linux clone() based threaded program, there is no semantic changed, because those programs only have one kernel thread in every process.
Reviewed by: deischen, dfr
|
#
123528 |
|
13-Dec-2003 |
marcel |
In set_mcontext(), take into account that kse_switchin(2) will eventually be passed an async. context as well as a syscall context. While here, fix a serious bug in that if the trapframe is a syscall frame, but we're restoring an async context, we need to clear the FRAME_SYSCALL flag so that we leave the kernel via exception_restore.
|
#
123255 |
|
07-Dec-2003 |
marcel |
Simplify the contexts created by the kernel and remove the related flags. We now create asynchronous contexts or syscall contexts only. Syscall contexts differ from the minimal ABI dictated contexts by having the scratch registers saved and restored because that's where we keep the syscall arguments and syscall return values. Since this change affects KSE, have it use kse_switchin(2) for the "new" syscall context.
|
#
122918 |
|
20-Nov-2003 |
marcel |
Set the ACPI processor Id in the PCPU structure so that CPU idling on SMP systems has a chance of working. This was a loose end of the implementation of the ACPI Cx idle states. Since our logical CPU Id is the ACPI processor Id, we do not need to jump through hoops to obtain it.
Approved: re@ (jhb)
|
#
122763 |
|
15-Nov-2003 |
njl |
Add the pc_acpi_id PCPU member. The new acpi_cpu driver uses this to dereference the softc.
|
#
122525 |
|
12-Nov-2003 |
marcel |
Remove ia64_highfp_load() now that it's unused.
|
#
122518 |
|
11-Nov-2003 |
marcel |
Further work-out the handling of the high FP registers. The most important change is in cpu_switch() where we disable the high FP registers for the thread that we switch-out if the CPU currently has its high FP registers. This avoids that the high FP registers remain enabled for the thread even when the CPU has unloaded them or the thread migrated to another processor. Likewise, when we switch-in a thread of that has its high FP registers on the CPU, we enable them. This avoids an otherwise harmless, but unnecessary trap to have them enabled.
The code that handles the disabled high FP trap (in trap()) has been turned into a critical section for the most part to avoid being preempted. If there's a race, we bail out and have the processor trap again if necessary.
Avoid using the generic ia64_highfp_save() function when the context is predictable. The function adds unnecessary overhead. Don't use ia64_highfp_load() for the same reason. The function is now unused and can be removed.
These changes make the lazy context switching of the high FP registers in an UP kernel functional.
|
#
122480 |
|
11-Nov-2003 |
marcel |
Save and restore the high FP registers in {g|s}_mcontext(). Note that we currently do not keep track of whether the thread has actually used the high FP registers before. If not, we should not save them in the context which automaticly means that we also would not restore them from the context. For now, do it unconditionally so that we can reach functional completeness.
|
#
122479 |
|
11-Nov-2003 |
marcel |
Fix a nasty bug that got exposed when the sendsig() and sigreturn() functions switched to using {g|s}et_mcontext(). The problem is that sigreturn(), being a syscall, can be given an async. context (i.e. one corresponding to an interrupt or trap). When this happens, we try to return to user mode via epc_syscall_return with a trapframe that can only be used to return to user mode via exception_restore.
To fix this, we check the frame's flags immediately prior to epc_syscall_return and branch to exception_restore for non-syscall frames. Modify the assertion in set_mcontext() to check that if there's a mismatch, it's because of sigreturn().
|
#
122389 |
|
10-Nov-2003 |
marcel |
In get_mcontext(), do not update bspstore and ndirty in the trapframe. Only update them in the newly created context to reflect the state after copying the dirty registers onto the user stack. If we were to update the trapframe, we lose the state at entry into the kernel. We may need that after we create the context, such as for KSE upcalls.
We have to update the trapframe after writing the dirty registers to the user stack for signal delivery to work. But this is best done in sendsig() itself where it applies, not in get_mcontext() where it's done unconditionally.
|
#
122368 |
|
09-Nov-2003 |
marcel |
Use get_mcontext() to construct the signal context in sendsig() and use set_mcontext() to restore the context in sigreturn(). Since we put the syscall number and the syscall arguments in the trapframe (we don't save the scratch registers for syscalls, which allows us to reuse the space to our advantage), create a MD specific flag so that we save the scratch registers even for syscalls. We would not be able to restart a syscall otherwise.
The signal trampoline does not need to flush the regiters anymore, because get_mcontext() already handles that. In fact, if we set up the context correctly, we do not need to have a trampoline at all. This change however only minimally changes the trampoline code. In follow-up commits this can be further optimized.
Note that normally we preserve cfm and iip in the trapframe created by the EPC syscall path when we restore a context in set_mcontext() because those fields are not normally set for a synchronuous context. The kernel puts the return address and frame info of the syscall stub in there. By preserving these fields we hide this detail from userland which allows us to use setcontext(2) for user created contexts. However, sigreturn() is commonly called from the trampoline, which means that if we preserve cfm and iip in all cases, we would return to the trampoline after the sigreturn(), which means we hit the safety net: we call exit(2). So, we do not preserve cfm and iip when we have a synchronous context that also has scratch registers (the uncommon context created by sendsig() only), under the assumption that if such a context is created in userland, something special is going on and the use of cfm and iip is then just another quirk. All this is invisible in the common case.
|
#
122364 |
|
09-Nov-2003 |
marcel |
Change the clear_ret argument of get_mcontext() to be a flags argument. Since all callers either passed 0 or 1 for clear_ret, define bit 0 in the flags for use as clear_ret. Reserve bits 1, 2 and 3 for use by MI code for possible (but unlikely) future use. The remaining bits are for use by MD code.
This change is triggered by a need on ia64 to have another knob for get_mcontext().
|
#
121635 |
|
28-Oct-2003 |
marcel |
When switching the RSE to use the kernel stack as backing store, keep the RNAT bit index constant. The net effect of this is that there's no discontinuity WRT NaT collections which greatly simplifies certain operations. The cost of this is that there can be up to 504 bytes of unused stack between the true base of the kernel stack and the start of the RSE backing store. The cost of adjusting the backing store pointer to keep the RNAT bit index constant, for each kernel entry, is negligible.
The primary reasons for this change are: 1. Asynchronuous contexts in KSE processes have the disadvantage of having to copy the dirty registers from the kernel stack onto the user stack. The implementation we had so far copied the registers one at a time without calculating NaT collection values. A process that used speculation would not work. Now that the RNAT bit index is constant, we can block-copy the registers from the kernel stack to the user stack without having to worry about NaT collections. They will be in the right place on the user stack. 2. The ndirty field in the trapframe is now also usable in userland. This was previously not the case because ndirty also includes the space occupied by NaT collections. The value could be off by 8, depending on the discontinuity. Now that the RNAT bit index is contants, we have exactly the same number of NaT collection points on the kernel stack as we would have had on the user stack if we didn't switch backing stores. 3. Debuggers and other applications that use ptrace(2) can now copy the dirty registers from the kernel stack (using ptrace(2)) and copy them whereever they want them (onto the user stack of the inferior as might be the case for gdb) without having to worry about NaT collections in the same way the kernel doesn't have to worry about them.
There's a second order effect caused by the randomization of the base of the backing store, for it depends on the number of dirty registers the processor happened to have at the time of entry into the kernel. The second order effect is that the RSE will have a better cache utilization as compared to having the backing store always aligned at page boundaries. This has not been measured and may be in practice only minimally beneficial, if at all measurable.
|
#
121457 |
|
24-Oct-2003 |
marcel |
Remove ia64_pack_bundle() and ia64_unpack_bundle(). They are not used anymore.
|
#
121452 |
|
24-Oct-2003 |
arun |
Use a TR of size 1 << IA64_ID_PAGE_SHIFT instead of 16M to avoid overlapping TR/TC entries (which results in a machine check). Note that we don't look at the size of the memory descriptor, because it doesn't guarantee non-overlap.
With this change, a UP kernel could boot on a Intel Tiger4 machine with the following options:
options LOG2_ID_PAGE_SIZE=26 # 64M options LOG2_PAGE_SIZE=14 # 16K
Approved by: marcel
|
#
121294 |
|
20-Oct-2003 |
marcel |
Remove md_bspstore from the MD fields of struct thread. Now that the backing store is at a fixed address, there's no need for a per-thread variable.
|
#
121228 |
|
18-Oct-2003 |
njl |
Add the cpu_idle_hook() function pointer so that other idlers can be hooked at runtime. Make C1 sleep (e.g., HLT) be the default. This prepares the way for further ACPI sleep states.
|
#
121148 |
|
17-Oct-2003 |
marcel |
Implement cpu_idle() on ia64. We put the processor in a lightweight halt state that minimizes power consumption while still preserving cache and TLB coherency. Halting the processor is not conditional at this time. Tested with UP and SMP kernels.
|
#
120683 |
|
03-Oct-2003 |
marcel |
Swap the syscall caller frame info (i.e. the return pointer and frame marker) and the syscall stub frame info in the trap frame. Previously we stored the stub frame info in (rp,pfs) and the caller frame info in (iip,cfm). This ends up being suboptimal for the following reasons: 1. When we create a new context, such as for an execve(2), we had to set the (rp,pfs) pair for the entry point when using the syscall path out of the kernel but we need to set the (iip,cfm) pair when we take the interrupt way out. This is mostly just an inconsistency from the kernel's point of view, but an ugly irregularity from gdb(1)'s point of view. 2. The getcontext(2) and setcontext(2) syscalls had to swap the (rp,pfs) and (iip,cfm) pairs to make the context compatible with one created purely in userland.
Swapping the (rp,pfs) and (iip,cfm) pairs is visible to signal handlers that actually peek at the mcontext_t and to gdb(1). Since this change is made for gdb(1) and we don't care about signal handlers that peek at the mcontext_t because we're still a tier 2 platform, this ABI breakage is academic at this moment in time.
Note that there was no real reason to save the caller frame info in (iip,cfm) and the stub frame info in (rp,pfs).
|
#
120296 |
|
20-Sep-2003 |
marcel |
Fix the last remaining problem encountered by KSE: apparently it is not guaranteed that the RSE writes the NaT collection immediately, sort of atomically, to the backing store when it writes the register immediately prior to the NaT collection point. This means that we cannot assume that the low 9 bits of the backingstore pointer do not point to the NaT collection. This is rather a surprise and I don't know at this time if it's a bug in the Merced or that it's actually a valid condition of the architecture. A quick scan over the sources does not indicate that we depend on the false assumption elsewhere, but it's something to keep in mind.
The fix is to write the saved contents of the ar.rnat register to the backingstore prior to entering the loop that copies the dirty registers from the kernel stack to the user stack.
|
#
120252 |
|
19-Sep-2003 |
marcel |
Fix the most significant KSE breakage caused by not restoring the restart instruction bits in the PSR. As such, we were returning from interrupt to the instruction in the bundle that caused us to enter the kernel, only now we're returning to a completely different bundle.
While close here: add two KASSERTs to make sure that we restore sync contexts only when entered the kernel through a syscall and restore an async context only when entered the kernel through an interrupt, trap or fault.
While not exactly here, but close enough: use suword64() when we copy the dirty registers from the kernel stack to the user stack. The code was intended to be be replaced shortly after being added, but that was a couple of weeks ago. I might as well avoid that it is a source for panics until it's replaced.
|
#
119906 |
|
09-Sep-2003 |
marcel |
Introduce IA64_ID_PAGE_{MASK|SHIFT|SIZE} and LOG2_ID_PAGE_SIZE. The latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE.
The constants are used instead of the literal hardcoding (in its various forms) of the size of the direct mappings created in region 6 and 7. The default and probably only workable size is still 256M, but for kicks we use 128M for LINT.
|
#
119649 |
|
01-Sep-2003 |
marcel |
Use pmap_steal_memory() for the msgbuf instead of trying to squeeze it in the last chunk (phys_avail block). The last chunk very often is not larger than one or two pages, resulting in a msgbuf that's too small to hold a complete verbose boot. Note that pmap_steal_memory() will bzero the memory it "allocates". Consequently, ia64 will never preserve previous msgbufs. This is not a noticable difference in practice. If the msgbuf could be reused, it was invariably too small to have anything preserved anyway.
|
#
119337 |
|
22-Aug-2003 |
marcel |
Remove unused inclusion of opt_acpi.h
|
#
118818 |
|
12-Aug-2003 |
marcel |
Extend identifycpu(): o Differentiate between CPU family and CPU model. There are multiple Itanium 2 models and it's nice to differentiate between them. o Seperately export the CPU family and CPU model with sysctl. o Merced is the only model in the Itanium family. o Add Madison to the Itanium 2 family. We already knew about McKinley. o Print the CPU family between parenthesis, like we do with the i386 CPU class.
My prototype now identifies itself as: CPU: Merced (800.03-Mhz Itanium)
pluto1 and pluto2 will eventually identify themselves as: CPU: McKinley (900.00-Mhz Itanium 2)
|
#
118739 |
|
10-Aug-2003 |
marcel |
o move cpu_reset() from vm_machdep.c to machdep.c. o reorder cpu_boot(), cpu_halt() and identifycpu().
No functional change.
|
#
118717 |
|
10-Aug-2003 |
marcel |
Now that we can ignore up to 8KB of dirty registers, remove the RSE magic from exec_setregs(). In set_mcontext() we now also don't have to worry that we entered the kernel with more that 512 bytes of dirty registers on the kernel stack. Note that we cannot make any assumptions anymore WRT to NaT collection points in exec_setregs(), so we have to deal with them now.
|
#
118590 |
|
07-Aug-2003 |
marcel |
Better define the flags in the mcontext_t and properly set the flags when we create contexts. The meaning of the flags are documented in <machine/ucontext.h>. I only list them here to help browsing the commit logs: _MC_FLAGS_ASYNC_CONTEXT _MC_FLAGS_HIGHFP_VALID _MC_FLAGS_KSE_SET_MBOX _MC_FLAGS_RETURN_VALID _MC_FLAGS_SCRATCH_VALID
Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)
|
#
118503 |
|
05-Aug-2003 |
marcel |
o Put the syscall return registers in the context. Not only do we need this for swapcontext(), KSE upcalls initiated from ast() also need to save them so that we properly return the syscall results after having had a context switch. Note that we don't use r11 in the kernel. However, the runtime specification has defined r8-r11 as return registers, so we put r11 in the context as well. I think deischen@ was trying to tell me that we should save the return registers before. I just wasn't ready for it :-)
o The EPC syscall code has 2 return registers and 2 frame markers to save. The first (rp/pfs) belongs to the syscall stub itself. The second (iip/cfm) belongs to the caller of the syscall stub. We want to put the second in the context (note that iip and cfm relate to interrupts. They are only being misused by the syscall code, but are not part of a regular context). This way, when the context is switched to again, we return to the caller of setcontext(2) as one would expect.
o Deal with dirty registers on the kernel stack. The getcontext() syscall will flush the RSE, so we don't expect any dirty registers in that case. However, in thread_userret() we also need to save the context in certain cases. When that happens, we are sure that there are dirty registers on the kernel stack. This implementation simply copies the registers, one at a time, from the kernel stack to the user stack. NAT collections are not dealt with. Hence we don't preserve NaT bits. A better solution needs to be found at some later time. We also don't deal with this in all cases in set_mcontext. No temporay solution is implemented because it's not a showstopper. The problem is that we need to ignore the dirty registers and we automaticly do that for at most 62 registers. When there are more than 62 dirty registers we have a memory "leak".
This commit is fundamental for KSE support.
|
#
118414 |
|
04-Aug-2003 |
marcel |
Cleanup the clock code. This includes: o Remove alpha specific timer code (mc146818A) and compiled-out calibration of said timer. o Remove i386 inherited timer code (i8253) and related acquire and release functions. o Move sysbeep() from clock.c to machdep.c and have it return ENODEV. Console beeps should be implemented using ACPI or if no such device is described, using the sound driver. o Move the sysctls related to adjkerntz, disable_rtc_set and wall_cmos_clock from machdep.c to clock.c, where the variables are. o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't bother faking a stathz that's 1/8 of that. Keep it simple: hz defaults to HZ and stathz equals hz. This is also how it's done for sparc64. o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj) to calculate ITC skew and corrections. On average, we adjust the ITC match register once every ~1500 interrupts for a duration of 2 consequtive interruprs. This is to correct the non-deterministic behaviour of the ITC interrupt (there's a delay between the match and the raising of the interrupt). o Add 4 debugging sysctls to monitor clock behaviour. Those are debug.clock_adjust_edges, debug.clock_adjust_excess, debug.clock_adjust_lost and debug.clock_adjust_ticks. The first counts the individual adjustment cycles (when the skew first crosses the threshold), the second counts the number of times the adjustment was excessive (any non-zero value is to be considered a bug), the third counts lost clock interrupts and the last counts the number of interrupts for which we applied an adjustment (debug.clock_adjust_ticks / debug.clock_adjust_edges gives the avarage duration of an individual adjustment -- should be ~2).
While here, remove some nearby (trivial) left-overs from alpha and other cleanups.
|
#
118296 |
|
01-Aug-2003 |
marcel |
Write the preserved registers to (and read them from) struct reg and struct fpreg.
|
#
118238 |
|
30-Jul-2003 |
peter |
Cosmetic: fix some disorder of #include "opt_...." files
|
#
117993 |
|
25-Jul-2003 |
marcel |
Move ia64_pa_access() from machdep.c to mem.c and declare it static. It's only used in mem.c and cannot accidentally be used elsewhere this way.
|
#
117608 |
|
15-Jul-2003 |
marcel |
Rename thread_siginfo to cpu_thread_siginfo.
|
#
116958 |
|
28-Jun-2003 |
davidxu |
Add a machine depended function thread_siginfo, SA signal code will use the function to construct a siginfo structure and use the result to export to userland.
Reviewed by: julian
|
#
116227 |
|
11-Jun-2003 |
marcel |
Make sure pcpu->pc_pcb is pointing to a 16-byte aligned address. The PCB contains FP registers, whose alignment must be 16 bytes at least. Since the PCB pointed to by pc_pcb is immediately after the PCPU itself, round-up the size of thge PCPU to a multiple of 16 bytes. The PCPU is page aligned.
This fixes a misalignment trap caused by stopping a CPU in a SMP kernel, such as been done when entering the debugger.
Reported by: Alan Robinson <alan.robinson@fujitsu-siemens.com>
|
#
115652 |
|
01-Jun-2003 |
marcel |
Improve set_mcontext: o Don't copy psr verbatim from the user supplied context. Only allow userland to change the processor settings that are part of the user mask.
|
#
115566 |
|
31-May-2003 |
marcel |
Implement set_mcontext() and get_mcontext(). Just as for sendsig() and sigreturn(), we cheat and assume the preserved registers are still on-chip and unmodified. This is actually the case, but more by accident than by design. We need to use unwinding eventually or explicitly compile the kernel in a way that the compiler steers clear from using the preserved registers completely.
|
#
115558 |
|
31-May-2003 |
marcel |
Make sure we have all the dirty registers in user frames on the backing store before we discard them. It is possible that we enter the kernel (due to an execve in this case) with a lot of dirty user registers and that the RSE has only partially spilled them (to make room for new frames). We cannot move the backing store pointer down (to discard user registers) when not all of the user registers are on the backing store. So, we flush the register stack IFF this happens. Unconditionally doing the flush is too costly, because the condition in which we need to flush is very rare.
This change appears to fix the SIGSEGV that sometimes happen for newly executed processes and so far also appears to fix the last of the corruption. It is possible, although not likely, that this change prevents some other bug from happening, even though it is itself not a fix. Hence the uncertainty. We'll know in a couple of months I guess :-)
|
#
115378 |
|
29-May-2003 |
marcel |
Move the sysctls of the misalignment handler to where they belong and use OID_AUTO instead of fixed IDs.
Approved by: re@ (blanket)
|
#
115341 |
|
26-May-2003 |
marcel |
Fix fu{byte|word*} and su{byte|word*}: o If the address was not within user space we jumped to fusufault where we would clear pcb_onfault and return 0. There are two bugs here: 1. We never got to the point where we assigned the address of pcb_onfault to r15, which means that we would clobber some random memory location, including I/O space or ROM. 2. We're supposed to return -1 on error. o Make sure we have proper memory ordering for setting pcb_onfault, doing the memory access to user space and clearing pcb_onfault. For the fu* family of functions this means that we need a mf instruction, because we don't have acquire semantics on stores and release semantics on loads (hence st;ld cannot be ordered without intermediate mf).
While here, implement casuptr() so that we are a (small) step closer to supporting libthr and deobfuscate the non-implementation of {f|s}uswintr.
Approved by: re@ (blanket)
|
#
115276 |
|
23-May-2003 |
marcel |
Fix an alpha inheritance bug:
On alpha, PAL is involved in context management and after wiring the CPU (in alpha_init()) a context switch was performed to tell PAL about the context. This was bogusly brought over to ia64 where it introduced bugs, because we restored the context from a mostly uninitialized PCB.
The cleanup constitutes: o Remove the unused arguments from ia64_init(). o Don't return from ia64_init(), but instead call mi_startup() directly. This reduces the amount of muckery in assembly and also allows for the next bullet: o Save our currect context prior to calling mi_startup(). The reason for this is that many threads are created from thread0 by cloning the PCB. By saving our context in the PCB, we have something sane to clone. It also ensures that a cloned thread that does not alter the context in any way will return to the saved context, where we're ready for the eventuality with a nice, user unfriendly panic().
The cleanup fixes at least the following bugs: o Entering mi_startup() with the RSE in enforced lazy mode. o Re-execution of ia64_init() in certain "lab" conditions.
While here, add proper unwind directives to __start() so that the unwind knows it has reached the bottom of the (call) stack.
Approved by: re@ (blanket)
|
#
115148 |
|
19-May-2003 |
marcel |
pmap_install() needs to be atomic WRT to context switching. Protect switching user regions (region 0-4) with schedlock. Avoid unnecessary recursion on schedlock by moving the core functionality to another function (pmap_switch()) where we assert schedlock is held. Turn pmap_install() into a wrapper that grabs schedlock. This minimizes the number of callsites that need to be changed. Since we already have schedlock in cpu_switch() and cpu_throw(), have them call pmap_switch() directly. These were also the only two calls to pmap_install() outside pmap.c, so make pmap_install() static and remove its prototype from pmap.h
Approved by: re (blanket)
|
#
115084 |
|
16-May-2003 |
marcel |
Revamp of the syscall path, exception and context handling. The prime objectives are: o Implement a syscall path based on the epc inststruction (see sys/ia64/ia64/syscall.s). o Revisit the places were we need to save and restore registers and define those contexts in terms of the register sets (see sys/ia64/include/_regset.h).
Secundairy objectives: o Remove the requirement to use contigmalloc for kernel stacks. o Better handling of the high FP registers for SMP systems. o Switch to the new cpu_switch() and cpu_throw() semantics. o Add a good unwinder to reconstruct contexts for the rare cases we need to (see sys/contrib/ia64/libuwx)
Many files are affected by this change. Functionally it boils down to: o The EPC syscall doesn't preserve registers it does not need to preserve and places the arguments differently on the stack. This affects libc and truss. o The address of the kernel page directory (kptdir) had to be unstaticized for use by the nested TLB fault handler. The name has been changed to ia64_kptdir to avoid conflicts. The renaming affects libkvm. o The trapframe only contains the special registers and the scratch registers. For syscalls using the EPC syscall path no scratch registers are saved. This affects all places where the trapframe is accessed. Most notably the unaligned access handler, the signal delivery code and the debugger. o Context switching only partly saves the special registers and the preserved registers. This affects cpu_switch() and triggered the move to the new semantics, which additionally affects cpu_throw(). o The high FP registers are either in the PCB or on some CPU. context switching for them is done lazily. This affects trap(). o The mcontext has room for all registers, but not all of them have to be defined in all cases. This mostly affects signal delivery code now. The *context syscalls are as of yet still unimplemented.
Many details went into the removal of the requirement to use contigmalloc for kernel stacks. The details are mostly CPU specific and limited to exception_save() and exception_restore(). The few places where we create, destroy or switch stacks were mostly simplified by not having to construct physical addresses and additionally saving the virtual addresses for later use.
Besides more efficient context saving and restoring, which of course yields a noticable speedup, this also fixes the dreaded SMP bootup problem as a side-effect. The details of which are still not fully understood.
This change includes all the necessary backward compatibility code to have it handle older userland binaries that use the break instruction for syscalls. Support for break-based syscalls has been pessimized in favor of a clean implementation. Due to the overall better performance of the kernel, this will still be notived as an improvement if it's noticed at all.
Approved by: re@ (jhb)
|
#
114983 |
|
13-May-2003 |
jhb |
- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe.
Reviewed by: arch@ Approved by: re (rwatson)
|
#
114553 |
|
02-May-2003 |
marcel |
Option KADB does not exist. It came from alpha, where it still exists.
|
#
113998 |
|
24-Apr-2003 |
deischen |
Add an argument to get_mcontext() which specified whether the syscall return values should be cleared. The system calls getcontext() and swapcontext() want to return 0 on success but these contexts can be switched to at a later time so the return values need to be cleared in the saved register sets. Other callers of get_mcontext() would normally want the context without clearing the return values.
Remove the i386-specific context saving from the KSE code. get_mcontext() is not i386-specific any more.
Fix a bad pointer in the alpha get_mcontext() code. The context was being bcopy()'d from &td->tf_frame, but tf_frame is itself a pointer, so the thread was being copied instead. Spotted by jake.
Glanced at by: jake Reviewed by: bde (months ago)
|
#
112898 |
|
31-Mar-2003 |
jeff |
- Define a new md function 'casuptr'. This atomically compares and sets a pointer that is in user space. It will be used as the basic primitive for a kernel supported user space lock implementation. - Implement this function in x86's support.s - Provide stubs that return -1 in all other architectures. Implementations will follow along shortly.
Reviewed by: jake
|
#
112888 |
|
31-Mar-2003 |
jeff |
- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
|
#
112882 |
|
31-Mar-2003 |
jeff |
- Use sigexit() instead of twiddling the signal mask, catch, ignore, and action bits to allow SIGILL to work as expected. This brings this file in line with other architectures.
|
#
111030 |
|
17-Feb-2003 |
marcel |
Print two new processor features: o Spontaneous deferral (A feature required by dutch railways :-) o 16-byte atomic operations (ld, st, cmpxchg)
|
#
110211 |
|
01-Feb-2003 |
marcel |
Remove special casing for running in the simulator from the kernel and instead add platform, firmware and EFI stubs to the loader. The net effect of this change is that besides a special console and disk driver, the kernel has no knowledge of the simulator. This has the following advantages: o Simulator support is much harder to break, o It's easier to make use of more feature complete simulators. This would only need a change in the simulator specific loader, o Running SMP kernels within the simulator. Note that ski at this time does not simulate IPIs, so there's no way to start APs.
The platform, firmware and EFI stubs describe the following hardware: o 4 CPU Itanium, o 128 MB RAM within the 4GB address space, o 64 MB RAM above the 4GB address space.
NOTE: The stubs in the skiloader describe a machine that should in parts be defined by the simulator. Things like processor interrupt block and AP wakeup vector cannot be choosen at random because they require interpretation by the simulator. Currently the simulator is ignorant of this.
This change introduces an unofficial SSC call SSC_SAL_SET_VECTORS which is ignored by the simulator.
Tested with: ski (version 0.943 for linux)
|
#
107206 |
|
24-Nov-2002 |
marcel |
MFp4: Add function map_port_space() to map the memory mapped I/O port range as uncacheable virtual memory and call it prior to probing for a console. This removes the dependency on the loader to have done this for us. Note that this change does not include doing the same for APs.
Approved by: re (blanket)
|
#
106977 |
|
16-Nov-2002 |
deischen |
Add getcontext, setcontext, and swapcontext as system calls. Previously these were libc functions but were requested to be made into system calls for atomicity and to coalesce what might be two entrances into the kernel (signal mask setting and floating point trap) into one.
A few style nits and comments from bde are also included.
Tested on alpha by: gallatin
|
#
106697 |
|
09-Nov-2002 |
des |
Print real / avail memory in megabytes rather than kilobytes.
|
#
106605 |
|
07-Nov-2002 |
tmm |
Move the definitions of the hw.physmem, hw.usermem and hw.availpages sysctls to MI code; this reduces code duplication and makes all of them available on sparc64, and the latter two on powerpc. The semantics by the i386 and pc98 hw.availpages is slightly changed: previously, holes between ranges of available pages would be included, while they are excluded now. The new behaviour should be more correct and brings i386 in line with the other architectures.
Move physmem to vm/vm_init.c, where this variable is used in MI code.
|
#
106503 |
|
06-Nov-2002 |
jmallett |
Remove what was a temporary bogus assignment of bits of siginfo_t, as it does not look like the prerequisites to fill it in properly will be in the tree for the upcoming release, but it's mostly done, so there is no need for these to stay around to remind us.
|
#
106189 |
|
30-Oct-2002 |
marcel |
Rewrite cpu_switch(). The most notable change is the fact that we now have f16-f31 as part of the context. The PCB has been reorganized to better match how we save and restore the (preserved) registers. This commit also moves the context restoriation to its own function (named pcb_restore), as we did with pcb_save.
Only minimal effort has been put in writing optimal assembly. The expectation is that there will be more rounds of changes.
|
#
105950 |
|
25-Oct-2002 |
peter |
Split 4.x and 5.x signal handling so that we can keep 4.x signal handling clean and functional as 5.x evolves. This allows some of the nasty bandaids in the 5.x codepaths to be unwound.
Encapsulate 4.x signal handling under COMPAT_FREEBSD4 (there is an anti-foot-shooting measure in place, 5.x folks need this for a while) and finish encapsulating the older stuff under COMPAT_43. Since the ancient stuff is required on alpha (longjmp(3) passes a 'struct osigcontext *' to the current sigreturn(2), instead of the 'ucontext_t *' that sigreturn is supposed to take), add a compile time check to prevent foot shooting there too. Add uniform COMPAT_43 stubs for ia64/sparc64/powerpc.
Tested on: i386, alpha, ia64. Compiled on sparc64 (a few days ago). Approved by: re
|
#
105891 |
|
24-Oct-2002 |
jhb |
Oops, I missed a few changes in 'device acpica' -> 'device acpi' change.
Submitted by: Hiten Pandya <hiten@angelica.unixdaemons.com>
|
#
105470 |
|
19-Oct-2002 |
marcel |
Update the unwind information when modules are loaded and unloaded by using the linker hooks. Since these hooks are called for the kernel as well, we don't need to deal with that with a special SYSINIT. The initialization implicitly performed on the first update of the unwind information is made explicit with a SYSINIT. We now don't need the _ia64_unwind_{start|end} symbols.
|
#
105432 |
|
19-Oct-2002 |
marcel |
Make this compile when DDB is not defined by conditionally compiling all references to ksym_start and ksym_end.
|
#
104433 |
|
03-Oct-2002 |
peter |
Do a bit of rude hackery to get clock interrupts on all CPUs. This is partly based on the Alpha system which duplicates the clock to each cpu, instead of doing a clock roundrobin like on i386. This means we get hz * ncpu clocks per second and so we have to seperate clock sampling from actual 'do the work' clock processing. The BSP runs the complete processing, the rest just sample state etc.
Using the on-cpu interval timer is not ideal as it will drift. There is more to be done here, we should use an external clock source.
|
#
103703 |
|
20-Sep-2002 |
phk |
For reasons now lost in historical fog, the bounds_check_with_label() function were put in i386/i386/machdep.c from where it has been cut and pasted to other architectures with only minor corruption.
Disklabel is really a MI format in many ways, at least it certainly is when you operate on struct disklabel.
Put bounds_check_with_label() back in subr_disklabel.c where it belongs.
Sponsored by: DARPA & NAI Labs.
|
#
103367 |
|
15-Sep-2002 |
julian |
Allocate KSEs and KSEGRPs separatly and remove them from the proc structure. next step is to allow > 1 to be allocated per process. This would give multi-processor threads. (when the rest of the infrastructure is in place)
While doing this I noticed libkvm and sys/kern/kern_proc.c:fill_kinfo_proc are diverging more than they should.. corrective action needed soon.
|
#
103081 |
|
07-Sep-2002 |
jmallett |
Fill out two fields (si_pid, si_uid) in the siginfo structure handed back to userland in the signal handler that were not being iflled out before, but should and can be.
This part of sendsig could be slightly refactored to use an MI interface, or ideally, *sendsig*() would have an API change to accept a siginfo_t, which would be filled out by an MI function in the level above sendsig, and said MI function would make a small call into MD code to fill out the MD parts (some of which may be bogus, such as the si_addr stuff in some places). This would eventually make it possible for parts of the kernel sending signals to set up a siginfo with meaningful information.
Reviewed by: mux MFC after: 2 weeks
|
#
102666 |
|
31-Aug-2002 |
peter |
Take a shot at fixing up a whole stack of style and other embarresing unforced errors that Bruce identified. I have not yet addressed all of his concerns.
|
#
102600 |
|
30-Aug-2002 |
peter |
Change hw.physmem and hw.usermem to unsigned long like they used to be in the original hardwired sysctl implementation.
The buf size calculator still overflows an integer on machines with large KVA (eg: ia64) where the number of pages does not fit into an int. Use 'long' there.
Change Maxmem and physmem and related variables to 'long', mostly for completeness. Machines are not likely to overflow 'int' pages in the near term, but then again, 640K ought to be enough for anybody. This comes for free on 32 bit machines, so why not?
|
#
102561 |
|
29-Aug-2002 |
jake |
Renamed poorly named setregs to exec_setregs. Moved its prototype to imgact.h with the other exec support functions.
|
#
101251 |
|
03-Aug-2002 |
peter |
Ignore memory above 4GB for now due to unpleasant pci issues.
|
#
97443 |
|
29-May-2002 |
marcel |
Remove the definition of struct mca_guid and use the generic struct uuid defined in <sys/uuid.h>.
Use uuid/UUID instead of guid/GUID to emphasize that the identifiers are DCE version 1 identifiers and also to avoid inconsistencies as much a possible.
|
#
96973 |
|
20-May-2002 |
marcel |
Flesh-out ptrace support. This obviously needs more work.
|
#
96912 |
|
19-May-2002 |
marcel |
o Remove namespace pollution from param.h: - Don't include ia64_cpu.h and cpu.h - Guard definitions by _NO_NAMESPACE_POLLUTION - Move definition of KERNBASE to vmparam.h
o Move definitions of IA64_RR_{BASE|MASK} to vmparam.h o Move definitions of IA64_PHYS_TO_RR{6|7} to vmparam.h
o While here, remove some left-over Alpha references.
|
#
95919 |
|
02-May-2002 |
marcel |
PCPU(current_pmap) is initialized in pmap_bootstrap. No need to do it again.
|
#
95863 |
|
01-May-2002 |
peter |
Connect up kern_envp *before* we use it for getenv() and console probing. It is a bit late after that when we have no consoles. :-]
Also, fix a comment nit and print a warning about missing metadata.
|
#
95762 |
|
30-Apr-2002 |
marcel |
Make this work for ski again. Don't call ia64_mca_init() when we're in the simulator.
|
#
95519 |
|
26-Apr-2002 |
marcel |
Initialize MCA in cpu_startup() so that it's ready before we wake-up the application processors. This allows us to collect unconsumed AP specific error records as part of the wake-up.
|
#
95458 |
|
25-Apr-2002 |
marcel |
The official name for McKinley is: Itanium 2
|
#
95025 |
|
19-Apr-2002 |
marcel |
Remove the bootinfo kludge. We get the address of the bootinfo block from the loader.
|
#
94936 |
|
17-Apr-2002 |
mux |
Rework the kernel environment subsystem. We now convert the static environment needed at boot time to a dynamic subsystem when VM is up. The dynamic kernel environment is protected by an sx lock.
This adds some new functions to manipulate the kernel environment : freeenv(), setenv(), unsetenv() and testenv(). freeenv() has to be called after every getenv() when you have finished using the string. testenv() only tests if an environment variable is present, and doesn't require a freeenv() call. setenv() and unsetenv() are self explanatory.
The kenv(2) syscall exports these new functionalities to userland, mainly for kenv(1).
Reviewed by: peter
|
#
94639 |
|
14-Apr-2002 |
peter |
Allow a kernel to be compiled with both SKI and acpica and still work on real hardware. (SKI used to break the sapic probes)
|
#
94628 |
|
13-Apr-2002 |
alc |
Add comment that sigreturn() is MPSAFE.
|
#
94496 |
|
12-Apr-2002 |
dfr |
Initialise ar.cflg, which contains the IA-32 registers cr0 and cr4. Since all IA-32 processes use the same values for cr0 and cr4, we initialise them at system startup.
|
#
94481 |
|
12-Apr-2002 |
peter |
Really fix uniprocessor on IA64. Note to self: do not use variables before they are initialized. I had correctly figured out that the UP problem was the pcpu current_pmap thing, but didn't fix it right last time.
|
#
94275 |
|
09-Apr-2002 |
phk |
GC various bits and pieces of USERCONFIG from all over the place.
|
#
93793 |
|
04-Apr-2002 |
bde |
Moved signal handling and rescheduling from userret() to ast() so that they aren't in the usual path of execution for syscalls and traps. The main complication for this is that we have to set flags to control ast() everywhere that changes the signal mask.
Avoid locking in userret() in most of the remaining cases.
Submitted by: luoqi (first part only, long ago, reorganized by me) Reminded by: dillon
|
#
93761 |
|
04-Apr-2002 |
alc |
o Kill the MD grow_stack(). Call the MI vm_map_growstack() in its place. o Eliminate the use of useracc() and grow_stack() from sendsig().
Reviewed by: peter
|
#
93713 |
|
03-Apr-2002 |
marcel |
o GC dumplo o Replace the string lit. "ia64" with MACHINE
|
#
93702 |
|
02-Apr-2002 |
jhb |
- Move the MI mutexes sched_lock and Giant from being declared in the various machdep.c's to being declared in kern_mutex.c. - Add a new function mutex_init() used to perform early initialization needed for mutexes such as setting up thread0's contested lock list and initializing MI mutexes. Change the various MD startup routines to call this function instead of duplicating all the code themselves.
Tested on: alpha, i386
|
#
93627 |
|
02-Apr-2002 |
marcel |
o GC totalphysmem and resvmem. o Rephrase comment describing that the memory region can contain the kernel.
|
#
93458 |
|
30-Mar-2002 |
marcel |
Transition to a model where the loader passes the address of the bootinfo block in register r8. In locore.s we save the address in the global variable 'pa_bootinfo'. In machdep.c we compare this value against the hardwired address, but don't depend on its validity yet (ie: we still expect the bootinfo block to be at the hardwired address). After a small amount of time, we'll flip the switch and depend on the loader to pass us the address. From that moment on the loader is free to put it anywhere it likes, provided the machine itself likes it as well.
Add some verbosity to aid in the transition. We emit a message if the loader didn't pass the address and we also emit a message if there's no bootinfo block at the hardwired address.
While in locore.s, reduce the number of redundant serialization instructions. A srlz.i is a proper superset of a srlz.d and thus is a valid replacement. Also slightly reorder the movl instructions to improve bundle density.
|
#
93273 |
|
27-Mar-2002 |
jeff |
Add a new mtx_init option "MTX_DUPOK" which allows duplicate acquires of locks with this flag. Remove the dup_list and dup_ok code from subr_witness. Now we just check for the flag instead of doing string compares.
Also, switch the process lock, process group lock, and uma per cpu locks over to this interface. The original mechanism did not work well for uma because per cpu lock names are unique to each zone.
Approved by: jhb
|
#
92865 |
|
21-Mar-2002 |
peter |
In UP mode, the primary cpu's per-cpu current_pmap was not initialized - this was only done as a side effect of calling cpu_mp_start(). I haven't actually tested that this fixes UP kernels, but it feels about right.
|
#
92843 |
|
20-Mar-2002 |
alfred |
Remove __P.
Reviewd by: peter
|
#
92677 |
|
19-Mar-2002 |
peter |
My ia64 box for some reason likes to fragment the beginning/end of memory a bit before handing it over to the OS. I occasionally have 11 segments with several 8K or so fragments depending on nvram settings and what I have done under loader(8) before booting. This needs to be revisited.
|
#
92675 |
|
19-Mar-2002 |
peter |
Move a couple of prototypes together instead of being incompletely scattered around.
|
#
92262 |
|
14-Mar-2002 |
dfr |
Move the call to pmap_bootstrap to after the initialisation of thread0. This allows us to use mutexes in pmap safely. Also initialise fpcurthread for cpu0 so that ia64_fpstate_check doesn't barf during boot.
|
#
92123 |
|
11-Mar-2002 |
peter |
Fix a warning (make ucontext_t *ucp a const)
|
#
92122 |
|
11-Mar-2002 |
peter |
Stop concatenating __func__ with strings
|
#
91598 |
|
03-Mar-2002 |
dfr |
* Include <sys/ucontext.h> so that this compiles again. * Move the section which manipulates ia64_pal_base to after cninit() so that we don't risk printing anything before we have a console. * Don't call ia64_probe_sapics() for a SKI build. This should really be dependant on ACPICA being present or something.
|
#
90361 |
|
07-Feb-2002 |
julian |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
#
90065 |
|
01-Feb-2002 |
bde |
Compile osigreturn() unconditionally since it will always be needed on some arches and the syscall table is machine-independent. It was (bogusly) conditional on COMPAT_43, so this usually makes no difference.
ia64: in addition: - replace the bogus cloned comment before osigreturn() by a correct one. osigreturn() is just a stub fo ia64's. - fix the formatting of cloned comment before sigreturn(). - fix the return code. use nosys() instead of returning ENOSYS to get the same semantics as if the syscall is not in the syscall table. Generating SIGSYS is actually correct here. - fix style bugs.
powerpc: copy the cleaned up ia64 stub. This mainly fixes a bogus comment.
sparc64: copy the cleaned up the ia64 stub, since there was no stub before.
|
#
89492 |
|
18-Jan-2002 |
marcel |
Remove the definition of bootverbose. This fixes the link failure caused by disabling the emission of common symbols.
|
#
88693 |
|
30-Dec-2001 |
marcel |
o Reimplement map_pal_code to work with a global variable ia64_pal_base instead of scanning the EFI tables. This way AP startup code can more easily use the function. o Initialize ia64_pal_base in ia64_init(). When the PAL code doesn't need explicit mapping or no PAL code has been found, ia64_pal_base will be 0. o Remove some unused global variables. o Also in ia64_init(), allocate only 1 page for struct pcpu and remove some Alpha leftovers. o Initialize pc_pcb in cpu_pcpu_init().
|
#
87702 |
|
11-Dec-2001 |
jhb |
Overhaul the per-CPU support a bit:
- The MI portions of struct globaldata have been consolidated into a MI struct pcpu. The MD per-CPU data are specified via a macro defined in machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the interface would be cleaner (PCPU_GET(my_md_field) vs. PCPU_GET(md.md_my_md_field)). - All references to globaldata are changed to pcpu instead. In a UP kernel, this data was stored as global variables which is where the original name came from. In an SMP world this data is per-CPU and ideally private to each CPU outside of the context of debuggers. This also included combining machine/globaldata.h and machine/globals.h into machine/pcpu.h. - The pointer to the thread using the FPU on i386 was renamed from npxthread to fpcurthread to be identical with other architectures. - Make the show pcpu ddb command MI with a MD callout to display MD fields. - The globaldata_register() function was renamed to pcpu_init() and now init's MI fields of a struct pcpu in addition to registering it with the internal array and list. - A pcpu_destroy() function was added to remove a struct pcpu from the internal array and list.
Tested on: alpha, i386 Reviewed by: peter, jake
|
#
87546 |
|
08-Dec-2001 |
dillon |
Allow maxusers to be specified as 0 in the kernel config, which will cause the system to auto-size to between 32 and 512 depending on the amount of memory.
MFC after: 1 week
|
#
86592 |
|
19-Nov-2001 |
peter |
Initial cut at calling the EFI-provided FPSWA (Floating Point Software Assist) driver to handle the "messy" floating point cases which cause traps to the kernel for handling.
|
#
86291 |
|
12-Nov-2001 |
marcel |
o os_boot_rendez is responsible for clearing the IRR bit by reading cr.ivr, as well as writing to cr.eoi. o use global variables to pass information to os_boot_rendez so that it doesn't have to jump through hoops to find it out. This avoids traps on the AP without it even being initialized. This fixes SMP configurations. o Move the probing of the MADT to the end of cpu_startup, instead of at the start of cpu_mp_probe. We need to probe the MADT for non-SMP configurations as well. This fixes uniprocessor configurations. o Serialize AP wake-up by waiting for the AP. We need to do this since we use global variables to for the AP to use. As a side-effect, we can use printf() more easily to see what's going on.
|
#
86286 |
|
12-Nov-2001 |
peter |
Remove #if 0'ed code that was replaced by vm_ksubmap_init() and GC'ed on other platforms.
|
#
86211 |
|
09-Nov-2001 |
dfr |
Reserve more space for phys_avail. Really need to be more careful about overflowing phys_avail.
|
#
85852 |
|
01-Nov-2001 |
peter |
argh! cut/paste typo. :-( (committed on a different machine to what I was testing it on)
|
#
85850 |
|
01-Nov-2001 |
peter |
"Fix" a problem that got copied from alpha to ia64 and broke there. When we truncate the msgbuf size because the last chunk is too small, correctly terminate the phys_avail[] array - the VM system tests the *end* for zero, not the start. This leads the VM startup to attempt to recreate a duplicate set of pages for all physical memory.
XXX the msgbuf handling is suspiciously different on i386 vs alpha/ia64...
|
#
85685 |
|
29-Oct-2001 |
dfr |
* Factor out common code for manipulating the RSE backing store. * Implement a fairly simplistic parser for unwinding stack frames. * Use unwind records for DDB's 'trace' command. Also add support for tracing past exceptions to the context which generated the exception.
The stack unwind code requires a toolchain based on binutils-2.11.2 or later and gcc-3.0.1 or later.
|
#
85383 |
|
23-Oct-2001 |
peter |
Fix RAW dependency violation when compiled with gcc-3 Warning: Use of 'br.ret.sptk.many' violates RAW dependency 'PSR.tb' (data)
|
#
85360 |
|
23-Oct-2001 |
peter |
Turn off the single-user override. We've been running multi-user for some time. Having a machine boot unattended is useful. :-)
|
#
85294 |
|
21-Oct-2001 |
des |
[partially forced commit due to pilot error in earlier commit attempt]
{set,fill}_{,fp,db}regs() fixup:
- Add dummy {set,fill}_dbregs() on architectures that don't have them.
- KSEfy the powerpc versions (struct proc -> struct thread).
- Some architectures had the prototypes in md_var.h, some in reg.h, and some in both; for consistency, move them to reg.h on all platforms.
These functions aren't really MD (the implementation is MD, but the interface is MI), so they should move to an MI header, but I haven't figured out which one yet.
Run-tested on i386, build-tested on Alpha, untested on other platforms.
|
#
85293 |
|
21-Oct-2001 |
des |
{set,fill}_{,fp,db}regs() fixup:
- Add dummy {set,fill}_dbregs() on architectures that don't have them.
- KSEfy the powerpc versions (struct proc -> struct thread).
- Some architectures had the prototypes in md_var.h, some in reg.h, and some in both; for consistency, move them to reg.h on all platforms.
These functions aren't really MD (the implementation is MD, but the interface is MI), so they should move to an MI header, but I haven't figured out which {set,fill}_{,fp,db}regs() fixup:
- Add dummy {set,fill}_dbregs() on architectures that don't have them.
- KSEfy the powerpc versions (struct proc -> struct thread).
- Some architectures had the prototypes in md_var.h, some in reg.h, and some in both; for consistency, move them to reg.h on all platforms.
These functions aren't really MD (the implementation is MD, but the interface is MI), so they should move to an MI header, but I haven't figured out which one yet.
Run-tested on i386, build-tested on Alpha, untested on other platforms.
|
#
85283 |
|
21-Oct-2001 |
dfr |
Use ia64_set_fpsr() instead of __asm to set ar.fpsr.
|
#
85109 |
|
18-Oct-2001 |
dfr |
Shift the code which packs and unpacks instruction bundles out of DDB since it is useful for various emulations duties (e.g. unaligned trap handling).
|
#
84966 |
|
15-Oct-2001 |
marcel |
When compiling with SKI support, create the fake memory regions when either the memory descriptor in the bootinfo is NULL or the descriptor count is 0.
|
#
84798 |
|
11-Oct-2001 |
dfr |
* Change the calling convention for execve so that it conforms to normal C calling conventions. This allows crt1.c to be written nearly without any inline assembler. * Initialise cpu_model[] so that the hw.model sysctl works properly.
|
#
84621 |
|
07-Oct-2001 |
dfr |
Remove bogus include.
|
#
84592 |
|
06-Oct-2001 |
dfr |
Move console probes until after we set boothowto so that 'boot -h' works.
|
#
84128 |
|
29-Sep-2001 |
dfr |
Various changes to use the firmware on a real machine.
|
#
83834 |
|
22-Sep-2001 |
dfr |
* Turn off memory descriptor debugging - its served its purpose. * Don't get confused when memory regions don't lie on page boundaries - remember our page size is typically larger than the firmware's page size. * Add a function ia64_running_in_simulator() which is intended to detect whether the kernel is running in SKI or on real hardware.
|
#
83611 |
|
18-Sep-2001 |
dfr |
Flesh out identifycpu().
|
#
83522 |
|
15-Sep-2001 |
dfr |
Rearrange so we search for I/O port space as early as possible (i.e. before console probing). Also fix a confusion between EFI's page size which is fixed at 4096 and our own page size which is variable at compile time.
|
#
83512 |
|
15-Sep-2001 |
dfr |
Use the MI console code to initialise the console.
|
#
83509 |
|
15-Sep-2001 |
dfr |
* Use Intel's EFI headers instead of home-grown ones. * Use the bootinfo's memory map if present instead of hard-coding SKI's memory map. * Record the location of the I/O Port Space if present in the memory map.
|
#
83407 |
|
13-Sep-2001 |
dfr |
* Enable dynamically linked kernel. This involves adding a self-relocator to locore to process the @fptr relocations in the dynamic executable. * Don't initialise the timer until *after* we install the timecounter to avoid a race between timecounter initialisation and hardclock. * Tidy up bootinfo somewhat including adding sanity checks for when the kernel is loaded without a recognisable bootinfo.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
83301 |
|
10-Sep-2001 |
dfr |
* Make a start on a realistic definition for bootinfo. * Switch to proc0's stack and backing store before calling ia64_init so that we don't rely on the loader's stack at all. * Change kernel entry point name from locorestart to __start.
|
#
83163 |
|
06-Sep-2001 |
jhb |
Call sendsig() with the proc lock held and return with it held.
|
#
82785 |
|
02-Sep-2001 |
peter |
Merge from i386: various cleanups including moving the map calculations to MI code. This gets ia64 to compile again.
|
#
82393 |
|
27-Aug-2001 |
peter |
Enable hardwiring of things like tunables from embedded enironments that do not start from loader(8).
|
#
82025 |
|
21-Aug-2001 |
peter |
Make COMPAT_43 optional again. XXX we need COMPAT_FBSD3 etc for this stuff.
|
#
81265 |
|
08-Aug-2001 |
peter |
Zap 'ptrace(PT_READ_U, ...)' and 'ptrace(PT_WRITE_U, ...)' since they are a really nasty interface that should have been killed long ago when 'ptrace(PT_[SG]ETREGS' etc came along. The entity that they operate on (struct user) will not be around much longer since it is part-per-process and part-per-thread in a post-KSE world.
gdb does not actually use this except for the obscure 'info udot' command which does a hexdump of as much of the child's 'struct user' as it can get. It carries its own #defines so it doesn't break compiles.
|
#
81197 |
|
06-Aug-2001 |
dfr |
Remove usage of nonexistent vm_mtx.
|
#
80421 |
|
26-Jul-2001 |
peter |
Call the early tunable setup functions as soon as kern_envp is available. Some things depend on hz being set not long after this.
|
#
78962 |
|
29-Jun-2001 |
jhb |
Add a new MI pointer to the process' trapframe p_frame instead of using various differently named pointers buried under p_md.
Reviewed by: jake (in principle)
|
#
78888 |
|
27-Jun-2001 |
jhb |
Catch up to mbuf allocator changes from last September so this compiles again.
|
#
77448 |
|
29-May-2001 |
jhb |
- Catch up to the VM mutex changes. - Sort includes in a few places.
|
#
77031 |
|
23-May-2001 |
ru |
- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs.
- Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs.
- Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.
- Install header files for the above file systems.
- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.
|
#
76770 |
|
17-May-2001 |
jhb |
- Move the setting of bootverbose to a MI SI_SUB_TUNABLES SYSINIT. - Attach a writable sysctl to bootverbose (debug.bootverbose) so it can be toggled after boot. - Move the printf of the version string to a SI_SUB_COPYRIGHT SYSINIT just afer the display of the copyright message instead of doing it by hand in three MD places.
|
#
76440 |
|
10-May-2001 |
jhb |
- Split out the support for per-CPU data from the SMP code. UP kernels have per-CPU data and gdb on the i386 at least needs access to it. - Clean up includes in kern_idle.c and subr_smp.c.
Reviewed by: jake
|
#
76078 |
|
27-Apr-2001 |
jhb |
Overhaul of the SMP code. Several portions of the SMP kernel support have been made machine independent and various other adjustments have been made to support Alpha SMP.
- It splits the per-process portions of hardclock() and statclock() off into hardclock_process() and statclock_process() respectively. hardclock() and statclock() call the *_process() functions for the current process so that UP systems will run as before. For SMP systems, it is simply necessary to ensure that all other processors execute the *_process() functions when the main clock functions are triggered on one CPU by an interrupt. For the alpha 4100, clock interrupts are delievered in a staggered broadcast fashion, so we simply call hardclock/statclock on the boot CPU and call the *_process() functions on the secondaries. For x86, we call statclock and hardclock as usual and then call forward_hardclock/statclock in the MD code to send an IPI to cause the AP's to execute forwared_hardclock/statclock which then call the *_process() functions. - forward_signal() and forward_roundrobin() have been reworked to be MI and to involve less hackery. Now the cpu doing the forward sets any flags, etc. and sends a very simple IPI_AST to the other cpu(s). AST IPIs now just basically return so that they can execute ast() and don't bother with setting the astpending or needresched flags themselves. This also removes the loop in forward_signal() as sched_lock closes the race condition that the loop worked around. - need_resched(), resched_wanted() and clear_resched() have been changed to take a process to act on rather than assuming curproc so that they can be used to implement forward_roundrobin() as described above. - Various other SMP variables have been moved to a MI subr_smp.c and a new header sys/smp.h declares MI SMP variables and API's. The IPI API's from machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h. - The globaldata_register() and globaldata_find() functions as well as the SLIST of globaldata structures has become MI and moved into subr_smp.c. Also, the globaldata list is only available if SMP support is compiled in.
Reviewed by: jake, peter Looked over by: eivind
|
#
75913 |
|
24-Apr-2001 |
dfr |
Align stack pointer and backing store pointer to 16 byte boundary when delivering signals.
|
#
75002 |
|
29-Mar-2001 |
obrien |
Reduce the emasculation of bounds_check_with_label() by one line, so we propagate a bio error condition to the caller and above.
|
#
74912 |
|
28-Mar-2001 |
jhb |
Rework the witness code to work with sx locks as well as mutexes. - Introduce lock classes and lock objects. Each lock class specifies a name and set of flags (or properties) shared by all locks of a given type. Currently there are three lock classes: spin mutexes, sleep mutexes, and sx locks. A lock object specifies properties of an additional lock along with a lock name and all of the extra stuff needed to make witness work with a given lock. This abstract lock stuff is defined in sys/lock.h. The lockmgr constants, types, and prototypes have been moved to sys/lockmgr.h. For temporary backwards compatability, sys/lock.h includes sys/lockmgr.h. - Replace proc->p_spinlocks with a per-CPU list, PCPU(spinlocks), of spin locks held. By making this per-cpu, we do not have to jump through magic hoops to deal with sched_lock changing ownership during context switches. - Replace proc->p_heldmtx, formerly a list of held sleep mutexes, with proc->p_sleeplocks, which is a list of held sleep locks including sleep mutexes and sx locks. - Add helper macros for logging lock events via the KTR_LOCK KTR logging level so that the log messages are consistent. - Add some new flags that can be passed to mtx_init(): - MTX_NOWITNESS - specifies that this lock should be ignored by witness. This is used for the mutex that blocks a sx lock for example. - MTX_QUIET - this is not new, but you can pass this to mtx_init() now and no events will be logged for this lock, so that one doesn't have to change all the individual mtx_lock/unlock() operations. - All lock objects maintain an initialized flag. Use this flag to export a mtx_initialized() macro that can be safely called from drivers. Also, we on longer walk the all_mtx list if MUTEX_DEBUG is defined as witness performs the corresponding checks using the initialized flag. - The lock order reversal messages have been improved to output slightly more accurate file and line numbers.
|
#
74030 |
|
09-Mar-2001 |
dfr |
Adjust a comment slightly.
|
#
73929 |
|
07-Mar-2001 |
jhb |
Grab the process lock while calling psignal and before calling psignal.
|
#
72226 |
|
09-Feb-2001 |
jhb |
Move the initailization of the proc lock for proc0 very early into the MD startup code.
|
#
72221 |
|
09-Feb-2001 |
jhb |
Remove bogus #if 0'd code that dinked with the saved interrupt state in sched_lock.
|
#
72200 |
|
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
#
71984 |
|
04-Feb-2001 |
peter |
All the world is not an i386. Merge rev 1.438 of i386/i386/machdep.c. Make buffer_map a system map.
|
#
71803 |
|
29-Jan-2001 |
dfr |
Flesh out EFI support somewhat.
|
#
71684 |
|
26-Jan-2001 |
dfr |
Initialise proc0.p_heldmtx and proc0.p_contested and call mtx_enter(&Giant, MTX_DEF) after Giant is initialised.
Reviewed by: jhb
|
#
71552 |
|
24-Jan-2001 |
jhb |
- Proc locking. - P_FOO -> PS_FOO.
|
#
71320 |
|
21-Jan-2001 |
jasone |
Remove MUTEX_DECLARE() and MTX_COLD. Instead, postpone full mutex initialization until after malloc() is safe to call, then iterate through all mutexes and complete their initialization.
This change is necessary in order to avoid some circular bootstrapping dependencies.
|
#
71228 |
|
18-Jan-2001 |
bmilekic |
Implement MTX_RECURSE flag for mtx_init(). All calls to mtx_init() for mutexes that recurse must now include the MTX_RECURSE bit in the flag argument variable. This change is in preparation for an upcoming (further) mutex API cleanup. The witness code will call panic() if a lock is found to recurse but the MTX_RECURSE bit was not set during the lock's initialization.
The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to MTX_RECURSED, which is more appropriate given its meaning.
The following locks have been made "recursive," thus far: eventhandler, Giant, callout, sched_lock, possibly some others declared in the architecture-specific code, all of the network card driver locks in pci/, as well as some other locks in dev/ stuff that I've found to be recursive.
Reviewed by: jhb
|
#
69586 |
|
04-Dec-2000 |
jake |
Remove the last of the MD netisr code. It is now all MI. Remove spending, which was unused now that all software interrupts have their own thread. Make the legacy schednetisr use an atomic op for setting bits in the netisr mask.
Reviewed by: jhb
|
#
69379 |
|
30-Nov-2000 |
marcel |
Don't use p->p_sigstk.ss_flags to keep state of whether the process is on the alternate stack or not. For compatibility with sigstack(2) state is being updated if such is needed.
We now determine whether the process is on the alternate stack by looking at its stack pointer. This allows a process to siglongjmp from a signal handler on the alternate stack to the place of the sigsetjmp on the normal stack. When maintaining state, this would have invalidated the state information and causing a subsequent signal to be delivered on the normal stack instead of the alternate stack.
PR: 22286
|
#
69207 |
|
26-Nov-2000 |
jlemon |
Add 'mpsafe' parameter to callout_init() in MD bits.
Reminded by: jake
|
#
68889 |
|
19-Nov-2000 |
jake |
- Protect the callout wheel with a separate spin mutex, callout_lock. - Use the mutex in hardclock to ensure no races between it and softclock. - Make softclock be INTR_MPSAFE and provide a flag, CALLOUT_MPSAFE, which specifies that a callout handler does not need giant. There is still no way to set this flag when regstering a callout.
Reviewed by: -smp@, jlemon
|
#
67708 |
|
27-Oct-2000 |
phk |
Convert all users of fldoff() to offsetof(). fldoff() is bad because it only takes a struct tag which makes it impossible to use unions, typedefs etc.
Define __offsetof() in <machine/ansi.h>
Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>
Remove myriad of local offsetof() definitions.
Remove includes of <stddef.h> in kernel code.
NB: Kernelcode should *never* include from /usr/include !
Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.
Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001.
Paritials reviews by: various. Significant brucifications by: bde
|
#
67522 |
|
24-Oct-2000 |
dfr |
* Various fixes to breakage introduced by the atomic and mutex reorgs. * Fixes to the signal delivery code. Not quite right yet.
I would have preferred to wait until I have signal delivery actually working but the current kernel in CVS doesn't build.
|
#
67357 |
|
20-Oct-2000 |
jhb |
- machine/mutex.h -> sys/mutex.h - Use MUTEX_DECLARE() and MTX_COLD for Giant and sched_lock.
|
#
67325 |
|
19-Oct-2000 |
dfr |
Don't force bootverbose anymore.
|
#
67199 |
|
16-Oct-2000 |
dfr |
* Correct some of my misunderstandings about how best to switch to the kernel backing store. * Implement syscalls via break instructions. * Fix backing store copying in cpu_fork() so that the child gets the right register values.
This thing is actually starting to work now. This set of changes takes me up to the second execve (the one which runs the first shell). Next stop single-user mode :-).
|
#
67032 |
|
12-Oct-2000 |
dfr |
Implement a rudimentary interrupt handling system which should be good enough for clock interrupts in SKI.
|
#
67020 |
|
12-Oct-2000 |
dfr |
* Fix exception handling so that it actually works. We can now handle exceptions from both kernel and user mode. * Fix context switching so that we can switch back to a proc which we switched away from (we were saving the state in the wrong place). * Implement lazy switching of the high-fp state. This needs to be looked at again for SMP to cope with the case of a process migrating from one processor to another while it has the high-fp state. * Make setregs() work properly. I still think this should be called cpu_exec() or something. * Various other minor fixes.
With this lot, we can execve() /sbin/init and we get all the way up to its first syscall. At that point, we stop because syscall handling is not done yet.
|
#
66937 |
|
10-Oct-2000 |
dfr |
* Add rudimentary DDB support (no kgdb, no backtrace, no single step). * Track recent changes to SWI code. * Allocate RIDs for pmaps (untested). * Implement assembler version of cpu_switch - its cleaner that way.
|
#
66633 |
|
04-Oct-2000 |
dfr |
Next round of fixes to the ia64 code. This includes simulated clock and disk drivers along with a load of fixes to context switching, fork handling and a load of other stuff I can't remember now. This takes us as far as start_init() before it dies. I guess now I will have to finish off the VM system and syscall handling :-).
|
#
66486 |
|
30-Sep-2000 |
dfr |
Next round of ia64 work, including fixes to context switching, implementing cpu_fork(), copy*str(), bcopy(), copy{in,out}(). With these changes, my test kernel reaches the mountroot prompt.
|
#
66458 |
|
29-Sep-2000 |
dfr |
This is the first snapshot of the FreeBSD/ia64 kernel. This kernel will not work on any real hardware (or fully work on any simulator). Much more needs to happen before this is actually functional but its nice to see the FreeBSD copyright message appear in the ia64 simulator.
|