History log of /freebsd-current/sys/compat/linux/linux_misc.c
Revision Date Author Comments
# c48820a4 29-May-2024 Warner Losh <imp@FreeBSD.org>

minor style tweak.

checkstyle9 doesn't check for this construct...

Fixes: 6d849754b996


# 6d849754 28-May-2024 Son Phan Trung <phantrungson17@gmail.com>

linux: implement PR_CHILD_SET_SUBREAPER

Reviewed by: imp, dchagin
Pull Request: https://github.com/freebsd/freebsd-src/pull/1260


# 9a9677ec 19-May-2024 Ricardo Branco <rbranco@suse.de>

linux: Update linux manpage to mention mqueuefs

Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248


# 97add684 17-May-2024 Ricardo Branco <rbranco@suse.de>

linux: Support POSIX message queues

Reviewed by: imp, kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/1248


# a7cc56b2 10-May-2024 Ricardo Branco <rbranco@suse.de>

linux: Adjust rlimit SIGPENDING & MSGQUEUE behaviour to match linprocfs

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1227


# 3221c44d 02-Feb-2024 rilysh <nightquick@proton.me>

sys/compat/linux/linux_misc.c: remove an extra semicolon

Signed-off-by: rilysh <nightquick@proton.me>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/959


# 199e397e 03-Oct-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deorbit linux_nosys

Differential Revision: https://reviews.freebsd.org/D41901
MFC after: 1 week


# 2a1cf1b6 05-Sep-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deduplicate mmap2

To help porting the Linux emulation layer to a new platforms start using
Linux names for conditional builds instead of architecture-specific ifdefs.

MFC after: 1 week


# 553b1a4e 05-Sep-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deduplicate mprotect, madvise

MFC after: 1 week


# 3460fab5 18-Aug-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Remove sys/cdefs.h inclusion where it's not needed due to 685dc743


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# bbe017e0 04-Aug-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add a dedicated ioprio system calls

On Linux these system calls have an effect only when used in conjuction
with an I/O scheduler that supports I/O priorities. If no I/O scheduler
has been set for a thread, then by defaut the I/O priority will follow
the CPU nice value. Due to FreeBSD lack of I/O scheduler facilities, the
default Linux behavior is implemented.

Ubuntu 23.04 debootstrap requires Linux ionice which depends on these
syscalls.

Differential Revision: https://reviews.freebsd.org/D41153
MFC after: 1 month


# 8340b034 28-May-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add a dedicated linux_exec_copyin_args()

Because Linux allows to exec binaries with 0 argc.

Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D40148
MFC after: 2 month


# fd745e1d 29-May-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Use pwd_altroot() to tell namei() about ABI root path

PR: 72920
Differential Revision: https://reviews.freebsd.org/D40090
MFC after: 2 month


# 166e2e5a 28-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Uniformly dev_t arguments translation

The two main uses of dev_t are in struct stat and as a parameter of the
mknod system calls.
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit quantity
with 12 bits set asaid for the major number and 20 for the minor number.
The in-kernel dev_t encoded as MMMmmmmm, where M is a hex digit of the
major number and m is a hex digit of the minor number.
The user-space dev_t encoded as mmmM MMmm, where M and m is the major
and minor numbers accordingly. This is downward compatible with legacy
systems where dev_t is 16 bits wide, encoded as MMmm.
In glibc dev_t is a 64-bit quantity, with 32-bit major and minor numbers,
encoded as MMMM Mmmm mmmM MMmm. This is downward compatible with the Linux
kernel and with legacy systems where dev_t is 16 bits wide.
In the FreeBSD dev_t is a 64-bit quantity. The major and minor numbers
are encoded as MMMmmmMm, therefore conversion of the device numbers between
Linux user-space and FreeBSD kernel required.


# e185d83f 26-Apr-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Use inlined LINUX_KERNVER for tests to improve readability

MFC after: 1 month


# c8a79231 14-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Rename linux_timer.h to linux_time.h

To avoid confusing people, rename linux_timer.h to linux_time.h,
as linux_timer.c is the implementation of timer syscalls only,
while linux_time.c contains implementation of all stuff declared
in linux_time.h.

MFC after: 2 weeks


# d8e53d94 14-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup includes under compat/linux

Cleanup unneeded includes, sort the rest according to style(9).
No functional changes.

MFC after: 2 weeks


# acbbd5c0 14-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup sys/uio.h where linux_uitl.h is included

MFC after: 2 weeks


# 50c85a32 14-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Move uselib() to i386

This obsolete system call is not supported by glibc. In ancient libc
versions (before glibc 2.0), uselib() was used to load the shared
libraries with names found in an array of names in the binary.
On Linux, since 3.15, this system call is available only when
the kernel is configured with the CONFIG_USELIB option.

It doesn't look like anyone needs this syscall for others Linuxulators,
so move it to the corresponding MD Linuxulator.

MFC after: 2 weeks


# 513eb69e 14-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup sys/sysent.h from linux_util

Include sys/sysent.h directly where it needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.

MFC after: 2 weeks


# 31e938c5 14-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Cleanup vm includes from linux_util.h

Include vm headers directly where they needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.

MFC after: 2 weeks


# 10d16789 12-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Get rid of the opt_compat.h include.

Since e013e369 COMPAT_LINUX, COMPAT_LINUX32 build options are removed,
so include of opt_compat.h is no more needed.

MFC after: 2 weeks


# 3e0c56a7 03-Feb-2023 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Use designated initializers.

MFC after: 1 week


# 93107373 22-Jun-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Trace Linux l_sigset_t.

MFC after: 2 weeks


# 5e872c27 30-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Use the copyin_sigset() in the remaining places

MFC after: 2 weeks


# d46174cd 28-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

Finish cpuset_getaffinity() after f35093f8

Split cpuset_getaffinity() into a two counterparts, where the
user_cpuset_getaffinity() is intended to operate on the cpuset_t from
user va, while kern_cpuset_getaffinity() expects the cpuset from kernel
va.
Accordingly, the code that clears the high bits is moved to the
user_cpuset_getaffinity(). Linux sched_getaffinity() syscall returns
the size of set copied to the user-space and then glibc wrapper clears
the high bits.

MFC after: 2 weeks


# 26700ac0 23-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Deduplicate execve

As linux_execve is common across archs, except amd64 32-bit Linuxulator,
move it under compat/linux.

Noted by: andrew@
MFC after: 2 weeks


# 4a3e5133 20-May-2022 Mark Johnston <markj@FreeBSD.org>

cpuset: Fix the KASAN and KMSAN builds

Rename the "copyin" and "copyout" fields of struct cpuset_copy_cb to
something less generic, since sanitizers define interceptors for
copyin() and copyout() using #define.

Reported by: syzbot+2db5d644097fc698fb6f@syzkaller.appspotmail.com
Fixes: 47a57144af25 ("cpuset: Byte swap cpuset for compat32 on big endian architectures")
Sponsored by: The FreeBSD Foundation


# 89737eb8 19-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

Fix the build after 47a57144


# 5326ebfd 11-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Revert c7ef7c3 as it's wrong at all.

Reported by: trasz


# f35093f8 11-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

Use Linux semantics for the thread affinity syscalls.

Linux has more tolerant checks of the user supplied cpuset_t's.

Minimum cpuset_t size that the Linux kernel permits in case of
getaffinity() is the maximum CPU id, present in the system / NBBY,
the maximum size is not limited.
For setaffinity(), Linux does not limit the size of the user-provided
cpuset_t, internally using only the meaningful part of the set, where
the upper bound is the maximum CPU id, present in the system, no larger
than the size of the kernel cpuset_t.
Unlike FreeBSD, Linux ignores high bits if set in the setaffinity(),
so clear it in the sched_setaffinity() and Linuxulator itself.

Reviewed by: Pau Amma (man pages)
In collaboration with: jhb
Differential revision: https://reviews.freebsd.org/D34849
MFC after: 2 weeks


# 707e567a 08-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add a helper intended for copying timespec's from the userspace.

There are many places where we copyin Linux timespec from the userspace
and then convert it to the kernel timespec. To avoid code duplication
add a tiny halper for doing this.

MFC after: 2 weeks


# 1579b320 08-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Zero out high order bits of nanoseconds in the compat mode.

Assuming the kernel would use random data, the 64-bit Linux kernel ignores
upper 32 bits of tv_nsec of struct timespec64 for 32-bit binaries.

MFC after: 2 weeks


# 9a9482f8 08-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add a helper intended for copying timespec's to the userspace.

There are many places where we convert natvie timespec and copyout it to
the userspace. To avoid code duplication add a tiny halper for doing this.

MFC after: 2 weeks


# 8c84ca65 04-May-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Implement sched_rr_get_interval_time64 syscall.

MFC after: 2 weeks


# 5171ed79 26-Apr-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Copyout pselect timeout.

According to pselect6 manual, on error timeout becomes undefined, by fact
Linux modifies the timeout and ignore EFAULT error if so.

MFC after: 2 weeks


# fe894a37 25-Apr-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Check that the thread tid in the thread group pid in linux_tdfind().

MFC after: 2 weeks


# d5dc757e 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add compat.linux32.emulate_i386 knob.

Historically 32-bit Linuxulator under amd64 emulated the real i386
behavior. Since 3d8dd983 the old i386 Linux world can't be used under
amd64 Linuxulator as it don't know anything about amd64 machine (which
is returned now by newuname() syscall). So, add a knob to allow to swith
the behavior and use i386 Linux binaries on amd64.
Set knob to the new behavior as I think this is common to the modern
Linux distros.

Reviewed by: Pau Amma (doc), emaste
Differential revision: https://reviews.freebsd.org/D34708
MFC after: 2 weeks


# 9103c558 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): wait4() returns ESRCH if pid is INT_MIN.

Weird and undocumented patch was added to the Linux kernel in 2017,
fixes wait403 LTP test.

MFC after: 2 weeks


# 4e3aefb9 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Fixup waitid handling P_PGID idtype.

Since Linux 5.4, if id is zero, then wait for any child that is in the same
process grop as the caller's process group.

Differential revision: https://reviews.freebsd.org/D31567
MFC after: 2 weeks


# be1e4a0b 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Return ENOSYS for unsupported P_PIDFD waitid idtype.

Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D31559
MFC after: 2 weeks


# c7ef7c3f 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Add support for __WALL wait option bit.

As FreeBSD does not have __WALL option bit analogue explicitly set all
possible option bits to emulate Linux __WALL wait option bit.

Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D31555
MFC after: 2 weeks


# d7e1e138 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Fixup options value validation in linux_waitid().

Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D31554
MFC after: 2 weeks


# 2fd480b3 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Eliminate bogus options value check validation.

The kernel do it for us.

Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D31553
MFC after: 2 weeks


# 0c6b1ff7 31-Mar-2022 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Consolidate wait* facility into linux_common_wait().

Also fix bug in waitid() implementation, use wru_self not wru_children.

Differential revision: https://reviews.freebsd.org/D31552
MFC after: 2 weeks


# bb92cd7b 24-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)


# 99454d3e 28-Jan-2022 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: Provide dummy seccomp(2)

Don't emit messages; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it. This is also similar with how we handle seccomp
in linux_prctl().

Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D33808


# 9caeb82e 20-Jan-2022 Edward Tomasz Napierala <trasz@FreeBSD.org>

Revert "linux: Provide dummy seccomp(2)"

This reverts commit 56981629f91fcdd358ccb41081ff6dcc2edac12f.

Wrong patch; fails to build on i386.


# 56981629 25-Jan-2022 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: Provide dummy seccomp(2)

Don't emit warnings; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it. This is also similar with how we handle seccomp
in linux_prctl().

Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D33808


# 0c8d7eeb 18-Dec-2021 Mateusz Guzik <mjg@FreeBSD.org>

linux: plug set-but-not-used vars

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 7e1d3eef 25-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the unused thread argument from NDINIT*

See b4a58fbf640409a1 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.


# af4051d2 25-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

linux: remove the always curthread argument from lconvpath


# 74a0e24f 24-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

linux: plug set-but-not-unused vars

Sponsored by: Rubicon Communications, LLC ("Netgate")


# a90ff3c4 07-Nov-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: Add ptrace(2) support on arm64

This moves linux_ptrace.c from sys/amd64/linux/ to sys/compat/linux/,
making it possible to use it on architectures other than amd64.
It also enables Linux ptrace(2) on arm64.

Relnotes: yes
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32868


# 2f514e6f 03-Jul-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux(4): implement PR_SET_NO_NEW_PRIVS

This makes prctl(2) support PR_SET_NO_NEW_PRIVS, by mapping it
to the native PROC_NO_NEW_PRIVS_CTL procctl(2).

Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30973


# c1da89fe 21-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Retire linux_kplatform.

Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().

I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.

This is the first stage of Linuxulator's vdso revision.

Reviewed by: trasz, imp
Differential Revision: https://reviews.freebsd.org/D30774
MFC after: 2 weeks


# 2eff670f 21-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Implement poll system call via linux_common_ppol()
for the sake of converting events to/from native.

MFC after: 2 weeks


# 26795a03 21-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Rework Linux ppoll system call.

For now the Linux emulation layer uses in kernel ppoll(2) without
conversion of user supplied fd 'events', and does not convert the
kernel supplied fd 'revents'.

At least POLLRDHUP is handled by FreeBSD differently than by
Linux. Seems that Linux silencly ignores POLLRDHUP on non socket fd's
unlike FreeBSD, which does more strictly check and fails.

Rework the Linux ppoll, using kern_poll and converting 'events'
and 'revents' values.
While here, move poll events defines to the MI part of code as they
mostly identical on all arches except arm.

Differential Revision: https://reviews.freebsd.org/D30716
MFC after: 2 weeks


# ed61e0ce 10-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Implement ppoll_time64 system call.

MFC after: 2 weeks


# f6d075ec 10-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Implement pselect6_time64 system call.

MFC after: 2 weeks


# e4bffb80 06-Jun-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Implement utimensat_time64 system call.

MFC after: 2 weeks


# 2a0fa277 31-May-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Microoptimize futimesat, utimes, utime.
While here wrap long line.

Differential Revision: https://reviews.freebsd.org/D30488
MFC after: 2 weeks


# b4f9b6ee 31-May-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Handle AT_EMPTY_PATH in the utimensat syscall.

Differential Revision: https://reviews.freebsd.org/D30518
MFC after: 2 weeks


# 8505eb5d 31-May-2021 Dmitry Chagin <dchagin@FreeBSD.org>

linux(4): Convert flags before use in utimensat.

Differential Revision: https://reviews.freebsd.org/D30487
MFC after: 2 weeks


# ec2700e0 12-Jan-2021 Edward Tomasz Napierala <trasz@FreeBSD.org>

linux: mute the "unsupported prctl option 23" warnings

Make the PR_CAPBSET_READ prctl(2) return EINVAL without logging
any warnings; this is way too noisy with Focal.

Sponsored by: The FreeBSD Foundation


# 76b2bfed 06-Nov-2020 Conrad Meyer <cem@FreeBSD.org>

linux(4): Fix loadable modules after r367395

Move dtrace SDT definitions into linux_common module code. Also, build
linux_dummy.c into the linux_common kld -- we don't need separate
versions of these stubs for 32- and 64-bit emulation.

Reported by: several
PR: 250897
Discussed with: emaste, trasz
Tested by: John Kennedy, Yasuhiro KIMURA, Oleg Sidorkin
X-MFC-With: r367395
Differential Revision: https://reviews.freebsd.org/D27124


# eaa5afce 02-Nov-2020 Conrad Meyer <cem@FreeBSD.org>

linux(4) prctl(2): Implement PR_[GS]ET_DUMPABLE

Proxy the flag to the roughly analogous FreeBSD procctl 'TRACE'.

TRACE-disabled processes are not coredumped, and Linux !DUMPABLE processes
can not be ptraced. There are some additional semantics around ownership of
files in the /proc/[pid] pseudo-filesystem, which we do not attempt to
emulate correctly at this time.

Reviewed by: markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D27015


# a98f0378 02-Nov-2020 Conrad Meyer <cem@FreeBSD.org>

linux(4): style: Eliminate dead 'break' after 'return'

No functional change.


# 1024de70 26-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

linux: add missing conversions for compat.linux.use_emul_path handling


# 62b1382f 24-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Further improve prctl(2) debug.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26916


# 1c748137 22-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Improve prctl(2) debug.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26899


# 54669eb7 18-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add compat.linux.dummy_rlimits, and disable by default.

Turns out the dummy rlimits fix prlimit(1), but break su(8)
(login-1:4.5-1ubuntu2) - although not sudo(8), for some reason.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26814


# 139c0978 16-Oct-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make linux getrlimit(2) and prlimit(2) return something reasonable
for linux-specific limits. Fixes prlimit (util-linux-2.31.1-0.4ubuntu3.7).

Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26777


# 1a180032 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

compat: clean up empty lines in .c and .h files


# a125ed50 18-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

linux: add sysctl compat.linux.use_emul_path

This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.

It defaults to on (== current behavior).

make -C /root/linux-5.3-rc8 -s -j 1 bzImage:

use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total


# 3d8dd983 15-Jun-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make Linux uname(2) return x86_64 to 32-bit apps. This helps Steam.

PR: kern/240432
Analyzed by by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25248


# 479f70ef 10-Jun-2020 Mark Johnston <markj@FreeBSD.org>

Fix a couple of nits in Linux sysinfo(2) emulation.

- Use the same definition of free memory as Linux.
- Rename the totalbig and freebig fields to match the corresponding
names on Linux.

Discussed with: alc
MFC after: 1 week


# 27e4374d 10-Jun-2020 Mark Johnston <markj@FreeBSD.org>

Add a comment reflecting the commit log for r361945.

Suggested by: alc
Reviewed by: alc
MFC with: r361945


# 3e5fae34 08-Jun-2020 Mark Johnston <markj@FreeBSD.org>

Stop computing a "sharedram" value when emulating Linux sysinfo(2).

The previous code was computing an incorrect value in a very expensive
manner. "sharedram" is supposed to be the amount of memory used by
named swap objects, which on FreeBSD basically corresponds to memory
usage by shared memory objects (including, for example, GEM objects) and
tmpfs. We currently have no cheap way to count such pages. The
previous code tried to determine the number of copy-on-write pages
shared between processes.

Just replace the computed value with 0. illumos reportedly does the
same thing. Linux itself did not populate this field until a 2014
commit, "mm: export NR_SHMEM via sysinfo(2) / si_meminfo() interfaces".

Reported by: mjg
MFC after: 1 week


# b4147bf6 05-Mar-2020 Tijl Coosemans <tijl@FreeBSD.org>

Move compat.linux.map_sched_prio sysctl definition to linux_mib.c so it is
only defined by linux_common kernel module and not both linux and linux64
modules.

Reported by: Yuri Pankov <ypankov@fastmail.com>


# f8b9b299 01-Mar-2020 Tijl Coosemans <tijl@FreeBSD.org>

linuxulator: Map scheduler priorities to Linux priorities.

On Linux the valid range of priorities for the SCHED_FIFO and SCHED_RR
scheduling policies is [1,99]. For SCHED_OTHER the single valid priority is
0. On FreeBSD it is [0,31] for all policies. Programs are supposed to
query the valid range using sched_get_priority_(min|max), but of course some
programs assume the Linux values are valid.

This commit adds a tunable compat.linux.map_sched_prio. When enabled
sched_get_priority_(min|max) return the Linux values and sched_setscheduler
and sched_(get|set)param translate between FreeBSD and Linux values.

Because there are more Linux levels than FreeBSD levels, multiple Linux
levels map to a single FreeBSD level, which means pre-emption might not
happen as it does on Linux, so the tunable allows to disable this behaviour.
It is enabled by default because I think it is unlikely that anyone runs
real-time software under Linux emulation on FreeBSD that critically relies
on correct pre-emption.

This fixes FMOD, a commercial sound library used by several games.

PR: 240043
Tested by: Alex S <iwtcex@gmail.com>
Reviewed by: dchagin
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D23790


# 46209cea 14-Jan-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Make linux getcpu(2) report the domain.

Submitted by: markj
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23144


# ca603bb1 12-Jan-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

dd kern_getpriority(), make Linuxulator use it.

Reviewed by: kib, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22842


# 7a0ef283 12-Jan-2020 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add kern_setpriority(), use it in Linuxulator.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22841


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# cc503330 31-Dec-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add basic getcpu(2) support to linuxulator. The purpose of this
syscall is to query the CPU number and the NUMA domain the calling
thread is currently running on. The third argument is ignored.
It doesn't do anything regarding scheduling - it's literally
just a way to query the current state, without any guarantees
you won't get rescheduled an opcode later.

This unbreaks Java from CentOS 8
(java-11-openjdk-11.0.5.10-0.el8_0.x86_64).

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22972


# ee0fe82e 29-Dec-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Implement Linux syslog(2) syscall; just enough to make Linux dmesg(8)
utility work.

MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22465


# be2cfdbc 13-Dec-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add kern_getsid() and use it in Linuxulator; no functional changes.

Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22647


# bb9e2184 18-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Change locking requirements for VOP_UNSET_TEXT().

Require the vnode to be locked for the VOP_UNSET_TEXT() call. This
will be used by the following bug fix for a tmpfs issue.

Tested by: sbruno, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 62375ca7 11-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

compat/linux: Remove obsoleted and somewhat confusing comments related to COMPAT_43.

Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21200


# 2478d444 04-Jul-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix linuxulator prlimit64(2) with pid == 0. This makes 'ulimit -a'
return something reasonable, and helps linux binaries which attempt
to close all the files, eg apt(8).

Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20692


# d49fb289 18-May-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Implement PTRACE_O_TRACESYSGOOD. This makes Linux strace(1) work.

Reviewed by: dchagin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20200


# c5156c77 13-May-2019 Dmitry Chagin <dchagin@FreeBSD.org>

Linuxulator depends on a fundamental kernel settings such as SMP. Many
of them listed in opt_global.h which is not generated while building
modules outside of a kernel and such modules never match real cofigured
kernel.

So, we should prevent our users from building obviously defective modules.

Therefore, remove the root cause of the building of modules outside of a
kernel - the possibility of building modules with DEBUG or KTR flags.
And remove all of DEBUG printfs as it is incomplete and in threaded
programms not informative, also a half of system call does not have DEBUG
printf. For debuging Linux programms we have dtrace, ktr and ktrace ability.

PR: 222861
Reviewed by: trasz
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20178


# 78022527 05-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Switch to use shared vnode locks for text files during image activation.

kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount. To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change. vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP. vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D19923


# 5d520d7f 30-Apr-2019 Dmitry Chagin <dchagin@FreeBSD.org>

Follow the FreeBSD and implement PDEATH_SIG prctl ops in the Linuxulator.
It was first introduced in r163734 and missied by me in r283383.

MFC after: 1 week


# 347a8ed1 21-Jan-2019 Ed Maste <emaste@FreeBSD.org>

linuxulator: fix stack memory disclosure in linux_sigaltstack

Most siginfo_to_lsiginfo callers already zeroed the l_siginfo_t before
callit it, but linux_waitid did not. Instead of zeroing in the called
function to address linux_waitid (as in commit 2e6ebe70), just do it in
linux_waitid.

admbugs: 765
Reported by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Reviewed by: Andrew
MFC after: 1 day
Security: Kernel stack memory disclosure
Sponsored by: The FreeBSD Foundation


# cc426dd3 11-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Remove unused argument to priv_check_cred.

Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by: The FreeBSD Foundation


# 6040822c 30-Jul-2018 Alan Somers <asomers@FreeBSD.org>

Make timespecadd(3) and friends public

The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725


# e8a1ec3e 27-Jun-2018 Ed Maste <emaste@FreeBSD.org>

Split kern_break from sys_break and use it in linuxulator

Previously the linuxulator's linux_brk invoked the FreeBSD sys_break
syscall implementation directly. Instead, move the bulk of the existing
implementation to kern_break, and call that from both sys_break and
linux_brk.

This also addresses a minor bug in linux_brk in that we now return the
actual (rounded up) break address, rather than the requested value.

Reviewed by: brooks (earlier version)
Sponsored by: Turing Robotic Industries
Differential Revision: https://reviews.freebsd.org/D16019


# aec1e6d3 19-Jun-2018 Ed Maste <emaste@FreeBSD.org>

linuxulator: handle V3 capget/capset

Linux 2.6.26 introduced 64-bit capability sets. Extend our stub
implementation to handle both 32- and 64-bit. (We still report no
capabilities in capget, and disallow any in capset.)

Reviewed by: chuck
Sponsored by: Turing Robotic Industries Inc.
Differential Revision: https://reviews.freebsd.org/D15887


# 645f3d43 18-Jun-2018 Ed Maste <emaste@FreeBSD.org>

linuxulator: add debugging for invalid capget/capset version

Sponsored by: Turing Robotic Industries Inc.


# 931e2a1a 15-Jun-2018 Ed Maste <emaste@FreeBSD.org>

linuxulator: do not include legacy syscalls on arm64

Existing linuxulator platforms (i386, amd64) support legacy syscalls,
such as non-*at ones like open, but arm64 and other new platforms do
not.

Wrap these in #ifdef LINUX_LEGACY_SYSCALLS, #defined in the MD linux.h
files. We may need finer grained control in the future but this is
sufficient for now.

Reviewed by: andrew
Sponsored by: Turing Robotic Industries
Differential Revision: https://reviews.freebsd.org/D15237


# 9da5364e 14-Jun-2018 Brooks Davis <brooks@FreeBSD.org>

Name the implementation of brk and sbrk sys_break().

The break() system call was renamed (several times) starting in v3
AT&T UNIX when C was invented and break was a language keyword. The
last vestage of a need for it to be called something else (eg obreak)
was removed in r225617 which consistantly prefixed all syscall
implementations.

Reviewed by: emaste, kib (older version)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15638


# eae594f7 21-Feb-2018 Ed Maste <emaste@FreeBSD.org>

Correct proper nouns in the Linuxulator

- Capitalize Linux
- Spell FreeBSD out in full
- Address some style(9) on changed lines

Sponsored by: Turing Robotic Industries Inc.


# e958ad4c 12-Feb-2018 Jeff Roberson <jeff@FreeBSD.org>

Make v_wire_count a per-cpu counter(9) counter. This eliminates a
significant source of cache line contention from vm_page_alloc(). Use
accessors and vm_page_unwire_noq() so that the mechanism can be easily
changed in the future.

Reviewed by: markj
Discussed with: kib, glebius
Tested by: pho (earlier version)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14273


# 132f90c6 05-Feb-2018 Ed Maste <emaste@FreeBSD.org>

Linuxolator whitespace cleanup

A version of each of the MD files by necessity exists for each CPU
architecture supported by the Linuxolator. Clean these up so that new
architectures do not inherit whitespace issues.

Clean up shared Linuxolator files while here.

Sponsored by: Turing Robotic Industries Inc.


# 7f2d13d6 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/compat: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 51d93426 04-Jun-2017 Dmitry Chagin <dchagin@FreeBSD.org>

On success, getrandom() Linux system call returns the number of bytes that
were copied to the buffer supplied by the user.

PR: 219464
Submitted by: Maciej Pasternacki
Reported by: Maciej Pasternacki
MFC after: 1 week


# e2e6a2a1 04-Jun-2017 Dmitry Chagin <dchagin@FreeBSD.org>

Revert r319053 due to lack of sence. As pointed out by kib@ opt_global.h
contains such fundamental settings as e.g. SMP option and fake
opt_global.h almost never match real configured kernels.

Reported by: kib@


# 9ecc1abc 28-May-2017 Dmitry Chagin <dchagin@FreeBSD.org>

On success, getrandom() Linux system call returns the number of bytes that
were copied to the buffer supplied by the user.

Also fix getrandom() if Linuxulator modules are built without the kernel.

PR: 219464
Submitted by: Maciej Pasternacki
Reported by: Maciej Pasternacki
MFC after: 1 week


# faa9679e 30-Mar-2017 Dmitry Chagin <dchagin@FreeBSD.org>

Use kern_mincore() helper instead of abusing syscall entry.

Suggested by: kib@
Reviewed by: kib@
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D10143


# 6c2a934b 25-Mar-2017 Dmitry Chagin <dchagin@FreeBSD.org>

Implement Linux mincore() system call.
This is necessary for the upcoming drm-next.

Suggested by: hselasky@
MFC after: 1 month


# b1ba0846 18-Mar-2017 Dmitry Chagin <dchagin@FreeBSD.org>

Implement getrandom() syscall.
Note. GRND_RANDOM option is not supported for now.

MFC after: 1 month


# 0670e972 26-Feb-2017 Dmitry Chagin <dchagin@FreeBSD.org>

Return EOVERFLOW error in case then the size of tv_sec field of struct timespec
in COMPAT_LINUX32 Linuxulator's not equal to the size of native tv_sec.

MFC after: 1 month


# 496ab053 13-Feb-2017 Konstantin Belousov <kib@FreeBSD.org>

Rework r313352.

Rename kern_vm_* functions to kern_*. Move the prototypes to
syscallsubr.h. Also change Mach VM types to uintptr_t/size_t as
needed, to avoid headers pollution.

Requested by: alc, jhb
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D9535


# 69cdfcef 06-Feb-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add kern_vm_mmap2(), kern_vm_mprotect(), kern_vm_msync(), kern_vm_munlock(),
kern_vm_munmap(), and kern_vm_madvise(), and use them in various compats
instead of their sys_*() counterparts.

Reviewed by: ed, dchagin, kib
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9378


# 96ee4310 05-Feb-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add kern_cpuset_getaffinity() and kern_cpuset_getaffinity(),
and use it in compats instead of their sys_*() counterparts.

Reviewed by: kib, jhb, dchagin
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9383


# 5db72ef2 31-Jan-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix linux_getppid() to debug the actual parent, even it was reparented
by debugger.

Reviewed by: dchagin@
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D9361


# 23e8912c 10-Jul-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag.
In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap().
Linux/i386 set this flag automatically if the binary requires executable stack.

READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.


# 34e05ebe 31-May-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Fix kernel stack disclosures in the Linux and 4.3BSD compat layers.

Submitted by: CTurt
Security: SA-16:20
Security: SA-16:21


# 1ce4275d 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/compat/linux*: spelling fixes.

Mostly on comments but there are some user-visible messages as well.

MFC after: 2 weeks


# ae26eab1 03-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Fix indentation oops.


# 8bc21baf 03-Apr-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Move Linux specific times tests up to guarantee the values are defined.

CID: 1305178
Submitted by: pfg@
MFC after: 1 week


# 4525bb82 20-Mar-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Rework r296543:

1. Limit secs to INT32_MAX / 2 to avoid errors from kern_setitimer().
Assert that kern_setitimer() returns 0.
Remove bogus cast of secs.
Fix style(9) issues.

2. Increment the return value if the remaining tv_usec value more than 500000 as a Linux does.

Pointed out by: [1] Bruce Evans

MFC after: 1 week


# a87488d1 08-Mar-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Better english.

Submitted by: Kevin P. Neal
MFC after: 1 week


# 649ca5e9 08-Mar-2016 Dmitry Chagin <dchagin@FreeBSD.org>

Put a commit message from r296502 about Linux alarm() system call
behaviour to the source.

Suggested by: emaste@

MFC after: 1 week


# 15c3b371 08-Mar-2016 Dmitry Chagin <dchagin@FreeBSD.org>

According to POSIX and Linux implementation the alarm() system call
is always successfull.
So, ignore any errors and return 0 as a Linux do.

XXX. Unlike POSIX, Linux in case when the invalid seconds value specified
always return 0, so in that case Linux does not return proper remining time.

MFC after: 1 week


# c8358c6e 14-Jan-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Call crextend() before copying old credentials to the new credentials
and replace crcopysafe by crcopy as crcopysafe is is not intended to be
safe in a threaded environment, it drops PROC_LOCK() in while() that
can lead to unexpected results, such as overwrite kernel memory.

In my POV crcopysafe() needs special attention. For now I do not see
any problems with this function, but who knows.

Submitted by: dchagin
Found by: trinity
Security: SA-16:04.linux


# cd0a26c5 23-Oct-2015 Konstantin Belousov <kib@FreeBSD.org>

Fix build for the KTR-enabled kernels.

Sponsored by: The FreeBSD Foundation


# b4490c6e 18-Jul-2015 Konstantin Belousov <kib@FreeBSD.org>

The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig. p_xexit contains complete status
and copied out into si_status.

Requested by: Joerg Schilling
Reviewed by: jilles (previous version), pho
Tested by: pho
Sponsored by: The FreeBSD Foundation


# f6f6d240 10-Jun-2015 Mateusz Guzik <mjg@FreeBSD.org>

Implement lockless resource limits.

Use the same scheme implemented to manage credentials.

Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.

Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.


# d9cbe8f0 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Properly check tv_nsec value. The tv_nsec field can also be one
of the special value UTIME_NOW or UTIME_OMIT.


# 19d8b461 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Add utimensat() system call.

The patch developed by Jilles Tjoelker and Andrew Wilcox and
adopted for lemul branch by me.


# 4ab7403b 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Rework signal code to allow using it by other modules, like linprocfs:

1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.

2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.

3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.

4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.

PR: 197216


# 68098228 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Do not use struct l_timespec without conversion. While here move
args->timeout handling before acquiring the futex key at FUTEX_WAIT path.

Differential Revision: https://reviews.freebsd.org/D1520
Reviewed by: trasz


# a6b40812 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Implement ppoll() system call.

Differential Revision: https://reviews.freebsd.org/D1105
Reviewed by: trasz


# 3e89b641 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Put the correct value for the abi_nfdbits parameter of kern_select() for
all supported Linuxulators.

Differential Revision: https://reviews.freebsd.org/D1093
Reviewed by: trasz


# 2245df38 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Convert Linux wait options to the FreeBSD.
Check wait options as a Linux do.
Linux always set WEXITED option not a WUNTRACED|WNOHANG
which is a strange bug.

Differential Revision: https://reviews.freebsd.org/D1085
Reviewed by: trasz


# 7a7a6efc 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Set WIFCONTINUED to the wait status if needed.

Differential Revision: https://reviews.freebsd.org/D1083
Reviewed by: trasz


# e0d3ea8c 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Where possible we will use M_LINUX malloc(9) type.
Move M_FUTEX defines to the linux_common.ko.

Differential Revision: https://reviews.freebsd.org/D1077
Reviewed by: emaste


# bc273677 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Refund the proc emuldata struct for future use. For now move flags from
thread emuldata to proc emuldata as it was originally intended.

As we can have both 64 & 32 bit Linuxulator running any eventhandler
can be called twice for us. To prevent this move eventhandlers code
from linux_emul.c to the linux_common.ko module.

Differential Revision: https://reviews.freebsd.org/D1073


# 67d39748 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Introduce a new module linux_common.ko which is intended for the
following primary purposes:

1. Remove the dependency of linsysfs and linprocfs modules from linux.ko,
which will be architecture specific on amd64.

2. Incorporate into linux_common.ko general code for platforms on which
we'll support two Linuxulator modules (for both instruction set - 32 & 64 bit).

3. Move malloc(9) declaration to linux_common.ko, to enable getting memory
usage statistics properly.

Currently linux_common.ko incorporates a code from linux_mib.c and linux_util.c
and linprocfs, linsysfs and linux kernel modules depend on linux_common.ko.

Temporarily remove dtrace garbage from linux_mib.c and linux_util.c

Differential Revision: https://reviews.freebsd.org/D1072
In collaboration with: Vassilis Laganakos.

Reviewed by: trasz


# 4ca75bed 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Fix compilation with -DDEBUG option.

Differential Revision: https://reviews.freebsd.org/D1070
Reviewed by: trasz


# 7f8f1d7f 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Disable i386 call for x86-64 Linux.

Differential Revision: https://reviews.freebsd.org/D1067
Reviewed by: trasz


# 0020bdf1 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Put linux_platform into the vdso to avoid copying it onto the stack at
every exec.

Differential Revision: https://reviews.freebsd.org/D1062
Reviewed by: trasz


# ae50b4d7 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Implement pselect6() system call.

Differential Revision: https://reviews.freebsd.org/D1051
Reviewed by: trasz


# c3978c7b 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Implement prlimit64() system call.

Differential Revision: https://reviews.freebsd.org/D1050
Reviewed by: emaste, trasz


# 44e93b23 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Sched_rr_get_interval returns EINVAL in case when the invalid pid
specified. This silence the ltp tests.

Differential Revision: https://reviews.freebsd.org/D1048
Reviewed by: trasz


# e5fe4ccf 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Implement waitid() system call.

Differential Revision: https://reviews.freebsd.org/D1046


# 001398c4 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

To reduce code duplication introduce linux_copyout_rusage() method.
Use it in linux_wait4() system call and move linux_wait4() to the MI path.
While here add a prototype for the static bsd_to_linux_rusage().

Differential Revision: https://reviews.freebsd.org/D2138
Reviewed by: trasz


# a7ae3c55 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Add a function for converting wait options.

Differential Revision: https://reviews.freebsd.org/D1045
Reviewed by: trasz


# 81338031 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Switch linuxulator to use the native 1:1 threads.

The reasons:
1. Get rid of the stubs/quirks with process dethreading,
process reparent when the process group leader exits and close
to this problems on wait(), waitpid(), etc.
2. Reuse our kernel code instead of writing excessive thread
managment routines in Linuxulator.

Implementation details:

1. The thread is created via kern_thr_new() in the clone() call with
the CLONE_THREAD parameter. Thus, everything else is a process.
2. The test that the process has a threads is done via P_HADTHREADS
bit p_flag of struct proc.
3. Per thread emulator state data structure is now located in the
struct thread and freed in the thread_dtor() hook.
Mandatory holdig of the p_mtx required when referencing emuldata
from the other threads.
4. PID mangling has changed. Now Linux pid is the native tid
and Linux tgid is the native pid, with the exception of the first
thread in the process where tid and pid are one and the same.

Ugliness:

In case when the Linux thread is the initial thread in the thread
group thread id is equal to the process id. Glibc depends on this
magic (assert in pthread_getattr_np.c). So for system calls that
take thread id as a parameter we should use the special method
to reference struct thread.

Differential Revision: https://reviews.freebsd.org/D1039


# 2003907d 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

Implement a Linux version of sched_getparam() && sched_setparam().
Temporarily use the first thread in proc.

Differential Revision: https://reviews.freebsd.org/D1036
Reviewed by: trasz


# 1aa90eca 24-May-2015 Dmitry Chagin <dchagin@FreeBSD.org>

In preparation for switching linuxulator to the use the native 1:1
threads refactor kern_sched_rr_get_interval() and sys_sched_rr_get_interval().
Add a kern_sched_rr_get_interval() counterpart which takes a targettd
parameter to allow specify target thread directly by callee (new Linuxulator).

Linuxulator temporarily uses first thread in proc.

Move linux_sched_rr_get_interval() to the MI part.

Differential Revision: https://reviews.freebsd.org/D1032
Reviewed by: trasz


# daf63fd2 15-Mar-2015 Mateusz Guzik <mjg@FreeBSD.org>

cred: add proc_set_cred helper

The goal here is to provide one place altering process credentials.

This eases debugging and opens up posibilities to do additional work when such
an action is performed.


# 5c7bebf9 26-Nov-2014 Konstantin Belousov <kib@FreeBSD.org>

The process spin lock currently has the following distinct uses:

- Threads lifetime cycle, in particular, counting of the threads in
the process, and interlocking with process mutex and thread lock.
The main reason of this is that turnstile locks are after thread
locks, so you e.g. cannot unlock blockable mutex (think process
mutex) while owning thread lock.

- Virtual and profiling itimers, since the timers activation is done
from the clock interrupt context. Replace the p_slock by p_itimmtx
and PROC_ITIMLOCK().

- Profiling code (profil(2)), for similar reason. Replace the p_slock
by p_profmtx and PROC_PROFLOCK().

- Resource usage accounting. Need for the spinlock there is subtle,
my understanding is that spinlock blocks context switching for the
current thread, which prevents td_runtime and similar fields from
changing (updates are done at the mi_switch()). Replace the p_slock
by p_statmtx and PROC_STATLOCK().

The split is done mostly for code clarity, and should not affect
scalability.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 6e646651 13-Nov-2014 Konstantin Belousov <kib@FreeBSD.org>

Remove the no-at variants of the kern_xx() syscall helpers. E.g., we
have both kern_open() and kern_openat(); change the callers to use
kern_openat().

This removes one (sometimes two) levels of indirection and
consolidates arguments checks.

Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 44f1c916 22-Mar-2014 Bryan Drewery <bdrewery@FreeBSD.org>

Rename global cnt to vm_cnt to avoid shadowing.

To reduce the diff struct pcu.cnt field was not renamed, so
PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in
kvm(3) and vmstat(8). The goal was to not affect externally used KPI.

Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the
the global cnt variable.

Exp-run revealed no ports using it directly.

No objection from: arch@
Sponsored by: EMC / Isilon Storage Division


# 54366c0b 25-Nov-2013 Attilio Rao <attilio@FreeBSD.org>

- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip


# edb572a3 09-Sep-2013 John Baldwin <jhb@FreeBSD.org>

Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping use
an address in the first 2GB of the process's address space. This flag should
have the same semantics as the same flag on Linux.

To facilitate this, add a new parameter to vm_map_find() that specifies an
optional maximum virtual address. While here, fix several callers of
vm_map_find() to use a VMFS_* constant for the findspace argument instead of
TRUE and FALSE.

Reviewed by: alc
Approved by: re (kib)


# 5df87b21 07-Aug-2013 Jeff Roberson <jeff@FreeBSD.org>

Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division


# 140dedb8 02-Nov-2012 Konstantin Belousov <kib@FreeBSD.org>

The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem. There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount. Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode. Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode. Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by: pho
MFC after: 3 weeks


# 5050aa86 22-Oct-2012 Konstantin Belousov <kib@FreeBSD.org>

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# 877d24ac 28-Sep-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix the mis-handling of the VV_TEXT on the nullfs vnodes.

If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by: pho (previous version)
MFC after: 2 weeks


# 19e252ba 05-May-2012 Alexander Leidinger <netchild@FreeBSD.org>

- >500 static DTrace probes for the linuxulator
- DTrace scripts to check for errors, performance, ...
they serve mostly as examples of what you can do with the static probe;s
with moderate load the scripts may be overwhelmed, excessive lock-tracing
may influence program behavior (see the last design decission)

Design decissions:
- use "linuxulator" as the provider for the native bitsize; add the
bitsize for the non-native emulation (e.g. "linuxuator32" on amd64)
- Add probes only for locks which are acquired in one function and released
in another function. Locks which are aquired and released in the same
function should be easy to pair in the code, inter-function
locking is more easy to verify in DTrace.
- Probes for locks should be fired after locking and before releasing to
prevent races (to provide data/function stability in DTrace, see the
man-page of "dtrace -v ..." and the corresponding DTrace docs).


# 3494f31a 17-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix misuse of the kernel map in miscellaneous image activators.
Vnode-backed mappings cannot be put into the kernel map, since it is a
system map.

Use exec_map for transient mappings, and remove the mappings with
kmem_free_wakeup() to notify the waiters on available map space.

Do not map the whole executable into KVA at all to copy it out into
usermode. Directly use vn_rdwr() for the case of not page aligned
binary.

There is one place left where the potentially unbounded amount of data
is mapped into exec_map, namely, in the COFF image activator
enumeration of the needed shared libraries.

Reviewed by: alc
MFC after: 2 weeks


# 9a14aa01 15-Jan-2012 Ulrich Spörlein <uqs@FreeBSD.org>

Convert files to UTF-8


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# 1ba5ad42 05-Apr-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add accounting for most of the memory-related resources.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 605da56b 26-Mar-2011 Andriy Gapon <avg@FreeBSD.org>

linux compat: improve and fix sendmsg/recvmsg compatibility

- implement baseic stubs for capget, capset, prctl PR_GET_KEEPCAPS
and prctl PR_SET_KEEPCAPS.
- add SCM_CREDS support to sendmsg and recvmsg
- modify sendmsg to ignore control messages if not using UNIX
domain sockets

This should allow linux pulse audio daemon and client work on FreeBSD
and interoperate with native counter-parts modulo the differences in
pulseaudio versions.

PR: kern/149168
Submitted by: John Wehle <john@feith.com>
Reviewed by: netchild
MFC after: 2 weeks


# d207e753 15-Feb-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Put the macro declaration in the relevant include file for future use.


# 596ba1bd 28-Jan-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Style(9) fixes.

MFC after: 1 Month.


# adc7ece0 28-Jan-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Implement a variation of the linux_common_wait() which should
be used by linuxolator itself.

Move linux_wait4() to MD path as it requires native struct
rusage translation to struct l_rusage on linux32/amd64.

MFC after: 1 Month.


# d908c2d2 27-Jan-2011 Dmitry Chagin <dchagin@FreeBSD.org>

Style(9) fix.

MFC after: 1 month.


# bb63fdde 22-Nov-2010 Alexander Leidinger <netchild@FreeBSD.org>

By using the 32-bit Linux version of Sun's Java Development Kit 1.6
on FreeBSD (amd64), invocations of "javac" (or "java") eventually
end with the output of "Killed" and exit code 137.

This is caused by:
1. After calling exec() in multithreaded linux program threads are not
destroyed and continue running. They get killed after program being
executed finishes.

2. linux_exit_group doesn't return correct exit code when called not
from group leader. Which happens regularly using sun jvm.

The submitters fix this in a similar way to how NetBSD handles this.

I took the PRs away from dchagin, who seems to be out of touch of
this since a while (no response from him).

The patches committed here are from [2], with some little modifications
from me to the style.

PR: 141439 [1], 144194 [2]
Submitted by: Stefan Schmidt <stefan.schmidt@stadtbuch.de>, gk
Reviewed by: rdivacky (in april 2010)
MFC after: 5 days


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 3c48c089 24-Feb-2010 Brooks Davis <brooks@FreeBSD.org>

MFC r202143,202163,202341,202342,204278

Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to
somewhere in the neighborhood of INT_MAX/4 one a system with sufficent
RAM and memory bandwidth. Given that the Windows group limit is
1024, this range should be sufficient for most applications

r202342:
Only allocate the space we need before calling kern_getgroups instead
of allocating what ever the user asks for up to "ngroups_max + 1". On
systems with large values of kern.ngroups this will be more efficient.

The now redundant check that the array is large enough in
kern_getgroups() is deliberate to allow this change to be merged to
stable/8 without breaking potential third party consumers of the API.


# 3ef5ae2d 15-Jan-2010 Brooks Davis <brooks@FreeBSD.org>

Since all other comparisons involving ngroups_max use
"ngroups_max + 1", use ">= ngroups_max+1" instead of the equivalent
"> ngroups_max" to reduce confusion.


# 412f9500 12-Jan-2010 Brooks Davis <brooks@FreeBSD.org>

Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1. Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after: 1 month


# 9f1fab50 16-Sep-2009 Konstantin Belousov <kib@FreeBSD.org>

MFC r197049:
Calculate the amount of bytes to copy for select filedescriptor masks
taking into account size of fd_set for the current process ABI.

Approved by: re (kensmith)


# b55ef216 09-Sep-2009 Konstantin Belousov <kib@FreeBSD.org>

kern_select(9) copies fd_set in and out of userspace in quantities of
longs. Since 32bit processes longs are 4 bytes, 64bit kernel may copy in
or out 4 bytes more then the process expected.

Calculate the amount of bytes to copy taking into account size of fd_set
for the current process ABI.

Diagnosed and tested by: Peter Jeremy <peterjeremy acm org>
Reviewed by: jhb
MFC after: 1 week


# 838d9858 19-Jun-2009 Brooks Davis <brooks@FreeBSD.org>

Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.

Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.

Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867


# 7455b100 12-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Add counterparts to getcredhostname:
getcreddomainname, getcredhostuuid, getcredhostid

Suggested by: rmacklem
Approved by: bz


# bcf11e8d 05-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# 76ca6f88 29-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


# 8d30f381 10-May-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Do not export AT_CLKTCK when emulating Linux kernel prior
to 2.4.0, as it has appeared in the 2.4.0-rc7 first time.
Being exported, AT_CLKTCK is returned by sysconf(_SC_CLK_TCK),
glibc falls back to the hard-coded CLK_TCK value when aux entry
is not present.

Glibc versions prior to 2.2.1 always use hard-coded CLK_TCK value.

For older applications/libc's which depends on hard-coded CLK_TCK
value user should set compat.linux.osrelease less than 2.4.0.

Approved by: kib (mentor)


# 1ca16454 10-May-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Rework r189362, r191883.
The frequency of the statistics clock is given by stathz.
Use stathz if it is available, otherwise use hz.

Pointed out by: bde

Approved by: kib (mentor)


# c65b9bfa 07-May-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Linux exports HZ value to user space via AT_CLKTCK auxiliary vector entry,
which is available for Glibc as sysconf(_SC_CLK_TCK). If AT_CLKTCK entry is
not exported, Glibc uses 100.

linux_times() shall use the value that is exported to user space.

Pointyhat to: dchagin

PR: kern/134251
Approved by: kib (mentor)
MFC after: 2 weeks


# 4d706dcc 06-May-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Change linux struct tms definition to match actual linux one.

Approved by: kib (mentor)
MFC after: 2 weeks


# 4d7c2e8a 03-Mar-2009 Dmitry Chagin <dchagin@FreeBSD.org>

Add AT_PLATFORM, AT_HWCAP and AT_CLKTCK auxiliary vector entries which
are used by glibc. This silents the message "2.4+ kernel w/o ELF notes?"
from some programs at start, among them are top and pkill.

Do the assignment of the vector entries in elf_linux_fixup()
as it is done in glibc.

Fix some minor style issues.

Submitted by: Marcin Cieslak <saper at SYSTEM PL>
Approved by: kib (mentor)
MFC after: 1 week


# ddf9d243 28-Dec-2008 Ed Schouten <ed@FreeBSD.org>

Push down Giant inside sysctl. Also add some more assertions to the code.

In the existing code we didn't really enforce that callers hold Giant
before calling userland_sysctl(), even though there is no guarantee it
is safe. Fix this by just placing Giant locks around the call to the oid
handler. This also means we only pick up Giant for a very short period
of time. Maybe we should add MPSAFE flags to sysctl or phase it out all
together.

I've also added SYSCTL_LOCK_ASSERT(). We have to make sure sysctl_root()
and name2oid() are called with the sysctl lock held.

Reviewed by: Jille Timmermans <jille quis cx>


# a1b5a895 09-Nov-2008 Ed Schouten <ed@FreeBSD.org>

Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.

Looking at our source code history, it seems the uname(),
getdomainname() and setdomainname() system calls got deprecated
somewhere after FreeBSD 1.1, but they have never been phased out
properly. Because we don't have a COMPAT_FREEBSD1, just use
COMPAT_FREEBSD4.

Also fix the Linuxolator to build without the setdomainname() routine by
just making it call userland_sysctl on kern.domainname. Also replace the
setdomainname()'s implementation to use this approach, because we're
duplicating code with sysctl_domainname().

I wasn't able to keep these three routines working in our
COMPAT_FREEBSD32, because that would require yet another keyword for
syscalls.master (COMPAT4+NOPROTO). Because this routine is probably
unused already, this won't be a problem in practice. If it turns out to
be a problem, we'll just restore this functionality.

Reviewed by: rdivacky, kib


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 68da8b22 04-Oct-2008 Konstantin Belousov <kib@FreeBSD.org>

Current linux_fooaffinity() emulation fails, as the FreeBSD affinity
syscalls expect the bitmap size in the range from 32 to 128. Old glibc
always assumed size 1024, while newer glibc searches for approriate
size, starting from 1024 and going up.

For now, use FreeBSD size of cpuset_t for bitmap size parameter and
return EINVAL if length of user space bitmap less than our size of
cpuset_t.

Submitted by: dchagin
MFC after: 1 week
[This requires MFC of the actual linux affinity syscalls]


# 8b615593 02-Oct-2008 Marko Zec <zec@FreeBSD.org>

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


# 9545354e 22-Sep-2008 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix usage of mac_vnode_check_open() in linuxulator - last argument
should be VREAD, not FREAD.

Approved by: rwatson (mentor)


# 0d621709 11-Sep-2008 Roman Divacky <rdivacky@FreeBSD.org>

The ERESTART to EINTR conversion is already done in
kern_select so there is no need to repeat it in
linux_select().

Submitted by: Dmitry Chagin <dchagin@>
MFC after: 1 week
Approved by: kib (mentor)


# 0359a12e 28-Aug-2008 Attilio Rao <attilio@FreeBSD.org>

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# 603724d3 17-Aug-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


# 0864e2a4 23-Jul-2008 Roman Divacky <rdivacky@FreeBSD.org>

Fix linux_alarm, the linux behaviour is to limit the
secs to INT_MAX when the passed in parameter is bigger
than INT_MAX.

Submitted by: Dmitry Chagin <chagin.dmitry gmail com>
Approved by: kib (mentor)


# 4f7d1876 05-Jul-2008 Robert Watson <rwatson@FreeBSD.org>

Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables. Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout(). A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after: 3 weeks


# 4732e446 13-May-2008 Roman Divacky <rdivacky@FreeBSD.org>

Implement robust futexes. Most of the code is modelled after
what Linux does. This is because robust futexes are mostly
userspace thing which we cannot alter. Two syscalls maintain
pointer to userspace list and when process exits a routine
walks this list waking up processes sleeping on futexes
from that list.

Reviewed by: kib (mentor)
MFC after: 1 month


# 48b05c3f 08-Apr-2008 Konstantin Belousov <kib@FreeBSD.org>

Implement the linux syscalls
openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat,
renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat.

Submitted by: rdivacky
Sponsored by: Google Summer of Code 2007
Tested by: pho


# d7a38db6 25-Mar-2008 Ruslan Ermilov <ru@FreeBSD.org>

Fix build.

Reported by: ache, tinderbox


# 5dfb6881 16-Mar-2008 Roman Divacky <rdivacky@FreeBSD.org>

Implement sched_setaffinity and get_setaffinity using
real cpu affinity setting primitives.

Reviewed by: jeff
Approved by: kib (mentor)


# cbd2c621 22-Feb-2008 Konstantin Belousov <kib@FreeBSD.org>

Sanitize arguments to linux_mremap().
Check that only MREMAP_FIXED and MREMAP_MAYMOVE flags are specified.
Check for the page alignment of the addr argument.

Submitted by: rdivacky
MFC after: 1 week


# 22db15c0 13-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 30d239bc 24-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# b6e645c9 27-Aug-2007 Konstantin Belousov <kib@FreeBSD.org>

Implement fake linux sched_getaffinity() syscall to enable java to work
with Linux 2.6 emulation. This shall be reimplemented once FreeBSD gets
native scheduler affinity syscalls.

Submitted by: rdivacky
Reviewed by: jkim
Sponsored by: Google Summer of Code 2007
Approved by: re (kensmith)


# 32f9753c 11-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# a1fe14bc 09-Jun-2007 Attilio Rao <attilio@FreeBSD.org>

rufetch and calcru sometimes should be called atomically together.
This patch fixes places where they should be called atomically changing
their locking requirements (both assume per-proc spinlock held) and
introducing rufetchcalc which wrappers both calls to be performed in
atomic way.

Reviewed by: jeff
Approved by: jeff (mentor)


# 2feb50bf 31-May-2007 Attilio Rao <attilio@FreeBSD.org>

Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)


# 9e223287 31-May-2007 Konstantin Belousov <kib@FreeBSD.org>

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


# 222d0195 18-May-2007 Jeff Roberson <jeff@FreeBSD.org>

- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.

Contributed by: Attilio Rao <attilio@FreeBSD.org>


# 802e08a3 24-Feb-2007 Alexander Leidinger <netchild@FreeBSD.org>

Partial MFp4 of 114977:
Whitespace commit: Fix grammar, spelling and punctuation.

Submitted by: "Scot Hetzel" <swhetzel@gmail.com>


# 1a26db0a 23-Feb-2007 Alexander Leidinger <netchild@FreeBSD.org>

MFp4 (114193 (i386 part), 114194, 114195, 114200):
- Dont "return" in linux_clone() after we forked the new process in a case
of problems.
- Move the copyout of p2->p_pid outside the emul_lock coverage in
linux_clone().
- Cache the em->pdeath_signal in a local variable and move the copyout
out of the emul_lock coverage.
- Move the free() out of the emul_shared_lock coverage in a preparation
to switch emul_lock to non-sleepable lock (mutex).

Submitted by: rdivacky


# 75ee4e54 01-Feb-2007 Konstantin Belousov <kib@FreeBSD.org>

No need to lock emul_lock in exit_group() because em->shared
cannot change (because its referenced by curthread). This fixes
a LOR caused by acquiring emul_shared_lock while holding emul_lock.

Fix typo in comment.

Submitted by: rdivacky


# a8494019 07-Jan-2007 Alexander Leidinger <netchild@FreeBSD.org>

MFp4 (112646):
Now (ok it's been a while...) that FreeBSD has RLIMIT_AS too, we can use
it in the linuxolator instead of ignoring it.

This fixes a LTP test.

Submitted by: rdivacky


# 0ed6f09c 07-Jan-2007 Alexander Leidinger <netchild@FreeBSD.org>

MFp4 (112534):
Dont lock em in a case of just using em->shared->group_pid because
the group_pid never changes.

Submitted by: rdivacky
Reviewed by: kib
Glanced at by: jhb


# 1c65504c 07-Jan-2007 Alexander Leidinger <netchild@FreeBSD.org>

MFp4 (112498):
Rename the locking flags to EMUL_DOLOCK and EMUL_DONTLOCK to prevent confusion.

Submitted by: rdivacky


# c9447c75 31-Dec-2006 Alexander Leidinger <netchild@FreeBSD.org>

MFp4 (111746, 108671, 108945, 112352):
- add linux utimes syscall [1]
- add linux rt_sigtimedwait syscall [2]

Submitted by: "Scot Hetzel" <swhetzel@gmail.com> [1]
Submitted by: Bruce Becker <hostmaster@whois.gts.net> [2]
PR: 93199 [2]


# 9ce8f9bc 30-Dec-2006 Alexander Leidinger <netchild@FreeBSD.org>

MFp4 (111746+):
Redo the checking for 2.6 emulation. We now cache the value of
use26 and replace calls to linux_get_osrelease() + parsing with
a call to linux_use26(). Typical path is lockless now.

Pointed out by: kib

This allows to ship RELENG_7_0 with a default osrelease of 2.4.2 and the
possibility to enable 2.6.x emulation without the possible performance
impact of the previous version of the check.

Submitted by: rdivacky


# ef95cfea 31-Dec-2006 Alexander Leidinger <netchild@FreeBSD.org>

MFp4:
- semi-automatic style fixes
- spelling fixes in comments
- add some comments


# b34608fe 04-Dec-2006 Jung-uk Kim <jkim@FreeBSD.org>

MFP4: 109653

Linux mknod(2) can open any files, not just char/block or fifo files.
This fixes Linux Test Project test cases mknod01, mknod07 and mknod09.


# f6018b14 02-Dec-2006 Alexander Leidinger <netchild@FreeBSD.org>

MFP4 (108673, 110519, 110874):
- Currently LINUX_MAX_COMM_LEN is smaller than MAXCOMLEN, but in case
this will change we have a buffer overflow. Apply some defensive
programming to DTRT when this should happen.
- Use copyinstr() instead of copyin where appropriate.
* Fallback to copyin() in case of ENAMETOOLONG. [1]
* Use the right source and destination (it was wrong before).
- Use strlcpy instead of strcpy.
- Properly lock the read case (PR_GET_NAME) like the write case.

Reviewed by: rwatson (except [1])
Suggested by: rwatson [1]


# cce15146 18-Nov-2006 Konstantin Belousov <kib@FreeBSD.org>

Sync struct sysinfo with real one from linux.

Submitted by: rdivacky


# d559d181 18-Nov-2006 Konstantin Belousov <kib@FreeBSD.org>

Add debuging printfs to syscalls that do not contain it yet. In
sethostname do not print the hostname because it would require to copyin
the string. Sethostname is not very frequently used.

Submitted by: rdivacky


# f472c6e3 18-Nov-2006 Konstantin Belousov <kib@FreeBSD.org>

Remove unecessary locking of process in linux_getpid.

Suggested by: jhb
Submitted by: rdivacky


# 0132096d 15-Nov-2006 Konstantin Belousov <kib@FreeBSD.org>

In rev 1.188 of linux_misc.c the added check for valid options ommited
__WCLONE. This fixes it thus fixing skype/teamspeak to not keep zombies
after exit.

Submitted by: rdivacky
Reported by: Bakul Shah (bakul at bitblocks com)


# 6aeb05d7 11-Nov-2006 Tom Rhodes <trhodes@FreeBSD.org>

Merge posix4/* into normal kernel hierarchy.

Reviewed by: glanced at by jhb
Approved by: silence on -arch@ and -standards@


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# c4ce314b 28-Oct-2006 Alexander Leidinger <netchild@FreeBSD.org>

Fix style(9).

Noticed by: rwatson


# 955d762a 28-Oct-2006 Alexander Leidinger <netchild@FreeBSD.org>

MFP4:
Implement prctl().

Submitted by: rdivacky
Tested with: LTP


# aed55708 22-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# 7660ace1 08-Oct-2006 Alexander Leidinger <netchild@FreeBSD.org>

- Replace homegrown check for FIFO with S_ISFIFO. [1]
- Check the status of the options before messing with it.

Inspired by: NetBSD [1]
Submitted by: rdivacky
Tested with: LTP


# 18f81b3d 16-Sep-2006 Alexander Leidinger <netchild@FreeBSD.org>

- don't reboot() when feed with wrong parameters (and enough permissions) [1]
- add support to power off the system [2]
- check the linux magic values [3]

Submitted by: Marcin Cieslak <saper@SYSTEM.PL> [1,2]
Modelled after: linux man page of the reboot() syscall [3]
Found by: LTP testcase "reboot02" [1]
Tested with: LTP testcase "reboot02" [1,3]
MFC after: 1 week


# 3e8df637 25-Aug-2006 Robert Watson <rwatson@FreeBSD.org>

Don't call suser_cred() directly from linux_sethostname(), as it just
wraps userland_sysctl(), which performs necessary privilege checks as
part of its normal operation.

MFC after: 1 week


# 1a28c0df 20-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Sync the MI parts for amd64 with i386 and remove the corresponding special
handling for amd64 in the common code. The MD parts for amd64 are still
outstanding, but at least this fixes some panics on amd64.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Tested by: bsam


# 590e3a06 17-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

- disable some more code when osrelease=2.4.2
- protect td->td_proc->p_pid with the proc lock in linux_getpid
in the amd64 (= non i386) case [1]

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: netchild [1]


# 94cb2ecf 17-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Move some stuff into headers where they belong.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Noticed by: jhb, ssouhlal


# a43eeaab 15-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Disable some parts of the code on amd64 for now to prevent a panic. A better
fix will come later.

Sponsored by: Google SoC 2006
Submitted by: rdivacky


# 9b44bfc5 14-Aug-2006 Alexander Leidinger <netchild@FreeBSD.org>

Add the linux 2.6.x stuff (not used by default!):
- TLS - complete
- pid/tid mangling - complete
- thread area - complete
- futexes - complete with issues
- clone() extension - complete with some possible minor issues
- mq*/timer*/clock* stuff - complete but untested and the mq* stuff is
disabled when not build as part of the kernel with native FreeBSD mq*
support (module support for this will come later)

Tested with:
- linux-firefox - works, tested
- linux-opera - works, tested
- linux-realplay - doesnt work, issue with futexes
- linux-skype - doesnt work, issue with futexes
- linux-rt2-demo - works, tested
- linux-acroread - doesnt work, unknown reason (coredump) and sometimes
issue with futexes
- various unix utilities in linux-base-gentoo3 and linux-base-fc4:
everything tried worked

On amd64 not everything is supported like on i386, the catchup is planned for
later when the remaining bugs in the new functions are fixed.

To test this new stuff, you have to run
sysctl compat.linux.osrelease=2.6.16
to switch back use
sysctl compat.linux.osrelease=2.4.2

Don't switch while running a linux program, strange things may or may not
happen.

Sponsored by: Google SoC 2006
Submitted by: rdivacky
Some suggestions/help by: jhb, kib, manu@NetBSD.org, netchild


# b4c63329 21-Jul-2006 John Baldwin <jhb@FreeBSD.org>

- Pass the MPSAFE flag to namei() in linux_uselib() and handle conditional
Giant VFS locking in that function.
- Remove bogus code to handle the case where namei() returns success but a
NULL vnode pointer.
- Note that this code duplicates exec_check_permissions() and annotate
where it differs.
- Hold the vnode lock longer to protect the write to set VV_TEXT in
v_vflag.
- Mark linux_uselib() MPSAFE.

Reviewed by: rwatson


# 555f86b8 23-Jun-2006 Alexander Leidinger <netchild@FreeBSD.org>

The linux times syscall can be called with a NULL pointer, so keep cool
and don't panic.

This fix is different from the patch submitted as it not only prevents
a NULL-pointer dereference, but also skips some work in this case.

Noticed by: Dmitry Ganenko <dima@apk-inform.com>
Reviewed by: rdivacky (the original version as in emulation@)
MFC after: 1 week
Security: This is a RELENG_x_y candidate (local DoS).
Go ahead by: secteam (cperciva)


# 01e0ffba 10-May-2006 Alexander Leidinger <netchild@FreeBSD.org>

Now that we don't have a linuxolator on alpha anymore:
- unifdef __alpha__
- revert rev. 1.66 of linux_socket.c


# d9d46ed2 27-Mar-2006 Tai-hwa Liang <avatar@FreeBSD.org>

Unbreaking build by removing a now unused variable.


# b77619bd 27-Mar-2006 John Baldwin <jhb@FreeBSD.org>

Use td_ucred rather than p_ucred to avoid panics and general unhappiness.

Pointy hat to: netchild


# aefce619 19-Mar-2006 Ruslan Ermilov <ru@FreeBSD.org>

Unbreak COMPAT_LINUX32 option support on amd64.

Broken by: netchild


# d4a3f5dd 18-Mar-2006 Alexander Leidinger <netchild@FreeBSD.org>

Fixup some problems in my previous commit (COMPAT_43).

Pointyhat to: netchild


# 5c8919ad 18-Mar-2006 Alexander Leidinger <netchild@FreeBSD.org>

Get rid of the need of COMPAT_43 in the linuxolator.

Submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Obtained from: DragonFly (some parts)


# 0e36e11d 28-Dec-2005 Tom Rhodes <trhodes@FreeBSD.org>

Cast tv_sec to intmax_t and print with %jd in some ifdef'ed code.


# 9104847f 13-Oct-2005 David Xu <davidxu@FreeBSD.org>

1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most
changes in MD code are trivial, before this change, trapsignal and
sendsig use discrete parameters, now they uses member fields of
ksiginfo_t structure. For sendsig, this change allows us to pass
POSIX realtime signal value to user code.

2. Remove cpu_thread_siginfo, it is no longer needed because we now always
generate ksiginfo_t data and feed it to libpthread.

3. Add p_sigqueue to proc structure to hold shared signals which were
blocked by all threads in the proc.

4. Add td_sigqueue to thread structure to hold all signals delivered to
thread.

5. i386 and amd64 now return POSIX standard si_code, other arches will
be fixed.

6. In this sigqueue implementation, pending signal set is kept as before,
an extra siginfo list holds additional siginfo_t data for signals.
kernel code uses psignal() still behavior as before, it won't be failed
even under memory pressure, only exception is when deleting a signal,
we should call sigqueue_delete to remove signal from sigqueue but
not SIGDELSET. Current there is no kernel code will deliver a signal
with additional data, so kernel should be as stable as before,
a ksiginfo can carry more information, for example, allow signal to
be delivered but throw away siginfo data if memory is not enough.
SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can
not be caught or masked.
The sigqueue() syscall allows user code to queue a signal to target
process, if resource is unavailable, EAGAIN will be returned as
specification said.
Just before thread exits, signal queue memory will be freed by
sigqueue_flush.
Current, all signals are allowed to be queued, not only realtime signals.

Earlier patch reviewed by: jhb, deischen
Tested on: i386, amd64


# 8d948cd1 07-Jul-2005 John Baldwin <jhb@FreeBSD.org>

Fix the computation of uptime for linux_sysinfo(). Before it was returning
the uptime in seconds mod 60 which wasn't very useful.

Approved by: re (scottl)


# bc165ab0 08-Jun-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Properly convert FreeBSD priority values into Linux values in the
getpriority(2) syscall.

PR: kern/81951
Submitted by: Andriy Gapon <avg@icyb.net.ua>


# 7625cbf3 27-Apr-2005 Jeff Roberson <jeff@FreeBSD.org>

- Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.


# 98df9218 01-Apr-2005 John Baldwin <jhb@FreeBSD.org>

- Change the vm_mmap() function to accept an objtype_t parameter specifying
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
vnodes and anonymous memory. Note that mmaping a cdev directly does not
currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
cdev the ioctl is acting on rather than trying to find a suitable vnode
to map from.

Reviewed by: alc, arch@


# e3478fe0 06-Mar-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Handle unimplemented syscall by instantly returning ENOSYS instead of sending
signal first and only then returning ENOSYS to match what real linux does.

PR: kern/74302
Submitted by: Travis Poppe <tlp@LiquidX.org>


# 12dd959a 07-Feb-2005 John Baldwin <jhb@FreeBSD.org>

Use kern_setitimer() to implement linux_alarm() instead of fondling the
real interval timer directly.


# cfa0efe7 25-Jan-2005 Maxim Sobolev <sobomax@FreeBSD.org>

Split out kernel side of {get,set}itimer(2) into two parts: the first that
pops data from the userland and pushes results back and the second which does
actual processing. Use the latter to eliminate stackgap in the linux wrappers
of those syscalls.

MFC after: 2 weeks


# 1997c537 13-Jan-2005 David E. O'Brien <obrien@FreeBSD.org>

Match the LINUX32's style with existing style
Submitted by: Jung-uk Kim <jkim@niksun.com>

Use positive, not negative logic.


# 9c0552ce 13-Jan-2005 David E. O'Brien <obrien@FreeBSD.org>

Fix Linux compat 'uname -m' on AMD64.

Submitted by: Jung-uk Kim <jkim@niksun.com>
(patch reworked by me)


# 78c85e8d 05-Oct-2004 John Baldwin <jhb@FreeBSD.org>

Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
times it needs rather than calling getrusage() twice with associated
stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
for user, system, and interrupt time as well as a bintime of the total
runtime. A new p_rux field in struct proc replaces the same inline fields
from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux
field in struct proc contains the "raw" child time usage statistics.
ruadd() has been changed to handle adding the associated rusage_ext
structures as well as the values in rusage. Effectively, the values in
rusage_ext replace the ru_utime and ru_stime values in struct rusage. These
two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
calculates appropriate timevals for user and system time as well as updating
the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a
copy of the process' p_rux structure to compute the timevals after updating
the runtime appropriately if any of the threads in that process are
currently executing. It also now only locks sched_lock internally while
doing the rux_runtime fixup. calcru() now only requires the caller to
hold the proc lock and calcru1() only requires the proc lock internally.
calcru() also no longer allows callers to ask for an interrupt timeval
since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
calling calcru1() on p_crux. Note that this means that any code that wants
child times must now call this function rather than reading from p_cru
directly. This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
proctree lock is used to protect the process group rather than the process
group lock. By holding this lock until the end of the function we now
ensure that the process/thread that we pick to dump info about will no
longer vanish while we are trying to output its info to the console.

Submitted by: bde (mostly)
MFC after: 1 month


# 4a16b489 16-Aug-2004 David E. O'Brien <obrien@FreeBSD.org>

Fix the 'DEBUG' argument code to unbreak the amd64 LINT build.


# 3a2e3a4a 16-Aug-2004 David E. O'Brien <obrien@FreeBSD.org>

Fix the 'DEBUG' argument code to unbreak the LINT build.


# 4af27623 16-Aug-2004 Tim J. Robbins <tjr@FreeBSD.org>

Changes to MI Linux emulation code necessary to run 32-bit Linux binaries
on AMD64, and the general case where the emulated platform has different
size pointers than we use natively:
- declare certain structure members as l_uintptr_t and use the new PTRIN
and PTROUT macros to convert to and from native pointers.
- declare some structures __packed on amd64 when the layout would differ
from that used on i386.
- include <machine/../linux32/linux.h> instead of <machine/../linux/linux.h>
if compiling with COMPAT_LINUX32. This will need to be revisited before
32-bit and 64-bit Linux emulation support can coexist in the same kernel.
- other small scattered changes.

This should be a no-op on i386 and Alpha.


# ae8e14a6 14-Aug-2004 Tim J. Robbins <tjr@FreeBSD.org>

Replace linux_getitimer() and linux_setitimer() with implementations
based on those in freebsd32_misc.c, removing the assumption that Linux
uses the same layout for struct itimerval as we use natively.


# d1d6dbf1 14-Aug-2004 Tim J. Robbins <tjr@FreeBSD.org>

Avoid assuming that l_timeval is the same as the native struct timeval
in linux_select().


# 56f21b9d 26-Jul-2004 Colin Percival <cperciva@FreeBSD.org>

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


# 1930e303 11-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


# b7e23e82 17-Mar-2004 John Baldwin <jhb@FreeBSD.org>

- Replace wait1() with a kern_wait() function that accepts the pid,
options, status pointer and rusage pointer as arguments. It is up to
the caller to copyout the status and rusage to userland if needed. This
lets us axe the 'compat' argument and hide all that functionality in
owait(), by the way. This also cleans up some locking in kern_wait()
since it no longer has to drop locks around copyout() since all the
copyout()'s are deferred.
- Convert owait(), wait4(), and the various ABI compat wait() syscalls to
use kern_wait() rather than wait1() or wait4(). This removes a bit
more stackgap usage.

Tested on: i386
Compiled on: i386, alpha, amd64


# 91d5354a 04-Feb-2004 John Baldwin <jhb@FreeBSD.org>

Locking for the per-process resource limits structure.
- struct plimit includes a mutex to protect a reference count. The plimit
structure is treated similarly to struct ucred in that is is always copy
on write, so having a reference to a structure is sufficient to read from
it without needing a further lock.
- The proc lock protects the p_limit pointer and must be held while reading
limits from a process to keep the limit structure from changing out from
under you while reading from it.
- Various global limits that are ints are not protected by a lock since
int writes are atomic on all the archs we support and thus a lock
wouldn't buy us anything.
- All accesses to individual resource limits from a process are abstracted
behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return
either an rlimit, or the current or max individual limit of the specified
resource from a process.
- dosetrlimit() was renamed to kern_setrlimit() to match existing style of
other similar syscall helper functions.
- The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit()
(it didn't used the stackgap when it should have) but uses lim_rlimit()
and kern_setrlimit() instead.
- The svr4 compat no longer uses the stackgap for resource limits calls,
but uses lim_rlimit() and kern_setrlimit() instead.
- The ibcs2 compat no longer uses the stackgap for resource limits. It
also no longer uses the stackgap for accessing sysctl's for the
ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result,
ibcs2_sysconf() no longer needs Giant.
- The p_rlimit macro no longer exists.

Submitted by: mtm (mostly, I only did a few cleanups and catchups)
Tested on: i386
Compiled on: alpha, amd64


# 277b6204 02-Jan-2004 Alan Cox <alc@FreeBSD.org>

Lock the traversal of the vm object list. Use TAILQ_FOREACH consistently.


# d09c47ac 16-Nov-2003 Maxim Sobolev <sobomax@FreeBSD.org>

Pull latest changes from OpenBSD:

- improve sysinfo(2) syscall;
- add dummy fadvise64(2) syscall;
- add dummy *xattr(2) family of syscalls;
- add protos for the syscalls 222-225, 238-249 and 253-267;
- add exit_group(2) syscall, which is currently just wired to exit(2).

Obtained from: OpenBSD
MFC after: 2 weeks


# 1d2d5501 21-Oct-2003 Tim J. Robbins <tjr@FreeBSD.org>

Reject negative ngrp arguments in linux_setgroups() and linux_setgroups16();
stops users being able to cause setgroups to clobber the kernel stack by
copying in data past the end of the linux_gidset array.


# 34eec0a1 07-Sep-2003 Bruce Evans <bde@FreeBSD.org>

Restored a non-egregious cast so that this file compiles on i386's
with 64-bit longs again. This was fixed in rev.1.42 but the fix
rotted non-fatally in rev.1.105 and fatally in rev.1.137.

Many more non-egregrious casts are strictly required for conversions
from semi-opaque types to pointers, but we avoid most of them by using
types that are almost certain to be compatible with uintptr_t for
representing pointers (e.g., vm_offset_t). Here we don't really want
the u_longs, but we have them because a.out.h and its support code
doesn't use typedefs (it uses unsigned in V7 and unsigned long in
FreeBSD) and is too obsolete to fix now.


# 7576b4b4 29-Jul-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

Try to make 'uname -a' look more like it does on Linux:

- cut the version string at the newline, suppressing information about
who built the kernel and in what directory. Most of this information
was already lost to truncation.

- on i386, return the precise CPU class (if known) rather than just
"i386". Linux software which uses this information to select
which binary to run often does not know what to make of "i386".


# a8d43c90 26-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.


# 567104a1 18-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add a new function swap_pager_status() which reports the total size of the
paging space and how much of it is in use (in pages).

Use this interface from the Linuxolator instead of groping around in the
internals of the swap_pager.


# 16dbc7f2 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 104a9b7e 29-Apr-2003 Alexander Kabaev <kan@FreeBSD.org>

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


# 8804bf6b 17-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Use local struct proc variables to reduce repeated td->td_proc dereferences
and improve readability.


# 20b04da8 16-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Explicitly cast a l_ulong to an unsigned long to make all arch's happy
with the printf format.


# 760eb2e0 16-Apr-2003 John Baldwin <jhb@FreeBSD.org>

Fix printf format in a debug printf.


# b62f75cf 13-Mar-2003 John Baldwin <jhb@FreeBSD.org>

- Change the linux_[gs]et_os{name, release, s_version}() functions to
take a thread instead of a proc for their first argument.
- Add a mutex to protect the system-wide Linux osname, osrelease, and
oss_version variables.
- Change linux_get_prison() to take a thread instead of a proc for its
first argument and to use td_ucred rather than p_ucred. This is ok
because a thread's prison does not change even though it's ucred might.
- Also, change linux_get_prison() to return a struct prison * instead of
a struct linux_prison * since it returns with the struct prison locked
and this makes it easier to safely unlock the prison when we are done
messing with it.


# 1d062e2b 03-Mar-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

Clean up whitespace and remove register keyword.


# 4b7ef73d 03-Mar-2003 Dag-Erling Smørgrav <des@FreeBSD.org>

More caddr_t removal, in conjunction with copy{in,out}(9) this time.
Also clean up some egregious casts and incorrect use of sizeof.


# 96d7f8ef 17-Feb-2003 Tim J. Robbins <tjr@FreeBSD.org>

Use the proc lock to protect p_realtimer instead of Giant, and obtain
sched_lock around accesses to p_stats->p_timer[] to avoid a potential
race with hardclock. getitimer(), setitimer() and the realitexpire()
callout are now Giant-free.


# fb30aed1 14-Feb-2003 Tim J. Robbins <tjr@FreeBSD.org>

Obtain proc lock around modification of p_siglist in linux_wait4().


# 75e8f2da 17-Oct-2002 Robert Drehmel <robert@FreeBSD.org>

- Use strlcpy() rather than strncpy() to copy NUL terminated
strings.
- Pass the correct buffer size to getcredhostname().


# 1d9c5696 01-Oct-2002 Juli Mallett <jmallett@FreeBSD.org>

Back our kernel support for reliable signal queues.

Requested by: rwatson, phk, and many others


# 1226f694 30-Sep-2002 Juli Mallett <jmallett@FreeBSD.org>

First half of implementation of ksiginfo, signal queues, and such. This
gets signals operating based on a TailQ, and is good enough to run X11,
GNOME, and do job control. There are some intricate parts which could be
more refined to match the sigset_t versions, but those require further
evaluation of directions in which our signal system can expand and contract
to fit our needs.

After this has been in the tree for a while, I will make in kernel API
changes, most notably to trapsignal(9) and sendsig(9), to use ksiginfo
more robustly, such that we can actually pass information with our
(queued) signals to the userland. That will also result in using a
struct ksiginfo pointer, rather than a signal number, in a lot of
kern_sig.c, to refer to an individual pending signal queue member, but
right now there is no defined behaviour for such.

CODAFS is unfinished in this regard because the logic is unclear in
some places.

Sponsored by: New Gold Technology
Reviewed by: bde, tjr, jake [an older version, logic similar]


# 0fa89fc7 24-Sep-2002 Jeff Roberson <jeff@FreeBSD.org>

- Hold the vn lock over vm_mmap().


# c5afa587 19-Sep-2002 Matthew N. Dodd <mdodd@FreeBSD.org>

Pass flags to msync() accounting for differences in the definition of
MS_SYNC on FreeBSD and Linux.

Submitted by: Christian Zander <zander@minion.de>


# 367797e0 04-Sep-2002 Bruce Evans <bde@FreeBSD.org>

Do not cast from a pointer to an integer of a possibly different size.
This fixes a warning on i386's with 64-bit longs.


# 85422e62 05-Sep-2002 Bruce Evans <bde@FreeBSD.org>

Include <sys/malloc.h> instead of depending on namespace pollution 2
layers deep in <sys/proc.h> or <sys/vnode.h>.

Removed unused includes. Sorted includes.


# 206a5d3a 01-Sep-2002 Ian Dowse <iedowse@FreeBSD.org>

Use the new kern_* functions to avoid the need to store arguments
in the stack gap. This converts most VFS and signal related system
calls, as well as select().

Discussed on: -arch
Approved by: marcel


# e6e370a7 04-Aug-2002 Jeff Roberson <jeff@FreeBSD.org>

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


# eddc160e 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points for a number of VFS-related
operations in the Linux ABI module. In particular, handle uselib
in a manner similar to open() (more work is probably needed here),
as well as handle statfs(), and linux readdir()-like calls.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# fa3b8ffb 14-Jun-2002 Robert Watson <rwatson@FreeBSD.org>

Add a comment about how we should use vn_open() here instead of directly
invoking VOP_OPEN(). This would reduce code redundancy with the rest
of the kernel, and also is required for MAC to work properly.


# 21dc7d4f 02-Jun-2002 Jens Schweikhardt <schweikh@FreeBSD.org>

Fix typo in the BSD copyright: s/withough/without/

Spotted and suggested by: des
MFC after: 3 weeks


# 4924b9dd 30-Apr-2002 Peter Wemm <peter@FreeBSD.org>

Zap some stale unused headers, including one machine/psl.h (which is
a stub on alpha). Compile tested on alpha and x86.


# b099af16 20-Apr-2002 Robert Watson <rwatson@FreeBSD.org>

Add an XXX: linux_uselib() should be using vn_open() rather than invoking
VOP_OPEN() and doing lots of manual checking. This would further
centralize use of the name functions, and once the MAC code is integrated,
meaning few extraneous MAC checks scattered all over the place. I don't
have time to fix this now, but want to make sure it doesn't get
forgotten. Anyone interested in fixing this should feel free.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 094a9455 13-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Rework logic of syscalls that modify process credentials as described in
rev 1.152 of sys/kern/kern_prot.c.


# 0af24d51 11-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Use td_ucred in a few spots.


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# 85103150 20-Mar-2002 Jeff Roberson <jeff@FreeBSD.org>

Remove references to vm_zone.h and switch over to the new uma API.


# a854ed98 27-Feb-2002 John Baldwin <jhb@FreeBSD.org>

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


# 668ae588 27-Feb-2002 Robert Drehmel <robert@FreeBSD.org>

Use the updated getcredhostname() function.


# 5597f0cc 27-Feb-2002 Robert Drehmel <robert@FreeBSD.org>

Use the getcredhostname function to fill the hostname into
the linux_newuname_args structure. This should fix the case
of jailed linux processes not using the jail's hostname.

PR: 35336
Reviewed by: phk


# 21e06996 23-Jan-2002 Andrew Gallatin <gallatin@FreeBSD.org>

Linux/alpha uses the same BSDish return mechanism we do for
getpid, getuid, getgid and pipe, since they bootstrapped from
OSF/1 and never cleaned up. Switch to the native syscalls
on alpha so that the above functions work

MFC after: 7 days


# 01137630 03-Dec-2001 Robert Watson <rwatson@FreeBSD.org>

o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
so as to enforce these protections, in particular, in kern_mib.c
protection sysctl access to the hostname and securelevel, as well as
kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
mib entries (osname, osrelease, osversion) so that they don't return
a pointer to the text in the struct linux_prison, rather, a copy
to an array passed into the calls. Likewise, update linprocfs to
use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
accessing struct prison.

Reviewed by: jhb


# c798b362 24-Nov-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

Revert incorrect KSEfication: realitexpire expects a struct proc *, not a
struct thread *.


# cbc89bfb 10-Oct-2001 Paul Saab <ps@FreeBSD.org>

Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by: peter
MFC after: 2 weeks


# 9b130a99 27-Sep-2001 Marcel Moolenaar <marcel@FreeBSD.org>

Remove linux_getpgid(). We map the syscall natively now.

PR: kern/21402


# b8febfd1 15-Sep-2001 Michael Reifenberger <mr@FreeBSD.org>

Add a wrapper for linux_getsid -> getsid Syscall.


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 5002a60f 08-Sep-2001 Marcel Moolenaar <marcel@FreeBSD.org>

Round of cleanups and enhancements. These include (in random order):

o Introduce private types for use in linux syscalls for two reasons:
1. establish type independence for ease in porting and,
2. provide a visual queue as to which syscalls have proper
prototypes to further cleanup the i386/alpha split.
Linuxulator types are prefixed by 'l_'. void and char have not
been "virtualized".

o Provide dummy functions for all syscalls and remove dummy functions
or implementations of truely obsolete syscalls.

o Sanitize the shm*, sem* and msg* syscalls.

o Make a first attempt to implement the linux_sysctl syscall. At this
time it only returns one MIB (KERN_VERSION), but most importantly,
it tells us when we need to add additional sysctls :-)

o Bump the kenel version up to 2.4.2 (this is not the same as the
KERN_VERSION MIB, BTW).

o Implement new syscalls, of which most are specific to i386. Our
syscall table is now up to date with Linux 2.4.2. Some highlights:
- Implement the 32-bit uid_t and gid_t bases syscalls.
- Implement a couple of 64-bit file size/offset bases syscalls.

o Fix or improve numerous syscalls and prototypes.

o Reduce style(9) violations while I'm here. Especially indentation
inconsistencies within the same file are addressed. Re-indenting
did not obfuscate actual changes to the extend that it could not
be combined.

NOTE: I spend some time testing these changes and found that if there
were regressions, they were not caused by these changes AFAICT.
It was observed that installing a RH 7.1 runtime environment
did make matters worse. Hangs and/or reboots have been observed
with and without these changes, so when it failed to make life
better in cases it doesn't look like it made it worse.


# 814c9526 23-Jul-2001 Jim Pirzyk <pirzyk@FreeBSD.org>

Added the linux_sysinfo function to implement sysinfo(2).

PR: kern/27759
Reviewed by: marcel
Approved by: marcel
MFC after: 1 week


# 2e17a059 15-Jun-2001 Peter Wemm <peter@FreeBSD.org>

Fix warning:
413: warning: long unsigned int format, vm_offset_t arg (arg 2)


# b1fc0ec1 25-May-2001 Robert Watson <rwatson@FreeBSD.org>

o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# c7e18870 24-Apr-2001 Robert Watson <rwatson@FreeBSD.org>

o Change a suser() call to a suser_xxx(..., PRISON_ROOT) call in the
linuxulator so as to allow privileged processes within a jail() to
invoke the Linux initgroups() system call. This allows the Linux
"su" to work properly (better) when running a complete Linux
environment under jail(). This problem was reported by Attila
Nagy <bra@fsn.hu>.

Reviewed by: marcel


# 33a9ed9d 23-Apr-2001 John Baldwin <jhb@FreeBSD.org>

Change the pfind() and zpfind() functions to lock the process that they
find before releasing the allproc lock and returning.

Reviewed by: -smp, dfr, jake


# 21c8cdfb 31-Mar-2001 Alan Cox <alc@FreeBSD.org>

Add linux_sched_get_priority_max() and linux_sched_get_priority_min(): The
policy parameter requires translation.


# 6d4aa00a 23-Mar-2001 Andrew Gallatin <gallatin@FreeBSD.org>

fix linux_times() to take into account linux's value of CLK_TCK on the alpha.
Previously, results were off by a factor of 10

Tested by: Yoriaki FUJIMORI <fujimori@grafin.fujimori.cache.waseda.ac.jp>


# 24593369 16-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Allow debugging output to be controlled on a per-syscall granularity.
Also clean up debugging output in a slightly more uniform fashion.

The default behavior remains the same (all debugging output is turned on)


# 705deb78 16-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Add mount syscall to linux emulation. Also improve emulation of reboot.


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# ba88dfc7 26-Jan-2001 John Baldwin <jhb@FreeBSD.org>

Back out proc locking to protect p_ucred for obtaining additional
references along with the actual obtaining of additional references.


# fb29c3e0 23-Jan-2001 John Baldwin <jhb@FreeBSD.org>

Protect calcru() with sched_lock.


# 216af822 15-Dec-2000 John Baldwin <jhb@FreeBSD.org>

Lock access to proc members.

Glanced over by: marcel


# b4c6727a 02-Dec-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Don't auto-generate the syscalls.


# 4f559836 27-Nov-2000 Jake Burkholder <jake@FreeBSD.org>

Use callout_reset instead of timeout(9). Most callouts are statically
allocated, 2 have been added to struct proc for setitimer and sleep.

Reviewed by: jhb, jlemon


# ebea8660 10-Nov-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Revert auto-generation. The Alpha port is broken.
Syncing with it is wrong.


# 2da829a0 09-Nov-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Sync with Alpha:
Do not use sysent.c, proto.h and syscall.h in source tree;
use auto-generated versions.


# 5231fb20 01-Nov-2000 David E. O'Brien <obrien@FreeBSD.org>

The MI/MD split wasn't perfect and the MI files need hacks for the
AlphaLinux compat bits. This will be better cleaned up soon.

Agreed to what ever was necessary by: marcel


# 4a22d850 25-Aug-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Fix bug in previous commit. We need to trim the limits to fit
the datatype (= long). Use ULONG_MAX and LONG_MAX to avoid
creating MD code.


# eebc2a07 25-Aug-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Re-implement linux_{g|s}etrlimit in terms of {g|s}etrlimit
instead of the o{g|s}etrlimit so that the dependency on
COMPAT_43 is removed.


# a751315c 21-Aug-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Update include directives.

Move linux_select to MD code (i386 compat. syscall).

Move linux_fork, linux_vfork, linux_clone, linux_mmap,
linux_pipe, linux_ioperm, linux_iopl and linux_modify_ldt
to MD code.


# 03567510 23-Jul-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Add bounds checking to stackgap_alloc. Previously it was possible
to construct a path that was long enough (ie longer than
SPARE_USRSPACE bytes) and trash the stack.

Note that SPARE_USRSPACE is much smaller than MAXPATHLEN so that
the Linuxulator will now return ENAMETOOLONG even if the path
is smaller than MAXPATHLEN.

PR: 12749


# a603fe5a 19-Jul-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Revert implementation of setfsuid and setfsgid due to security
issues.

Requested by: rwatson
Backed by: kris


# ddb48608 16-Jul-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Implement setfsuid and setfsgid. Implementation derived from patch
in PR.

PR: 16993
Submitted by: Bjoern Groenvall <bg@sics.se>


# 6f6b2cd0 15-Jun-2000 Martin Cracauer <cracauer@FreeBSD.org>

Linux allows to mmap annonymous with a file descriptor passed, FreeBSD
doesn't. In the Linux emulation layer, ignore the fd passed when
MAP_ANON is specified.

Known application to be fixed: Xanalys/Harlequin Lispworks

Also improve debug output for mmap, now showing what the emulation
layer mapped to what (-DDEBUG).

Reviewed by: marcel


# 2c9b67a8 30-Apr-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unneeded #include <vm/vm_zone.h>

Generated by: src/tools/tools/kerninclude


# 3c1124cf 09-Mar-2000 Marcel Moolenaar <marcel@FreeBSD.org>

Fix bug in linux_wait4 and linux_waitpid where garbage in the status
argument could panic the kernel.

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Prompted by: jkh, gallatin
Approved by: prompters


# 762e6b85 15-Dec-1999 Eivind Eklund <eivind@FreeBSD.org>

Introduce NDFREE (and remove VOP_ABORTOP)


# 923502ff 29-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

useracc() the prequel:

Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.


# 956d3333 29-Sep-1999 Marcel Moolenaar <marcel@FreeBSD.org>

sigset_t change (part 4 of 5)
-----------------------------

The compatibility code and/or emulators have been updated:

iBCS2 now mostly uses the older syscalls. SVR4 now properly
handles all signals. This has been achieved by using the
new sigset_t throughout the emulator. The Linuxulator has
been severely updated. Internally the new Linux sigset_t is
made the default. These are then mapped to and from the
new FreeBSD sigset_t.

Also, rt_sigsuspend has been implemented in the Linuxulator.
Implementing this syscall basicly caused all this sigset_t
changing in the first place and the syscall has been used
throughout the change as a means for testing. It basicly is
too much work to undo the implementation so that it can
later be added again.

A special note on the use of sv_sigtbl and sv_sigsize in
struct sysentvec:
Every signal larger than sv_sigsize is not translated and is
passed on to the signal handler unmodified. Signals in the
range 1 upto and including sv_sigsize are translated.
The rationale is that only the system defined signals need to
be translated.

The emulators also have been updated so that the translation
tables are only indexed for valid (system defined) signals.
This change also fixes the translation bug already in the
SVR4 emulator.


# 2323686a 22-Sep-1999 Luoqi Chen <luoqi@FreeBSD.org>

Implement linux_ioperm() syscall. Fix linux_iopl() to use the level argument.
SVGAlib should now work.

Reviewed by: marcel


# 6771d803 03-Sep-1999 Marcel Moolenaar <marcel@FreeBSD.org>

I missed the namechange of field desc in struct i386_ldt_args into descs while
reviewing luoqi's changes...

Pointed out by: luoqi


# ff78e850 02-Sep-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Implementation of the modify_ldt syscall. Use the sysarch() interface to do
the actual work. When USER_LDT is not defined for a kernel, sysarch returns
EOPNOTSUPP. Display a message in that case and return ENOSYS to userland.

Reviewed by: luoqi


# d4c45842 29-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Fix a missing '-1' in the size argument of copyout in getgroups. Spotted while
reviewing the MFC in -stable.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# c6dfea0e 27-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Add sysctl variables for the Linuxulator. These reside under `compat.linux' as
discussed on current.

The following variables are defined (for now):

osname (defaults to "Linux")
Allow users to change the name of the OS as returned by uname(2),
specially added for all those Linux Netscape users and statistics
maniacs :-) We now have what we all wanted!

osrelease (defaults to "2.2.5")
Allow users to change the version of the OS as returned by uname(2).
Since -current supports glibc2.1 now, change the default to 2.2.5
(was 2.0.36).

oss_version (defaults to 198144 [0x030600])
This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I
can commit now that we have the MIB. The default version number is the
lowest version possible with the current 'encoding'.

A note about imprisoned processes (see jail(2)):
These variables are copy-on-write (as suggested by phk). This means that
imprisoned processes will use the system wide value unless it is written/set
by the process. From that moment on, a copy local to the prison will be
used.

A note about the implementation:
I choose to add a single pointer to struct prison, because I didn't like the
idea of changing struct prison every time I come up with a new variable. As
a side effect, the extra storage is only needed when a variable is set from
within the prison. This also minimizes kernel bloat when the Linuxulator is
not used; both compiled in or as a module.

Reviewed by: bde (first version only) and phk


# c85f6717 25-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Fix {g|s}etgroups semantics. We use cr_groups[0] to hold egid. This means that
egid will be twice in the set and that setting cr_groups[0] will change egid.
This is simply solved by ignoring cr_groups[0]. That is; linux_getgroups does
not return cr_groups[0] and linux_setgroups does not touch it.

Noticed by: bde
Brought to my attention by: sheldonh


# 2fdc82e0 25-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Change all UNIMPL syscalls to STD and add them to linux_dummy. Now we always
know if and when an unimplemented or obsoleted syscall is being used. Make the
message more end-user friendly.

And as long as we're here, rename some unimplemeted syscalls (linux_phys ->
linux_umount2, linux_vm86 -> linux_vm86old, linux_new_vm86 -> linux_vm86).

Change prototype for linux_newuname from `struct linux_newuname_t *' into
`struct linux_new_utsname *'. This change is reflected in linux.h and
linux_misc.c.


# ce2b2a92 17-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Fix bug in the debug-printf of the vfork syscall, where the format specifier
didn't match the argument (p->p_pid).

While I'm at it, also fix the dupo in the format string and fix the annoying
inconsistency in all the debug-printfs wrt p_pid arguments. Change all of them
to use the %ld format specifier and cast the p_pid arguments to long.

Submitted by: billf


# 42035021 16-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Implement linux_vfork() syscall by calling vfork(). Analogous to the
linux_fork() implementation.


# a171f5ad 15-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Provide wrappers for sched_{s|g}etscheduler. We need to convert the policy
argument.

PR: 12006
Originator: Jean-Claude MICHOT <jcmichot@teaser.fr>


# 20c661be 15-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Include opt_compat.h so that COMPAT_43 is defined. This gives us the proper
prototypes of o{s|g}etrlimit (from sys/sysproto.h). Update linux_{s|g}etrlimit
so that the arguments to o{s|g}etrlimit are corresponding the prototypes.

Pointed out by: bde


# 175db64b 11-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Do not map {s|g}etrlimit onto FreeBSD syscalls. The arguments don't match.
The linux syscalls translate the arguments first before invoking the
FreeBSD native syscalls.

PR: kern/9591
Originator: John Plevyak <jplevyak@inktomi.com>


# 6a6ea79a 08-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Fix page fault in linux_uselib syscall.

PR: 12910
Submitted by: Peter Holm <peter@holm.cc>


# 19e52096 05-Jul-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Let newuname return "Linux" as the OS name and not "FreeBSD". Also, return a
more sensible (for Linux applications) release number. Hardcoding a release
number has its drawbacks, but it will do for now.


# d5558c00 06-May-1999 Peter Wemm <peter@FreeBSD.org>

Fix up a few easy 'assignment used as truth value' and 'suggest parens
around && within ||' type warnings. I'm pretty sure I have not masked
any problems here, I've committed real problem fixes seperately.


# 5206bca1 27-Apr-1999 Luoqi Chen <luoqi@FreeBSD.org>

Enable vmspace sharing on SMP. Major changes are,
- %fs register is added to trapframe and saved/restored upon kernel entry/exit.
- Per-cpu pages are no longer mapped at the same virtual address.
- Each cpu now has a separate gdt selector table. A new segment selector
is added to point to per-cpu pages, per-cpu global variables are now
accessed through this new selector (%fs). The selectors in gdt table are
rearranged for cache line optimization.
- fask_vfork is now on as default for both UP and SMP.
- Some aio code cleanup.

Reviewed by: Alan Cox <alc@cs.rice.edu>
John Dyson <dyson@iquest.net>
Julian Elischer <julian@whistel.com>
Bruce Evans <bde@zeta.org.au>
David Greenman <dg@root.com>


# 1c308b81 26-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Change suser_xxx() to suser() where it applies.


# f711d546 27-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# db42d908 19-Apr-1999 Peter Wemm <peter@FreeBSD.org>

unifdef -DVM_STACK - it's been on for a while for x86 and was checked
and appeared to be working for the Alpha some time ago.


# 4ac9ae70 01-Mar-1999 Julian Elischer <julian@FreeBSD.org>

Fix thread/process tracking and differentiation for Linux threads emulation.

Submitted by: Richard Seaman, Jr." <dick@tar.com>

Also clean some compiler warnings in surrounding code.


# 88c5ea45 25-Jan-1999 Julian Elischer <julian@FreeBSD.org>

Enable Linux threads support by default.
This takes the conditionals out of the code that has been tested by
various people for a while.
ps and friends (libkvm) will need a recompile as some proc structure
changes are made.

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 2267af78 06-Jan-1999 Julian Elischer <julian@FreeBSD.org>

Add (but don't activate) code for a special VM option to make
downward growing stacks more general.
Add (but don't activate) code to use the new stack facility
when running threads, (specifically the linux threads support).
This allows people to use both linux compiled linuxthreads, and also the
native FreeBSD linux-threads port.

The code is conditional on VM_STACK. Not using this will
produce the old heavily tested system.

Submitted by: Richard Seaman <dick@tar.com>


# 397e4760 30-Dec-1998 Søren Schmidt <sos@FreeBSD.org>

Commit #2 of

PR: 9235
Submitted by: marcel@scc.nl <Marcel Moolenaar>


# 1b88e5d7 24-Dec-1998 Julian Elischer <julian@FreeBSD.org>

According to the author..

"I've been having a problem running the patches [committed to current]
installed with the COMPAT_LINUX_THREADS option along
with the VM_STACK patches I did. I'm not sure what
the problem is, since it seemed to work before.

In any event, the attached patch fixes the problem for
me. While I've had no report of problems from anyone
else, possibly it would be wise to commit the patch
until the problem is found.

Also, there was some left-over junk in the linux_misc.c
file from some earlier work I did. The attached patch
cleans that up too."

Submitted by: "Richard Seaman, Jr." <dick@tar.com>


# 6626c604 18-Dec-1998 Julian Elischer <julian@FreeBSD.org>

Reviewed by: Luoqi Chen, Jordan Hubbard
Submitted by: "Richard Seaman, Jr." <lists@tar.com>
Obtained from: linux :-)

Code to allow Linux Threads to run under FreeBSD.

By default not enabled
This code is dependent on the conditional
COMPAT_LINUX_THREADS (suggested by Garret)
This is not yet a 'real' option but will be within some number of hours.


# 57da30bf 10-Dec-1998 Jordan K. Hubbard <jkh@FreeBSD.org>

linux_pipe does not preserve the edx register. Linux and
programs using glibc expect edx to be preserved accross syscalls.
As a result, linux programs running in emulation mode can
have whatever value may be represented by edx clobbered.

PR: 9038
Submitted-By: Richard Seaman, Jr. <dick@tar.com>


# 2127f260 04-Dec-1998 Archie Cobbs <archie@FreeBSD.org>

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


# 5dd99c3b 04-Oct-1998 Søren Schmidt <sos@FreeBSD.org>

In linux_newuname bzero the right type of struct (linux_newuname_t).


# 9587f05d 24-Sep-1998 Jordan K. Hubbard <jkh@FreeBSD.org>

MF22: revert time bogon.


# c9297a73 23-Sep-1998 Jordan K. Hubbard <jkh@FreeBSD.org>

return time in proper format for linux.


# 86a14a7a 15-Aug-1998 Bruce Evans <bde@FreeBSD.org>

Use [u]intptr_t instead of [u_]long for casts between pointers and
integers. Don't forget to cast to (void *) as well.


# 882fdeae 05-Aug-1998 Bruce Evans <bde@FreeBSD.org>

Converted the second last instance of hzto() to tvtohz().

Fixed nearby bugs (in linux_alarm()):
- the itimer for the alarm was relative to the epoch instead of relative
to the boot time. This was harmless because the itimer's interval is 0.
- the seconds arg was not checked for validity before converting it to a
possibly different value.
- printf format errors.

Improvements:
Don't use splclock(). splsoftclock() suffices. Don't complicate things
by micro-optimizing interrupt latency.

Minor improvements:
Various micro-optimizations to exploit the specialness of the alarm itimer
and the value 0.


# e4e6ae13 29-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Fixed print format errors.


# d14897d3 10-Jul-1998 Jordan K. Hubbard <jkh@FreeBSD.org>

Quick and dirty support for Linux's mremap. Not used by anything
but quake2 AFAIK.

Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>


# c21410e1 17-May-1998 Poul-Henning Kamp <phk@FreeBSD.org>

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


# 4cf41af3 06-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Make a kernel version of the timer* functions called timerval* to be
more consistent.

OK'ed by: bde


# cc6447a3 04-Apr-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Use microruntime() rather than doing it by hand.


# 227ee8a1 30-Mar-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


# 770a5f7e 24-Feb-1998 Bruce Evans <bde@FreeBSD.org>

Removed redundant test against MAXDSIZ (the rlimit test is stronger).


# cb226aaa 06-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 1d07b128 30-Oct-1997 KATO Takenori <kato@FreeBSD.org>

Securelevel and formatting fixes, and trapframe simplification.

Reviewed by: sos
Submitted by: bde


# 404c835d 29-Oct-1997 KATO Takenori <kato@FreeBSD.org>

Implement linux_iopl and linux_nice.


# 35442183 21-Sep-1997 Justin T. Gibbs <gibbs@FreeBSD.org>

Update for changes in the callout interface.


# 293a9e51 20-Jul-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# 28f6972b 27-Apr-1997 Mike Smith <msmith@FreeBSD.org>

Always include PROT_READ for Linux mmap operations.
Submitted by: Hannu Savolainen <hannu@voxware.pp.fi> via jkh


# 3f39dbc5 01-Apr-1997 Bruce Evans <bde@FreeBSD.org>

Removed potentially harmful garbage <vm/lock.h> and fixed bogus
use of it. It was actually harmless because the use was null due
to fortuitous include orders and identical (wrong) idempotency
macros.


# fce002fd 24-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


# 3ac4d1ef 22-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# b49b1215 10-Feb-1997 Mike Pritchard <mpp@FreeBSD.org>

Make this compile again after the Lite2 merge.

VOP_UNLOCK was being called with the wrong mumber of arguments.

Also silenced a -Wall warning.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# c23670e2 11-Jun-1996 Gary Palmer <gpalmer@FreeBSD.org>

Clean up -Wunused warnings.

Reviewed by: bde


# f8845af0 02-May-1996 Poul-Henning Kamp <phk@FreeBSD.org>

First pass at cleaning up macros relating to pages, clusters and all that.


# 6ffde942 07-Apr-1996 Bruce Evans <bde@FreeBSD.org>

Removed never-used #includes of <machine/cpu.h>. Many were apparently
copied from bad examples.


# ede8dc43 19-Mar-1996 Bruce Evans <bde@FreeBSD.org>

Fixed unsigned longs that should have been vm_offset_t.

vm_offset_t is currently unsigned long but should probably be plain
unsigned for i386's to match the choice of minimal types to represent
for fixed-width types in Lite2. Anyway, it shouldn't be assumed
to be unsigned long.

I only fixed the type mismatches that were detected when I changed
vm_offset_t to unsigned. Only pointer type mismatches were detected.


# 71d7d1b1 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Remove references to MAP_FILE.. That is now "default" and is only
a "#define MAP_FILE 0" that is still there for net-2 source compatability.


# 0946c36c 10-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Fix the vm_map_remove and vm_map_protect calls.. Somewhere along the
line, these had got (start, length) arguments instead of (start, end)
args. This could be the cause of Robert Sanders lockups with ZMAGIC
binaries.


# dbc09a63 04-Mar-1996 Peter Wemm <peter@FreeBSD.org>

update linux_times() and linux_utime() emulation,
fix sigsuspend() (actually back out my recent change there)
and regen the syscall tables..


# d66a5066 02-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Mega-commit for Linux emulator update.. This has been stress tested under
netscape-2.0 for Linux running all the Java stuff. The scrollbars are now
working, at least on my machine. (whew! :-)

I'm uncomfortable with the size of this commit, but it's too
inter-dependant to easily seperate out.

The main changes:

COMPAT_LINUX is *GONE*. Most of the code has been moved out of the i386
machine dependent section into the linux emulator itself. The int 0x80
syscall code was almost identical to the lcall 7,0 code and a minor tweak
allows them to both be used with the same C code. All kernels can now
just modload the lkm and it'll DTRT without having to rebuild the kernel
first. Like IBCS2, you can statically compile it in with "options LINUX".

A pile of new syscalls implemented, including getdents(), llseek(),
readv(), writev(), msync(), personality(). The Linux-ELF libraries want
to use some of these.

linux_select() now obeys Linux semantics, ie: returns the time remaining
of the timeout value rather than leaving it the original value.

Quite a few bugs removed, including incorrect arguments being used in
syscalls.. eg: mixups between passing the sigset as an int, vs passing
it as a pointer and doing a copyin(), missing return values, unhandled
cases, SIOC* ioctls, etc.

The build for the code has changed. i386/conf/files now knows how
to build linux_genassym and generate linux_assym.h on the fly.

Supporting changes elsewhere in the kernel:

The user-mode signal trampoline has moved from the U area to immediately
below the top of the stack (below PS_STRINGS). This allows the different
binary emulations to have their own signal trampoline code (which gets rid
of the hardwired syscall 103 (sigreturn on BSD, syslog on Linux)) and so
that the emulator can provide the exact "struct sigcontext *" argument to
the program's signal handlers.

The sigstack's "ss_flags" now uses SS_DISABLE and SS_ONSTACK flags, which
have the same values as the re-used SA_DISABLE and SA_ONSTACK which are
intended for sigaction only. This enables the support of a SA_RESETHAND
flag to sigaction to implement the gross SYSV and Linux SA_ONESHOT signal
semantics where the signal handler is reset when it's triggered.

makesyscalls.sh no longer appends the struct sysentvec on the end of the
generated init_sysent.c code. It's a lot saner to have it in a seperate
file rather than trying to update the structure inside the awk script. :-)

At exec time, the dozen bytes or so of signal trampoline code are copied
to the top of the user's stack, rather than obtaining the trampoline code
the old way by getting a clone of the parent's user area. This allows
Linux and native binaries to freely exec each other without getting
trampolines mixed up.


# 5297fc55 16-Feb-1996 Peter Wemm <peter@FreeBSD.org>

This is an extract of changes from what I am currently running...
- Optimise the linux a.out loading and uselib system calls so they
take advantage of some of John's recent interface improvements.
Basically, this means they make far less map changes than before.
- Attempt to plug some potentially nasty kernel_map memory leaks..
- Improve support for QMAGIC libs (I only use QMAGIC (ie: a.out libraries from
the slackware 3.0 dist) but this depends on other changes to enhance
the /compat/linux support)
- uselib goes out through a single exit as part of the resource tracking
that I did when closing the resource leaks on errors. This could be
cleaner than what I did, but making a 30-deep nested if/else was not my
idea of fun, neither did I want to repeat the same code 30 times over for
each failure possibility. I guess this function needs to be split into
smaller functions to solve this.

I've been running the Linux Netscape-2.0 (with Java) to test this, and apart
from the long-standing problem with the missing scrollbars, it appears to
still work as before with ZMAGIC libs (and the leaks).. However, I've
been using it with mods for the signal trampoline code for native linux stack
frames on signals and exterminated the blasted sigreturn printf() problem,
so I can't be certain that there is not a dependency on something else.


# a4fc5c1a 19-Jan-1996 John Dyson <dyson@FreeBSD.org>

Fixed vm_map_find for new vm updates.


# 0de2e98f 14-Jan-1996 Søren Schmidt <sos@FreeBSD.org>

Add linux_mknod so that it will do mkfifo if needed...


# d3cc2bd2 14-Dec-1995 Peter Wemm <peter@FreeBSD.org>

Initial attempt at getting Linux QMAGIC shared lib support. I have
successfully run linux netscape 2.0b3 with a QMAGIC ld.so and libc/libm
that I found on some linux machine that I _think_ is running slackware 3.0.

There are still problems.. ld.so claims the libraries are the wrong
format, but it still runs anyway.. :-/ The QMAGIC ld.so also screams
about needing ld.so.cache, and running a linux ldconfig is quite
educational. You soon learn to run "chroot /compat/linux /bin/ldconfig"
where ldconfig is living in /compat/linux/bin. :-]

(Lets just say that it puts loads of symlinks in /usr/lib otherwise :-)


# ef04503d 14-Dec-1995 Peter Wemm <peter@FreeBSD.org>

Clean up some warnings by using the generated structures in <sys/sysproto.h>
for passing to the bsd system calls, rather than inveninting our own
equivalent structures.


# e0067d71 14-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Restored a vm #include.


# ac9a8f2f 09-Dec-1995 Peter Wemm <peter@FreeBSD.org>

Attempt to make the Linux LKM compile again after the recent VM include
de-nesting changes...
(I figured this might be usefulif it actually built, since I've told
everybody to rebuild it or die.. :-)


# d973c055 06-Dec-1995 Bruce Evans <bde@FreeBSD.org>

Include <vm/vm.h> explicitly to avoid breaking when vnode_if.h doesn't
include vm stuff.


# 1f3dad5a 22-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Completed function declarations and added prototypes.

Removed some unnecessary #includes.

Fixed warnings about nested externs.


# c52007c2 05-Nov-1995 David Greenman <dg@FreeBSD.org>

All:
Changed vnodep -> vp for consistency with the rest of the kernel, and
changed iparams -> imgp for brevity.

kern_exec.c:
Explicitly initialized some additional parts of the image_params struct
to avoid bzeroing it. Rewrote the set-id code to reduce the number of
logical tests. The rewrite exposed a mostly benign bug in the algorithm:
traced set-id images would get ktracing disabled even if the set-id didn't
happen for other reasons.


# 00c6cada 04-Oct-1995 Julian Elischer <julian@FreeBSD.org>

Submitted by: Juergen Lock <nox@jelal.hb.north.de>
Obtained from: other people on the net ?

1. stepping over syscalls (gdb ni) sends you to DDB, and returned
to the wrong address afterwards, with or without DDB. patch in
i386/i386/trap.c below.

2. the linux emulator (modload'ed) still causes panics with DIAGNOSTIC,
re-applied a patch posted to one of the lists...


# c21dee17 25-Jun-1995 Søren Schmidt <sos@FreeBSD.org>

First incarnation of our Linux emulator or rather compatibility code.
This first shot only incorporaties so much functionality that DOOM
can run (the X version), signal handling is VERY weak, so is many
other things. But it meets my milestone number one (you guessed it
- running DOOM).

Uses /compat/linux as prefix for loading shared libs, so it won't
conflict with our own libs.

Kernel must be compiled with "options COMPAT_LINUX" for this to work.