Copyright 2000 Massachusetts Institute of Technology

Permission to use, copy, modify, and distribute this software and
its documentation for any purpose and without fee is hereby
granted, provided that both the above copyright notice and this
permission notice appear in all copies, that both the above
copyright notice and this permission notice appear in all
supporting documentation, and that the name of M.I.T. not be used
in advertising or publicity pertaining to distribution of the
software without specific, written prior permission. M.I.T. makes
no representations about the suitability of this software for any
purpose. It is provided "as is" without express or implied
warranty.

THIS SOFTWARE IS PROVIDED BY M.I.T. ``AS IS''. M.I.T. DISCLAIMS
ALL EXPRESS OR IMPLIED WARRANTIES WITH REGARD TO THIS SOFTWARE,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT
SHALL M.I.T. BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

.Dd January 30, 2023 .Dt SHM_OPEN 2 .Os .Sh NAME .Nm memfd_create , shm_create_largepage , shm_open , shm_rename, shm_unlink .Nd "shared memory object operations" .Sh LIBRARY .Lb libc .Sh SYNOPSIS n sys/types.h n sys/mman.h n fcntl.h .Ft int .Fn memfd_create "const char *name" "unsigned int flags" .Ft int .Fo shm_create_largepage .Fa "const char *path" .Fa "int flags" .Fa "int psind" .Fa "int alloc_policy" .Fa "mode_t mode" .Fc .Ft int .Fn shm_open "const char *path" "int flags" "mode_t mode" .Ft int .Fn shm_rename "const char *path_from" "const char *path_to" "int flags" .Ft int .Fn shm_unlink "const char *path" .Sh DESCRIPTION The .Fn shm_open function opens (or optionally creates) a POSIX shared memory object named .Fa path . The .Fa flags argument contains a subset of the flags used by .Xr open 2 . An access mode of either .Dv O_RDONLY or .Dv O_RDWR must be included in .Fa flags . The optional flags .Dv O_CREAT , .Dv O_EXCL , and .Dv O_TRUNC may also be specified.

p If .Dv O_CREAT is specified, then a new shared memory object named .Fa path will be created if it does not exist. In this case, the shared memory object is created with mode .Fa mode subject to the process' umask value. If both the .Dv O_CREAT and .Dv O_EXCL flags are specified and a shared memory object named .Fa path already exists, then .Fn shm_open will fail with .Er EEXIST .

p Newly created objects start off with a size of zero. If an existing shared memory object is opened with .Dv O_RDWR and the .Dv O_TRUNC flag is specified, then the shared memory object will be truncated to a size of zero. The size of the object can be adjusted via .Xr ftruncate 2 and queried via .Xr fstat 2 .

p The new descriptor is set to close during .Xr execve 2 system calls; see .Xr close 2 and .Xr fcntl 2 .

p The constant .Dv SHM_ANON may be used for the .Fa path argument to .Fn shm_open . In this case, an anonymous, unnamed shared memory object is created. Since the object has no name, it cannot be removed via a subsequent call to .Fn shm_unlink , or moved with a call to .Fn shm_rename . Instead, the shared memory object will be garbage collected when the last reference to the shared memory object is removed. The shared memory object may be shared with other processes by sharing the file descriptor via .Xr fork 2 or .Xr sendmsg 2 . Attempting to open an anonymous shared memory object with .Dv O_RDONLY will fail with .Er EINVAL . All other flags are ignored.

p The .Fn shm_create_largepage function behaves similarly to .Fn shm_open , except that the .Dv O_CREAT flag is implicitly specified, and the returned .Dq largepage object is always backed by aligned, physically contiguous chunks of memory. This ensures that the object can be mapped using so-called .Dq superpages , which can improve application performance in some workloads by reducing the number of translation lookaside buffer (TLB) entries required to access a mapping of the object, and by reducing the number of page faults performed when accessing a mapping. This happens automatically for all largepage objects.

p An existing largepage object can be opened using the .Fn shm_open function. Largepage shared memory objects behave slightly differently from non-largepage objects: l -bullet -offset indent t Memory for a largepage object is allocated when the object is extended using the .Xr ftruncate 2 system call, whereas memory for regular shared memory objects is allocated lazily and may be paged out to a swap device when not in use. t The size of a mapping of a largepage object must be a multiple of the underlying large page size. Most attributes of such a mapping can only be modified at the granularity of the large page size. For example, when using .Xr munmap 2 to unmap a portion of a largepage object mapping, or when using .Xr mprotect 2 to adjust protections of a mapping of a largepage object, the starting address must be large page size-aligned, and the length of the operation must be a multiple of the large page size. If not, the corresponding system call will fail and set .Va errno to .Er EINVAL . .El

p The .Fa psind argument to .Fn shm_create_largepage specifies the size of large pages used to back the object. This argument is an index into the page sizes array returned by .Xr getpagesizes 3 . In particular, all large pages backing a largepage object must be of the same size. For example, on a system with large page sizes of 2MB and 1GB, a 2GB largepage object will consist of either 1024 2MB pages, or 2 1GB pages, depending on the value specified for the .Fa psind argument. The .Fa alloc_policy parameter specifies what happens when an attempt to use .Xr ftruncate 2 to allocate memory for the object fails. The following values are accepted: l -tag -offset indent -width SHM_ t Dv SHM_LARGEPAGE_ALLOC_DEFAULT If the (non-blocking) memory allocation fails because there is insufficient free contiguous memory, the kernel will attempt to defragment physical memory and try another allocation. The subsequent allocation may or may not succeed. If this subsequent allocation also fails, .Xr ftruncate 2 will fail and set .Va errno to .Er ENOMEM . t Dv SHM_LARGEPAGE_ALLOC_NOWAIT If the memory allocation fails, .Xr ftruncate 2 will fail and set .Va errno to .Er ENOMEM . t Dv SHM_LARGEPAGE_ALLOC_HARD The kernel will attempt defragmentation until the allocation succeeds, or an unblocked signal is delivered to the thread. However, it is possible for physical memory to be fragmented such that the allocation will never succeed. .El

p The .Dv FIOSSHMLPGCNF and .Dv FIOGSHMLPGCNF .Xr ioctl 2 commands can be used with a largepage shared memory object to get and set largepage object parameters. Both commands operate on the following structure: d -literal struct shm_largepage_conf { int psind; int alloc_policy; }; .Ed The .Dv FIOGSHMLPGCNF command populates this structure with the current values of these parameters, while the .Dv FIOSSHMLPGCNF command modifies the largepage object. Currently only the .Va alloc_policy parameter may be modified. Internally, .Fn shm_create_largepage works by creating a regular shared memory object using .Fn shm_open , and then converting it into a largepage object using the .Dv FIOSSHMLPGCNF ioctl command.

p The .Fn shm_rename system call atomically removes a shared memory object named .Fa path_from and relinks it at .Fa path_to . If another object is already linked at .Fa path_to , that object will be unlinked, unless one of the following flags are provided: l -tag -offset indent -width Er t Er SHM_RENAME_EXCHANGE Atomically exchange the shms at .Fa path_from and .Fa path_to . t Er SHM_RENAME_NOREPLACE Return an error if an shm exists at .Fa path_to , rather than unlinking it. .El

p The .Fn shm_unlink system call removes a shared memory object named .Fa path .

p The .Fn memfd_create function creates an anonymous shared memory object, identical to that created by .Fn shm_open when .Dv SHM_ANON is specified. Newly created objects start off with a size of zero. The size of the new object must be adjusted via .Xr ftruncate 2 .

p The .Fa name argument must not be .Dv NULL , but it may be an empty string. The length of the .Fa name argument may not exceed .Dv NAME_MAX minus six characters for the prefix .Dq memfd: , which will be prepended. The .Fa name argument is intended solely for debugging purposes and will never be used by the kernel to identify a memfd. Names are therefore not required to be unique.

p The following .Fa flags may be specified to .Fn memfd_create : l -tag -width MFD_ALLOW_SEALING t Dv MFD_CLOEXEC Set .Dv FD_CLOEXEC on the resulting file descriptor. t Dv MFD_ALLOW_SEALING Allow adding seals to the resulting file descriptor using the .Dv F_ADD_SEALS .Xr fcntl 2 command. t Dv MFD_HUGETLB This flag is currently unsupported. .El .Sh RETURN VALUES If successful, .Fn memfd_create and .Fn shm_open both return a non-negative integer, and .Fn shm_rename and .Fn shm_unlink return zero. All functions return -1 on failure, and set .Va errno to indicate the error. .Sh COMPATIBILITY The .Fn shm_create_largepage and .Fn shm_rename functions are .Fx extensions, as is support for the .Dv SHM_ANON value in .Fn shm_open .

p The .Fa path , .Fa path_from , and .Fa path_to arguments do not necessarily represent a pathname (although they do in most other implementations). Two processes opening the same .Fa path are guaranteed to access the same shared memory object if and only if .Fa path begins with a slash

q Ql / character.

p Only the .Dv O_RDONLY , .Dv O_RDWR , .Dv O_CREAT , .Dv O_EXCL , and .Dv O_TRUNC flags may be used in portable programs.

p POSIX specifications state that the result of using .Xr open 2 , .Xr read 2 , or .Xr write 2 on a shared memory object, or on the descriptor returned by .Fn shm_open , is undefined. However, the .Fx kernel implementation explicitly includes support for .Xr read 2 and .Xr write 2 .

p .Fx also supports zero-copy transmission of data from shared memory objects with .Xr sendfile 2 .

p Neither shared memory objects nor their contents persist across reboots.

p Writes do not extend shared memory objects, so .Xr ftruncate 2 must be called before any data can be written. See .Sx EXAMPLES . .Sh EXAMPLES This example fails without the call to .Xr ftruncate 2 : d -literal -compact uint8_t buffer[getpagesize()]; ssize_t len; int fd; fd = shm_open(SHM_ANON, O_RDWR | O_CREAT, 0600); if (fd < 0) err(EX_OSERR, "%s: shm_open", __func__); if (ftruncate(fd, getpagesize()) < 0) err(EX_IOERR, "%s: ftruncate", __func__); len = pwrite(fd, buffer, getpagesize(), 0); if (len < 0) err(EX_IOERR, "%s: pwrite", __func__); if (len != getpagesize()) errx(EX_IOERR, "%s: pwrite length mismatch", __func__); .Ed .Sh ERRORS .Fn memfd_create fails with these error codes for these conditions: l -tag -width Er t Bq Er EBADF The .Fa name argument was NULL. t Bq Er EINVAL The .Fa name argument was too long.

p An invalid or unsupported flag was included in .Fa flags . t Bq Er EMFILE The process has already reached its limit for open file descriptors. t Bq Er ENFILE The system file table is full. t Bq Er ENOSYS In .Fa memfd_create , .Dv MFD_HUGETLB was specified in .Fa flags , and this system does not support forced hugetlb mappings. .El

p .Fn shm_open fails with these error codes for these conditions: l -tag -width Er t Bq Er EINVAL A flag other than .Dv O_RDONLY , .Dv O_RDWR , .Dv O_CREAT , .Dv O_EXCL , or .Dv O_TRUNC was included in .Fa flags . t Bq Er EMFILE The process has already reached its limit for open file descriptors. t Bq Er ENFILE The system file table is full. t Bq Er EINVAL .Dv O_RDONLY was specified while creating an anonymous shared memory object via .Dv SHM_ANON . t Bq Er EFAULT The .Fa path argument points outside the process' allocated address space. t Bq Er ENAMETOOLONG The entire pathname exceeds 1023 characters. t Bq Er EINVAL The .Fa path does not begin with a slash

q Ql / character. t Bq Er ENOENT .Dv O_CREAT is not specified and the named shared memory object does not exist. t Bq Er EEXIST .Dv O_CREAT and .Dv O_EXCL are specified and the named shared memory object does exist. t Bq Er EACCES The required permissions (for reading or reading and writing) are denied. t Bq Er ECAPMODE The process is running in capability mode (see .Xr capsicum 4 ) and attempted to create a named shared memory object. .El

p .Fn shm_create_largepage can fail for the reasons listed above. It also fails with these error codes for the following conditions: l -tag -width Er t Bq Er ENOTTY The kernel does not support large pages on the current platform. .El

p The following errors are defined for .Fn shm_rename : l -tag -width Er t Bq Er EFAULT The .Fa path_from or .Fa path_to argument points outside the process' allocated address space. t Bq Er ENAMETOOLONG The entire pathname exceeds 1023 characters. t Bq Er ENOENT The shared memory object at .Fa path_from does not exist. t Bq Er EACCES The required permissions are denied. t Bq Er EEXIST An shm exists at .Fa path_to , and the .Dv SHM_RENAME_NOREPLACE flag was provided. .El

p .Fn shm_unlink fails with these error codes for these conditions: l -tag -width Er t Bq Er EFAULT The .Fa path argument points outside the process' allocated address space. t Bq Er ENAMETOOLONG The entire pathname exceeds 1023 characters. t Bq Er ENOENT The named shared memory object does not exist. t Bq Er EACCES The required permissions are denied. .Fn shm_unlink requires write permission to the shared memory object. .El .Sh SEE ALSO .Xr posixshmcontrol 1 , .Xr close 2 , .Xr fstat 2 , .Xr ftruncate 2 , .Xr ioctl 2 , .Xr mmap 2 , .Xr munmap 2 , .Xr sendfile 2 .Sh STANDARDS The .Fn memfd_create function is expected to be compatible with the Linux system call of the same name.

p The .Fn shm_open and .Fn shm_unlink functions are believed to conform to .St -p1003.1b-93 . .Sh HISTORY The .Fn memfd_create function appeared in .Fx 13.0 .

p The .Fn shm_open and .Fn shm_unlink functions first appeared in .Fx 4.3 . The functions were reimplemented as system calls using shared memory objects directly rather than files in .Fx 8.0 .

p .Fn shm_rename first appeared in .Fx 13.0 as a .Fx extension. .Sh AUTHORS .An Garrett A. Wollman Aq Mt wollman@FreeBSD.org (C library support and this manual page)

p .An Matthew Dillon Aq Mt dillon@FreeBSD.org

q Dv MAP_NOSYNC

p .An Matthew Bryan Aq Mt matthew.bryan@isilon.com

q Dv shm_rename implementation