332928 |
24-Apr-2018 |
hselasky |
MFC r329372 and r329464: Implement enable_irq() and disable_irq() in the LinuxKPI and add checks for valid IRQ tag before setting up or tearing down an interrupt handler in the LinuxKPI. This is needed when the interrupt handler is disabled before freeing the interrupt.
Submitted by: Johannes Lundberg <johalun0@gmail.com> Sponsored by: Mellanox Technologies |
332922 |
24-Apr-2018 |
hselasky |
MFC r331355: Clear old MSIX IRQ numbers in the LinuxKPI.
When disabling the MSIX IRQ vectors for a PCI device through the LinuxKPI, make sure any old MSIX IRQ numbers are no longer visible to the linux_pci_find_irq_dev() function else IRQs can be requested from the wrong PCI device.
Sponsored by: Mellanox Technologies |
328656 |
01-Feb-2018 |
hselasky |
MFC r328623: Properly implement the cond_resched() function macro in the LinuxKPI.
Sponsored by: Mellanox Technologies |
326054 |
21-Nov-2017 |
hselasky |
MFC r299674 and r299931: Handle case of class being set, but not parent when calling device_register() in the LinuxKPI.
Requested by: Chelsio Sponsored by: Mellanox Technologies |
325937 |
17-Nov-2017 |
hselasky |
MFC r325533: Make the dma_alloc_coherent() function in the LinuxKPI NULL safe with regard to the "dev" argument.
Submitted by: Krishnamraju Eraparaju @ Chelsio Sponsored by: Chelsio Communications |
325613 |
09-Nov-2017 |
hselasky |
MFC r325278: Unconditionally include "opt_inet6.h" in the LinuxKPI. This makes sure the INET6 macro gets properly defined, also for kernel module builds.
Sponsored by: Mellanox Technologies |
325611 |
09-Nov-2017 |
hselasky |
MFC r324792: The remote DMA TCP portspace selector, RDMA_PS_TCP, is used for both iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used to indicate iWarp protocol use. Backport the proper IB device capabilities from Linux upstream to distinguish between iWarp and RoCE. Only allocate the additional socket required for iWarp for RDMA IDs when at least one iWarp device present. This resolves interopability issues between iWarp and RoCE in ibcore
Reviewed by: np @ Differential Revision: https://reviews.freebsd.org/D12563 Sponsored by: Mellanox Technologies |
324685 |
17-Oct-2017 |
hselasky |
MFC r289568, r300676, r300677, r300719, r300720 and r300721: Implement LinuxKPI module parameters as SYSCTLs.
The bool module parameter is no longer supported, because there is no equivalent in FreeBSD 10-stable. These are converted into "int" type.
There are two macros available which control the behaviour of the LinuxKPI module parameters:
- LINUXKPI_PARAM_PARENT allows the consumer to set the SYSCTL parent where the modules parameters will be created.
- LINUXKPI_PARAM_PREFIX defines a parameter name prefix, which is added to all created module parameters.
The LinuxKPI module parameters also have a permissions value. If any write bits are set we are allowed to modify the module parameter runtime. Reflect this when creating the static SYSCTL nodes.
The module_param_call() function is no longer supported.
Sponsored by: Mellanox Technologies |
324527 |
11-Oct-2017 |
hselasky |
MFC r315404: Add basic support for VIMAGE to the LinuxKPI and ibcore.
Support is implemented by mapping Linux's "struct net" into FreeBSD's "struct vnet". Currently only vnet0 is supported by ibcore.
Sponsored by: Mellanox Technologies |
324525 |
11-Oct-2017 |
hselasky |
MFC r315405, r323351 and r323364: Add helper function similar to ip_dev_find() to the LinuxKPI to lookup a network device by its IPv6 address in the given VNET.
Sponsored by: Mellanox Technologies |
322500 |
14-Aug-2017 |
hselasky |
MFC r314878: Add support for constant pointer constructs to READ_ONCE() in the LinuxKPI. When the type of the argument is constant the temporary variable cannot be assigned after the barrier. Instead assign the temporary variable by initialization.
Approved by: re (kib) Sponsored by: Mellanox Technologies |
322165 |
07-Aug-2017 |
hselasky |
MFC r321782: Remove some dead statistics related code and a structure field from the mlx4en driver which is used by its Linux counterpart, but not under FreeBSD.
Sponsored by: Mellanox Technologies |
322162 |
07-Aug-2017 |
hselasky |
MFC r321772: Fix broken usage of the mlx4_read_clock() function: - return value has too small width - cycle_t is unsigned and cannot be less than zero
Sponsored by: Mellanox Technologies |
321020 |
15-Jul-2017 |
dchagin |
MFC r281436 (by mjg@):
fd: remove filedesc argument from fdclose
Just accept a thread instead. This makes it consistent with fdalloc.
No functional changes. |
318536 |
19-May-2017 |
hselasky |
MFC r313555: Flexible and asymmetric allocation of EQs and MSI-X vectors for PF/VFs.
Previously, the mlx4 driver queried the firmware in order to get the number of supported EQs. Under SRIOV, since this was done before the driver notified the firmware how many VFs it actually needs, the firmware had to take into account a worst case scenario and always allocated four EQs per VF, where one was used for events while the others were used for completions. Now, when the firmware supports the asymmetric allocation scheme, denoted by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we can get more EQs and MSI-X vectors per function. Moreover, when running in the new firmware/driver mode, the limitation that the number of EQs should be a power of two is lifted.
Obtained from: Linux (dual BSD/GPLv2 licensed) Submitted by: Dexuan Cui @ microsoft . com Differential Revision: https://reviews.freebsd.org/D8867 Sponsored by: Mellanox Technologies |
318533 |
19-May-2017 |
hselasky |
MFC r313556: Change mlx4 QP allocation scheme.
When using Blue-Flame, BF, the QPN overrides the VLAN, CV, and SV fields in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
The current ethernet driver code reserves a TX QP range with 256b alignment.
This is wrong because if there are more than 64 TX QPs in use, QPNs >= base + 65 will have bits 6/7 set.
This problem is not specific for the Ethernet driver, any entity that tries to reserve more than 64 BF-enabled QPs should fail. Also, using ranges is not necessary here and is wasteful.
The new mechanism introduced here will support reservation for "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs (when hypervisors support WC in VMs). The flow we use is:
1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation, and request "BF enabled QPs" if BF is supported for the function
2. In the ALLOC_RES FW command, change param1 to: a. param1[23:0] - number of QPs b. param1[31-24] - flags controlling QPs reservation
Bit 31 refers to Eth blueflame supported QPs. Those QPs must have bits 6 and 7 unset in order to be used in Ethernet.
Bits 24-30 of the flags are currently reserved.
When a function tries to allocate a QP, it states the required attributes for this QP. Those attributes are considered "best-effort". If an attribute, such as Ethernet BF enabled QP, is a must-have attribute, the function has to check that attribute is supported before trying to do the allocation.
In a lower layer of the code, mlx4_qp_reserve_range masks out the bits which are unsupported. If SRIOV is used, the PF validates those attributes and masks out unsupported attributes as well. In order to notify VFs which attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's mailbox is filled by the PF, which notifies which QP allocation attributes it supports.
Obtained from: Linux (dual BSD/GPLv2 licensed) Submitted by: Dexuan Cui @ microsoft . com Differential Revision: https://reviews.freebsd.org/D8868 Sponsored by: Mellanox Technologies |
314667 |
04-Mar-2017 |
avg |
MFC r283291: don't use CALLOUT_MPSAFE with callout_init()
The main purpose of this MFC is to reduce conflicts for other merges. Parts of the original change have already "trickled down" via individual MFCs. |
309378 |
01-Dec-2016 |
jhb |
MFC 273806,289103,289201,289338,289578,293185,294474,294610,297124,297368, 297406,300875,300888,301158,301896,301897,304838:
Pull in most of the Chelsio and iWARP related changes from stable/11 into stable/10. A few changes from 278886 (OFED 1.2) were also included though the full merge is not: - The find_gid_port() function in infiband/core/cma.c. - Addition of the 'ord' and 'ird' fields to 'struct iw_cm_event'.
273806: Userspace library for Chelsio's Terminator 5 based iWARP RNICs (pretty much every T5 card that does _not_ have "-SO" in its name is RDMA capable).
This plugs into the OFED verbs framework and allows userspace RDMA applications to work over T5 RNICs. Tested with rping.
289103: iw_cxgbe: fix for page fault in cm_close_handler().
This is roughly the iw_cxgbe equivalent of https://github.com/torvalds/linux/commit/be13b2dff8c4e41846477b22cc5c164ea5a6ac2e ----------------- RDMA/cxgb4: Connect_request_upcall fixes
When processing an MPA Start Request, if the listening endpoint is DEAD, then abort the connection.
If the IWCM returns an error, then we must abort the connection and release resources. Also abort_connection() should not post a CLOSE event, so clean that up too.
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com> -----------------
289201: iw_cxgbe: MPA v2 is always available.
289338: iw_cxgbe: use correct RFC number.
289578: Merge LinuxKPI changes from DragonflyBSD: - Define the kref structure identical to the one found in Linux. - Update clients referring inside the kref structure. - Implement kref_sub() for FreeBSD.
293185: iw_cxgbe: Shut down the socket but do not close the fd in case of error. The fd is closed later in this case. This fixes a "SS_NOFDREF on enter" panic.
294474: iw_cxgbe: fix a couple of problems int the RDMA_TERMINATE handler.
a) Look for the CPL in the payload buffer instead of the descriptor. b) Retrieve the socket associated with the tid with the inpcb lock held.
294610: Fix for iWARP servers that listen on INADDR_ANY.
The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to represent an iWARP endpoint when the connection is over TCP. For servers the current approach is to invoke create_listen callback for each iWARP RNIC registered with the CM. This doesn't work too well for INADDR_ANY because a listen on any TCP socket already notifies all hardware TOEs/RNICs of the new listener. This patch fixes the server side of things for FreeBSD. We've tried to keep all these modifications in the iWARP/TCP specific parts of the OFED infrastructure as much as possible.
297124: iw_cxgbe/libcxgb4: Pull in many applicable fixes from the upstream Linux iWARP driver and userspace library to the FreeBSD iw_cxgbe and libcxgb4.
This commit includes internal changesets 6785 8111 8149 8478 8617 8648 8650 9110 9143 9440 9511 9894 10164 10261 10450 10980 10981 10982 11730 11792 12218 12220 12222 12223 12225 12226 12227 12228 12229 12654.
297368: cxgbe/iw_cxgbe: Fix for stray "start_ep_timer timer already started!" messages.
297406: Remove unnecessary dequeue_mutex (added in r294610) from the iWARP connection manager. Examining so_comp without synchronization with iw_so_event_handler is a harmless race.
300875: iw_cxgbe: Use vmem(9) to manage PBL and RQT allocations.
300888: iw_cxgbe: Plug a lock leak in process_mpa_request().
If the parent is DEAD or connect_request_upcall() fails, the parent mutex is left locked. This leads to a hang when process_mpa_request() is called again for another child of the listening endpoint.
301158: iw_cxgbe: Fix panic that occurs when c4iw_ev_handler tries to acquire comp_handler_lock but c4iw_destroy_cq has already freed the CQ memory (which is where the lock resides).
301896: Fix bug in iwcm that caused a panic in iw_cm_wq when krping is run repeatedly in a tight loop.
301897: iw_cxgbe: Make sure that send_abort results in a TCP RST and not a FIN. Release the hold on ep->com immediately after sending the RST. This fixes a bug that sometimes leaves userspace iWARP tools hung when the user presses ^C.
304838: Do not free an uninitialized pointer on soaccept failure in the iWARP connection manager.
Submitted by: Krishnamraju Eraparaju @ Chelsio (original patch) Sponsored by: Chelsio Communications |
307011 |
11-Oct-2016 |
sephe |
MFC 306480
linuxkpi: Fix PCI BAR lazy allocation support.
FreeBSD supports lazy allocation of PCI BAR, that is, when a device driver's attach method is invoked, even if the device's PCI BAR address wasn't initialized, the invocation of bus_alloc_resource_any() (the call chain: pci_alloc_resource() -> pci_alloc_multi_resource() -> pci_reserve_map() -> pci_write_bar()) would allocate a proper address for the PCI BAR and write this 'lazy allocated' address into the PCI BAR.
This model works fine for native FreeBSD device drivers, but _not_ for device drivers shared with Linux (e.g. dev/mlx5/mlx5_core/mlx5_main.c and ofed/drivers/net/mlx4/main.c. Both of them use pci_request_regions(), which doesn't work properly with the PCI BAR lazy allocation, because pci_resource_type() -> _pci_get_rle() always returns NULL, so pci_request_regions() doesn't have the opportunity to invoke bus_alloc_resource_any(). We now use pci_find_bar() in pci_resource_type(), which is able to locate all available PCI BARs even if some of them will be lazy allocated.
Submitted by: Dexuan Cui <decui microsoft com> Reviewed by: hps Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D8071 |
306950 |
10-Oct-2016 |
hselasky |
MFC r306451: The IORESOURCE_XXX defines should resemble a bitmask while SYS_RES_XXX are not bitmasks. Fix return value of pci_resource_flags() to reflect this change.
Sponsored by: Mellanox Technologies |
302926 |
16-Jul-2016 |
markj |
MFC r301877: Add a missing error check for a malloc() call in idr_get(). |
302271 |
29-Jun-2016 |
hselasky |
MFC r301544: Fallback to arc4rand() in the LinuxKPI when read_random() returns zero. This can happen for virtual machines.
Sponsored by: Mellanox Technologies |
301264 |
03-Jun-2016 |
hselasky |
MFC r294832: Implement ether_addr_equal(), ether_addr_equal_64bits() and random_ether_addr() for the LinuxKPI.
Sponsored by: Mellanox Technologies |
297658 |
07-Apr-2016 |
hselasky |
MFC r294520: LinuxKPI atomic fixes: - Fix implementation of atomic_add_unless(). The atomic_cmpset_int() function returns a boolean and not the previous value of the atomic variable. - The atomic counters should be signed according to Linux. - Some minor cosmetics and styling while at it.
Reviewed by: alfred @ Sponsored by: Mellanox Technologies |
297656 |
07-Apr-2016 |
hselasky |
MFC r297444: Fix bugs in currently unused bit searching loop.
Sponsored by: Mellanox Technologies |
294636 |
23-Jan-2016 |
jhb |
MFC 294366: Initialize vm_page_prot to VM_MEMATTR_DEFAULT instead of 0.
If a driver's Linux mmap callback passed vm_page_prot through unchanged, then linux_dev_mmap_single() would try to apply whatever VM_MEMATTR_xxx value 0 is to the mapping. On x86, VM_MEMATTR_DEFAULT is the PAT value for write-back (WB) which is 6, while 0 maps to the PAT value for uncacheable (UC). Thus, any mmap request that did not explicitly set page_prot was tried to map memory as UC triggering the warning in sg_pager_getpages().
Sponsored by: Chelsio Communications |
293736 |
12-Jan-2016 |
hselasky |
MFC r292989: Handle when filedescriptors are closed before initialized. An early fdclose() call can cause fget_unlocked() to fail. |
293151 |
04-Jan-2016 |
hselasky |
MFC r289563,r291481,r292537,r292538,r292542,r292543,r292544 and r292834:
Update the LinuxKPI: - Add more functions and types. - Implement ACCESS_ONCE(), WRITE_ONCE() and READ_ONCE(). - Implement sleepable RCU mechanism using shared exclusive locks. - Minor workqueue cleanup: - Make some functions global instead of inline to ease debugging. - Fix some minor style issues. - In the zero delay case in queue_delayed_work() use the return value from taskqueue_enqueue() instead of reading "ta_pending" unlocked and also ensure the callout is stopped before proceeding. - Implement drain_workqueue() function. - Reduce memory consumption when allocating kobject strings in the LinuxKPI. Compute string length before allocating memory instead of using fixed size allocations. Make kobject_set_name_vargs() global instead of inline to save some bytes when compiling.
Sponsored by: Mellanox Technologies |
292907 |
30-Dec-2015 |
ngie |
MFC r270212,r270332:
This helps reduce the diff in pci(4) between head and stable/10 to help pave the way for bringing in IOV/nv(9) more cleanly
Differential Revision: https://reviews.freebsd.org/D4728 Relnotes: yes Reviewed by: hselasky (ofed piece), royger (overall change) Sponsored by: EMC / Isilon Storage Division
r270212 (by royger):
pci: make MSI(-X) enable and disable methods of the PCI bus
Make the functions pci_disable_msi, pci_enable_msi and pci_enable_msix methods of the newbus PCI bus. This code should not include any functional change.
Sponsored by: Citrix Systems R&D Reviewed by: imp, jhb Differential Revision: https://reviews.freebsd.org/D354
dev/pci/pci.c: - Convert the mentioned functions to newbus methods. - Fix the callers of the converted functions.
sys/dev/pci/pci_private.h: dev/pci/pci_if.m: - Declare the new methods.
dev/pci/pcivar.h: - Add helpers to call the newbus methods.
ofed/include/linux/pci.h: - Add define to prevent the ofed version of pci_enable_msix from clashing with the FreeBSD native version.
r270332 (by royger):
pci: add a new pci_child_added newbus method.
This is needed so when running under Xen the calls to pci_child_added can be intercepted and a custom Xen method can be used to register those devices with Xen. This should not include any functional change, since the Xen implementation will be added in a following patch and the native implementation is a noop.
Sponsored by: Citrix Systems R&D Reviewed by: jhb
dev/pci/pci.c: dev/pci/pci_if.m: dev/pci/pci_private.h: dev/pci/pcivar.h: - Add the pci_child_added newbus method. |
292192 |
14-Dec-2015 |
hselasky |
MFC r290003: Add support for binding IRQs to CPUs in the LinuxKPI. The new function added is for BSD only and does not exist in Linux.
Sponsored by: Mellanox Technologies |
292107 |
11-Dec-2015 |
hselasky |
MFC r290710, r291694, r291699 and r291793: - Fix print formatting compile warnings for Sparc64 and PowerPC platforms. - Updated the mlx4 and mlxen drivers to the latest version, v2.1.6: - Added support for dumping the SFP EEPROM content to dmesg. - Fixed handling of network interface capability IOCTLs. - Fixed race when loading and unloading the mlxen driver by applying appropriate locking. - Removed two unused C-files. - Convert the mlxen driver to use the BUSDMA(9) APIs instead of vtophys() when loading mbufs for transmission and reception. While at it all pointer arithmetic and cast qualifier issues were fixed, mostly related to transmission and reception. - Fix i386 build WITH_OFED=YES. Remove some redundant KASSERTs.
Sponsored by: Mellanox Technologies Differential Revision: https://reviews.freebsd.org/D4283 Differential Revision: https://reviews.freebsd.org/D4284 |
292105 |
11-Dec-2015 |
hselasky |
MFC r291693: Add some structures and defines which will be used when decoding small form factor, SFF, standards compliant ethernet EEPROMs.
Obtained from: Linux Sponsored by: Mellanox Technologies |
292103 |
11-Dec-2015 |
hselasky |
MFC r291690: Remove incorrect defines. The proper version of these macros is defined in linux/etherdevice.h.
Sponsored by: Mellanox Technologies |
287637 |
11-Sep-2015 |
jhb |
MFC 287440: Currently the Linux character device mmap handling only supports mmap operations that map a single page that has an associated vm_page_t. This does not permit mapping larger regions (such as a PCI memory BAR) and it does not permit mapping addresses beyond the top of RAM (such as a 64-bit BAR located above the top of RAM).
Instead of using a single OBJT_DEVICE object and passing the physaddr via the offset as a hack, create a new sglist and OBJT_SG object for each mmap request. The requested memory attribute is applied to the object thus affecting all pages mapped by the request.
Sponsored by: Chelsio |
287229 |
28-Aug-2015 |
markj |
MFC r286418: ipv4_is_zeronet() and ipv4_is_loopback() expect an address in network order, but IN_ZERONET and IN_LOOPBACK expect it in host order. |
286841 |
17-Aug-2015 |
glebius |
Merge r283612: Add SIOCGI2C ioctl support to the driver. Would work only on ConnectX-3 with fresh firmware. The low level code is based on code provided by Mellanox.
Thanks to Mellanox and their distributor Must (http://mustcompany.ru) for providing hardware.
In collaboration with: Andre Melkoumian <andre mellanox.com> Reviewed by: hselasky Sponsored by: Netflix Sponsored by: Nginx, Inc. |
285410 |
11-Jul-2015 |
hselasky |
MFC r285088: Fix broken implementation of "kvasprintf()" function by adding missing kmalloc() call. Make function global instead of static inline to fix compiler warnings about passing variable argument lists to inline functions.
Sponsored by: Mellanox Technologies Approved by: re, gjb |
284530 |
17-Jun-2015 |
np |
MFC r277229:
Use parentheses instead of close proximity to ensure layer + 1 is evaluated before the rest of the expression. |
283675 |
29-May-2015 |
markj |
MFC r282331: Don't drop the idr lock before verifying that the newly-inserted element is present in the tree.
MFC r282741: find_next_bit() and find_next_zero_bit(): if the caller-specified offset lies within the last block of the bit set and no bits are set beyond the offset, terminate the search immediately instead of continuing as though there are further blocks in the set and subsequently returning an incorrect result.
MFC r282743: Ensure that msecs_to_jiffies(0) == 0. |
282513 |
05-May-2015 |
hselasky |
MFC r277396, r278681, r278865, r278924, r279205, r280208, r280210, r280764 and r280768:
Update the Linux compatibility layer: - Add more functions. - Add some missing includes which are needed when the header files are not included in a particular order. - The kasprintf() function cannot be inlined due to using a variable number of arguments. Move it to a C-file. - Fix problems about 32-bit ticks wraparound and unsigned long conversion. Jiffies or ticks in FreeBSD have integer type and are not long. - Add missing "order_base_2()" macro. - Fix BUILD_BUG_ON() macro. - Declare a missing symbol which is needed when compiling without -O2 - Clean up header file inclusions in the linux/completion.h, linux/in.h and linux/fs.h header files.
Sponsored by: Mellanox Technologies |
280540 |
25-Mar-2015 |
hselasky |
MFC r280211: Add missing void pointer argument to SYSINIT() functions.
Sponsored by: Mellanox Technologies |
279737 |
07-Mar-2015 |
hselasky |
MFC r279587: Define PTR_ALIGN() macro which will be needed coming Mellanox driver releases.
Sponsored by: Mellanox Technologies |
279732 |
07-Mar-2015 |
hselasky |
MFC r278866: Define standard formatting strings to print GIDs in a separate header file.
Sponsored by: Mellanox Technologies |
279731 |
07-Mar-2015 |
hselasky |
MFC r279584: Updates for the Mellanox ethernet driver
> List of fixes: * use correct format for GID printouts * double array indexing * spelling in printouts * void pointer arithmetic * allow more receive rings * correct maximum number of transmit rings * use "const" instead of "static" for constants * check for invalid VLAN tags * check for lack of IRQ resources > Added more hardware specific defines > Added more verbose printouts of firmware status codes
Sponsored by: Mellanox Technologies |
277139 |
13-Jan-2015 |
hselasky |
MFC r276749: Fixes and updates for the Linux compatibility layer: - Remove unsupported "bus" field from "struct pci_dev". - Fix logic inside "pci_enable_msix()" when the number of allocated interrupts are less than the number of available interrupts. - Update header files included from "list.h". - Ensure that "idr_destroy()" removes all entries before destroying the IDR root node(s). - Set the "device->release" function so that we don't leak memory at device destruction. - Use FreeBSD's "log()" function for certain debug printouts. - Put parenthesis around arguments inside the min, max, min_t and max_t macros. - Make sure we don't leak file descriptors by dropping the extra file reference counts done by the FreeBSD kernel when calling falloc() and fget_unlocked().
MFC after: 1 week Sponsored by: Mellanox Technologies |
277137 |
13-Jan-2015 |
hselasky |
MFC r276879: Don't mask the IP-address when doing multicast IP over infiniband.
PR: 196631 Sponsored by: Mellanox Technologies |
276744 |
06-Jan-2015 |
rodrigc |
Merge r275599: Use CURVNET macros inside inet_get_local_port_range() function. Without this fix, a kernel with VIMAGE + Infiniband will panic on bootup.
Certain necessary #include statements require LIST_HEAD. Add these includes to ofed/include/linux/list.h, because LIST_HEAD is specifically overridden in this file.
PR: 191468 Differential Revision: D1279 Reviewed by: hselasky |
275724 |
12-Dec-2014 |
hselasky |
MFC r275636: Move OFED init a bit earlier so that PXE boot works.
Sponsored by: Mellanox Technologies |
273379 |
21-Oct-2014 |
hselasky |
MFC r272683: - Fix compile warning when compiling with GCC. - Add missed chunk in previous driver code MFC.
Sponsored by: Mellanox Technologies |
273246 |
18-Oct-2014 |
hselasky |
MFC r273135: Update the OFED Linux compatibility layer and Mellanox hardware driver(s):
- Properly name an inclusion guard - Fix compile warnings regarding unsigned enums - Add two new sysctl nodes - Remove all empty linux header files - Make an error printout more verbose - Use "mod_delayed_work()" instead of cancelling and starting a timeout. - Implement more Linux scatterlist functions.
Sponsored by: Mellanox Technologies |
272407 |
02-Oct-2014 |
hselasky |
MFC r272027:
Hardware driver update from Mellanox Technologies, including: - improved performance - better stability - new features - bugfixes
Supported HCAs: - ConnectX-2 - ConnectX-3 - ConnectX-3 Pro
NOTE: - TSO feature needs r271946, which is not yet merged.
Sponsored by: Mellanox Technologies Approved by: re, glebius |
271127 |
04-Sep-2014 |
hselasky |
MFC r270710 and r270821: - Update the OFED Linux Emulation layer as a preparation for a hardware driver update from Mellanox Technologies. - Remove empty files from the OFED Linux Emulation layer. - Fix compile warnings related to printf() and the "%lld" and "%llx" format specifiers. - Add some missing 2-clause BSD copyrights. - Add "Mellanox Technologies, Ltd." to list of copyright holders. - Add some new compatibility files. - Fix order of uninit in the mlx4ib module to avoid crash at unload using the new module_exit_order() function.
Sponsored by: Mellanox Technologies |
270166 |
19-Aug-2014 |
hselasky |
MFC r269859: Fix for memory leak.
Sponsored by: Mellanox Technologies |
269862 |
12-Aug-2014 |
hselasky |
MFC r268316: Fix OFED startup order: All SYSINIT()'s and modules should be loaded prior to starting "/sbin/init" which will run all the "/etc/rc.d/xxx" scripts. Else there can be a race configuring the interfaces via "/etc/rc.conf".
Sponsored by: Mellanox Technologies |
267517 |
15-Jun-2014 |
hselasky |
MFC r267395: - Fix out of range shifting bug in bitops.h. - Make code a bit easier to read by adding parenthesis. |
257867 |
08-Nov-2013 |
alfred |
MFC: r257862, r257863, r257864
r257862:
Use explicit long cast to avoid overflow in bitopts.
This was causing problems with the buddy allocator inside of ofed.
r257863:
Fix for bad performance when mtu is increased.
Update the auto moderation behavior in the mlxen driver to match the new LINUX OFED code.
r257864:
Do not use a sleep lock when protecting the driver flags.
This was causing a locking issue with lagg.
Approved by: re |
256686 |
17-Oct-2013 |
alfred |
Fix __free_pages() in the linux shim.
__free_pages() is actaully supposed to take a "struct page *" not an address.
MFC: 256546
Approved by: re |
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
256116 |
07-Oct-2013 |
dim |
Give an unnamed union in sys/ofed/include/rdma/ib_verbs.h a name, to silence a gcc warning.
Approved by: re (gjb) MFC after: 3 days
|
255972 |
01-Oct-2013 |
alfred |
Enable ib_dev.mmap function
Removed the ifdef linux from this function. Added stub function for contiguous pages to avoid compilation errors.
Submitted by: Orit Moskovich (oritm mellanox.com) Approved by: re
|
255968 |
01-Oct-2013 |
alfred |
Fix mis-merge of upstream fix.
We would accidentally make the string one byte too short.
Submitted by: Orit Moskovich (oritm mellanox.com)
Approved by: re
|
255932 |
29-Sep-2013 |
alfred |
Update OFED to Linux 3.7 and update Mellanox drivers.
Update the OFED Infiniband core to the version supplied in Linux version 3.7.
The update to OFED is nearly all additional defines and functions with the exception of the addition of additional parameters to ib_register_device() and the reg_user_mr callback.
In addition the ibcore (Infiniband core) and ipoib (IP over Infiniband) have both been made into completely loadable modules to facilitate testing of the OFED stack in FreeBSD.
Finally the Mellanox Infiniband drivers are now updated to the latest version shipping with Linux 3.7.
Submitted by: Mellanox FreeBSD driver team: Oded Shanoon (odeds mellanox.com), Meny Yossefi (menyy mellanox.com), Orit Moskovich (oritm mellanox.com)
Approved by: re
|
255240 |
05-Sep-2013 |
pjd |
Handle cases where capability rights are not provided.
Reported by: kib
|
254734 |
23-Aug-2013 |
np |
Fix implementation of sock_getname.
MFC after: 1 week
|
254356 |
15-Aug-2013 |
glebius |
Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented.
Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix
|
254122 |
09-Aug-2013 |
jeff |
- Reserve a special AF for SDP. The one we were incorrectly using before was taken by another AF.
Sponsored by: EMC / Isilon Storage Division
|
254121 |
09-Aug-2013 |
jeff |
- Correctly handle various edge cases in sysfs emulation.
Sponsored by: EMC / Isilon Storage Division
|
254120 |
09-Aug-2013 |
jeff |
- Use the correct type in the linux bitops emulation.
Submitted by: Maxim Ignatenko <gelraen.ua@gmail.com>
|
254065 |
07-Aug-2013 |
kib |
Split the pagequeues per NUMA domains, and split pageademon process into threads each processing queue in a single domain. The structure of the pagedaemons and queues is kept intact, most of the changes come from the need for code to find an owning page queue for given page, calculated from the segment containing the page.
The tie between NUMA domain and pagedaemon thread/pagequeue split is rather arbitrary, the multithreaded daemon could be allowed for the single-domain machines, or one domain might be split into several page domains, to further increase concurrency.
Right now, each pagedaemon thread tries to reach the global target, precalculated at the start of the pass. This is not optimal, since it could cause excessive page deactivation and freeing. The code should be changed to re-check the global page deficit state in the loop after some number of iterations.
The pagedaemons reach the quorum before starting the OOM, since one thread inability to meet the target is normal for split queues. Only when all pagedaemons fail to produce enough reusable pages, OOM is started by single selected thread.
Launder is modified to take into account the segments layout with regard to the region for which cleaning is performed.
Based on the preliminary patch by jeff, sponsored by EMC / Isilon Storage Division.
Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation
|
254025 |
07-Aug-2013 |
jeff |
Replace kernel virtual address space allocation with vmem. This provides transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
|
253604 |
24-Jul-2013 |
avg |
rename scheduler->swapper and SI_SUB_RUN_SCHEDULER->SI_SUB_LAST
Also directly call swapper() at the end of mi_startup instead of relying on swapper being the last thing in sysinits order.
Rationale:
- "RUN_SCHEDULER" was misleading, scheduling already takes place at that stage - "scheduler" was misleading, the function swaps in the swapped out processes - another SYSINIT(SI_SUB_RUN_SCHEDULER, SI_ORDER_ANY) could never be invoked depending on its relative order with scheduler; this was not obvious and the bug actually used to exist
Reviewed by: kib (ealier version) MFC after: 14 days
|
253449 |
18-Jul-2013 |
jhb |
Rework the previous fix for the IB vs Ethernet sysctl handler to be more generic and apply to all sysfs attributes: - Use sysctl_handle_string() instead of reimplementing it. - Remove trailing newline from the current value before passing it to userland and append a newline to the new string value before passing it to the attribute's store function. - Don't leak the temporary buffer if the first error check triggers. - Revert earlier change to mlx4 port mode handler.
PR: kern/174213 Submitted by: Garrett Cooper Reviewed by: Shakar Klein @ Mellanox MFC after: 1 week
|
253048 |
08-Jul-2013 |
jhb |
Allow mlx4 devices to switch from Ethernet to Infiniband (and vice versa): - Fix sysctl wrapper for sysfs attributes to properly handle new string values similar to sysctl_handle_string() (only copyin the user's supplied length and nul-terminate the string). - Don't check for a trailing newline when evaluating the desired operating mode of a mlx4 device.
PR: kern/179999 Submitted by: Shahar Klein <shahark@mellanox.com> MFC after: 1 week
|
251617 |
11-Jun-2013 |
jhb |
Store a reference to the vnode associated with a file descriptor in the linux_file structure and use it instead of directly accessing td_fpop when destroying the linux_file structure. The td_fpop pointer is not valid when a cdevpriv destructor is run, and the type-specific close method has already been called, so f_vnode may not be valid (and the vnode might have been recycled without our own reference).
Tested by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> MFC after: 1 week
|
250374 |
08-May-2013 |
delphij |
According to the documentation, on Linux, cancel_delayed_work() does not do drain (flush_workqueue() in Linux terms) but instead returns true if the work was removed before it is run, or false otherwise.
Simulate this by removing the taskqueue_drain() and return the value derived from taskqueue_cancel()'s return value.
This would solve a witness warning caused by calling taskqueue_drain() with a non-sleepable lock held, like:
taskqueue_drain with the following non-sleepable locks held: exclusive rw lle (lle) r = 0 (0xfffffe001450b410) locked @ /usr/src/sys/netinet/in.c:1484 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffff848d4f7690 kdb_backtrace() at kdb_backtrace+0x39/frame 0xffffff848d4f7740 witness_warn() at witness_warn+0x4a8/frame 0xffffff848d4f7800 taskqueue_drain() at taskqueue_drain+0x3a/frame 0xffffff848d4f7840 set_timeout() at set_timeout+0x4a/frame 0xffffff848d4f7860 netevent_callback() at netevent_callback+0x16/frame 0xffffff848d4f7870 arpintr() at arpintr+0x9b5/frame 0xffffff848d4f7930
This do not affect kernel without OFED compiled in.
Reported by: Garrett Cooper <yaneurabeya gmail com> (who also tested an earlier version of this patch, but bugs are mine) MFC after: 2 weeks
|
248084 |
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
247675 |
02-Mar-2013 |
mav |
Add protective parentheses for macro argument, missed in r247671.
|
247671 |
02-Mar-2013 |
mav |
MFcalloutng: Give OFED Linux wrapper own "expires" field instead of abusing callout's c_time, which will change its type and units with calloutng commit.
|
247602 |
02-Mar-2013 |
pjd |
Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights.
- The cap_new(2) system call is left, but it is no longer documented and should not be used in new code.
- The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one.
- The cap_getrights(2) syscall is renamed to cap_rights_get(2).
- If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall.
- If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2).
- To support ioctl and fcntl white-listing the filedesc structure was heavly modified.
- The audit subsystem, kdump and procstat tools were updated to recognize new syscalls.
- Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below:
CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT.
Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2).
Added CAP_SYMLINKAT: - Allow for symlinkat(2).
Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2).
Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory.
Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall.
Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call.
Removed CAP_MAPEXEC.
CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE.
Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).
Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT.
CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required).
CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required).
Added convinient defines:
#define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE
#define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN)
Added defines for backward API compatibility:
#define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)
Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
242933 |
12-Nov-2012 |
dim |
Redo r242842, now actually fixing the warnings, as follows: - In sys/ofed/drivers/infiniband/core/cma.c, an enum struct member is interpreted as an int, so cast it to an int. - In sys/ofed/drivers/infiniband/core/ud_header.c, initialize the packet_length variable in ib_ud_header_init(), to prevent undefined behaviour. - In sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c, call rdma_notify() with the correct enum type and value. - In sys/ofed/include/linux/pci.h, change the PCI_DEVICE and PCI_VDEVICE macros to use C99 struct initializers, so additional members can be overridden.
Reviewed by: delphij, Garrett Cooper <yanegomi@gmail.com> MFC after: 1 week
|
242841 |
09-Nov-2012 |
delphij |
Use %s when calling make_dev with a string pointer. This makes clang happy.
MFC after: 2 weeks
|
241697 |
18-Oct-2012 |
jhb |
Take advantage of if_baudrate_pf and calculate an effective baud rate on all platforms (not just amd64) to compute an equivalent IB rate.
|
240680 |
18-Sep-2012 |
gavin |
Align the PCI Express #defines with the style used for the PCI-X #defines. This also has the advantage that it makes the names more compact, iand also allows us to correct the non-uniform naming of the PCIM_LINK_* defines, making them all consistent amongst themselves.
This is a mostly mechanical rename: s/PCIR_EXPRESS_/PCIER_/g s/PCIM_EXP_/PCIEM_/g s/PCIM_LINK_/PCIEM_LINK_/g
When this is MFC'd, #defines will be added for the old names to assist out-of-tree drivers.
Discussed with: jhb MFC after: 1 week
|
239303 |
15-Aug-2012 |
hselasky |
Streamline use of cdevpriv and correct some corner cases.
1) It is not useful to call "devfs_clear_cdevpriv()" from "d_close" callbacks, hence for example read, write, ioctl and so on might be sleeping at the time of "d_close" being called and then then freed private data can still be accessed. Examples: dtrace, linux_compat, ksyms (all fixed by this patch)
2) In sys/dev/drm* there are some cases in which memory will be freed twice, if open fails, first by code in the open routine, secondly by the cdevpriv destructor. Move registration of the cdevpriv to the end of the drm open routines.
3) devfs_clear_cdevpriv() is not called if the "d_open" callback registered cdevpriv data and the "d_open" callback function returned an error. Fix this.
Discussed with: phk MFC after: 2 weeks
|
239065 |
05-Aug-2012 |
kib |
After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages.
Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it.
Suggested and reviewed by: alc MFC after: 2 weeks
|
237563 |
25-Jun-2012 |
np |
Fix clang warning when compiling iw_cxgb.
Reported by: rene, dim
|
237263 |
19-Jun-2012 |
np |
- Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs. These are available as t3_tom and t4_tom modules that augment cxgb(4) and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the works and will follow soon.
Build-tested with make universe.
30s overview ============ What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the capabilities of an interface: # ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet capability): # ifconfig cxgbe0 toe # ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the output of netstat and sockstat: # netstat -np tcp | grep toe # sockstat -46c | grep toe
Reviewed by: bz, gnn Sponsored by: Chelsio communications. MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
|
234183 |
12-Apr-2012 |
jhb |
Add OFED and the associated options and drivers to x86 LINT builds: - Mark 'sdp' as requiring 'inet'. - Always include "opt_inet.h" and "opt_inet6.h" and modify the IB driver Makefiles to honor WITH/WITHOUT_INET/INET6/_SUPPORT options to determine what should be enabled during a module build. - Fix the mlxen(4) driver and the core IB code to compile without if INET is disabled (including when both INET and INET6 are disabled).
Reviewed by: bz MFC after: 2 weeks
|
233547 |
27-Mar-2012 |
jhb |
Use VM_MEMATTR_UNCACHEABLE instead of VM_MEMATTR_UNCACHED for UC mappings. VM_MEMATTR_UNCACHED is actually the x86-specific UC- mode (where a WC MTRR can override the PAT setting).
|
228469 |
13-Dec-2011 |
ed |
Replace __signed by signed.
The signed keyword is an integral part of the C syntax. There's no need to use __signed.
|
228443 |
12-Dec-2011 |
mdf |
Do not define bool/true/false if the symbols already exist.
MFC after: 2 weeks Sponsored by: Isilon Systems, LLC
|
227293 |
07-Nov-2011 |
ed |
Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
|
224914 |
16-Aug-2011 |
kib |
Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores.
Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
|
222813 |
07-Jun-2011 |
attilio |
etire the cpumask_t type and replace it with cpuset_t usage.
This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being.
Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES.
Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64).
Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC.
People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately.
Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development.
(I really hope I didn't forget anyone, if it happened I apologize in advance).
|
221055 |
26-Apr-2011 |
jeff |
- Catch up to falloc() changes. - PHOLD() before using a task structure on the stack. - Fix a LOR between the sleepq lock and thread lock in _intr_drain().
|
220016 |
26-Mar-2011 |
jeff |
- Implement wake-on-lan support in mlxen.
|
219902 |
23-Mar-2011 |
jhb |
Do a sweep of the tree replacing calls to pci_find_extcap() with calls to pci_find_cap() instead.
|
219846 |
21-Mar-2011 |
kib |
Allow the ofed modules to be compiled on i386.
Reviewed by: jeff
|
219820 |
21-Mar-2011 |
jeff |
- Merge in OFED 1.5.3 from projects/ofed/head
|