[LU-10301] kernel update [RHEL7.4 3.10.0-693.11.1.el7] Created: 30/Nov/17  Updated: 01/Feb/18  Resolved: 22/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.3

Type: Bug Priority: Minor
Reporter: Bob Glossman (Inactive) Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10142 kernel update [RHEL7.4 3.10.0-693.5.2... Resolved
is related to LU-10455 kernel update [RHEL7.4 3.10.0-693.11.... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Security Fix(es):

It was found that the timer functionality in the Linux kernel ALSA subsystem is prone to a race condition between read and ioctl system call handlers, resulting in an uninitialized memory disclosure to user space. A local user could use this flaw to read information belonging to other users. (CVE-2017-1000380, Moderate)

This update also fixes the following bugs:

Previously, the page_cgroup_init() function incorrectly assumed a section started in another NUMA node and therefore did not allocate page_cgroup memory for that section. Consequently, when booting a 14G AMD system with an H330 storage controller, the system crashed. This update ensures page_cgroup_init() works as expected, and the system now boots successfully. BZ#1491970

Previously, support buffer lists for the Inter User Communication Vehicle (IUCV) transport missed a part of the information required to copy Socket Buffer (SKB) data correctly. Consequently, the IUCV transport failed, which led to a kernel panic on z/VM systems. This update fixes the AF_IUCV address family to copy SKB data correctly, and the kernel panic no longer occurs on z/VM systems due to this behavior. BZ#1494354

Previously, the default timeout and retry settings in the VMBus driver were insufficient in some cases, for example when a Hyper-V host was under a significant load. Consequently, in Windows Server 2016, Hyper-V Server 2016, and Windows Azure Platform, when running a Red Hat Enterprise Linux Guest on the Hyper-V hypervisor, the guest failed to boot or booted with certain Hyper-V devices missing. This update alters the timeout and retry settings in VMBus, and Red Hat Enterprise Linux guests now boot as expected under the described conditions. BZ#1495763

The switch to rhashtables in gfs2 in the Linux kernel broke the glock dumps in the /sys/kernel/debug/gfs2//glocks file for dumps bigger than a single buffer. Consequently, glock and gfs2 problems could not be diagnosed and analyzed correctly due to incomplete glock dumps. This update fixes glock dumps to be complete again under the described circumstances. BZ#1497078

Previously, the netdev_wait_allrefs() function could generate the NETDEV_UNREGISTER event twice, which caused the in6_dev_put() function to be called too many times. Consequently, a kernel panic occurred. This update handles only the first NETDEV_UNREGISTER event, and the kernel no longer panics due to this behavior. BZ#1497121

Previously, the lstopo tool used the Advanced Programmable Interrupt Controller ID (APIC ID) identificators to calculate shared_cpu_map. Consequently, on AMD systems with multiple CPU cores, lstopo displayed incorrect topology of L3 caches, because APIC IDs are not guaranteed to be contiguous for CPU cores across different L3 caches. Incorrect L3 cache information could cause incorrect L3 schedule domain, which has a performance impact under certain workloads. With this update, the underlying source code has been fixed to use cpu_llc_shared_mask of each CPU to derive the L3 shared_cpu_map. As a result, lstopo now shows the core topology information on L3 caches correctly. BZ#1497238

On AMD systems with multiple cores, the lstopo tool displayed incorrect topology information due to an incorrect value of the "cpu_core_id" parameter in the /proc/cpuinfo file. This update fixes the computation of "cpu_core_id", and the core topology on AMD systems with multiple cores is now displayed correctly. BZ#1497603

If a Melanox firmware command took a long time to complete, the mlx5 driver command got a timeout and the command slot was freed to be available for a new firmware command. Consequently, receiving a new firmware command on the still busy slot caused a kernel panic. This update fixes mlx5 to avoid using pending command interface slots, and the kernel panic no longer occurs under the described circumstances. BZ#1497604

Previously, there was an off-by-one counting error in the loop termination conditions for the xfs_find_get_desired_pgoff() function. Consequently, the generic/436 xfstests test for seeking holes and data failed, and data was detected where a hole was expected.
With this update, the underlying source code has been fixed, and generic/436 passes as expected, showing the correct execution of holes and data. BZ#1498736

When configuring a Red Hat Enterprise Linux guest VM on a Hyper-V Server with Single Root I/O Virtualization (SR-IOV) enabled or a guest on Windows Azure Platform host with Accelerated Networking enabled, a separate VF interface was provisioned inside the guest. According to Hyper-V implementation, both netvsc and VF interfaces need to be enabled at the same time, so they needed to be put in a bonding interface. Consequently, multiple race conditions occurred if the VF interface was not available immediately at boot. The update adds VF interface as a slave to the netvsc interface, which eliminates the need for creating the special bonding interface, and it also prevents possible race conditions. As a result the required network configuration is now simpler and more robust. BZ#1500321

Previously, there was an off-by-one counting error in the loop termination conditions for the ext4_find_unwritten_pgoff() function. Consequently, the generic/436 xfstests test for seeking holes and data failed, and data was detected where a hole was expected. This update fixes the ext4_find_unwritten_pgoff() loop termination conditions to properly detect non-contiguous page indices and handle cases where fewer pages are expected. As a result, generic/436 on the ext4 file system now passes as expected, showing the correct execution of holes and data. BZ#1501387

If a Common Internet File System (CIFS) client received the STATUS_NETWORK_SESSION_EXPIRED error from a server, the client did not reconnect the current SMB session. Consequently, all further client requests failed, and the only way to recover was by manually unmounting and re-mounting the share or rebooting the client. This update provides an additional logic to the demultiplex thread to identify the expired sessions and reconnect them. As a result, the CIFS client now reconnects automatically if STATUS_NETWORK_SESSION_EXPIRED error occurs, and further client requests no longer fail under the described circumstances. BZ#1501526

Previously, there was an off-by-one counting error in the minimum number of pages for the xfs_find_get_desired_pgoff() function. Consequently, the xfstests sanity check tests for seeking holes and data failed, and neither data, nor holes were found. With this update, the off-by-one counting error in xfs_find_get_desired_pgoff() has been fixed, and the xfstests sanity check tests now pass as expected, showing the correct execution of holes and data. BZ#1502731

Previously, the Test Unit Ready (TUR) command, which is used to determine if a device is ready to transfer data, failed for Nonvolatile Memory Express (NVME) devices. This update fixes the nvme driver, and TUR for NVME devices now works as expected. BZ#1502733

Previously, there was a null pointer dereference in the release_lock_stateid() function. Consequently, a kernel panic occurred. This update fixes the null pointer dereference, and the kernel no longer panics due to this behavior. BZ#1505160

Previously, running node migration for multiple processes sharing hugepage-mapped memory areas could trigger a kernel panic due to a race condition within the hugepage migration algorithm. This update fixes memory management in the Linux kernel, and the kernel panic no longer occurs under the described circumstances. BZ#1505164

On IBM Power systems, the perf utility, misinterpreted software events as hardware events and attempted to access structures that were only initialized for hardware events. Consequently, perf generated incorrect output, and, under certain circumstances, a kernel panic occurred. This update fixes perf to properly identify software and hardware events, thus generating the correct output. As a result, the kernel no longer panics due to this behavior. BZ#1506143



 Comments   
Comment by Gerrit Updater [ 06/Dec/17 ]

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/30401
Subject: LU-10301 kernel: kernel update RHEL7.4 [3.10.0-693.11.1.el7]
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4e4bd2af4f9947557e087447fdd67ce6c2ab0809

Comment by Gerrit Updater [ 07/Dec/17 ]

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/30441
Subject: LU-10301 kernel: kernel update RHEL7.4 [3.10.0-693.11.1.el7]
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: dd3736d77def4b4b35adb18cbbfcfbe300dbb952

Comment by Gerrit Updater [ 22/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30401/
Subject: LU-10301 kernel: kernel update RHEL7.4 [3.10.0-693.11.1.el7]
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dec59e5e3c56317266e1e2ec95cf90b17a7ab339

Comment by Peter Jones [ 22/Dec/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 04/Jan/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30441/
Subject: LU-10301 kernel: kernel update RHEL7.4 [3.10.0-693.11.1.el7]
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: b9b97cc84d7462913ecaefd24ff91ddfcbdda3aa

Generated at Sat Feb 10 02:33:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.