Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.8.0
-
None
-
CentOS 7.2 (Kernel: Various)
lustre-master Build 3424 & 3423
Hardware:
10x Lustre Servers (Intel Wildcat Pass, E5 v3 & 128GB)
Single LNET Network - o2ib0 (Omni-Path) IFS 10.1.1.0.9
All block devices for Lustre are NVMe, either DC P3700 or DC P3600 (Excluding MGT which is on normal SSD).
_____
kernel-3.10.0-327.22.2.el7_lustre.x86_64
kernel-debuginfo-3.10.0-327.22.2.el7_lustre.x86_64
kernel-debuginfo-common-x86_64-3.10.0-327.22.2.el7_lustre.x86_64
kernel-devel-3.10.0-327.22.2.el7_lustre.x86_64
kernel-headers-3.10.0-327.22.2.el7_lustre.x86_64
kernel-tools-3.10.0-327.22.2.el7_lustre.x86_64
kernel-tools-debuginfo-3.10.0-327.22.2.el7_lustre.x86_64
kernel-tools-libs-3.10.0-327.22.2.el7_lustre.x86_64
kernel-tools-libs-devel-3.10.0-327.22.2.el7_lustre.x86_64
kmod-lustre-2.8.56_26_g6fad3ab-1.el7.x86_64
kmod-lustre-osd-ldiskfs-2.8.56_26_g6fad3ab-1.el7.x86_64
kmod-lustre-osd-zfs-2.8.56_26_g6fad3ab-1.el7.x86_64
kmod-lustre-tests-2.8.56_26_g6fad3ab-1.el7.x86_64
kmod-spl-3.10.0-327.22.2.el7_lustre.x86_64-0.6.5.7-1.el7.x86_64
kmod-spl-devel-3.10.0-327.22.2.el7_lustre.x86_64-0.6.5.7-1.el7.x86_64
kmod-zfs-3.10.0-327.22.2.el7_lustre.x86_64-0.6.5.7-1.el7.x86_64
kmod-zfs-devel-3.10.0-327.22.2.el7_lustre.x86_64-0.6.5.7-1.el7.x86_64
lustre-2.8.56_26_g6fad3ab-1.el7.x86_64
lustre-debuginfo-2.8.56_26_g6fad3ab-1.el7.x86_64
lustre-iokit-2.8.56_26_g6fad3ab-1.el7.x86_64
lustre-osd-ldiskfs-mount-2.8.56_26_g6fad3ab-1.el7.x86_64
lustre-osd-zfs-mount-2.8.56_26_g6fad3ab-1.el7.x86_64
lustre-tests-2.8.56_26_g6fad3ab-1.el7.x86_64
perf-3.10.0-327.22.2.el7_lustre.x86_64
perf-debuginfo-3.10.0-327.22.2.el7_lustre.x86_64
python-perf-3.10.0-327.22.2.el7_lustre.x86_64
python-perf-debuginfo-3.10.0-327.22.2.el7_lustre.x86_64CentOS 7.2 (Kernel: Various) lustre-master Build 3424 & 3423 Hardware: 10x Lustre Servers (Intel Wildcat Pass, E5 v3 & 128GB) Single LNET Network - o2ib0 (Omni-Path) IFS 10.1.1.0.9 All block devices for Lustre are NVMe, either DC P3700 or DC P3600 (Excluding MGT which is on normal SSD). _____ kernel-3.10.0-327.22.2.el7_lustre.x86_64 kernel-debuginfo-3.10.0-327.22.2.el7_lustre.x86_64 kernel-debuginfo-common-x86_64-3.10.0-327.22.2.el7_lustre.x86_64 kernel-devel-3.10.0-327.22.2.el7_lustre.x86_64 kernel-headers-3.10.0-327.22.2.el7_lustre.x86_64 kernel-tools-3.10.0-327.22.2.el7_lustre.x86_64 kernel-tools-debuginfo-3.10.0-327.22.2.el7_lustre.x86_64 kernel-tools-libs-3.10.0-327.22.2.el7_lustre.x86_64 kernel-tools-libs-devel-3.10.0-327.22.2.el7_lustre.x86_64 kmod-lustre-2.8.56_26_g6fad3ab-1.el7.x86_64 kmod-lustre-osd-ldiskfs-2.8.56_26_g6fad3ab-1.el7.x86_64 kmod-lustre-osd-zfs-2.8.56_26_g6fad3ab-1.el7.x86_64 kmod-lustre-tests-2.8.56_26_g6fad3ab-1.el7.x86_64 kmod-spl-3.10.0-327.22.2.el7_lustre.x86_64-0.6.5.7-1.el7.x86_64 kmod-spl-devel-3.10.0-327.22.2.el7_lustre.x86_64-0.6.5.7-1.el7.x86_64 kmod-zfs-3.10.0-327.22.2.el7_lustre.x86_64-0.6.5.7-1.el7.x86_64 kmod-zfs-devel-3.10.0-327.22.2.el7_lustre.x86_64-0.6.5.7-1.el7.x86_64 lustre-2.8.56_26_g6fad3ab-1.el7.x86_64 lustre-debuginfo-2.8.56_26_g6fad3ab-1.el7.x86_64 lustre-iokit-2.8.56_26_g6fad3ab-1.el7.x86_64 lustre-osd-ldiskfs-mount-2.8.56_26_g6fad3ab-1.el7.x86_64 lustre-osd-zfs-mount-2.8.56_26_g6fad3ab-1.el7.x86_64 lustre-tests-2.8.56_26_g6fad3ab-1.el7.x86_64 perf-3.10.0-327.22.2.el7_lustre.x86_64 perf-debuginfo-3.10.0-327.22.2.el7_lustre.x86_64 python-perf-3.10.0-327.22.2.el7_lustre.x86_64 python-perf-debuginfo-3.10.0-327.22.2.el7_lustre.x86_64
-
3
-
9223372036854775807
Description
Lustre DNE2 Testing, noticed some issue with latest master builds. When mounting storage targets on servers other than ones with the MGT i get a kernel panic with the below; I have validated this is not (to the best of my ability) network, I have also tried and FE build which works and another master build (3419) which works:
[root@zlfs2-oss1 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/MDT0000 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/nvme0n1 arg[5] = /mnt/MDT0000 source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/MDT0000 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/nvme0n1 at /mnt/MDT0000, flags=0x1000000 options=osd=osd-ldiskfs,user_xattr,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-MDT0000,device=/dev/nvme0n1 mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler' Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) LBUG Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:Kernel panic - not syncing: LBUG
Attached is some debugging / more info.
Builds Tried:
master b3424 - issues
master b3423 - issues
master b3420 - issues
master b3419 - works
fe 2.8 b18 - works