[LU-8346] conf-sanity test_93: test failed to respond and timed out Created: 29/Jun/16  Updated: 03/Nov/20  Resolved: 10/Jul/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.13.0, Lustre 2.12.1

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-11814 conf-sanity test_93 osd_handler.c:713... Resolved
duplicates LU-12300 conf-sanity test 93: osd_handler.c:77... Resolved
Related
is related to LU-11089 Performance improvements for lu_objec... Resolved
is related to LU-13313 conf-sanity test_93: Crashed while pa... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for bfaccini <bruno.faccini@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/59eb46d2-3d9f-11e6-a0ce-5254006e85c2.

The sub-test test_93 failed with the following error:

test failed to respond and timed out

Test log indicates onyx-32vm3 is not responding correctly after parallel mount of MDS[2,4] :

== conf-sanity test 93: register mulitple MDT at the same time ======================================= 15:20:48 (1467152448)
Stopping clients: onyx-32vm1.onyx.hpdd.intel.com,onyx-32vm2 /mnt/lustre (opts:)
CMD: onyx-32vm1.onyx.hpdd.intel.com,onyx-32vm2 running=\$(grep -c /mnt/lustre' ' /proc/mounts);
if [ \$running -ne 0 ] ; then
echo Stopping client \$(hostname) /mnt/lustre opts:;
lsof /mnt/lustre || need_kill=no;
if [ x != x -a x\$need_kill != xno ]; then
    pids=\$(lsof -t /mnt/lustre | sort -u);
    if [ -n \"\$pids\" ]; then
             kill -9 \$pids;
    fi
fi;
while umount  /mnt/lustre 2>&1 | grep -q busy; do
    echo /mnt/lustre is still busy, wait one second && sleep 1;
done;
fi
Stopping clients: onyx-32vm1.onyx.hpdd.intel.com,onyx-32vm2 /mnt/lustre2 (opts:)
CMD: onyx-32vm1.onyx.hpdd.intel.com,onyx-32vm2 running=\$(grep -c /mnt/lustre2' ' /proc/mounts);
if [ \$running -ne 0 ] ; then
echo Stopping client \$(hostname) /mnt/lustre2 opts:;
lsof /mnt/lustre2 || need_kill=no;
if [ x != x -a x\$need_kill != xno ]; then
    pids=\$(lsof -t /mnt/lustre2 | sort -u);
    if [ -n \"\$pids\" ]; then
             kill -9 \$pids;
    fi
fi;
while umount  /mnt/lustre2 2>&1 | grep -q busy; do
    echo /mnt/lustre2 is still busy, wait one second && sleep 1;
done;
fi
CMD: onyx-32vm7 grep -c /mnt/lustre-mds1' ' /proc/mounts
CMD: onyx-32vm7 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm3 grep -c /mnt/lustre-mds2' ' /proc/mounts
CMD: onyx-32vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm7 grep -c /mnt/lustre-mds3' ' /proc/mounts
CMD: onyx-32vm7 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm3 grep -c /mnt/lustre-mds4' ' /proc/mounts
CMD: onyx-32vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 grep -c /mnt/lustre-ost1' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 grep -c /mnt/lustre-ost2' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 grep -c /mnt/lustre-ost3' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 grep -c /mnt/lustre-ost4' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 grep -c /mnt/lustre-ost5' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 grep -c /mnt/lustre-ost6' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 grep -c /mnt/lustre-ost7' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 grep -c /mnt/lustre-ost8' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm2,onyx-32vm3,onyx-32vm7,onyx-32vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_hostid 
Loading modules from /usr/lib64/lustre
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
debug=-1
subsystem_debug=all -lnet -lnd -pinger
Formatting mgs, mds, osts
Format mds1: /dev/lvm-Role_MDS/P1
CMD: onyx-32vm7 grep -c /mnt/lustre-mds1' ' /proc/mounts
CMD: onyx-32vm7 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm7 mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_MDS/P1

   Permanent disk data:
Target:     lustre:MDT0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x65
              (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/usr/sbin/l_getidentity

device size = 2048MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_MDS/P1
	target name   lustre:MDT0000
	4k blocks     50000
	options        -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0000  -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F /dev/lvm-Role_MDS/P1 50000
Writing CONFIGS/mountdata
Format mds2: /dev/lvm-Role_MDS/P2
CMD: onyx-32vm3 grep -c /mnt/lustre-mds2' ' /proc/mounts
CMD: onyx-32vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm3 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --mdt --index=1 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_MDS/P2

   Permanent disk data:
Target:     lustre:MDT0001
Index:      1
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x61
              (MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/usr/sbin/l_getidentity

device size = 2048MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_MDS/P2
	target name   lustre:MDT0001
	4k blocks     50000
	options        -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0001  -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F /dev/lvm-Role_MDS/P2 50000
Writing CONFIGS/mountdata
Format mds3: /dev/lvm-Role_MDS/P3
CMD: onyx-32vm7 grep -c /mnt/lustre-mds3' ' /proc/mounts
CMD: onyx-32vm7 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm7 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --mdt --index=2 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_MDS/P3

   Permanent disk data:
Target:     lustre:MDT0002
Index:      2
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x61
              (MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/usr/sbin/l_getidentity

device size = 2048MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_MDS/P3
	target name   lustre:MDT0002
	4k blocks     50000
	options        -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0002  -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F /dev/lvm-Role_MDS/P3 50000
Writing CONFIGS/mountdata
Format mds4: /dev/lvm-Role_MDS/P4
CMD: onyx-32vm3 grep -c /mnt/lustre-mds4' ' /proc/mounts
CMD: onyx-32vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm3 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --mdt --index=3 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_MDS/P4

   Permanent disk data:
Target:     lustre:MDT0003
Index:      3
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x61
              (MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/usr/sbin/l_getidentity

device size = 2048MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_MDS/P4
	target name   lustre:MDT0003
	4k blocks     50000
	options        -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0003  -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_itable_init,lazy_journal_init -F /dev/lvm-Role_MDS/P4 50000
Writing CONFIGS/mountdata
Format ost1: /dev/lvm-Role_OSS/P1
CMD: onyx-32vm8 grep -c /mnt/lustre-ost1' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --ost --index=0 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_OSS/P1

   Permanent disk data:
Target:     lustre:OST0000
Index:      0
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20

device size = 9912MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P1
	target name   lustre:OST0000
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0000  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P1 50000
Writing CONFIGS/mountdata
Format ost2: /dev/lvm-Role_OSS/P2
CMD: onyx-32vm8 grep -c /mnt/lustre-ost2' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --ost --index=1 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_OSS/P2

   Permanent disk data:
Target:     lustre:OST0001
Index:      1
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20

device size = 9912MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P2
	target name   lustre:OST0001
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0001  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P2 50000
Writing CONFIGS/mountdata
Format ost3: /dev/lvm-Role_OSS/P3
CMD: onyx-32vm8 grep -c /mnt/lustre-ost3' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --ost --index=2 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_OSS/P3

   Permanent disk data:
Target:     lustre:OST0002
Index:      2
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20

device size = 9912MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P3
	target name   lustre:OST0002
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0002  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P3 50000
Writing CONFIGS/mountdata
Format ost4: /dev/lvm-Role_OSS/P4
CMD: onyx-32vm8 grep -c /mnt/lustre-ost4' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --ost --index=3 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_OSS/P4

   Permanent disk data:
Target:     lustre:OST0003
Index:      3
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20

device size = 9912MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P4
	target name   lustre:OST0003
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0003  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P4 50000
Writing CONFIGS/mountdata
Format ost5: /dev/lvm-Role_OSS/P5
CMD: onyx-32vm8 grep -c /mnt/lustre-ost5' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --ost --index=4 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_OSS/P5

   Permanent disk data:
Target:     lustre:OST0004
Index:      4
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20

device size = 9912MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P5
	target name   lustre:OST0004
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0004  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P5 50000
Writing CONFIGS/mountdata
Format ost6: /dev/lvm-Role_OSS/P6
CMD: onyx-32vm8 grep -c /mnt/lustre-ost6' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --ost --index=5 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_OSS/P6

   Permanent disk data:
Target:     lustre:OST0005
Index:      5
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20

device size = 9912MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P6
	target name   lustre:OST0005
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0005  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P6 50000
Writing CONFIGS/mountdata
Format ost7: /dev/lvm-Role_OSS/P7
CMD: onyx-32vm8 grep -c /mnt/lustre-ost7' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --ost --index=6 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_OSS/P7

   Permanent disk data:
Target:     lustre:OST0006
Index:      6
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20

device size = 9912MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P7
	target name   lustre:OST0006
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0006  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P7 50000
Writing CONFIGS/mountdata
Format ost8: /dev/lvm-Role_OSS/P8
CMD: onyx-32vm8 grep -c /mnt/lustre-ost8' ' /proc/mounts
CMD: onyx-32vm8 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
CMD: onyx-32vm8 mkfs.lustre --mgsnode=onyx-32vm7@tcp --fsname=lustre --ost --index=7 --param=sys.timeout=20 --backfstype=ldiskfs --device-size=200000 --mkfsoptions=\"-E lazy_itable_init\" --reformat /dev/lvm-Role_OSS/P8

   Permanent disk data:
Target:     lustre:OST0007
Index:      7
Lustre FS:  lustre
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.2.4.117@tcp sys.timeout=20

device size = 9912MB
formatting backing filesystem ldiskfs on /dev/lvm-Role_OSS/P8
	target name   lustre:OST0007
	4k blocks     50000
	options        -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:OST0007  -I 256 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E lazy_itable_init,resize="4290772992",lazy_journal_init -F /dev/lvm-Role_OSS/P8 50000
Writing CONFIGS/mountdata
start mds service on onyx-32vm7
CMD: onyx-32vm7 mkdir -p /mnt/lustre-mds1
CMD: onyx-32vm7 test -b /dev/lvm-Role_MDS/P1
CMD: onyx-32vm7 e2label /dev/lvm-Role_MDS/P1
Starting mds1:   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
CMD: onyx-32vm7 mkdir -p /mnt/lustre-mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1
CMD: onyx-32vm7 /usr/sbin/lctl get_param -n health_check
CMD: onyx-32vm7 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"-1\" \"all -lnet -lnd -pinger\" 4 
CMD: onyx-32vm7 e2label /dev/lvm-Role_MDS/P1 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: onyx-32vm7 e2label /dev/lvm-Role_MDS/P1 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: onyx-32vm7 sync; sync; sync
CMD: onyx-32vm7 e2label /dev/lvm-Role_MDS/P1 2>/dev/null
Started lustre-MDT0000
start ost1 service on onyx-32vm8
CMD: onyx-32vm8 mkdir -p /mnt/lustre-ost1
CMD: onyx-32vm8 test -b /dev/lvm-Role_OSS/P1
CMD: onyx-32vm8 e2label /dev/lvm-Role_OSS/P1
Starting ost1:   /dev/lvm-Role_OSS/P1 /mnt/lustre-ost1
CMD: onyx-32vm8 mkdir -p /mnt/lustre-ost1; mount -t lustre   		                   /dev/lvm-Role_OSS/P1 /mnt/lustre-ost1
CMD: onyx-32vm8 /usr/sbin/lctl get_param -n health_check
CMD: onyx-32vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"-1\" \"all -lnet -lnd -pinger\" 4 
CMD: onyx-32vm8 e2label /dev/lvm-Role_OSS/P1 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: onyx-32vm8 e2label /dev/lvm-Role_OSS/P1 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: onyx-32vm8 sync; sync; sync
CMD: onyx-32vm8 e2label /dev/lvm-Role_OSS/P1 2>/dev/null
Started lustre-OST0000
CMD: onyx-32vm7 /usr/sbin/lctl set_param fail_val = 10 fail_loc=0x8000090e
onyx-32vm7: error: set_param: setting /proc/sys/lnet/fail_val==: Invalid argument
onyx-32vm7: error: set_param: param_path '10': No such file or directory
mount lustre on /mnt/lustre.....
Starting client: onyx-32vm1.onyx.hpdd.intel.com:  -o user_xattr,flock onyx-32vm7@tcp:/lustre /mnt/lustre
CMD: onyx-32vm1.onyx.hpdd.intel.com mkdir -p /mnt/lustre
start mds service on onyx-32vm7
start mds service on onyx-32vm3
start mds service on onyx-32vm3
CMD: onyx-32vm1.onyx.hpdd.intel.com mount -t lustre -o user_xattr,flock onyx-32vm7@tcp:/lustre /mnt/lustre
CMD: onyx-32vm7 mkdir -p /mnt/lustre-mds3
CMD: onyx-32vm3 mkdir -p /mnt/lustre-mds2
CMD: onyx-32vm3 mkdir -p /mnt/lustre-mds4
CMD: onyx-32vm3 test -b /dev/lvm-Role_MDS/P2
CMD: onyx-32vm3 test -b /dev/lvm-Role_MDS/P4
CMD: onyx-32vm7 test -b /dev/lvm-Role_MDS/P3
CMD: onyx-32vm3 e2label /dev/lvm-Role_MDS/P4
CMD: onyx-32vm7 e2label /dev/lvm-Role_MDS/P3
CMD: onyx-32vm3 e2label /dev/lvm-Role_MDS/P2
Starting mds3:   /dev/lvm-Role_MDS/P3 /mnt/lustre-mds3
CMD: onyx-32vm7 mkdir -p /mnt/lustre-mds3; mount -t lustre   		                   /dev/lvm-Role_MDS/P3 /mnt/lustre-mds3
Starting mds4:   /dev/lvm-Role_MDS/P4 /mnt/lustre-mds4
CMD: onyx-32vm3 mkdir -p /mnt/lustre-mds4; mount -t lustre   		                   /dev/lvm-Role_MDS/P4 /mnt/lustre-mds4
Starting mds2:   /dev/lvm-Role_MDS/P2 /mnt/lustre-mds2
CMD: onyx-32vm3 mkdir -p /mnt/lustre-mds2; mount -t lustre   		                   /dev/lvm-Role_MDS/P2 /mnt/lustre-mds2
CMD: onyx-32vm7 /usr/sbin/lctl get_param -n health_check
CMD: onyx-32vm7 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"-1\" \"all -lnet -lnd -pinger\" 4 
CMD: onyx-32vm7 e2label /dev/lvm-Role_MDS/P3 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: onyx-32vm7 e2label /dev/lvm-Role_MDS/P3 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: onyx-32vm7 sync; sync; sync
CMD: onyx-32vm7 e2label /dev/lvm-Role_MDS/P3 2>/dev/null
Started lustre-MDT0002
CMD: onyx-32vm7 lctl list_param osc.lustre-OST*-osc             > /dev/null 2>&1
CMD: onyx-32vm7 lctl get_param -n at_min
CMD: onyx-32vm7 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh wait_import_state FULL osc.lustre-OST0000-osc-MDT0000.ost_server_uuid 40 
onyx-32vm7: osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 0 sec
CMD: onyx-32vm3 /usr/sbin/lctl lustre_build_version
pdsh@onyx-32vm1: onyx-32vm3: mcmd: connect failed: Connection refused
/usr/lib64/lustre/tests/test-framework.sh: line 382: ( << 16) | ( << 8) | : syntax error: operand expected (error token is "<< 16) | ( << 8) | ")
/usr/lib64/lustre/tests/test-framework.sh: line 5818: [: -le: unary operator expected
CMD: onyx-32vm3 /usr/sbin/lctl lustre_build_version
pdsh@onyx-32vm1: onyx-32vm3: mcmd: connect failed: Connection refused
/usr/lib64/lustre/tests/test-framework.sh: line 382: ( << 16) | ( << 8) | : syntax error: operand expected (error token is "<< 16) | ( << 8) | ")
/usr/lib64/lustre/tests/test-framework.sh: line 5803: [: -gt: unary operator expected
CMD: onyx-32vm3 lctl get_param -n at_min
pdsh@onyx-32vm1: onyx-32vm3: mcmd: connect failed: Connection refused
CMD: onyx-32vm3 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh wait_import_state FULL osc.lustre-OST0000-osc-MDT0001.ost_server_uuid 40 
pdsh@onyx-32vm1: onyx-32vm3: mcmd: connect failed: Connection refused
 conf-sanity test_93: @@@@@@ FAIL: import is not in FULL state 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4785:error()
  = /usr/lib64/lustre/tests/test-framework.sh:5976:_wait_osc_import_state()
  = /usr/lib64/lustre/tests/test-framework.sh:5991:wait_osc_import_state()
  = /usr/lib64/lustre/tests/conf-sanity.sh:6467:test_93()
  = /usr/lib64/lustre/tests/test-framework.sh:5049:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5088:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4935:run_test()
  = /usr/lib64/lustre/tests/conf-sanity.sh:6473:main()
Dumping lctl log to /logdir/test_logs/2016-06-28/lustre-reviews-el7-x86_64--review-dne-part-1--1_6_1__40104__-69939819083780-054108/conf-sanity.test_93.*.1467152498.log
CMD: onyx-32vm1.onyx.hpdd.intel.com,onyx-32vm2,onyx-32vm3,onyx-32vm7,onyx-32vm8 /usr/sbin/lctl dk > /logdir/test_logs/2016-06-28/lustre-reviews-el7-x86_64--review-dne-part-1--1_6_1__40104__-69939819083780-054108/conf-sanity.test_93.debug_log.\$(hostname -s).1467152498.log;
         dmesg > /logdir/test_logs/2016-06-28/lustre-reviews-el7-x86_64--review-dne-part-1--1_6_1__40104__-69939819083780-054108/conf-sanity.test_93.dmesg.\$(hostname -s).1467152498.log
pdsh@onyx-32vm1: onyx-32vm3: mcmd: connect failed: Connection refused
Resetting fail_loc on all nodes...CMD: onyx-32vm1.onyx.hpdd.intel.com,onyx-32vm2,onyx-32vm3,onyx-32vm7,onyx-32vm8 lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null || true
pdsh@onyx-32vm1: onyx-32vm3: mcmd: connect failed: Connection refused
done.

This occurs during conf-sanity/test_93 :

test_93() {
        [ $MDSCOUNT -lt 3 ] && skip "needs >= 3 MDTs" && return

        reformat
        #start mgs or mgs/mdt0
        if ! combined_mgs_mds ; then
                start_mgs
                start_mdt 1
        else
                start_mdt 1
        fi

        start_ost || error "OST0 start fail"

        #define OBD_FAIL_MGS_WRITE_TARGET_DELAY  0x90e
        do_facet mgs "$LCTL set_param fail_val = 10 fail_loc=0x8000090e"
        for num in $(seq 2 $MDSCOUNT); do
                start_mdt $num &    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
        done

        mount_client $MOUNT || error "mount client fails"
        wait_osc_import_state mds ost FULL
        wait_osc_import_state client ost FULL
        check_mount || error "check_mount failed"

        cleanup || error "cleanup failed with $?"
}
run_test 93 "register mulitple MDT at the same time"

and the reason of the failure is the following crash/LBUG found in onyx-32vm3/MDS Console log :

15:21:36:[29395.748697] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4
15:21:36:[29395.753612] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2
15:21:36:[29396.019926] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P4
15:21:36:[29396.024718] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P2
15:21:36:[29396.306479] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P2
15:21:36:[29396.311613] Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P4
15:21:36:[29396.577947] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds2; mount -t lustre   		                   /dev/lvm-Role_MDS/P2 /mnt/lustre-mds2
15:21:36:[29396.594860] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds4; mount -t lustre   		                   /dev/lvm-Role_MDS/P4 /mnt/lustre-mds4
15:21:36:[29396.743622] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
15:21:36:[29396.750879] LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: errors=remount-ro
15:21:36:[29396.909772] LustreError: 26347:0:(osd_handler.c:6468:osd_device_init0()) ASSERTION( info ) failed: 
15:21:36:[29396.912150] LustreError: 26347:0:(osd_handler.c:6468:osd_device_init0()) LBUG
15:21:36:[29396.915016] Pid: 26347, comm: mount.lustre
15:21:36:[29396.919614] 
15:21:36:[29396.919614] Call Trace:
15:21:36:[29396.922958]  [<ffffffffa05e67d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
15:21:36:[29396.925401]  [<ffffffffa05e6d75>] lbug_with_loc+0x45/0xc0 [libcfs]
15:21:36:[29396.927434]  [<ffffffffa0c24ccf>] osd_device_alloc+0x70f/0x880 [osd_ldiskfs]
15:21:36:[29396.929611]  [<ffffffffa07cd104>] obd_setup+0x114/0x2a0 [obdclass]
15:21:36:[29396.931618]  [<ffffffffa07cfb54>] class_setup+0x2f4/0x8d0 [obdclass]
15:21:36:[29396.933586]  [<ffffffffa07d3ee7>] class_process_config+0x1de7/0x2f70 [obdclass]
15:21:36:[29396.935800]  [<ffffffffa05f1957>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
15:21:36:[29396.937938] LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
15:21:36:[29396.937939]  [<ffffffffa07dcb69>] do_lcfg+0x159/0x5d0 [obdclass]
15:21:36:[29396.937954]  [<ffffffffa07dd928>] lustre_start_simple+0x88/0x210 [obdclass]
15:21:36:[29396.937972]  [<ffffffffa0808ac4>] server_fill_super+0xf24/0x184c [obdclass]
15:21:36:[29396.937977]  [<ffffffffa05f1957>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
15:21:36:[29396.937991]  [<ffffffffa07e09e8>] lustre_fill_super+0x328/0x950 [obdclass]
15:21:36:[29396.938013]  [<ffffffffa07e06c0>] ? lustre_fill_super+0x0/0x950 [obdclass]
15:21:36:[29396.938019]  [<ffffffff811e1f2d>] mount_nodev+0x4d/0xb0
15:21:36:[29396.938033]  [<ffffffffa07d8918>] lustre_mount+0x38/0x60 [obdclass]
15:21:36:[29396.938034]  [<ffffffff811e28d9>] mount_fs+0x39/0x1b0
15:21:36:[29396.938038]  [<ffffffff811fe1af>] vfs_kern_mount+0x5f/0xf0
15:21:36:[29396.938039]  [<ffffffff812006fe>] do_mount+0x24e/0xa40
15:21:36:[29396.938043]  [<ffffffff8116e15e>] ? __get_free_pages+0xe/0x50
15:21:36:[29396.938044]  [<ffffffff81200f86>] SyS_mount+0x96/0xf0
15:21:36:[29396.938048]  [<ffffffff816463c9>] system_call_fastpath+0x16/0x1b
15:21:36:[29396.938048] 
15:21:36:[29396.968797] Kernel panic - not syncing: LBUG
15:21:36:[29396.969781] CPU: 0 PID: 26347 Comm: mount.lustre Tainted: G           OE  ------------   3.10.0-327.18.2.el7_lustre.x86_64 #1
15:21:36:[29396.969781] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
15:21:36:[29396.969781]  ffffffffa0603def 0000000048fb9a4f ffff880039073950 ffffffff81635c14
15:21:36:[29396.969781]  ffff8800390739d0 ffffffff8162f48a ffffffff00000008 ffff8800390739e0
15:21:36:[29396.969781]  ffff880039073980 0000000048fb9a4f ffffffffa0c511a0 0000000000000246
15:21:36:[29396.969781] Call Trace:
15:21:36:[29396.969781]  [<ffffffff81635c14>] dump_stack+0x19/0x1b
15:21:36:[29396.969781]  [<ffffffff8162f48a>] panic+0xd8/0x1e7
15:21:36:[29396.969781]  [<ffffffffa05e6ddb>] lbug_with_loc+0xab/0xc0 [libcfs]
15:21:36:[29396.969781]  [<ffffffffa0c24ccf>] osd_device_alloc+0x70f/0x880 [osd_ldiskfs]
15:21:36:[29396.969781]  [<ffffffffa07cd104>] obd_setup+0x114/0x2a0 [obdclass]
15:21:36:[29396.969781]  [<ffffffffa07cfb54>] class_setup+0x2f4/0x8d0 [obdclass]
15:21:36:[29396.969781]  [<ffffffffa07d3ee7>] class_process_config+0x1de7/0x2f70 [obdclass]
15:21:36:[29396.969781]  [<ffffffffa05f1957>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
15:21:36:[29396.969781]  [<ffffffffa07dcb69>] do_lcfg+0x159/0x5d0 [obdclass]
15:21:36:[29396.969781]  [<ffffffffa07dd928>] lustre_start_simple+0x88/0x210 [obdclass]
15:21:36:[29396.969781]  [<ffffffffa0808ac4>] server_fill_super+0xf24/0x184c [obdclass]
15:21:36:[29396.969781]  [<ffffffffa05f1957>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
15:21:36:[29396.969781]  [<ffffffffa07e09e8>] lustre_fill_super+0x328/0x950 [obdclass]
15:21:36:[29396.969781]  [<ffffffffa07e06c0>] ? lustre_common_put_super+0x270/0x270 [obdclass]
15:21:36:[29396.969781]  [<ffffffff811e1f2d>] mount_nodev+0x4d/0xb0
15:21:36:[29396.969781]  [<ffffffffa07d8918>] lustre_mount+0x38/0x60 [obdclass]
15:21:36:[29396.969781]  [<ffffffff811e28d9>] mount_fs+0x39/0x1b0
15:21:36:[29396.969781]  [<ffffffff811fe1af>] vfs_kern_mount+0x5f/0xf0
15:21:36:[29396.969781]  [<ffffffff812006fe>] do_mount+0x24e/0xa40
15:21:36:[29396.969781]  [<ffffffff8116e15e>] ? __get_free_pages+0xe/0x50
15:21:36:[29396.969781]  [<ffffffff81200f86>] SyS_mount+0x96/0xf0
15:21:36:[29396.969781]  [<ffffffff816463c9>] system_call_fastpath+0x16/0x1b

Info required for matching: conf-sanity 93



 Comments   
Comment by Bruno Faccini (Inactive) [ 29/Jun/16 ]

Looks like a new race scenario during parallel mount, but at feature/service level/layer (ie, not at target layer like LU-5299/LU-5573/LU-6553).
Will have a look to crash-dump to see how this happen and can be fixed.

Comment by Jian Yu [ 26/Jul/16 ]

More failure instances on master branch:
https://testing.hpdd.intel.com/test_sets/bdb7e32e-533d-11e6-b2ba-5254006e85c2
https://testing.hpdd.intel.com/test_sets/b7749b9e-52dc-11e6-8968-5254006e85c2

This is affecting patch review testing on master branch.

Comment by nasf (Inactive) [ 12/Dec/16 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/1ba56f60-bfa4-11e6-bedd-5254006e85c2

Comment by nasf (Inactive) [ 15/Mar/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/171abdb8-090c-11e7-9053-5254006e85c2

Comment by Gerrit Updater [ 21/Mar/17 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/26099
Subject: LU-8346 obdclass: guarantee all keys filled
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 12223b3133d3651dee29dfd940ca3c4f0e256cfd

Comment by Jian Yu [ 05/May/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/a7d8dad6-30a5-11e7-8847-5254006e85c2

Comment by Gerrit Updater [ 12/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26099/
Subject: LU-8346 obdclass: guarantee all keys filled
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e58f8d609a81576eaf5bc9d0fa53bef274a01bfc

Comment by Peter Jones [ 12/May/17 ]

Landed for 2.10

Comment by Bruno Faccini (Inactive) [ 01/Jun/17 ]

Well, too bad but looks like I have triggered a new occurrence (https://testing.hpdd.intel.com/test_sets/6cf13ba8-46a7-11e7-bc6c-5254006e85c2), even with https://review.whamcloud.com/26099 applied.

Comment by Peter Jones [ 05/Jun/17 ]

It seems that the attempt to fix this issue has not been successful as the frequency of occurance is similar to before the fix. This frequency is not often enough so that we need to address this for 2.10 at this late stage.

Comment by Gerrit Updater [ 05/Jun/17 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/27426
Subject: LU-8346 obd: debug patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 40d89c405dd84bf4f36553ef72bedf49c18da956

Comment by Gerrit Updater [ 06/Jun/17 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/27448
Subject: LU-8346 obdclass: protect key_set_version
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 725fcbcaa1da4ffa554764a19c467394e5f71024

Comment by Gerrit Updater [ 11/Jul/17 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/27994
Subject: LU-8346 obdclass: Set lc_version
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b432e08231b73c7b6c3a5e6fb5ab03a8de1e1778

Comment by Gerrit Updater [ 29/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27994/
Subject: LU-8346 obdclass: Set lc_version
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 96f3fb788c230872e6d31185367a55ec3c4fedbc

Comment by Gerrit Updater [ 07/Aug/17 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/28405
Subject: LU-8346 obdclass: Set lc_version
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 6225c3de5efeab340bba61895682923193c75821

Comment by Gerrit Updater [ 21/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28405/
Subject: LU-8346 obdclass: Set lc_version
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 6aabd4a2760f1d42a788f6ad8712abdece7d1159

Comment by Peter Jones [ 21/Aug/17 ]

Hongchao

Does https://review.whamcloud.com/#/c/27448/ still need to land or can it be abandoned?

Peter

Comment by Hongchao Zhang [ 22/Aug/17 ]

Hi Peter,

I think https://review.whamcloud.com/#/c/27448/ is still needed.

Comment by Andreas Dilger [ 26/Oct/17 ]

Hit this on master: https://testing.hpdd.intel.com/sub_tests/c3aecd26-b96a-11e7-8afb-52540065bddc

Comment by Jian Yu [ 21/Nov/17 ]

More failure instances on master branch:
https://testing.hpdd.intel.com/test_sets/25246a58-ce73-11e7-a066-52540065bddc
https://testing.hpdd.intel.com/test_sets/b7d3e026-cbcc-11e7-8027-52540065bddc
https://testing.hpdd.intel.com/test_sets/be17cd80-cb13-11e7-9840-52540065bddc

Comment by Gerrit Updater [ 25/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27448/
Subject: LU-8346 obdclass: protect key_set_version
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4538de675cc1ee05408fa912e71c65d9109d7027

Comment by Peter Jones [ 25/Jan/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 25/Jan/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31017
Subject: LU-8346 obdclass: protect key_set_version
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 69c783d5c9934f9e5e0f59dee0ab9445bd2e6e3e

Comment by Hongchao Zhang [ 30/Jan/18 ]

https://testing.hpdd.intel.com/test_sessions/80fd399a-e6ad-40b4-8624-6ea2b73c1fd6

Comment by Gerrit Updater [ 30/Jan/18 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/31084
Subject: LU-8346 obdclass: debug patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3d738f569ac354563116f91f6c32f611ae6b1b54

Comment by Gerrit Updater [ 09/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31017/
Subject: LU-8346 obdclass: protect key_set_version
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 823e1549f109412db4a8cb31e648819660c5f7b8

Comment by Hongchao Zhang [ 12/Apr/18 ]

This issue can be reproduced if using "rmmod -w osd-ldiskfs" while mounting the MDT or OST, and the osd-ldiskfs module
is marked as "MODULE_STATE_GOING" and will be skipped in "keys_fill" called by "lu_env_refill".

Comment by Gerrit Updater [ 12/Apr/18 ]

Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: https://review.whamcloud.com/31971
Subject: LU-8346 osd-ldiskfs: don't assert if module is going
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b6bbcddb2dec31dc6019fc8177c35424a957ed22

Comment by Gerrit Updater [ 01/Feb/19 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34155
Subject: LU-8346 tests: remove spaces around fail_val
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8d2909dab2e8af0d3301db14dec175a498d5f63b

Comment by James Nunez (Inactive) [ 01/Feb/19 ]

The patch https://review.whamcloud.com/34155 fixes the problem with setting fail_val in conf-sanity test 93:
do_facet mgs "$LCTL set_param fail_val = 10 fail_loc=0x8000090e"

Comment by Gerrit Updater [ 11/Feb/19 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34226
Subject: LU-8346 tests: remove spaces around fail_val
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: dbb3a02fef7332b126668bb5b2d3066d77243f90

Comment by Gerrit Updater [ 18/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34155/
Subject: LU-8346 tests: remove spaces around fail_val
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 59cb4a5c39e2c85a89be2863a73899c02c9a89c3

Comment by Gerrit Updater [ 19/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34226/
Subject: LU-8346 tests: remove spaces around fail_val
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 430b20be17645989a51fb586824f7637535ff24e

Comment by James A Simmons [ 25/Mar/20 ]

Is this work done.

Comment by Andreas Dilger [ 04/Apr/20 ]

I searched back to the start of the year, and there were two timeouts for this test in the past 3 months, so this isn't really a high priority to fix:
2020-03-02 https://testing.whamcloud.com/test_sets/ddf5643a-c7b9-4b0b-9b86-2adfe74817d3
2020-02-24 https://testing.whamcloud.com/test_sets/b0e66f0e-c3d4-49e9-9024-9e7910dd3d12

Comment by Antoine Percher [ 27/Oct/20 ]

It seems that the patch 26099 has bad effect on parallel mounts on lustre client

On a client node with lustre 2.12.5 after mounting the same filesystem twice
in parallel then unmounting these filesystem, it is impossible to remove the
lustre module from the kernel

fstab:
<serv1@ib1>:<serv2@ib1>:/fs1 /mnt/fs1 lustre defaults,_netdev,noauto,x-systemd.requires=lnet.service,flock,user_xattr,nosuid 0 0
<serv1@ib1>:<serv2@ib1>:/fs1/home /mnt/home lustre defaults,_netdev,noauto,x-systemd.requires=lnet.service,flock,user_xattr,nosuid 0 0

{{ systemctl start lnet
modprobe lustre
mount /mnt/home & mount /mnt/fs1
umount /mnt/home
umount /mnt/fs1
rmmod lustre <- hang}}

The rmmod stack in kernel is

{{#0 __schedule
#1 schedule
#2 lu_contex_key_degister [obdclass]
#3 lu_context_key_degister_many [obdclass]
#4 vvp_global_fini [lustre]
#5 lustre_exit [lustre]
#6 __x64_sys_delete_module
#7 do_syscall
#8 entry_SYSCALL_64_after_hwframe
}}
crash> p vvp_thread_key.lct_used.counter
$1 = 105
crash> p vvp_session_key.lct_used.counter
$2 = 51

Comment by Etienne Aujames [ 03/Nov/20 ]

@Antoine Percher I have created the LU-14110 to follow your issue. This affects also the master branch (with less recurrences).

Generated at Sat Feb 10 02:16:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.