Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.0, Lustre 2.10.5
-
3
-
9223372036854775807
Description
Looking at logs at https://testing.hpdd.intel.com/test_sets/6f53a458-3c92-11e8-8f8a-52540065bddc, we see conf-sanity test_32a and test_32d fail with the following in the client test_log after trying to rmmod the ZFS module 19 times
trevis-49vm4: trevis-49vm4.trevis.hpdd.intel.com: executing /usr/sbin/lustre_rmmod zfs trevis-49vm4: rmmod: ERROR: Module zfs is in use CMD: trevis-49vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_mem_leak trevis-49vm4: trevis-49vm4.trevis.hpdd.intel.com: executing check_mem_leak Unloading modules on trevis-49vm4: Attempt 19 CMD: trevis-49vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh /usr/sbin/lustre_rmmod zfs trevis-49vm4: trevis-49vm4.trevis.hpdd.intel.com: executing /usr/sbin/lustre_rmmod zfs trevis-49vm4: rmmod: ERROR: Module zfs is in use CMD: trevis-49vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_mem_leak trevis-49vm4: trevis-49vm4.trevis.hpdd.intel.com: executing check_mem_leak Unloading modules on trevis-49vm4: Given up conf-sanity test_32a: @@@@@@ FAIL: Reloading modules Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5726:error_noexit() = /usr/lib64/lustre/tests/conf-sanity.sh:2292:t32_test()
Looking at the console log on vm4, the MDS, we see some errors prior to trying to unload the ZFS module
[ 7085.963959] Lustre: DEBUG MARKER: mount -t lustre -onomgs -omgsnode=10.9.6.58@tcp t32fs-ost1/ost1 /tmp/t32/mnt/ost [ 7086.669656] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n obdfilter.t32fs-OST0000.uuid [ 7087.000523] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15 [ 7087.335563] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.9.6.58@tcp [ 7087.663836] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9 [ 7087.993803] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.9.6.58@tcp [ 7088.322614] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_new t32fs.interop [ 7093.175575] LustreError: 9067:0:(mgc_request.c:1576:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.9.6.58@tcp [ 7093.177860] Lustre: 9067:0:(mgc_request.c:1802:mgc_process_recover_nodemap_log()) MGC10.9.6.58@tcp: error processing recovery log t32fs-mdtir: rc = -2 [ 7093.181698] LustreError: 9067:0:(mgc_request.c:2132:mgc_process_log()) MGC10.9.6.58@tcp: recover log t32fs-mdtir failed, not fatal: rc= -2 [ 7093.187057] Lustre: 10864:0:(obd_mount.c:972:lustre_check_exclusion()) Excluding t32fs-OST0000-osc-MDT0000 (on exclusion list) [ 7093.191117] LustreError: 10864:0:(obd_config.c:1501:class_process_proc_param()) t32fs-OST0000-osc-MDT0000: unknown config parameter 'osc.max_dirty_mb=15' [ 7094.656698] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M [ 7094.993137] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70 [ 7095.327475] Lustre: DEBUG MARKER: umount -d /tmp/t32/mnt/mdt [ 7095.500065] Lustre: Failing over t32fs-MDT0000 [ 7095.809239] Lustre: DEBUG MARKER: umount -d /tmp/t32/mnt/ost [ 7102.179414] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust [ 7102.796961] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-49vm4.trevis.hpdd.intel.com: executing \/usr\/sbin\/lustre_rmmod zfs [ 7102.801643] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-49vm4.trevis.hpdd.intel.com: executing \/usr\/sbin\/lustre_rmmod zfs [ 7102.998133] Lustre: DEBUG MARKER: trevis-49vm4.trevis.hpdd.intel.com: executing /usr/sbin/lustre_rmmod zfs [ 7103.010598] Lustre: DEBUG MARKER: trevis-49vm4.trevis.hpdd.intel.com: executing /usr/sbin/lustre_rmmod zfs
It looks like this these tests started failing during testing for LU-8066 on 2018-04-10 01:04:07 UTC.
Logs for these test failures are at
https://testing.hpdd.intel.com/test_sets/6f53a458-3c92-11e8-8f8a-52540065bddc
https://testing.hpdd.intel.com/test_sets/32358bce-3cb3-11e8-960d-52540065bddc
https://testing.hpdd.intel.com/test_sets/827118f8-3c8e-11e8-8f8a-52540065bddc