Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10898

conf-sanity test 32a and 32d fail with ‘rmmod: ERROR: Module zfs is in use’

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.12.0, Lustre 2.10.5
    • Fix Version/s: Lustre 2.12.0, Lustre 2.10.5
    • Labels:
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      Looking at logs at https://testing.hpdd.intel.com/test_sets/6f53a458-3c92-11e8-8f8a-52540065bddc, we see conf-sanity test_32a and test_32d fail with the following in the client test_log after trying to rmmod the ZFS module 19 times

      trevis-49vm4: trevis-49vm4.trevis.hpdd.intel.com: executing /usr/sbin/lustre_rmmod zfs
      trevis-49vm4: rmmod: ERROR: Module zfs is in use
      CMD: trevis-49vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_mem_leak
      trevis-49vm4: trevis-49vm4.trevis.hpdd.intel.com: executing check_mem_leak
      Unloading modules on trevis-49vm4: Attempt 19
      CMD: trevis-49vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh /usr/sbin/lustre_rmmod zfs
      trevis-49vm4: trevis-49vm4.trevis.hpdd.intel.com: executing /usr/sbin/lustre_rmmod zfs
      trevis-49vm4: rmmod: ERROR: Module zfs is in use
      CMD: trevis-49vm4 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_mem_leak
      trevis-49vm4: trevis-49vm4.trevis.hpdd.intel.com: executing check_mem_leak
      Unloading modules on trevis-49vm4: Given up
       conf-sanity test_32a: @@@@@@ FAIL: Reloading modules
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5726:error_noexit()
        = /usr/lib64/lustre/tests/conf-sanity.sh:2292:t32_test()
      

       

      Looking at the console log on vm4, the MDS, we see some errors prior to trying to unload the ZFS module

      [ 7085.963959] Lustre: DEBUG MARKER: mount -t lustre -onomgs -omgsnode=10.9.6.58@tcp t32fs-ost1/ost1 /tmp/t32/mnt/ost
      [ 7086.669656] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n obdfilter.t32fs-OST0000.uuid
      [ 7087.000523] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.osc.max_dirty_mb=15
      [ 7087.335563] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-OST0000.failover.node=10.9.6.58@tcp
      [ 7087.663836] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight=9
      [ 7087.993803] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.failover.node=10.9.6.58@tcp
      [ 7088.322614] Lustre: DEBUG MARKER: /usr/sbin/lctl pool_new t32fs.interop
      [ 7093.175575] LustreError: 9067:0:(mgc_request.c:1576:mgc_apply_recover_logs()) mgc: cannot find uuid by nid 10.9.6.58@tcp
      [ 7093.177860] Lustre: 9067:0:(mgc_request.c:1802:mgc_process_recover_nodemap_log()) MGC10.9.6.58@tcp: error processing recovery log t32fs-mdtir: rc = -2
      [ 7093.181698] LustreError: 9067:0:(mgc_request.c:2132:mgc_process_log()) MGC10.9.6.58@tcp: recover log t32fs-mdtir failed, not fatal: rc= -2
      [ 7093.187057] Lustre: 10864:0:(obd_mount.c:972:lustre_check_exclusion()) Excluding t32fs-OST0000-osc-MDT0000 (on exclusion list)
      [ 7093.191117] LustreError: 10864:0:(obd_config.c:1501:class_process_proc_param()) t32fs-OST0000-osc-MDT0000: unknown config parameter 'osc.max_dirty_mb=15'
      [ 7094.656698] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.lov.stripesize=4M
      [ 7094.993137] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdd.atime_diff=70
      [ 7095.327475] Lustre: DEBUG MARKER: umount -d /tmp/t32/mnt/mdt
      [ 7095.500065] Lustre: Failing over t32fs-MDT0000
      [ 7095.809239] Lustre: DEBUG MARKER: umount -d /tmp/t32/mnt/ost
      [ 7102.179414] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
      [ 7102.796961] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-49vm4.trevis.hpdd.intel.com: executing \/usr\/sbin\/lustre_rmmod zfs
      [ 7102.801643] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-49vm4.trevis.hpdd.intel.com: executing \/usr\/sbin\/lustre_rmmod zfs
      [ 7102.998133] Lustre: DEBUG MARKER: trevis-49vm4.trevis.hpdd.intel.com: executing /usr/sbin/lustre_rmmod zfs
      [ 7103.010598] Lustre: DEBUG MARKER: trevis-49vm4.trevis.hpdd.intel.com: executing /usr/sbin/lustre_rmmod zfs
      

       

      It looks like this these tests started failing during testing for LU-8066 on 2018-04-10 01:04:07 UTC.

       

      Logs for these test failures are at

      https://testing.hpdd.intel.com/test_sets/6f53a458-3c92-11e8-8f8a-52540065bddc

      https://testing.hpdd.intel.com/test_sets/32358bce-3cb3-11e8-960d-52540065bddc

      https://testing.hpdd.intel.com/test_sets/827118f8-3c8e-11e8-8f8a-52540065bddc

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                utopiabound Nathaniel Clark
                Reporter:
                jamesanunez James Nunez
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: