Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7776

lustre-single lnet-selftest test failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.8.0, Lustre 2.9.0
    • None
    • Solo setup
    • 3
    • 9223372036854775807

    Description

      lnet-selftest test fails in test setup

      stdout.log
        1 UP mgs MGS MGS 5
        2 UP mgc MGC192.168.108.18@tcp c0ab2420-8f51-ad18-f779-591cad596879 5
        3 UP mds MDS MDS_uuid 3
        4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
        5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 11
        6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4
        7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4
        8 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5
        9 UP osd-ldiskfs lustre-OST0000-osd lustre-OST0000-osd_UUID 5
       10 UP ost OSS OSS_uuid 3
       11 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 7
       12 UP lwp lustre-MDT0000-lwp-OST0000 lustre-MDT0000-lwp-OST0000_UUID 5
       13 UP osd-ldiskfs lustre-OST0001-osd lustre-OST0001-osd_UUID 5
       14 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 7
       15 UP lwp lustre-MDT0000-lwp-OST0001 lustre-MDT0000-lwp-OST0001_UUID 5
       21 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
       22 UP osp lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
      Modules still loaded: 
      lustre/osp/osp.o lustre/lod/lod.o lustre/ost/ost.o lustre/mdt/mdt.o lustre/mdd/mdd.o lustre/mgs/mgs.o ldiskfs/ldiskfs.o lustre/quota/lquota.o lustre/lfsck/lfsck.o lustre/mgc/mgc.o lustre/fid/fid.o lustre/fld/fld.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o
      
      

      Attachments

        Activity

          [LU-7776] lustre-single lnet-selftest test failed

          patch has landed to master for 2.9.0

          jgmitter Joseph Gmitter (Inactive) added a comment - patch has landed to master for 2.9.0

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19308/
          Subject: LU-7776 tests: lnet-selftest.sh local_mode failure
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 84030bf26c1763edf9ac17a8cd2765e9163294bf

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19308/ Subject: LU-7776 tests: lnet-selftest.sh local_mode failure Project: fs/lustre-release Branch: master Current Patch Set: Commit: 84030bf26c1763edf9ac17a8cd2765e9163294bf

          @Andreas Dilge: Hi Andreas, have uploaded the discussed patch and test run results were fine. Can you and others kindly review the patch.

          abrarahmed Abrar-ahmed (Inactive) added a comment - @Andreas Dilge: Hi Andreas, have uploaded the discussed patch and test run results were fine. Can you and others kindly review the patch.

          Abrarahmed Momin (kais_abrar@yahoo.co.in) uploaded a new patch: http://review.whamcloud.com/19308
          Subject: LU-7776 tests: lnet-selftest.sh local_mode failure
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 4efad14f8c85d5346aed97a047d9e5681c1792e5

          gerrit Gerrit Updater added a comment - Abrarahmed Momin (kais_abrar@yahoo.co.in) uploaded a new patch: http://review.whamcloud.com/19308 Subject: LU-7776 tests: lnet-selftest.sh local_mode failure Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4efad14f8c85d5346aed97a047d9e5681c1792e5

          Looks reasonable, and if this patch works for you then you can submit it and it can be tested.

          adilger Andreas Dilger added a comment - Looks reasonable, and if this patch works for you then you can submit it and it can be tested.

          @Andreas Dilger

          Alternate solution to keep the debug patch functionality would be to avoid calling cleanupall on local_mode setups. Something like below

          -	local_mode && CLIENTONLY=yes
          +	if local_mode; then
          +		CLIENTONLY=yes
          +		stopall
          +	else
          +		LOAD_MODULES_REMOTE=true
          +		cleanupall
          +	fi
          

          Let me know which solution works for you or if you want to suggest alternatives. I can upload a patch for the same.

          abrarahmed Abrar-ahmed (Inactive) added a comment - @Andreas Dilger Alternate solution to keep the debug patch functionality would be to avoid calling cleanupall on local_mode setups. Something like below - local_mode && CLIENTONLY=yes + if local_mode; then + CLIENTONLY=yes + stopall + else + LOAD_MODULES_REMOTE= true + cleanupall + fi Let me know which solution works for you or if you want to suggest alternatives. I can upload a patch for the same.

          @Andreas Dilger

          Here is my understanding of the debug patch submitted via commit <a8ba5c645f91faf86a84c99dd2cc049bc54e12b1>
          Debug patch replaced stopall with cleanupall. cleanupall in addition to unmounting clients and stopping servers also unloads modules which i believe was the intended purpose of the debug patch. Please correct my understanding if wrong.
          Quoting the relevant section of the debug patch change below

          -    local_mode && CLIENTONLY=yes
          -    stopall
          -    RESTORE_MOUNT=yes
          +	local_mode && CLIENTONLY=yes
          +	RESTORE_MOUNT=yes
          +	LOAD_MODULES_REMOTE=true
          +	cleanupall
          

          So changing cleanupall to stopall would be functionally reverting the debug patch. Would this not cause your test setup to fail again?.

          abrarahmed Abrar-ahmed (Inactive) added a comment - @Andreas Dilger Here is my understanding of the debug patch submitted via commit <a8ba5c645f91faf86a84c99dd2cc049bc54e12b1> Debug patch replaced stopall with cleanupall. cleanupall in addition to unmounting clients and stopping servers also unloads modules which i believe was the intended purpose of the debug patch. Please correct my understanding if wrong. Quoting the relevant section of the debug patch change below - local_mode && CLIENTONLY=yes - stopall - RESTORE_MOUNT=yes + local_mode && CLIENTONLY=yes + RESTORE_MOUNT=yes + LOAD_MODULES_REMOTE= true + cleanupall So changing cleanupall to stopall would be functionally reverting the debug patch. Would this not cause your test setup to fail again?.

          I don't think that reverting the patch is a good idea, since I believe this will cause lnet-selftest to begin failing again in our test configuration.

          Instead, I think it should be enough to change the "cleanupall" to "stopall" so that it doesn't try to unload the modules, which isn't necessary. The goal of the LU-4181 patch was to stop the clients so that they would not interfere with the testing, or become disconnected when lnet-selftest was saturating the network.

          adilger Andreas Dilger added a comment - I don't think that reverting the patch is a good idea, since I believe this will cause lnet-selftest to begin failing again in our test configuration. Instead, I think it should be enough to change the "cleanupall" to "stopall" so that it doesn't try to unload the modules, which isn't necessary. The goal of the LU-4181 patch was to stop the clients so that they would not interfere with the testing, or become disconnected when lnet-selftest was saturating the network.
          abrarahmed Abrar-ahmed (Inactive) added a comment - - edited

          lnet-selftest.sh script is fails while trying to execute cleanupall() during test setup. cleanupall() in turn fails trying to remove modules while still in use. This happens on a solo setup when local_node returns true and variable CLIENTONLY is set to true. Further cleanupall() internally calls stopall() which checks CLIENTONLY and returns midway if true without further cleanup of mgs, mds and ost. This causes cleanupall() to fail at a later stage trying to remove loaded modules.

          stopall() {
          ...
           [ "$CLIENTONLY" ] && return
          

          History of change shows that this regression was introduced as a result of a debug patch http://review.whamcloud.com/12469 (LU-4181 tests: cleanup lustre before starting lnet-selftest.sh)
          As the discussions on LU-4181 point out that the changes were for debug purpose and removing the modules was not a necessity i propose revoking this change to resolve the bug.

          abrarahmed Abrar-ahmed (Inactive) added a comment - - edited lnet-selftest.sh script is fails while trying to execute cleanupall() during test setup. cleanupall() in turn fails trying to remove modules while still in use. This happens on a solo setup when local_node returns true and variable CLIENTONLY is set to true. Further cleanupall() internally calls stopall() which checks CLIENTONLY and returns midway if true without further cleanup of mgs, mds and ost. This causes cleanupall() to fail at a later stage trying to remove loaded modules. stopall() { ... [ "$CLIENTONLY" ] && return History of change shows that this regression was introduced as a result of a debug patch http://review.whamcloud.com/12469 ( LU-4181 tests: cleanup lustre before starting lnet-selftest.sh) As the discussions on LU-4181 point out that the changes were for debug purpose and removing the modules was not a necessity i propose revoking this change to resolve the bug.

          People

            wc-triage WC Triage
            abrarahmed Abrar-ahmed (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: