Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7776

lustre-single lnet-selftest test failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.8.0, Lustre 2.9.0
    • None
    • Solo setup
    • 3
    • 9223372036854775807

    Description

      lnet-selftest test fails in test setup

      stdout.log
        1 UP mgs MGS MGS 5
        2 UP mgc MGC192.168.108.18@tcp c0ab2420-8f51-ad18-f779-591cad596879 5
        3 UP mds MDS MDS_uuid 3
        4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
        5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 11
        6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 4
        7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 4
        8 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 5
        9 UP osd-ldiskfs lustre-OST0000-osd lustre-OST0000-osd_UUID 5
       10 UP ost OSS OSS_uuid 3
       11 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 7
       12 UP lwp lustre-MDT0000-lwp-OST0000 lustre-MDT0000-lwp-OST0000_UUID 5
       13 UP osd-ldiskfs lustre-OST0001-osd lustre-OST0001-osd_UUID 5
       14 UP obdfilter lustre-OST0001 lustre-OST0001_UUID 7
       15 UP lwp lustre-MDT0000-lwp-OST0001 lustre-MDT0000-lwp-OST0001_UUID 5
       21 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
       22 UP osp lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
      Modules still loaded: 
      lustre/osp/osp.o lustre/lod/lod.o lustre/ost/ost.o lustre/mdt/mdt.o lustre/mdd/mdd.o lustre/mgs/mgs.o ldiskfs/ldiskfs.o lustre/quota/lquota.o lustre/lfsck/lfsck.o lustre/mgc/mgc.o lustre/fid/fid.o lustre/fld/fld.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o
      
      

      Attachments

        Activity

          [LU-7776] lustre-single lnet-selftest test failed
          jgmitter Joseph Gmitter (Inactive) made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

          patch has landed to master for 2.9.0

          jgmitter Joseph Gmitter (Inactive) added a comment - patch has landed to master for 2.9.0

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19308/
          Subject: LU-7776 tests: lnet-selftest.sh local_mode failure
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 84030bf26c1763edf9ac17a8cd2765e9163294bf

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19308/ Subject: LU-7776 tests: lnet-selftest.sh local_mode failure Project: fs/lustre-release Branch: master Current Patch Set: Commit: 84030bf26c1763edf9ac17a8cd2765e9163294bf

          @Andreas Dilge: Hi Andreas, have uploaded the discussed patch and test run results were fine. Can you and others kindly review the patch.

          abrarahmed Abrar-ahmed (Inactive) added a comment - @Andreas Dilge: Hi Andreas, have uploaded the discussed patch and test run results were fine. Can you and others kindly review the patch.

          Abrarahmed Momin (kais_abrar@yahoo.co.in) uploaded a new patch: http://review.whamcloud.com/19308
          Subject: LU-7776 tests: lnet-selftest.sh local_mode failure
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 4efad14f8c85d5346aed97a047d9e5681c1792e5

          gerrit Gerrit Updater added a comment - Abrarahmed Momin (kais_abrar@yahoo.co.in) uploaded a new patch: http://review.whamcloud.com/19308 Subject: LU-7776 tests: lnet-selftest.sh local_mode failure Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4efad14f8c85d5346aed97a047d9e5681c1792e5

          Looks reasonable, and if this patch works for you then you can submit it and it can be tested.

          adilger Andreas Dilger added a comment - Looks reasonable, and if this patch works for you then you can submit it and it can be tested.

          @Andreas Dilger

          Alternate solution to keep the debug patch functionality would be to avoid calling cleanupall on local_mode setups. Something like below

          -	local_mode && CLIENTONLY=yes
          +	if local_mode; then
          +		CLIENTONLY=yes
          +		stopall
          +	else
          +		LOAD_MODULES_REMOTE=true
          +		cleanupall
          +	fi
          

          Let me know which solution works for you or if you want to suggest alternatives. I can upload a patch for the same.

          abrarahmed Abrar-ahmed (Inactive) added a comment - @Andreas Dilger Alternate solution to keep the debug patch functionality would be to avoid calling cleanupall on local_mode setups. Something like below - local_mode && CLIENTONLY=yes + if local_mode; then + CLIENTONLY=yes + stopall + else + LOAD_MODULES_REMOTE= true + cleanupall + fi Let me know which solution works for you or if you want to suggest alternatives. I can upload a patch for the same.

          @Andreas Dilger

          Here is my understanding of the debug patch submitted via commit <a8ba5c645f91faf86a84c99dd2cc049bc54e12b1>
          Debug patch replaced stopall with cleanupall. cleanupall in addition to unmounting clients and stopping servers also unloads modules which i believe was the intended purpose of the debug patch. Please correct my understanding if wrong.
          Quoting the relevant section of the debug patch change below

          -    local_mode && CLIENTONLY=yes
          -    stopall
          -    RESTORE_MOUNT=yes
          +	local_mode && CLIENTONLY=yes
          +	RESTORE_MOUNT=yes
          +	LOAD_MODULES_REMOTE=true
          +	cleanupall
          

          So changing cleanupall to stopall would be functionally reverting the debug patch. Would this not cause your test setup to fail again?.

          abrarahmed Abrar-ahmed (Inactive) added a comment - @Andreas Dilger Here is my understanding of the debug patch submitted via commit <a8ba5c645f91faf86a84c99dd2cc049bc54e12b1> Debug patch replaced stopall with cleanupall. cleanupall in addition to unmounting clients and stopping servers also unloads modules which i believe was the intended purpose of the debug patch. Please correct my understanding if wrong. Quoting the relevant section of the debug patch change below - local_mode && CLIENTONLY=yes - stopall - RESTORE_MOUNT=yes + local_mode && CLIENTONLY=yes + RESTORE_MOUNT=yes + LOAD_MODULES_REMOTE= true + cleanupall So changing cleanupall to stopall would be functionally reverting the debug patch. Would this not cause your test setup to fail again?.

          I don't think that reverting the patch is a good idea, since I believe this will cause lnet-selftest to begin failing again in our test configuration.

          Instead, I think it should be enough to change the "cleanupall" to "stopall" so that it doesn't try to unload the modules, which isn't necessary. The goal of the LU-4181 patch was to stop the clients so that they would not interfere with the testing, or become disconnected when lnet-selftest was saturating the network.

          adilger Andreas Dilger added a comment - I don't think that reverting the patch is a good idea, since I believe this will cause lnet-selftest to begin failing again in our test configuration. Instead, I think it should be enough to change the "cleanupall" to "stopall" so that it doesn't try to unload the modules, which isn't necessary. The goal of the LU-4181 patch was to stop the clients so that they would not interfere with the testing, or become disconnected when lnet-selftest was saturating the network.
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.9.0 [ 11891 ]

          People

            wc-triage WC Triage
            abrarahmed Abrar-ahmed (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: