Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9511

parallel-scale-stress-hw_parallel_grouplock test stuck on subtest 12, timeout 2hours, normally takes < 400sec

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      stdout.log
      17:34:15: All tests passed!
      parallel_grouplock subtests -t 10 PASS
      /usr/lib64/lustre/tests/parallel_grouplock -g -v -d /mnt/fs1/d0.parallel_grouplock -t 11
      chmod 0777 /mnt/fs1
      drwxrwxrwx 8 root root 270336 Aug 12 17:31 /mnt/fs1
      su mpiuser sh -c "/usr/bin/mpirun -np 32 /usr/lib64/lustre/tests/parallel_grouplock -g -v -d /mnt/fs1/d0.parallel_grouplock -t 11 "
      /usr/lib64/lustre/tests/parallel_grouplock is running with 32 task(es) in DEBUG mode
      17:34:16: Running test #/usr/lib64/lustre/tests/parallel_grouplock(iter 0)
      17:34:16: Beginning subtest 11
      17:34:56: Finished subtest 11 (39.317 sec)
      17:34:56: All tests passed!
      parallel_grouplock subtests -t 11 PASS
      /usr/lib64/lustre/tests/parallel_grouplock -g -v -d /mnt/fs1/d0.parallel_grouplock -t 12
      su mpiuser sh -c "/usr/bin/mpirun -np 32 /usr/lib64/lustre/tests/parallel_grouplock -g -v -d /mnt/fs1/d0.parallel_grouplock -t 12 "
      17:34:57: Running test #/usr/lib64/lustre/tests/parallel_grouplock(iter 0)
      17:34:57: Beginning subtest 12

      stderr.log

      Attachments

        Issue Links

          Activity

            [LU-9511] parallel-scale-stress-hw_parallel_grouplock test stuck on subtest 12, timeout 2hours, normally takes < 400sec
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27127/
            Subject: LU-9511 utils: fix parallel_grouplock test timeout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4a33e1edd48c5db0e0c54c1226787c6575301bee

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27127/ Subject: LU-9511 utils: fix parallel_grouplock test timeout Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4a33e1edd48c5db0e0c54c1226787c6575301bee

            Both this patch and the change from LU-9425 are changing the userspace code to avoid the problem, but that means it is still possible for real applications using group lock to become deadlocked. That shouldn't be allowed.

            adilger Andreas Dilger added a comment - Both this patch and the change from LU-9425 are changing the userspace code to avoid the problem, but that means it is still possible for real applications using group lock to become deadlocked. That shouldn't be allowed.

            It looks like this is the same as LU-9429.

            adilger Andreas Dilger added a comment - It looks like this is the same as LU-9429 .

            jadhav.vikram (jadhav.vikram@seagate.com) uploaded a new patch: https://review.whamcloud.com/27127
            Subject: LU-9511 utils: fix parallel_grouplock test timeout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 80c6a9089cdccdd6e3096d5bfc7cdd717cf122ea

            gerrit Gerrit Updater added a comment - jadhav.vikram (jadhav.vikram@seagate.com) uploaded a new patch: https://review.whamcloud.com/27127 Subject: LU-9511 utils: fix parallel_grouplock test timeout Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 80c6a9089cdccdd6e3096d5bfc7cdd717cf122ea

            People

              bobijam Zhenyu Xu
              jadhav.vikram VIKRAM BABASO JADHAV (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: