Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11915

conf-sanity test 115 is skipped or hangs

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.13.0
    • Fix Version/s: Lustre 2.14.0
    • Labels:
    • Environment:
      DNE
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      conf-sanity test_115 is only run for ldiskfs MDS file systems and is skipped for ZFS. Looking back for the past couple of weeks, this test is either skipped and in the past few days it has started to hang.

      For some reason, the test is skipped when the formatting of MDS1 fails

      8316         local mds_opts="$(mkfs_opts mds1 ${mdsdev}) --device-size=$IMAGESIZE   \
      8317                 --mkfsoptions='-O lazy_itable_init,ea_inode,^resize_inode,meta_bg \
      8318                 -i 1024'"
      8319         add mds1 $mds_opts --mgs --reformat $mdsdev ||
      8320                 { skip_env "format large MDT failed"; return 0; }
      

      Shouldn’t this be an error?

      Starting on January 30, 2019, conf-sanity test 115 started hanging only for review-dne-part-3 test sessions. Looking at the logs from a recent hang, https://testing.whamcloud.com/test_sets/d49db868-2610-11e9-8486-52540065bddc , the last thing seen in the client test_log is

      == conf-sanity test 115: Access large xattr with inodes number over 2TB ============================== 09:51:24 (1549014684)
      Stopping clients: onyx-34vm6.onyx.whamcloud.com,onyx-34vm7 /mnt/lustre (opts:)
      CMD: onyx-34vm6.onyx.whamcloud.com,onyx-34vm7 running=\$(grep -c /mnt/lustre' ' /proc/mounts);
      if [ \$running -ne 0 ] ; then
      echo Stopping client \$(hostname) /mnt/lustre opts:;
      lsof /mnt/lustre || need_kill=no;
      if [ x != x -a x\$need_kill != xno ]; then
          pids=\$(lsof -t /mnt/lustre | sort -u);
          if [ -n \"\$pids\" ]; then
                   kill -9 \$pids;
          fi
      fi;
      while umount  /mnt/lustre 2>&1 | grep -q busy; do
          echo /mnt/lustre is still busy, wait one second && sleep 1;
      done;
      fi
      

      The console logs don’t have much information about this test in them; no errors, LBUGS, etc.

      There were two new tests added to conf-sanity that run right before test 115; conf-santiy tests 110 and 111 added by https://review.whamcloud.com/22009 . Maybe there is some residual effect from these tests running in a DNE environment.

      Logs for other hangs are at
      https://testing.whamcloud.com/test_sets/d49db868-2610-11e9-8486-52540065bddc
      https://testing.whamcloud.com/test_sets/8d6ec5d2-25ab-11e9-a318-52540065bddc

      Logs for skipping this test are at
      https://testing.whamcloud.com/test_sets/bb4dbd90-25a7-11e9-b97f-52540065bddc
      https://testing.whamcloud.com/test_sets/1adf3cf6-2590-11e9-b54c-52540065bddc

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                artem_blagodarenko Artem Blagodarenko
                Reporter:
                jamesanunez James Nunez
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: