Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12260

conf-sanity test 57a times out in client unmount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1
    • PPC clients
    • 3
    • 9223372036854775807

    Description

      conf-sanity test_57a times out in client unmount. These time outs are only seen for PPC.

      Looking at the suite_log for a recent failure, https://testing.whamcloud.com/test_sets/b1449c50-668f-11e9-8bb1-52540065bddc, we see an error trying to mount an OST and then the client tries to umount and hangs

      CMD: trevis-55vm11 e2label /dev/mapper/ost1_flakey
      Starting ost1:   /dev/mapper/ost1_flakey /mnt/lustre-ost1
      CMD: trevis-55vm11 mkdir -p /mnt/lustre-ost1; mount -t lustre   /dev/mapper/ost1_flakey /mnt/lustre-ost1
      trevis-55vm11: mount.lustre: mount /dev/mapper/ost1_flakey at /mnt/lustre-ost1 failed: Cannot assign requested address
      Start of /dev/mapper/ost1_flakey on ost1 failed 99
      Stopping clients: trevis-77vm1.trevis.whamcloud.com,trevis-77vm2 /mnt/lustre (opts:-f)
      CMD: trevis-77vm1.trevis.whamcloud.com,trevis-77vm2 running=\$(grep -c /mnt/lustre' ' /proc/mounts);
      if [ \$running -ne 0 ] ; then
      echo Stopping client \$(hostname) /mnt/lustre opts:-f;
      lsof /mnt/lustre || need_kill=no;
      if [ x-f != x -a x\$need_kill != xno ]; then
          pids=\$(lsof -t /mnt/lustre | sort -u);
          if [ -n \"\$pids\" ]; then
                   kill -9 \$pids;
          fi
      fi;
      while umount -f /mnt/lustre 2>&1 | grep -q busy; do
          echo /mnt/lustre is still busy, wait one second && sleep 1;
      done;
      fi
      

      Looking at the OSS console log, we can see errors

      [15207.625829] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1; mount -t lustre   /dev/mapper/ost1_flakey /mnt/lustre-ost1
      [15207.947844] LDISKFS-fs (dm-10): file extents enabled, maximum tree depth=5
      [15207.950074] LDISKFS-fs (dm-10): mounted filesystem with ordered data mode. Opts: errors=remount-ro
      [15208.057801] LDISKFS-fs (dm-10): file extents enabled, maximum tree depth=5
      [15208.059983] LDISKFS-fs (dm-10): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
      [15208.161527] LustreError: 15f-b: lustre-OST2710: cannot register this server with the MGS: rc = -99. Is the MGS running?
      [15208.175587] LustreError: 21832:0:(obd_mount_server.c:1939:server_fill_super()) Unable to start targets: -99
      [15208.177395] LustreError: 21832:0:(obd_mount_server.c:1589:server_put_super()) no obd lustre-OST2710
      [15208.178929] LustreError: 21832:0:(obd_mount_server.c:132:server_deregister_mount()) lustre-OST2710 not registered
      [15208.245160] LustreError: 21832:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-99)
       

      Logs for other failures are at
      https://testing.whamcloud.com/test_sets/19acf4bc-2555-11e9-830a-52540065bddc
      https://testing.whamcloud.com/test_sets/aa4bfab6-b6e4-11e8-8c12-52540065bddc
      https://testing.whamcloud.com/test_sets/392a269c-0102-11e9-b970-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: