Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13531

runtests test_1: Timeout occurred after 65 mins, last suite running was runtests

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Chris Horn <hornc@cray.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/266b7a77-975d-473d-963d-f78a22e7209f

      test_1 failed with the following error:

      Timeout occurred after 65 mins, last suite running was runtests
      

      Maloo tagged this failure as LU-13065, but it looks different to me.

      The last lines in suite log show us unmount clients, not OSTs as in LU-13065:

      echo Stopping client \$(hostname) /mnt/lustre opts:;
      lsof /mnt/lustre || need_kill=no;
      if [ x != x -a x\$need_kill != xno ]; then
          pids=\$(lsof -t /mnt/lustre | sort -u);
          if [ -n \"\$pids\" ]; then
                   kill -9 \$pids;
          fi
      fi;
      while umount  /mnt/lustre 2>&1 | grep -q busy; do
          echo /mnt/lustre is still busy, wait one second && sleep 1;
      done;
      fi
      

      console log for client 2 (trevis-53vm2) shows hung mount.nfs tasks:

      [  720.526224] INFO: task mount.nfs:1314 blocked for more than 120 seconds.
      [  720.527710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  720.529043] mount.nfs       D ffff8e4c7bf2a0e0     0  1314   1313 0x00000080
      [  720.530429] Call Trace:
      [  720.530899]  [] schedule+0x29/0x70
      [  720.531762]  [] schedule_timeout+0x221/0x2d0
      [  720.533040]  [] ? __slab_free+0x1b0/0x290
      [  720.534069]  [] ? svc_close_list+0x88/0xa0 [sunrpc]
      [  720.535194]  [] wait_for_completion+0xfd/0x140
      [  720.536316]  [] ? wake_up_state+0x20/0x20
      [  720.537321]  [] kthread_stop+0x4a/0xf0
      [  720.538291]  [] nfs_callback_down+0x53/0xd0 [nfsv4]
      [  720.539452]  [] nfs4_free_client+0x4f/0xc0 [nfsv4]
      [  720.540600]  [] nfs_put_client+0xf2/0x140 [nfs]
      [  720.541706]  [] nfs4_init_client+0x193/0x2f0 [nfsv4]
      [  720.542908]  [] ? kmem_cache_alloc+0x35/0x1f0
      [  720.543984]  [] ? __fscache_acquire_cookie+0x66/0x180 [fscache]
      [  720.545309]  [] ? __fscache_acquire_cookie+0x66/0x180 [fscache]
      [  720.546652]  [] ? __rpc_init_priority_wait_queue+0x81/0xc0 [sunrpc]
      [  720.548038]  [] ? rpc_init_wait_queue+0x13/0x20 [sunrpc]
      [  720.549251]  [] nfs_get_client+0x353/0x470 [nfs]
      [  720.550352]  [] nfs4_set_client+0x9c/0x150 [nfsv4]
      [  720.551492]  [] nfs4_create_server+0x13e/0x3b0 [nfsv4]
      [  720.552722]  [] nfs4_remote_mount+0x2e/0x60 [nfsv4]
      [  720.553887]  [] mount_fs+0x3e/0x1b0
      [  720.554860]  [] ? __alloc_percpu+0x15/0x20
      [  720.555900]  [] vfs_kern_mount+0x67/0x110
      [  720.556909]  [] nfs_do_root_mount+0x86/0xc0 [nfsv4]
      [  720.558073]  [] nfs4_try_mount+0x44/0xc0 [nfsv4]
      [  720.559168]  [] ? get_nfs_version+0x27/0x90 [nfs]
      [  720.560301]  [] nfs_fs_mount+0x4cb/0xdc0 [nfs]
      [  720.561402]  [] ? nfs_clone_super+0x140/0x140 [nfs]
      [  720.562568]  [] ? param_set_portnr+0x70/0x70 [nfs]
      [  720.563682]  [] mount_fs+0x3e/0x1b0
      [  720.564627]  [] ? __alloc_percpu+0x15/0x20
      [  720.565694]  [] vfs_kern_mount+0x67/0x110
      [  720.566695]  [] do_mount+0x1ef/0xce0
      [  720.567627]  [] ? copy_mount_options+0xc0/0x170
      [  720.568725]  [] SyS_mount+0x83/0xd0
      [  720.569643]  [] system_call_fastpath+0x25/0x2a
      [  720.570704]  [] ? system_call_after_swapgs+0xae/0x146
      

      At a quick glance, I don't see anything wrong with Lustre

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      runtests test_1 - Timeout occurred after 65 mins, last suite running was runtests

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: