Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9701

replay-single test_53c: test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0
    • None
    • trevis, failover
      servers: EL7, zfs, master branch, v2.9.59_15_g107b2cb, b3603
      clients: EL7, master branch, v2.9.59_15_g107b2cb, b3603
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/07818c64-6912-4446-814a-c3cdec28854c

      Could not find another ticket with a replay-single timeout and a hung umount on a client. This config also has hung kworker processes on several VMs, but this client umount issue might be a more likely root cause.

      From Client 3 dmesg:

      if [ $running -ne 0 ] ; then
      echo Stopping client $(hostname) /mnt/lustre2 opts:;
      lsof /mnt/lustre2 || need_kill=no;
      if [ x != x -a x$need_kill != xno ]; then
          pids=$(lsof -t /mnt/lustre2 | sort -u);
         
      [11520.078055] INFO: task umount:1234 blocked for more than 120 seconds.
      [11520.079506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [    2.225724] intel_powerclamp: No package C-state available
      [11520.081059] umount          D
      [11520.082569]  ffff880069527dc8     0  1234   1227 0x00000080
      [11520.083815]  ffff8800654fbae0 0000000000000086 ffff88007a732f10 ffff8800654fbfd8
      [11520.085531]  ffff8800654fbfd8 ffff8800654fbfd8 ffff88007a732f10 ffff880069527dc0
      [11520.087219]  ffff880069527dc4 ffff88007a732f10 00000000ffffffff ffff880069527dc8
      [11520.088795] Call Trace:
      [11520.090178]  [<ffffffff8168d6c9>] schedule_preempt_disabled+0x29/0x70
      [11520.091653]  [<ffffffff8168b315>] __mutex_lock_slowpath+0xc5/0x1d0
      [11520.093349]  [<ffffffff8168a76f>] mutex_lock+0x1f/0x2f
      [11520.094874]  [<ffffffffa06bb101>] mgc_process_config+0x201/0x13e0 [mgc]
      [11520.096613]  [<ffffffffa07a1615>] obd_process_config.constprop.13+0x85/0x2d0 [obdclass]
      [11520.098405]  [<ffffffffa0658b37>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [11520.100064]  [<ffffffffa078e319>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [11520.101783]  [<ffffffffa07a293f>] lustre_end_log+0x1ff/0x550 [obdclass]
      [11520.103515]  [<ffffffffa0b9968d>] ll_put_super+0x8d/0xaa0 [lustre]
      [11520.105178]  [<ffffffff81243207>] ? fsnotify_clear_marks_by_inode+0xa7/0x140
      [11520.106902]  [<ffffffff81138fbd>] ? call_rcu_sched+0x1d/0x20
      [11520.108563]  [<ffffffffa0bc40ec>] ? ll_destroy_inode+0x1c/0x20 [lustre]
      [11520.110314]  [<ffffffff8121a718>] ? destroy_inode+0x38/0x60
      [11520.111942]  [<ffffffff8121a846>] ? evict+0x106/0x170
      [11520.113553]  [<ffffffff8121a8ee>] ? dispose_list+0x3e/0x50
      [11520.115235]  [<ffffffff8121b544>] ? evict_inodes+0x114/0x140
      [11520.116820]  [<ffffffff81200da2>] generic_shutdown_super+0x72/0xf0
      [11520.118407]  [<ffffffff81201172>] kill_anon_super+0x12/0x20
      [11520.119958]  [<ffffffffa07a0cb5>] lustre_kill_super+0x45/0x50 [obdclass]
      [11520.121606]  [<ffffffff81201529>] deactivate_locked_super+0x49/0x60
      [11520.123217]  [<ffffffff81201b26>] deactivate_super+0x46/0x60
      [11520.124774]  [<ffffffff8121ef65>] mntput_no_expire+0xc5/0x120
      [11520.126317]  [<ffffffff812200a0>] SyS_umount+0xa0/0x3b0
      [11520.127822]  [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jcasper James Casper
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: