Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9122

replay-ost-single test_5 test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.0, Lustre 2.11.0
    • None
    • onyx-30vm1-3/7/8, Full Group test,
      master branch, v2.9.52, b3520,
      DNE, ZFS
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sets/39bcbd8a-efe9-11e6-8c0d-5254006e85c2

      Noticed in the client 1 dmesg log that the writemany task is failing:

      [ 3197.720173] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-small test_50: @@@@@@ IGNORE \(bz13652\): writemany returned rc 108 
      [ 3198.045985] Lustre: DEBUG MARKER: recovery-small test_50: @@@@@@ IGNORE (bz13652): writemany returned rc 108
      

      and

      [ 3360.095050] INFO: task writemany:29170 blocked for more than 120 seconds.
      [ 3360.097638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 3360.100227] writemany       D ffff88007b13bbc0     0 29170  29166 0x00000080
      [ 3360.102784]  ffff880057be3df0 0000000000000082 ffff880051bc1f60 ffff880057be3fd8
      [ 3360.105352]  ffff880057be3fd8 ffff880057be3fd8 ffff880051bc1f60 ffff88007b13bbb8
      [ 3360.107917]  ffff88007b13bbbc ffff880051bc1f60 00000000ffffffff ffff88007b13bbc0
      [ 3360.110428] Call Trace:
      [ 3360.112480]  [<ffffffff8168c989>] schedule_preempt_disabled+0x29/0x70
      [ 3360.115023]  [<ffffffff8168a5e5>] __mutex_lock_slowpath+0xc5/0x1c0
      [ 3360.117307]  [<ffffffff81689a4f>] mutex_lock+0x1f/0x2f
      [ 3360.119528]  [<ffffffff8120f40b>] do_unlinkat+0x13b/0x2b0
      [ 3360.121668]  [<ffffffff8120031e>] ? ____fput+0xe/0x10
      [ 3360.123853]  [<ffffffff810acdec>] ? task_work_run+0xac/0xe0
      [ 3360.125937]  [<ffffffff8102ab22>] ? do_notify_resume+0x92/0xb0
      [ 3360.128057]  [<ffffffff81210486>] SyS_unlink+0x16/0x20
      [ 3360.130052]  [<ffffffff816967c9>] system_call_fastpath+0x16/0x1b
      [ 3374.831443] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_51: failover in 25 sec
      [ 3375.133682] Lustre: DEBUG MARKER: test_51: failover in 25 sec
      [ 3423.727627] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_51: failover in 30 sec
      [ 3424.041587] Lustre: DEBUG MARKER: test_51: failover in 30 sec
      [ 3477.622211] LustreError: 11-0: MGC10.2.4.99@tcp: operation obd_ping to node 10.2.4.99@tcp failed: rc = -107
      [ 3477.624949] LustreError: Skipped 9 previous similar messages
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jcasper James Casper
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: