Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9608

recovery-small test_57: test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.10.0, Lustre 2.11.0
    • None
    • onyx, failover
        clients: EL7, master branch, v2.9.58, b3591
        servers: EL7, ldiskfs, master branch, v2.9.58, b3591
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/31124cb2-cc25-46ca-89d1-f0f57d60054c

      Looked through all logs of the TIMEOUT subtest (57) and the PASS subtest that preceeded it (56). The only
      indication of an issue I could find was in the client stack_trace log (3 instances of "tee D", and 1 of "lsof D"):

      21:47:04:[188400.664338] tee             D ffffffff8168a1e0     0  8273  10981 0x00000080
      21:47:04:[188400.666145]  ffff880012c6f9b0 0000000000000086 ffff88000303ce70 ffff880012c6ffd8
      21:47:04:[188400.668007]  ffff880012c6ffd8 ffff880012c6ffd8 ffff88000303ce70 ffff88007fc16c40
      21:47:04:[188400.669475]  0000000000000000 7fffffffffffffff ffff88007ff607a0 ffffffff8168a1e0
      21:47:04:[188400.671536] Call Trace:
      21:47:04:[188400.672816]  [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50
      21:47:04:[188400.674403]  [<ffffffff8168c169>] schedule+0x29/0x70
      21:47:04:[188400.675681]  [<ffffffff81689bc9>] schedule_timeout+0x239/0x2c0
      21:47:04:[188400.677296]  [<ffffffff811dbad9>] ? ___slab_alloc+0x209/0x4f0
      21:47:04:[188400.678583]  [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30
      21:47:04:[188400.680132]  [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50
      21:47:04:[188400.681545]  [<ffffffff8168b70e>] io_schedule_timeout+0xae/0x130
      21:47:04:[188400.683008]  [<ffffffff8168b7a8>] io_schedule+0x18/0x20
      21:47:04:[188400.684242]  [<ffffffff8168a1f1>] bit_wait_io+0x11/0x50
      21:47:04:[188400.685688]  [<ffffffff81689d15>] __wait_on_bit+0x65/0x90
      21:47:04:[188400.687050]  [<ffffffff8168a1e0>] ? bit_wait+0x50/0x50
      21:47:04:[188400.688347]  [<ffffffff81689dc1>] out_of_line_wait_on_bit+0x81/0xb0
      21:47:04:[188400.689715]  [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40
      21:47:04:[188400.691008]  [<ffffffffa05af9f3>] nfs_wait_on_request+0x33/0x40 [nfs]
      21:47:04:[188400.692582]  [<ffffffffa05b4931>] nfs_updatepage+0x151/0x8d0 [nfs]
      21:47:04:[188400.694007]  [<ffffffffa05a4131>] nfs_write_end+0x121/0x350 [nfs]
      21:47:04:[188400.695366]  [<ffffffff81181b99>] generic_file_buffered_write+0x189/0x2a0
      21:47:04:[188400.696806]  [<ffffffff810c54d2>] ? default_wake_function+0x12/0x20
      21:47:04:[188400.698188]  [<ffffffff810c54d2>] ? default_wake_function+0x12/0x20
      21:47:04:[188400.699529]  [<ffffffff81183112>] __generic_file_aio_write+0x1e2/0x400
      21:47:04:[188400.701007]  [<ffffffff81183389>] generic_file_aio_write+0x59/0xa0
      21:47:04:[188400.702458]  [<ffffffffa05a311b>] nfs_file_write+0xbb/0x1e0 [nfs]
      21:47:04:[188400.703874]  [<ffffffff811fdf3d>] do_sync_write+0x8d/0xd0
      21:47:04:[188400.705007]  [<ffffffff811fe7ad>] vfs_write+0xbd/0x1e0
      21:47:04:[188400.706507]  [<ffffffff811fe677>] ? vfs_read+0xf7/0x170
      21:47:04:[188400.707858]  [<ffffffff811ff2cf>] SyS_write+0x7f/0xe0
      21:47:04:[188400.709098]  [<ffffffff81697189>] system_call_fastpath+0x16/0x1b
      

      and

      21:47:04:[188400.710582] INFO: task lsof:32228 blocked for more than 120 seconds.
      21:47:04:[188400.712009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      21:47:04:[188400.713397] lsof            D ffff88001a713bc8     0 32228  32226 0x00000080
      21:47:04:[188400.715028]  ffff88004a9bbd90 0000000000000086 ffff880006e0de20 ffff88004a9bbfd8
      21:47:04:[188400.716564]  ffff88004a9bbfd8 ffff88004a9bbfd8 ffff880006e0de20 ffff88001a713bc0
      21:47:04:[188400.718334]  ffff88001a713bc4 ffff880006e0de20 00000000ffffffff ffff88001a713bc8
      21:47:04:[188400.719925] Call Trace:
      21:47:04:[188400.721209]  [<ffffffff8168d259>] schedule_preempt_disabled+0x29/0x70
      21:47:04:[188400.722865]  [<ffffffff8168aeb5>] __mutex_lock_slowpath+0xc5/0x1c0
      21:47:04:[188400.724339]  [<ffffffff8168a31f>] mutex_lock+0x1f/0x2f
      21:47:04:[188400.725730]  [<ffffffffa05a779b>] nfs_getattr+0xfb/0x230 [nfs]
      21:47:04:[188400.727107]  [<ffffffff81203726>] vfs_getattr+0x46/0x80
      21:47:04:[188400.728533]  [<ffffffff81203855>] vfs_fstatat+0x75/0xc0
      21:47:04:[188400.729630]  [<ffffffff81203dae>] SYSC_newstat+0x2e/0x60
      21:47:04:[188400.731011]  [<ffffffff811fe677>] ? vfs_read+0xf7/0x170
      21:47:04:[188400.732374]  [<ffffffff8111f426>] ? __audit_syscall_exit+0x1e6/0x280
      21:47:04:[188400.733819]  [<ffffffff8120408e>] SyS_newstat+0xe/0x10
      21:47:04:[188400.735144]  [<ffffffff81697189>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jcasper James Casper
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: