Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9877

conf-sanity test_84: test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.0, Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/575246c6-7fd1-11e7-8823-5254006e85c2.

      The sub-test test_84 failed with the following error:

      test failed to respond and timed out
      

      Several instances of stack traces like this seen on console log of Client:

      23:10:43:[11160.267697] INFO: task tee:19197 blocked for more than 120 seconds.
      23:10:43:[11160.268385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      23:10:43:[11160.269165] tee             D ffffffff8168a850     0 19197   1556 0x00000080
      23:10:43:[11160.270216]  ffff8800798ef9b0 0000000000000082 ffff88007a966dd0 ffff8800798effd8
      23:10:43:[11160.271166]  ffff8800798effd8 ffff8800798effd8 ffff88007a966dd0 ffff88007fc16c40
      23:10:43:[11160.272094]  0000000000000000 7fffffffffffffff ffff88007ff5d728 ffffffff8168a850
      23:10:43:[11160.273036] Call Trace:
      23:10:43:[11160.273305]  [<ffffffff8168a850>] ? bit_wait+0x50/0x50
      23:10:43:[11160.273842]  [<ffffffff8168c7f9>] schedule+0x29/0x70
      23:10:43:[11160.274719]  [<ffffffff8168a239>] schedule_timeout+0x239/0x2c0
      23:10:43:[11160.275284]  [<ffffffff810d1f43>] ? find_busiest_group+0x143/0x920
      23:10:43:[11160.275911]  [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30
      23:10:43:[11160.276937]  [<ffffffff8168a850>] ? bit_wait+0x50/0x50
      23:10:43:[11160.277441]  [<ffffffff8168bd9e>] io_schedule_timeout+0xae/0x130
      23:10:43:[11160.278064]  [<ffffffff8168be38>] io_schedule+0x18/0x20
      23:10:43:[11160.278925]  [<ffffffff8168a861>] bit_wait_io+0x11/0x50
      23:10:43:[11160.279590]  [<ffffffff8168a385>] __wait_on_bit+0x65/0x90
      23:10:43:[11160.280146]  [<ffffffff8168a850>] ? bit_wait+0x50/0x50
      23:10:43:[11160.280902]  [<ffffffff8168a431>] out_of_line_wait_on_bit+0x81/0xb0
      23:10:43:[11160.281646]  [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40
      23:10:43:[11160.282264]  [<ffffffffa04e3a53>] nfs_wait_on_request+0x33/0x40 [nfs]
      23:10:43:[11160.283196]  [<ffffffffa04e8991>] nfs_updatepage+0x151/0x8d0 [nfs]
      23:10:43:[11160.283988]  [<ffffffffa04d8171>] nfs_write_end+0x121/0x350 [nfs]
      23:10:43:[11160.284764]  [<ffffffff81181c29>] generic_file_buffered_write+0x189/0x2a0
      23:10:43:[11160.285545]  [<ffffffff810c54f2>] ? default_wake_function+0x12/0x20
      23:10:43:[11160.286181]  [<ffffffff810c54f2>] ? default_wake_function+0x12/0x20
      23:10:43:[11160.287045]  [<ffffffff811831a2>] __generic_file_aio_write+0x1e2/0x400
      23:10:43:[11160.287877]  [<ffffffff81183419>] generic_file_aio_write+0x59/0xa0
      23:10:43:[11160.288620]  [<ffffffffa04d715b>] nfs_file_write+0xbb/0x1e0 [nfs]
      23:10:43:[11160.289237]  [<ffffffff811fe18d>] do_sync_write+0x8d/0xd0
      23:10:43:[11160.290039]  [<ffffffff811fe9fd>] vfs_write+0xbd/0x1e0
      23:10:43:[11160.290708]  [<ffffffff811fe8c7>] ? vfs_read+0xf7/0x170
      23:10:43:[11160.291217]  [<ffffffff811ff51f>] SyS_write+0x7f/0xe0
      23:10:43:[11160.291743]  [<ffffffff81697809>] system_call_fastpath+0x16/0x1b
      

      Suggests the failure is in access to nfs, not lustre.
      If so this may be entirely a DCO problem, not really LU.

      Info required for matching: conf-sanity 84

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: