Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6064

obdfilter-survey test_1c: test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.5.4
    • 3
    • 16883

    Description

      maloo: https://testing.hpdd.intel.com/test_sets/468b0a6e-86c6-11e4-9946-5254006e85c2

      While investigating the log, I see lots of journal threads and lctl thread is waiting in sync_buffer:

      Dec 18 14:50:09 shadow-45vm4 kernel: jbd2/dm-2-8   D 0000000000000001     0  5635      2 0x00000080
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007ab59c20 0000000000000046 ffff8800ffffffff 00001b90f303b9b0
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007ab59b90 ffff88007d104ad0 000000000015a824 ffffffffab6d3bf2
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007d7f5058 ffff88007ab59fd8 000000000000fbc8 ffff88007d7f5058
      Dec 18 14:50:09 shadow-45vm4 kernel: Call Trace:
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff810a6d31>] ? ktime_get_ts+0xb1/0xf0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffc0>] ? sync_buffer+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff81529ea3>] io_schedule+0x73/0xc0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811c0000>] sync_buffer+0x40/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8152a96f>] __wait_on_bit+0x5f/0x90
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffc0>] ? sync_buffer+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8152aa18>] out_of_line_wait_on_bit+0x78/0x90
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109b020>] ? wake_bit_function+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffb6>] __wait_on_buffer+0x26/0x30
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e0206>] jbd2_journal_commit_transaction+0xa86/0x15a0 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff81084a1b>] ? try_to_del_timer_sync+0x7b/0xe0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e5c18>] kjournald2+0xb8/0x220 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e5b60>] ? kjournald2+0x0/0x220 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      
      Dec 18 14:50:09 shadow-45vm4 kernel: jbd2/dm-2-8   D 0000000000000001     0  5635      2 0x00000080
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007ab59c20 0000000000000046 ffff8800ffffffff 00001b90f303b9b0
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007ab59b90 ffff88007d104ad0 000000000015a824 ffffffffab6d3bf2
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007d7f5058 ffff88007ab59fd8 000000000000fbc8 ffff88007d7f5058
      Dec 18 14:50:09 shadow-45vm4 kernel: Call Trace:
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff810a6d31>] ? ktime_get_ts+0xb1/0xf0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffc0>] ? sync_buffer+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff81529ea3>] io_schedule+0x73/0xc0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811c0000>] sync_buffer+0x40/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8152a96f>] __wait_on_bit+0x5f/0x90
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffc0>] ? sync_buffer+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8152aa18>] out_of_line_wait_on_bit+0x78/0x90
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109b020>] ? wake_bit_function+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffb6>] __wait_on_buffer+0x26/0x30
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e0206>] jbd2_journal_commit_transaction+0xa86/0x15a0 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff81084a1b>] ? try_to_del_timer_sync+0x7b/0xe0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e5c18>] kjournald2+0xb8/0x220 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e5b60>] ? kjournald2+0x0/0x220 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      I think that could probabaly result some other lctl threads waiting in do_get_write_access():

      Dec 18 14:50:09 shadow-45vm4 kernel: jbd2/dm-2-8   D 0000000000000001     0  5635      2 0x00000080
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007ab59c20 0000000000000046 ffff8800ffffffff 00001b90f303b9b0
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007ab59b90 ffff88007d104ad0 000000000015a824 ffffffffab6d3bf2
      Dec 18 14:50:09 shadow-45vm4 kernel: ffff88007d7f5058 ffff88007ab59fd8 000000000000fbc8 ffff88007d7f5058
      Dec 18 14:50:09 shadow-45vm4 kernel: Call Trace:
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff810a6d31>] ? ktime_get_ts+0xb1/0xf0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffc0>] ? sync_buffer+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff81529ea3>] io_schedule+0x73/0xc0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811c0000>] sync_buffer+0x40/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8152a96f>] __wait_on_bit+0x5f/0x90
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffc0>] ? sync_buffer+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8152aa18>] out_of_line_wait_on_bit+0x78/0x90
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109b020>] ? wake_bit_function+0x0/0x50
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff811bffb6>] __wait_on_buffer+0x26/0x30
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e0206>] jbd2_journal_commit_transaction+0xa86/0x15a0 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff81084a1b>] ? try_to_del_timer_sync+0x7b/0xe0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e5c18>] kjournald2+0xb8/0x220 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109afa0>] ? autoremove_wake_function+0x0/0x40
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffffa03e5b60>] ? kjournald2+0x0/0x220 [jbd2]
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109abf6>] kthread+0x96/0xa0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8109ab60>] ? kthread+0x0/0xa0
      Dec 18 14:50:09 shadow-45vm4 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
      

      and some others waiting in ldiskfs_ext_new_extent_cb():

      Dec 18 14:50:11 shadow-45vm4 kernel: lctl          D 0000000000000001     0 13435  13389 0x00000080
      Dec 18 14:50:11 shadow-45vm4 kernel: ffff88000e9116a0 0000000000000086 0000000000000000 ffff88000e911618
      Dec 18 14:50:11 shadow-45vm4 kernel: ffffea0000838e18 ffff88000e9116a8 ffffffff8112bcb3 ffffea0000838e40
      Dec 18 14:50:11 shadow-45vm4 kernel: ffff88006335fab8 ffff88000e911fd8 000000000000fbc8 ffff88006335fab8
      Dec 18 14:50:11 shadow-45vm4 kernel: Call Trace:
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8112bcb3>] ? __rmqueue+0xc3/0x490
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8152bd85>] rwsem_down_failed_common+0x95/0x1d0
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8152bf16>] rwsem_down_read_failed+0x26/0x30
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8128fbd4>] call_rwsem_down_read_failed+0x14/0x30
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0d06480>] ? ldiskfs_ext_new_extent_cb+0x0/0x67c [fsfilt_ldiskfs]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8152b414>] ? down_read+0x24/0x30
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0405bd3>] ldiskfs_ext_walk_space+0x73/0x310 [ldiskfs]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0d06480>] ? ldiskfs_ext_new_extent_cb+0x0/0x67c [fsfilt_ldiskfs]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff81299028>] ? __percpu_counter_sum+0x58/0x80
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0d061cc>] fsfilt_map_nblocks+0xcc/0xf0 [fsfilt_ldiskfs]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0d062f0>] fsfilt_ldiskfs_map_ext_inode_pages+0x100/0x200 [fsfilt_ldiskfs]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0d06475>] fsfilt_ldiskfs_map_inode_pages+0x85/0x90 [fsfilt_ldiskfs]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0437b84>] ? ldiskfs_dquot_initialize+0x94/0xd0 [ldiskfs]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0c9d492>] osd_write_commit+0x302/0x620 [osd_ldiskfs]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0d92f64>] ofd_commitrw_write+0x684/0x11b0 [ofd]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0d98e6d>] ? ofd_fmd_find_nolock+0xad/0x100 [ofd]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0d95ccd>] ofd_commitrw+0x5cd/0xbb0 [ofd]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa0241605>] echo_client_iocontrol+0x23e5/0x39d0 [obdecho]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8128d776>] ? vsnprintf+0x336/0x5e0
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8111fc67>] ? unlock_page+0x27/0x30
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff810aec20>] ? get_futex_key+0x180/0x2b0
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8116febc>] ? __kmalloc+0x20c/0x220
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa05b4b44>] class_handle_ioctl+0x12b4/0x1e40 [obdclass]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffffa05a22ab>] obd_class_ioctl+0x4b/0x190 [obdclass]
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8119e992>] vfs_ioctl+0x22/0xa0
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8119eb0c>] ? do_vfs_ioctl+0x5c/0x580
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8119eb34>] do_vfs_ioctl+0x84/0x580
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff810b196b>] ? sys_futex+0x7b/0x170
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8119f0b1>] sys_ioctl+0x81/0xa0
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff810e204e>] ? __audit_syscall_exit+0x25e/0x290
      Dec 18 14:50:11 shadow-45vm4 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Info required for matching: obdfilter-survey test_1c

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: