Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1505

"disconnect stale client" hangs OSS

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.1.0, Lustre 2.1.1
    • None
    • 4046

    Description

      After recover "LustreError: 5866:0:(genops.c:1272:class_disconnect_stale_exports()) nbp6-OST004a: disconnect stale client 3e1a1726-5e53-1be8-2c0f-22adae5b4093@<unknown>" messages will hang system while it prints this for every ost and every client.

      We also see
      INFO: task ll_log_process:6350 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      ll_log_proces D 0000000000000006 0 6350 2 0x00000080
      ffff8808f59b1c10 0000000000000046 ffff8808f59b1ba0 ffffffff81060839
      ffff8808f59b1b80 0000000000000282 0000000000800500 0000000000000000
      ffff880c13fc5b38 ffff8808f59b1fd8 000000000000f5a0 ffff880c13fc5b38
      Call Trace:
      [<ffffffff81060839>] ? wake_up_new_task+0xd9/0x130
      [<ffffffff81522215>] schedule_timeout+0x215/0x2e0
      [<ffffffffa05ecd10>] ? llog_process_thread+0x0/0xe70 [obdclass]
      [<ffffffff8100c0e2>] ? kernel_thread+0x82/0xe0
      [<ffffffffa05ecd10>] ? llog_process_thread+0x0/0xe70 [obdclass]
      [<ffffffff81521e93>] wait_for_common+0x123/0x180
      [<ffffffff8105fff0>] ? default_wake_function+0x0/0x20
      [<ffffffffa0538c9a>] ? cfs_create_thread+0x7a/0xa0 [libcfs]
      [<ffffffffa05f1280>] ? llog_cat_process_cb+0x0/0x400 [obdclass]
      [<ffffffff81521fad>] wait_for_completion+0x1d/0x20
      [<ffffffffa05ebb33>] llog_process_flags+0xf3/0x660 [obdclass]
      [<ffffffffa0742b57>] ? llog_client_read_header+0x187/0x640 [ptlrpc]
      [<ffffffffa05eeef8>] llog_cat_process_flags+0x188/0x2d0 [obdclass]
      [<ffffffffa05edfef>] ? llog_init_handle+0x17f/0xa70 [obdclass]
      [<ffffffffa0aab6f0>] ? filter_recov_log_mds_ost_cb+0x0/0xb70 [obdfilter]
      [<ffffffffa0aab6f0>] ? filter_recov_log_mds_ost_cb+0x0/0xb70 [obdfilter]
      [<ffffffffa05ef056>] llog_cat_process+0x16/0x20 [obdclass]
      [<ffffffffa05f02a1>] llog_cat_process_thread+0x121/0x850 [obdclass]
      [<ffffffffa05f0180>] ? llog_cat_process_thread+0x0/0x850 [obdclass]
      [<ffffffff8100c14a>] child_rip+0xa/0x20
      [<ffffffffa05f0180>] ? llog_cat_process_thread+0x0/0x850 [obdclass]
      [<ffffffffa05f0180>] ? llog_cat_process_thread+0x0/0x850 [obdclass]
      [<ffffffff8100c140>] ? child_rip+0x0/0x20
      INFO: task ll_log_process:6354 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      ll_log_proces D 0000000000000006 0 6354 2 0x00000080
      ffff8808f5ac1c10 0000000000000046 ffff8808f5ac1bc0 ffffffff810097dc
      ffff880bc2006b38 0000000000000000 0000000000ac1bd0 ffff880028393b40
      ffff880bc2a8db38 ffff8808f5ac1fd8 000000000000f5a0 ffff880bc2a8db38
      Call Trace:
      [<ffffffff810097dc>] ? __switch_to+0x1ac/0x320
      [<ffffffff815213ae>] ? thread_return+0x4e/0x760
      [<ffffffff81522215>] schedule_timeout+0x215/0x2e0
      [<ffffffffa05ecd10>] ? llog_process_thread+0x0/0xe70 [obdclass]
      [<ffffffff81521e93>] wait_for_common+0x123/0x180
      [<ffffffff8105fff0>] ? default_wake_function+0x0/0x20
      [<ffffffffa0538c9a>] ? cfs_create_thread+0x7a/0xa0 [libcfs]
      [<ffffffffa05f1280>] ? llog_cat_process_cb+0x0/0x400 [obdclass]
      [<ffffffff81521fad>] wait_for_completion+0x1d/0x20
      [<ffffffffa05ebb33>] llog_process_flags+0xf3/0x660 [obdclass]
      [<ffffffffa0742b57>] ? llog_client_read_header+0x187/0x640 [ptlrpc]
      [<ffffffffa05eeef8>] llog_cat_process_flags+0x188/0x2d0 [obdclass]
      [<ffffffffa05edfef>] ? llog_init_handle+0x17f/0xa70 [obdclass]
      [<ffffffffa0aab6f0>] ? filter_recov_log_mds_ost_cb+0x0/0xb70 [obdfilter]
      [<ffffffffa0aab6f0>] ? filter_recov_log_mds_ost_cb+0x0/0xb70 [obdfilter]
      [<ffffffffa05ef056>] llog_cat_process+0x16/0x20 [obdclass]
      [<ffffffffa05f02a1>] llog_cat_process_thread+0x121/0x850 [obdclass]
      [<ffffffffa05f0180>] ? llog_cat_process_thread+0x0/0x850 [obdclass]
      [<ffffffff8100c14a>] child_rip+0xa/0x20
      [<ffffffffa05f0180>] ? llog_cat_process_thread+0x0/0x850 [obdclass]
      [<ffffffffa05f0180>] ? llog_cat_process_thread+0x0/0x850 [obdclass]
      [<ffffffff8100c140>] ? child_rip+0x0/0x20

      Attachments

        Activity

          People

            pjones Peter Jones
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: