Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.9.0
    • Soak test cluster, tip of lustre-master lustre: 2.8.59_79_gb8811a0
    • 3
    • 9223372036854775807

    Description

      After previous failover, MDT0003 is failed back to lola-11:
      Nov 8 22:10:14 lola-10 kernel: Lustre: server umount soaked-MDT0003 complete

      MDS failover fault induced, lola-11 fails MDT0003 to lola-10:

      016-11-08 22:24:25,959:fsmgmt.fsmgmt:INFO     Failing over soaked-MDT0003 ...
      2016-11-08 22:24:25,959:fsmgmt.fsmgmt:INFO     Mounting soaked-MDT0003 on lola-10 ...
      

      Recovery starts on MDS: Nov 8 22:25:29 lola-10 kernel: Lustre: soaked-MDT0003: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450

      Task block on lola-10:

      ov  8 22:29:17 lola-10 kernel: INFO: task mdt_out01_007:6305 blocked for more than 120 seconds.
      Nov  8 22:29:17 lola-10 kernel:      Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1
      Nov  8 22:29:17 lola-10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Nov  8 22:29:17 lola-10 kernel: mdt_out01_007 D 0000000000000018     0  6305      2 0x00000080
      Nov  8 22:29:17 lola-10 kernel: ffff8808104b7b40 0000000000000046 0000000000000000 ffff8808104b7d00
      Nov  8 22:29:17 lola-10 kernel: ffff8804337ea800 ffff880824300118 000014206fe1f7c1 ffff880824300118
      Nov  8 22:29:17 lola-10 kernel: ffff8808104b7b40 00000001014d3a26 ffff880834add068 ffff8808104b7fd8
      Nov  8 22:29:17 lola-10 kernel: Call Trace:
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa07bca3d>] lu_object_find_at+0x3d/0xe0 [obdclass]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa09dd442>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa09b2af0>] ? lustre_swab_object_update_reply+0x0/0xc0 [ptlrpc]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa07bd7cc>] dt_locate_at+0x1c/0xa0 [obdclass]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa0a24147>] out_handle+0x1067/0x18d0 [ptlrpc]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffff8105e9b6>] ? enqueue_task+0x66/0x80
      Nov  8 22:29:17 lola-10 kernel: [<ffffffff8105ab8d>] ? check_preempt_curr+0x6d/0x90
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa067ac8a>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffff8153afce>] ? mutex_lock+0x1e/0x50
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa0a142ca>] ? req_can_reconstruct+0x6a/0x120 [ptlrpc]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa0a1b4bc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa09c7bc1>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
      Nov  8 22:29:17 lola-10 kernel: [<ffffffffa09c6e90>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
      Nov  8 22:29:17 lola-10 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
      Nov  8 22:29:17 lola-10 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
      Nov  8 22:29:17 lola-10 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
      Nov  8 22:29:18 lola-10 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      

      Second blocked task:

      Nov  8 22:29:32 lola-10 kernel: Call Trace:
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa07bca3d>] lu_object_find_at+0x3d/0xe0 [obdclass]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa09dd442>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa09b2af0>] ? lustre_swab_object_update_reply+0x0/0xc0 [ptlrpc]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa07bd7cc>] dt_locate_at+0x1c/0xa0 [obdclass]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa0a24147>] out_handle+0x1067/0x18d0 [ptlrpc]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff8105e9b6>] ? enqueue_task+0x66/0x80
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff8105ab8d>] ? check_preempt_curr+0x6d/0x90
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa067ac8a>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff8153afce>] ? mutex_lock+0x1e/0x50
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa0a142ca>] ? req_can_reconstruct+0x6a/0x120 [ptlrpc]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa0a1b4bc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa09c7bc1>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
      Nov  8 22:29:32 lola-10 kernel: [<ffffffffa09c6e90>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
      Nov  8 22:29:32 lola-10 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      

      Recovery timer competes, but recovery does not:

      mdt.soaked-MDT0003.recovery_status=
      status: RECOVERING
      recovery_start: 1478672734
      time_remaining: 0
      connected_clients: 21/21
      req_replay_clients: 4
      lock_repay_clients: 5
      completed_clients: 16
      evicted_clients: 0
      replayed_requests: 7
      queued_requests: 4
      next_transno: 77309522798
      

      System finally notices:

      Nov  9 08:22:35 lola-10 kernel: Lustre: soaked-MDT0003: Recovery already passed deadline 588:42, It is most likely due to DNE recovery is failed or stuck, please wait a few more minutes or abort the recovery.
      

      Attached lustre-log dumps from lola-10

      Attachments

        Issue Links

          Activity

            [LU-8815] MDS fails to complete recovery

            Any update with the patch applied? Thanks!

            yong.fan nasf (Inactive) added a comment - Any update with the patch applied? Thanks!

            We are moving soak hardware this week, will add that to the list.

            cliffw Cliff White (Inactive) added a comment - We are moving soak hardware this week, will add that to the list.

            Cliff, do you have any change to reproduce the issue with the patch https://review.whamcloud.com/#/c/25260/ applied? Thanks!

            yong.fan nasf (Inactive) added a comment - Cliff, do you have any change to reproduce the issue with the patch https://review.whamcloud.com/#/c/25260/ applied? Thanks!
            yong.fan nasf (Inactive) added a comment - +1 on master: https://testing.hpdd.intel.com/test_sets/b54d8c9c-eee8-11e6-9756-5254006e85c2

            Cliff, would you please to verify the debug patch https://review.whamcloud.com/#/c/25260/? That can tell us which FID caused the blocking. Thanks!

            yong.fan nasf (Inactive) added a comment - Cliff, would you please to verify the debug patch https://review.whamcloud.com/#/c/25260/? That can tell us which FID caused the blocking. Thanks!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/25260
            Subject: LU-8815 obdclass: debug information for lu_object_find_at
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e8c1940e2e7b0afa0b54d139c70359425bf92d40

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/25260 Subject: LU-8815 obdclass: debug information for lu_object_find_at Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e8c1940e2e7b0afa0b54d139c70359425bf92d40

            According to the bug description, it seems that one MDT was trying to recover/replay some cross-MDTs RPC, that triggered OUT_ATTR_GET RPC. Unfortunately, such OUT RPC was blocked on remote MDT inside lu_object_find_at(). Usually, the lu_object_find_at() will be blocked if the target object is in-RAM but marked as dying. If someone hold the reference on the dying object, then others will be blocked until all the reference have been released. So we need to find out what the target object is, and who hold the reference on such dying object.

            Do we have any clew about the first question: what the target object is?
            Di, is it possible an update_log file on remote MDT?

            As for the stack trace for OI scrub in above comment, that seems not related. Because OI scrub operates the inode directly, NOT reference the up layer object.

            yong.fan nasf (Inactive) added a comment - According to the bug description, it seems that one MDT was trying to recover/replay some cross-MDTs RPC, that triggered OUT_ATTR_GET RPC. Unfortunately, such OUT RPC was blocked on remote MDT inside lu_object_find_at(). Usually, the lu_object_find_at() will be blocked if the target object is in-RAM but marked as dying. If someone hold the reference on the dying object, then others will be blocked until all the reference have been released. So we need to find out what the target object is, and who hold the reference on such dying object. Do we have any clew about the first question: what the target object is? Di, is it possible an update_log file on remote MDT? As for the stack trace for OI scrub in above comment, that seems not related. Because OI scrub operates the inode directly, NOT reference the up layer object.

            Currently, system cannot recover.

            Jan 20 08:59:41 lola-8 kernel: Pid: 6236, comm: OI_scrub Tainted: P           --L------------    2.6.32-573.26.1.el6_lustre.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ
            Jan 20 08:59:41 lola-8 kernel: RIP: 0010:[<ffffffffa10d6e98>]  [<ffffffffa10d6e98>] osd_inode_iteration+0x498/0xd80 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: RSP: 0018:ffff8804173d7d10  EFLAGS: 00000202
            Jan 20 08:59:41 lola-8 kernel: RAX: 0000000000000004 RBX: ffff8804173d7e00 RCX: ffff8804173d7dc0
            Jan 20 08:59:41 lola-8 kernel: RDX: ffff8804173d7da0 RSI: ffff8808296fc000 RDI: ffff88082d394000
            Jan 20 08:59:41 lola-8 kernel: RBP: ffffffff8100bc0e R08: 0000000000000001 R09: 0000000000000004
            Jan 20 08:59:41 lola-8 kernel: R10: ffff880417371b80 R11: ffff880417371b90 R12: ffff8804173d7e00
            Jan 20 08:59:41 lola-8 kernel: R13: ffff8808296fd000 R14: ffff8808296fd0e0 R15: 0000000000000001
            Jan 20 08:59:41 lola-8 kernel: FS:  0000000000000000(0000) GS:ffff880038680000(0000) knlGS:0000000000000000
            Jan 20 08:59:41 lola-8 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
            Jan 20 08:59:41 lola-8 kernel: CR2: 00000039036acd90 CR3: 0000000001a8d000 CR4: 00000000000407e0
            Jan 20 08:59:41 lola-8 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            Jan 20 08:59:41 lola-8 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            Jan 20 08:59:41 lola-8 kernel: Process OI_scrub (pid: 6236, threadinfo ffff8804173d4000, task ffff880435f34040)
            Jan 20 08:59:41 lola-8 kernel: Stack:
            Jan 20 08:59:41 lola-8 kernel: 0000000000015a00 ffff8808331d4c00 ffff8808000005a6 ffff8804173d7e10
            Jan 20 08:59:41 lola-8 kernel: <d> ffff8804173d7dd0 ffffffff81539b0e ffff880435f34040 ffff880435f34040
            Jan 20 08:59:41 lola-8 kernel: <d> ffffffffa10d1e20 ffffffffa10d3be0 ffff88082d394000 ffff8808345e0000
            Jan 20 08:59:41 lola-8 kernel: Call Trace:
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
            Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d1e20>] ? osd_scrub_exec+0x0/0x1dc0 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d3be0>] ? osd_scrub_next+0x0/0x4b0 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d8905>] ? osd_scrub_main+0x885/0xec0 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20
            Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d8080>] ? osd_scrub_main+0x0/0xec0 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff810a138e>] ? kthread+0x9e/0xc0
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
            Jan 20 08:59:41 lola-8 kernel: Code: 00 02 05 00 00 48 c7 05 a3 59 02 00 00 00 00 00 c7 05 91 59 02 00 01 00 00 00 e8 d4 ad 7b ff e9 d2 fb ff ff 0f 1f 80 00 00 00 00 <83> f8 05 0f 84 67 03 00 00 83 f8 06 0f 84 a6 04 00 00 41 89 c1 
            Jan 20 08:59:41 lola-8 kernel: Call Trace:
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
            Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d1e20>] ? osd_scrub_exec+0x0/0x1dc0 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d3be0>] ? osd_scrub_next+0x0/0x4b0 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d8905>] ? osd_scrub_main+0x885/0xec0 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20
            Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d8080>] ? osd_scrub_main+0x0/0xec0 [osd_ldiskfs]
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff810a138e>] ? kthread+0x9e/0xc0
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
            Jan 20 08:59:41 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
            
            cliffw Cliff White (Inactive) added a comment - Currently, system cannot recover. Jan 20 08:59:41 lola-8 kernel: Pid: 6236, comm: OI_scrub Tainted: P --L------------ 2.6.32-573.26.1.el6_lustre.x86_64 #1 Intel Corporation S2600GZ ........../S2600GZ Jan 20 08:59:41 lola-8 kernel: RIP: 0010:[<ffffffffa10d6e98>] [<ffffffffa10d6e98>] osd_inode_iteration+0x498/0xd80 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: RSP: 0018:ffff8804173d7d10 EFLAGS: 00000202 Jan 20 08:59:41 lola-8 kernel: RAX: 0000000000000004 RBX: ffff8804173d7e00 RCX: ffff8804173d7dc0 Jan 20 08:59:41 lola-8 kernel: RDX: ffff8804173d7da0 RSI: ffff8808296fc000 RDI: ffff88082d394000 Jan 20 08:59:41 lola-8 kernel: RBP: ffffffff8100bc0e R08: 0000000000000001 R09: 0000000000000004 Jan 20 08:59:41 lola-8 kernel: R10: ffff880417371b80 R11: ffff880417371b90 R12: ffff8804173d7e00 Jan 20 08:59:41 lola-8 kernel: R13: ffff8808296fd000 R14: ffff8808296fd0e0 R15: 0000000000000001 Jan 20 08:59:41 lola-8 kernel: FS: 0000000000000000(0000) GS:ffff880038680000(0000) knlGS:0000000000000000 Jan 20 08:59:41 lola-8 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Jan 20 08:59:41 lola-8 kernel: CR2: 00000039036acd90 CR3: 0000000001a8d000 CR4: 00000000000407e0 Jan 20 08:59:41 lola-8 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 20 08:59:41 lola-8 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jan 20 08:59:41 lola-8 kernel: Process OI_scrub (pid: 6236, threadinfo ffff8804173d4000, task ffff880435f34040) Jan 20 08:59:41 lola-8 kernel: Stack: Jan 20 08:59:41 lola-8 kernel: 0000000000015a00 ffff8808331d4c00 ffff8808000005a6 ffff8804173d7e10 Jan 20 08:59:41 lola-8 kernel: <d> ffff8804173d7dd0 ffffffff81539b0e ffff880435f34040 ffff880435f34040 Jan 20 08:59:41 lola-8 kernel: <d> ffffffffa10d1e20 ffffffffa10d3be0 ffff88082d394000 ffff8808345e0000 Jan 20 08:59:41 lola-8 kernel: Call Trace: Jan 20 08:59:41 lola-8 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0 Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d1e20>] ? osd_scrub_exec+0x0/0x1dc0 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d3be0>] ? osd_scrub_next+0x0/0x4b0 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d8905>] ? osd_scrub_main+0x885/0xec0 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20 Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d8080>] ? osd_scrub_main+0x0/0xec0 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: [<ffffffff810a138e>] ? kthread+0x9e/0xc0 Jan 20 08:59:41 lola-8 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 Jan 20 08:59:41 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0 Jan 20 08:59:41 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Jan 20 08:59:41 lola-8 kernel: Code: 00 02 05 00 00 48 c7 05 a3 59 02 00 00 00 00 00 c7 05 91 59 02 00 01 00 00 00 e8 d4 ad 7b ff e9 d2 fb ff ff 0f 1f 80 00 00 00 00 <83> f8 05 0f 84 67 03 00 00 83 f8 06 0f 84 a6 04 00 00 41 89 c1 Jan 20 08:59:41 lola-8 kernel: Call Trace: Jan 20 08:59:41 lola-8 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0 Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d1e20>] ? osd_scrub_exec+0x0/0x1dc0 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d3be0>] ? osd_scrub_next+0x0/0x4b0 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d8905>] ? osd_scrub_main+0x885/0xec0 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: [<ffffffff81067650>] ? default_wake_function+0x0/0x20 Jan 20 08:59:41 lola-8 kernel: [<ffffffffa10d8080>] ? osd_scrub_main+0x0/0xec0 [osd_ldiskfs] Jan 20 08:59:41 lola-8 kernel: [<ffffffff810a138e>] ? kthread+0x9e/0xc0 Jan 20 08:59:41 lola-8 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 Jan 20 08:59:41 lola-8 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0 Jan 20 08:59:41 lola-8 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
            di.wang Di Wang added a comment -

            Any more information here? console or stack trace. Thanks.

            di.wang Di Wang added a comment - Any more information here? console or stack trace. Thanks.

            Hit this again testing 2.9.0 GA

            cliffw Cliff White (Inactive) added a comment - Hit this again testing 2.9.0 GA

            The abort_recovery command never completes:

            LustreError: 72214:0:(mdt_handler.c:5942:mdt_iocontrol()) soaked-MDT0003: Aborting recovery for device
            LustreError: 72214:0:(ldlm_lib.c:2596:target_stop_recovery_thread()) soaked-MDT0003: Aborting recovery
            LustreError: 6078:0:(ldlm_lib.c:2816:target_queue_recovery_request()) @@@ dropping resent queued req  req@ffff88
            03f78f03c0 x1550360654345360/t0(77309522798) o36->3d670dc0-f7e5-756a-f555-c09f70a29db2@192.168.1.117@o2ib100:-1/
            -1 lens 768/0 e 0 to 0 dl 1478715043 ref 2 fl Interpret:/6/ffffffff rc 0/-1
            LustreError: 6078:0:(ldlm_lib.c:2816:target_queue_recovery_request()) Skipped 169 previous similar messages
            
            cliffw Cliff White (Inactive) added a comment - The abort_recovery command never completes: LustreError: 72214:0:(mdt_handler.c:5942:mdt_iocontrol()) soaked-MDT0003: Aborting recovery for device LustreError: 72214:0:(ldlm_lib.c:2596:target_stop_recovery_thread()) soaked-MDT0003: Aborting recovery LustreError: 6078:0:(ldlm_lib.c:2816:target_queue_recovery_request()) @@@ dropping resent queued req req@ffff88 03f78f03c0 x1550360654345360/t0(77309522798) o36->3d670dc0-f7e5-756a-f555-c09f70a29db2@192.168.1.117@o2ib100:-1/ -1 lens 768/0 e 0 to 0 dl 1478715043 ref 2 fl Interpret:/6/ffffffff rc 0/-1 LustreError: 6078:0:(ldlm_lib.c:2816:target_queue_recovery_request()) Skipped 169 previous similar messages

            People

              yong.fan nasf (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: