Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8806

LFSCK hangs on MDT - osp_precreate_cleanup_orphans - cannot cleanup orphans: rc = -107

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.3
    • Soak cluster, tip of Lustre-master lustre: 2.8.59_79_gb8811a0
    • 3
    • 9223372036854775807

    Description

      After a failover, lfsck is started.
      2016-11-04 15:18:59,065:fsmgmt.fsmgmt:INFO lfsck started on lola-8
      lfsck never completes.
      2016-11-06 01:08:25,976:fsmgmt.fsmgmt:INFO lfsck still in progress for soaked-MDT0000 after 122865s
      Examining the logs, we see that the connection to OST0002 is going up and down, orphans cannot be cleaned up.

      Nov  4 15:18:32 lola-8 kernel: LustreError: 11-0: soaked-OST0017-osc-MDT0000: operation obd_ping to node 192.168.1.106@o2ib10 failed: rc = -107
      Nov  4 15:18:36 lola-8 kernel: LustreError: 11-0: soaked-OST0005-osc-MDT0000: operation ost_statfs to node 192.168.1.106@o2ib10 failed: rc = -107
      Nov  4 15:18:46 lola-8 kernel: LustreError: 6010:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) soaked-OST0002-osc-MDT0000: cannot cleanup orphans: rc = -107
      Nov  4 15:18:57 lola-8 kernel: LustreError: 11-0: soaked-OST000b-osc-MDT0000: operation obd_ping to node 192.168.1.106@o2ib10 failed: rc = -107
      Nov  4 15:18:57 lola-8 kernel: LustreError: Skipped 1 previous similar message
      Nov  4 15:19:27 lola-8 kernel: Lustre: 5851:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478297366/real 1478297366]  req@ffff8807e61689c0 x1550005401109488/t0(0) o6->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:28/4 lens 664/432 e 23 to 1 dl 1478297967 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
      Nov  4 15:19:27 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
      Nov  4 15:19:27 lola-8 kernel: Lustre: Skipped 9 previous similar messages
      Nov  4 15:19:27 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection restored to 192.168.1.104@o2ib10 (at 192.168.1.104@o2ib10)
      Nov  4 15:19:27 lola-8 kernel: Lustre: Skipped 18 previous similar messages
      Nov  4 15:19:27 lola-8 kernel: Lustre: 5851:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 38 previous similar messages
      Nov  4 15:29:28 lola-8 kernel: Lustre: 5830:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478297967/real 1478297967]  req@ffff8803e013d380 x1550005401094688/t0(0) o1000->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:24/4 lens 520/4320 e 23 to 1 dl 1478298568 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
      Nov  4 15:29:28 lola-8 kernel: Lustre: 5830:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 1 previous similar message
      Nov  4 15:29:28 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
      Nov  4 15:29:28 lola-8 kernel: Lustre: Skipped 1 previous similar message
      Nov  4 15:29:28 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection restored to 192.168.1.104@o2ib10 (at 192.168.1.104@o2ib10)
      Nov  4 15:29:28 lola-8 kernel: Lustre: Skipped 1 previous similar message
      Nov  4 15:31:22 lola-8 kernel: LustreError: 6010:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) soaked-OST0002-osc-MDT0000: cannot cleanup orphans: rc = -107
      Nov  4 15:39:29 lola-8 kernel: Lustre: 5840:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478298568/real 1478298568]  req@ffff8803c0c980c0 x1550005401096032/t0(0) o1000->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:24/4 lens 520/4320 e 23 to 1 dl 1478299169 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
      Nov  4 15:39:29 lola-8 kernel: Lustre: 5840:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      Nov  4 15:39:29 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
      

      lola-4 during this time was experiencing difficulties communicating with the MDS, prior to the start of the lfsck.

      Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
      Nov  4 14:49:18 lola-4 kernel:  req@ffff88050f9c2450 
      x1550005401098800/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478296163 ref 2 fl New:/2/ffffffff rc 0/-1
      Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 96 previous similar messages
      Nov  4 14:49:24 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
      Nov  4 14:38:26 lola-4 kernel: Lustre: soaked-OST0002: Export ffff8802e106a800 already connecting from 192.168.1.113@o2ib100
      Nov  4 14:38:26 lola-4 kernel: Lustre: Skipped 12 previous similar messages
      Nov  4 14:39:17 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
      Nov  4 14:39:17 lola-4 kernel:  req@ffff88052335ec50 x1550005401085472/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478295562 ref 2 fl New:/2/ffffffff rc 0/-1
      Nov  4 14:39:17 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 95 previous similar messages
      Nov  4 14:39:23 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
      Nov  4 14:39:23 lola-4 kernel: Lustre: Skipped 13 previous similar messages
      Nov  4 14:39:23 lola-4 kernel: Lustre: soaked-OST0002: Connection restored to soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10)
      Nov  4 14:49:16 lola-4 kernel: Lustre: soaked-OST0002: Export ffff8802e106a800 already connecting from 192.168.1.113@o2ib100
      Nov  4 14:49:16 lola-4 kernel: Lustre: Skipped 12 previous similar messages
      Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
      Nov  4 14:49:18 lola-4 kernel:  req@ffff88050f9c2450 x1550005401098800/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478296163 ref 2 fl New:/2/ffffffff rc 0/-1
      Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 96 previous similar messages
      Nov  4 14:49:24 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
      

      lola-4 continued dumping stacks, finally restarted the cluster
      Prior to restart, dumped lctl logs from lola-4 and lola-8, those are attached with message logs

      Attachments

        1. lola-4.syslog.txt
          1.56 MB
        2. lola-8.lustre-log.txt.gz
          889 kB
        3. lola-8.syslog.txt
          3.46 MB
        4. oss.lola-4.lustre-log.txt.gz
          2.63 MB

        Issue Links

          Activity

            [LU-8806] LFSCK hangs on MDT - osp_precreate_cleanup_orphans - cannot cleanup orphans: rc = -107

            Peter,

            i see no updates for this patch for year. If you talk about an Alex patch, it's just a part of this solution and i will add to watchers to it.
            As about alternative solution - i don't see any good for now. It just a notification - it have a 2.7,2.8, 2.9 affected.
            Not about 2.10/2.11.

            shadow Alexey Lyashkov added a comment - Peter, i see no updates for this patch for year. If you talk about an Alex patch, it's just a part of this solution and i will add to watchers to it. As about alternative solution - i don't see any good for now. It just a notification - it have a 2.7,2.8, 2.9 affected. Not about 2.10/2.11.
            pjones Peter Jones added a comment -

            Alexey

            This issue is presently targeted for the 2.11 release and a draft patch of one possible patch exists already. Feel free to provide an alternative approach or feedback on any testing you conduct on the existing prototype.

            Peter

            pjones Peter Jones added a comment - Alexey This issue is presently targeted for the 2.11 release and a draft patch of one possible patch exists already. Feel free to provide an alternative approach or feedback on any testing you conduct on the existing prototype. Peter

            We hit this issue with 2.7 (IEEL3) code.
            OST deadlocked with lfsck operation.

            Stack : 
            __schedule
            schedule
            rwsem_down_write_failed
            call_rwsem_down_write_failed
            osd_object_write_lock
            out_tx_attr_set_exec
            out_tx_end
            out_handle
            tgt_request_handle
            ptlrpc_server_handle_request
            ptlrpc_main
            kthread
            ret_from_fork
            Progs:  2213 "ll_ost_out00_00
            

            Any plans to fix it?

            shadow Alexey Lyashkov added a comment - We hit this issue with 2.7 (IEEL3) code. OST deadlocked with lfsck operation. Stack : __schedule schedule rwsem_down_write_failed call_rwsem_down_write_failed osd_object_write_lock out_tx_attr_set_exec out_tx_end out_handle tgt_request_handle ptlrpc_server_handle_request ptlrpc_main kthread ret_from_fork Progs: 2213 "ll_ost_out00_00 Any plans to fix it?

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23689
            Subject: LU-8806 target: lock ostobj before transaction start
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 28bf0565e025ca20c72fb8d813274c142ed273fe

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23689 Subject: LU-8806 target: lock ostobj before transaction start Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 28bf0565e025ca20c72fb8d813274c142ed273fe
            yong.fan nasf (Inactive) added a comment - - edited

            As I remember, there seemed some discussion about the difference between MDT and OST operations for transaction and lock order. But I cannot remember the detail. Currently, the MDT side modification is start transaction before lock the object, but the OST side is the reverse order. Because the OUT handler is shared by both sides, there always be some potential deadlock. Is there any special reason for that? Can we unify them?

            yong.fan nasf (Inactive) added a comment - - edited As I remember, there seemed some discussion about the difference between MDT and OST operations for transaction and lock order. But I cannot remember the detail. Currently, the MDT side modification is start transaction before lock the object, but the OST side is the reverse order. Because the OUT handler is shared by both sides, there always be some potential deadlock. Is there any special reason for that? Can we unify them?

            looks like a lock/tx ordering issue.

            here we take the object's lock before TX

            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038c81d>] dmu_tx_wait+0x21d/0x400 [zfs]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03d1ef2>] ? txg_kick+0x92/0xd0 [zfs]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038cbf1>] dmu_tx_assign+0xa1/0x570 [zfs]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa0f06d7d>] osd_trans_start+0xed/0x430 [osd_zfs]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa102156c>] ofd_trans_start+0x7c/0x100 [ofd]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa1023980>] ofd_object_destroy+0x3d0/0x8e0 [ofd]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa101d2cd>] ofd_destroy_by_fid+0x35d/0x620 [ofd]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098bab0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098db10>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa10176a2>] ofd_destroy_hdl+0x2f2/0xb60 [ofd]

            and here we do with TX started:

            Nov 4 09:32:15 lola-4 kernel: Pid: 33274, comm: ll_ost_out01_00
            Nov 4 09:32:15 lola-4 kernel:
            Nov 4 09:32:15 lola-4 kernel: Call Trace:
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa03a22ce>] ? dsl_dir_tempreserve_impl+0x2ae/0x330 [zfs]
            Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cc55>] rwsem_down_failed_common+0x95/0x1d0
            Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cdb3>] rwsem_down_write_failed+0x23/0x30
            Nov 4 09:32:15 lola-4 kernel: [<ffffffff8129e923>] call_rwsem_down_write_failed+0x13/0x20
            Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153c2b2>] ? down_write+0x32/0x40
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0f0f021>] osd_object_write_lock+0x61/0x70 [osd_zfs]
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2bbf9>] out_tx_attr_set_exec+0x69/0x480 [ptlrpc]
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2296a>] out_tx_end+0xda/0x5d0 [ptlrpc]
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a282b9>] out_handle+0x11d9/0x18d0 [ptlrpc]

            bzzz Alex Zhuravlev added a comment - looks like a lock/tx ordering issue. here we take the object's lock before TX Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038c81d>] dmu_tx_wait+0x21d/0x400 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03d1ef2>] ? txg_kick+0x92/0xd0 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038cbf1>] dmu_tx_assign+0xa1/0x570 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa0f06d7d>] osd_trans_start+0xed/0x430 [osd_zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa102156c>] ofd_trans_start+0x7c/0x100 [ofd] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa1023980>] ofd_object_destroy+0x3d0/0x8e0 [ofd] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa101d2cd>] ofd_destroy_by_fid+0x35d/0x620 [ofd] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098bab0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098db10>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa10176a2>] ofd_destroy_hdl+0x2f2/0xb60 [ofd] and here we do with TX started: Nov 4 09:32:15 lola-4 kernel: Pid: 33274, comm: ll_ost_out01_00 Nov 4 09:32:15 lola-4 kernel: Nov 4 09:32:15 lola-4 kernel: Call Trace: Nov 4 09:32:15 lola-4 kernel: [<ffffffffa03a22ce>] ? dsl_dir_tempreserve_impl+0x2ae/0x330 [zfs] Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cc55>] rwsem_down_failed_common+0x95/0x1d0 Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cdb3>] rwsem_down_write_failed+0x23/0x30 Nov 4 09:32:15 lola-4 kernel: [<ffffffff8129e923>] call_rwsem_down_write_failed+0x13/0x20 Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153c2b2>] ? down_write+0x32/0x40 Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0f0f021>] osd_object_write_lock+0x61/0x70 [osd_zfs] Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2bbf9>] out_tx_attr_set_exec+0x69/0x480 [ptlrpc] Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2296a>] out_tx_end+0xda/0x5d0 [ptlrpc] Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a282b9>] out_handle+0x11d9/0x18d0 [ptlrpc]

            There are no detailed Lustre debug logs for the txg_quiesce_thread hung. By simply comparing the stack trace, there are some similar tickets on Jira, such as LU-5165.

            Add Alex for more suggestion.

            yong.fan nasf (Inactive) added a comment - There are no detailed Lustre debug logs for the txg_quiesce_thread hung. By simply comparing the stack trace, there are some similar tickets on Jira, such as LU-5165 . Add Alex for more suggestion.

            The logs on the lola-4 show that at the beginning, the txg_quiesce_thread was blocked for some unknown reason.

            Nov  4 09:31:32 lola-4 kernel: INFO: task txg_quiesce:12998 blocked for more than 120 seconds.
            Nov  4 09:31:32 lola-4 kernel:      Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1
            Nov  4 09:31:32 lola-4 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            Nov  4 09:31:32 lola-4 kernel: txg_quiesce   D 000000000000000c     0 12998      2 0x00000080
            Nov  4 09:31:32 lola-4 kernel: ffff88040ae23d20 0000000000000046 0000000000000000 ffffffff810674be
            Nov  4 09:31:32 lola-4 kernel: 0000465576d5e718 ffff880400000000 0000465576dd3cd4 000000010497fc9b
            Nov  4 09:31:32 lola-4 kernel: ffff88040e8725f8 000000010497fc9d ffff88040e8725f8 ffff88040ae23fd8
            Nov  4 09:31:32 lola-4 kernel: Call Trace:
            Nov  4 09:31:32 lola-4 kernel: [<ffffffff810674be>] ? try_to_wake_up+0x24e/0x3e0
            Nov  4 09:31:32 lola-4 kernel: [<ffffffffa0314e5d>] cv_wait_common+0x11d/0x130 [spl]
            Nov  4 09:31:32 lola-4 kernel: [<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40
            Nov  4 09:31:32 lola-4 kernel: [<ffffffff812a0d8c>] ? __bitmap_weight+0x8c/0xb0
            Nov  4 09:31:32 lola-4 kernel: [<ffffffffa0314ec5>] __cv_wait+0x15/0x20 [spl]
            Nov  4 09:31:32 lola-4 kernel: [<ffffffffa03d29fb>] txg_quiesce_thread+0x32b/0x470 [zfs]
            Nov  4 09:31:32 lola-4 kernel: [<ffffffffa03d26d0>] ? txg_quiesce_thread+0x0/0x470 [zfs]
            Nov  4 09:31:32 lola-4 kernel: [<ffffffffa03d26d0>] ? txg_quiesce_thread+0x0/0x470 [zfs]
            Nov  4 09:31:32 lola-4 kernel: [<ffffffffa0310238>] thread_generic_wrapper+0x68/0x80 [spl]
            Nov  4 09:31:32 lola-4 kernel: [<ffffffffa03101d0>] ? thread_generic_wrapper+0x0/0x80 [spl]
            Nov  4 09:31:32 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
            Nov  4 09:31:32 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
            Nov  4 09:31:32 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
            Nov  4 09:31:32 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
            

            The txg_quiesce_thread() is used for handling the ZFS transactions. Such hung caused a lot of subsequent transactions blocked. For example:

            Nov  4 09:31:33 lola-4 kernel: INFO: task ll_ost_io01_002:13089 blocked for more than 120 seconds.
            Nov  4 09:31:33 lola-4 kernel:      Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1
            Nov  4 09:31:33 lola-4 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            Nov  4 09:31:33 lola-4 kernel: ll_ost_io01_0 D 0000000000000008     0 13089      2 0x00000080
            Nov  4 09:31:33 lola-4 kernel: ffff880409027790 0000000000000046 0000000000000000 0000000000000000
            Nov  4 09:31:33 lola-4 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
            Nov  4 09:31:33 lola-4 kernel: 0000000000000000 0000000000000000 ffff8804090025f8 ffff880409027fd8
            Nov  4 09:31:33 lola-4 kernel: Call Trace:
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff810a1a6e>] ? prepare_to_wait_exclusive+0x4e/0x80
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0314e5d>] cv_wait_common+0x11d/0x130 [spl]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0314ec5>] __cv_wait+0x15/0x20 [spl]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa038c81d>] dmu_tx_wait+0x21d/0x400 [zfs]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa03d1ef2>] ? txg_kick+0x92/0xd0 [zfs]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa038cbf1>] dmu_tx_assign+0xa1/0x570 [zfs]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0f06d7d>] osd_trans_start+0xed/0x430 [osd_zfs]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa10280d5>] ofd_write_attr_set+0x2c5/0x9e0 [ofd]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa102933f>] ofd_commitrw_write+0x23f/0x10a0 [ofd]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa102df1d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa102a75f>] ofd_commitrw+0x5bf/0xb10 [ofd]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff81149461>] ? kzfree+0x31/0x40
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa079d5c1>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0a17ee4>] obd_commitrw+0x114/0x380 [ptlrpc]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0a20c80>] tgt_brw_write+0xc70/0x1530 [ptlrpc]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff8105e9b6>] ? enqueue_task+0x66/0x80
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff8105ab8d>] ? check_preempt_curr+0x6d/0x90
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff810674be>] ? try_to_wake_up+0x24e/0x3e0
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0974eb0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0a1f4bc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa09cbbc1>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
            Nov  4 09:31:33 lola-4 kernel: [<ffffffffa09cae90>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
            Nov  4 09:31:33 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
            

            These OST RPC services thread hung caused related MDT/client RPC timeout, then triggered the reconnection from the MDT/client.

            Nov  4 09:48:54 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
            Nov  4 09:48:54 lola-4 kernel: Lustre: Skipped 1 previous similar message
            Nov  4 09:48:54 lola-4 kernel: Lustre: soaked-OST0002: Connection restored to soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10)
            Nov  4 09:48:54 lola-4 kernel: Lustre: Skipped 12 previous similar messages
            Nov  4 09:48:58 lola-4 kernel: Lustre: soaked-OST0002: Client e093a05c-49f1-1f4e-3cd0-0381aa4e6173 (at 192.168.1.130@o2ib100) reconnecting
            Nov  4 09:48:58 lola-4 kernel: Lustre: Skipped 2 previous similar messages
            Nov  4 09:48:58 lola-4 kernel: Lustre: soaked-OST0008: deleting orphan objects from 0x0:27891162 to 0x0:27891185
            

            For MDT, after reconnected, it sent orphan cleanup RPC to the OST for sync the last_id. Unfortunately, related RPC service thread was blocked by the mutex ofd_seq::os_create_lock that was held by another RPC service thread which was blocked by the transaction trouble. So there were a serious of blocking. And then these blocked orphan cleanup RPC timeout and trigger more and more reconnection repeatedly.

            Nov  4 10:24:05 lola-4 kernel: Pid: 21738, comm: ll_ost01_103
            Nov  4 10:24:05 lola-4 kernel: 
            Nov  4 10:24:05 lola-4 kernel: Call Trace:
            Nov  4 10:24:05 lola-4 kernel: [<ffffffff8153b4b6>] __mutex_lock_slowpath+0x96/0x210
            Nov  4 10:24:05 lola-4 kernel: [<ffffffffa102086b>] ? ofd_seq_load+0xbb/0x9c0 [ofd]
            Nov  4 10:24:05 lola-4 kernel: [<ffffffff8153afdb>] mutex_lock+0x2b/0x50
            Nov  4 10:24:05 lola-4 kernel: [<ffffffffa1015a28>] ofd_create_hdl+0xba8/0x2530 [ofd]
            Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09baf9b>] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc]
            Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09bb0d6>] ? lustre_pack_reply_flags+0xa6/0x1e0 [ptlrpc]
            Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09bb221>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
            Nov  4 10:24:05 lola-4 kernel: [<ffffffffa0a1f4bc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
            Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09cbbc1>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
            Nov  4 10:24:05 lola-4 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
            Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09cae90>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
            Nov  4 10:24:05 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
            Nov  4 10:24:05 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
            Nov  4 10:24:05 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
            Nov  4 10:24:05 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
            

            So the root is why the txg_quiesce_thread was blocked.

            yong.fan nasf (Inactive) added a comment - The logs on the lola-4 show that at the beginning, the txg_quiesce_thread was blocked for some unknown reason. Nov 4 09:31:32 lola-4 kernel: INFO: task txg_quiesce:12998 blocked for more than 120 seconds. Nov 4 09:31:32 lola-4 kernel: Tainted: P -- ------------ 2.6.32-573.26.1.el6_lustre.x86_64 #1 Nov 4 09:31:32 lola-4 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 4 09:31:32 lola-4 kernel: txg_quiesce D 000000000000000c 0 12998 2 0x00000080 Nov 4 09:31:32 lola-4 kernel: ffff88040ae23d20 0000000000000046 0000000000000000 ffffffff810674be Nov 4 09:31:32 lola-4 kernel: 0000465576d5e718 ffff880400000000 0000465576dd3cd4 000000010497fc9b Nov 4 09:31:32 lola-4 kernel: ffff88040e8725f8 000000010497fc9d ffff88040e8725f8 ffff88040ae23fd8 Nov 4 09:31:32 lola-4 kernel: Call Trace: Nov 4 09:31:32 lola-4 kernel: [<ffffffff810674be>] ? try_to_wake_up+0x24e/0x3e0 Nov 4 09:31:32 lola-4 kernel: [<ffffffffa0314e5d>] cv_wait_common+0x11d/0x130 [spl] Nov 4 09:31:32 lola-4 kernel: [<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40 Nov 4 09:31:32 lola-4 kernel: [<ffffffff812a0d8c>] ? __bitmap_weight+0x8c/0xb0 Nov 4 09:31:32 lola-4 kernel: [<ffffffffa0314ec5>] __cv_wait+0x15/0x20 [spl] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03d29fb>] txg_quiesce_thread+0x32b/0x470 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03d26d0>] ? txg_quiesce_thread+0x0/0x470 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03d26d0>] ? txg_quiesce_thread+0x0/0x470 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa0310238>] thread_generic_wrapper+0x68/0x80 [spl] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03101d0>] ? thread_generic_wrapper+0x0/0x80 [spl] Nov 4 09:31:32 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0 Nov 4 09:31:32 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Nov 4 09:31:32 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0 Nov 4 09:31:32 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 The txg_quiesce_thread() is used for handling the ZFS transactions. Such hung caused a lot of subsequent transactions blocked. For example: Nov 4 09:31:33 lola-4 kernel: INFO: task ll_ost_io01_002:13089 blocked for more than 120 seconds. Nov 4 09:31:33 lola-4 kernel: Tainted: P -- ------------ 2.6.32-573.26.1.el6_lustre.x86_64 #1 Nov 4 09:31:33 lola-4 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 4 09:31:33 lola-4 kernel: ll_ost_io01_0 D 0000000000000008 0 13089 2 0x00000080 Nov 4 09:31:33 lola-4 kernel: ffff880409027790 0000000000000046 0000000000000000 0000000000000000 Nov 4 09:31:33 lola-4 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Nov 4 09:31:33 lola-4 kernel: 0000000000000000 0000000000000000 ffff8804090025f8 ffff880409027fd8 Nov 4 09:31:33 lola-4 kernel: Call Trace: Nov 4 09:31:33 lola-4 kernel: [<ffffffff810a1a6e>] ? prepare_to_wait_exclusive+0x4e/0x80 Nov 4 09:31:33 lola-4 kernel: [<ffffffffa0314e5d>] cv_wait_common+0x11d/0x130 [spl] Nov 4 09:31:33 lola-4 kernel: [<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40 Nov 4 09:31:33 lola-4 kernel: [<ffffffffa0314ec5>] __cv_wait+0x15/0x20 [spl] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa038c81d>] dmu_tx_wait+0x21d/0x400 [zfs] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa03d1ef2>] ? txg_kick+0x92/0xd0 [zfs] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa038cbf1>] dmu_tx_assign+0xa1/0x570 [zfs] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa0f06d7d>] osd_trans_start+0xed/0x430 [osd_zfs] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa10280d5>] ofd_write_attr_set+0x2c5/0x9e0 [ofd] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa102933f>] ofd_commitrw_write+0x23f/0x10a0 [ofd] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa102df1d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa102a75f>] ofd_commitrw+0x5bf/0xb10 [ofd] Nov 4 09:31:33 lola-4 kernel: [<ffffffff81149461>] ? kzfree+0x31/0x40 Nov 4 09:31:33 lola-4 kernel: [<ffffffffa079d5c1>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa0a17ee4>] obd_commitrw+0x114/0x380 [ptlrpc] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa0a20c80>] tgt_brw_write+0xc70/0x1530 [ptlrpc] Nov 4 09:31:33 lola-4 kernel: [<ffffffff8105e9b6>] ? enqueue_task+0x66/0x80 Nov 4 09:31:33 lola-4 kernel: [<ffffffff8105ab8d>] ? check_preempt_curr+0x6d/0x90 Nov 4 09:31:33 lola-4 kernel: [<ffffffff810674be>] ? try_to_wake_up+0x24e/0x3e0 Nov 4 09:31:33 lola-4 kernel: [<ffffffffa0974eb0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa0a1f4bc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc] Nov 4 09:31:33 lola-4 kernel: [<ffffffffa09cbbc1>] ptlrpc_main+0xd31/0x1800 [ptlrpc] Nov 4 09:31:33 lola-4 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0 Nov 4 09:31:33 lola-4 kernel: [<ffffffffa09cae90>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] Nov 4 09:31:33 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0 Nov 4 09:31:33 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Nov 4 09:31:33 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0 Nov 4 09:31:33 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 These OST RPC services thread hung caused related MDT/client RPC timeout, then triggered the reconnection from the MDT/client. Nov 4 09:48:54 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting Nov 4 09:48:54 lola-4 kernel: Lustre: Skipped 1 previous similar message Nov 4 09:48:54 lola-4 kernel: Lustre: soaked-OST0002: Connection restored to soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) Nov 4 09:48:54 lola-4 kernel: Lustre: Skipped 12 previous similar messages Nov 4 09:48:58 lola-4 kernel: Lustre: soaked-OST0002: Client e093a05c-49f1-1f4e-3cd0-0381aa4e6173 (at 192.168.1.130@o2ib100) reconnecting Nov 4 09:48:58 lola-4 kernel: Lustre: Skipped 2 previous similar messages Nov 4 09:48:58 lola-4 kernel: Lustre: soaked-OST0008: deleting orphan objects from 0x0:27891162 to 0x0:27891185 For MDT, after reconnected, it sent orphan cleanup RPC to the OST for sync the last_id. Unfortunately, related RPC service thread was blocked by the mutex ofd_seq::os_create_lock that was held by another RPC service thread which was blocked by the transaction trouble. So there were a serious of blocking. And then these blocked orphan cleanup RPC timeout and trigger more and more reconnection repeatedly. Nov 4 10:24:05 lola-4 kernel: Pid: 21738, comm: ll_ost01_103 Nov 4 10:24:05 lola-4 kernel: Nov 4 10:24:05 lola-4 kernel: Call Trace: Nov 4 10:24:05 lola-4 kernel: [<ffffffff8153b4b6>] __mutex_lock_slowpath+0x96/0x210 Nov 4 10:24:05 lola-4 kernel: [<ffffffffa102086b>] ? ofd_seq_load+0xbb/0x9c0 [ofd] Nov 4 10:24:05 lola-4 kernel: [<ffffffff8153afdb>] mutex_lock+0x2b/0x50 Nov 4 10:24:05 lola-4 kernel: [<ffffffffa1015a28>] ofd_create_hdl+0xba8/0x2530 [ofd] Nov 4 10:24:05 lola-4 kernel: [<ffffffffa09baf9b>] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc] Nov 4 10:24:05 lola-4 kernel: [<ffffffffa09bb0d6>] ? lustre_pack_reply_flags+0xa6/0x1e0 [ptlrpc] Nov 4 10:24:05 lola-4 kernel: [<ffffffffa09bb221>] ? lustre_pack_reply+0x11/0x20 [ptlrpc] Nov 4 10:24:05 lola-4 kernel: [<ffffffffa0a1f4bc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc] Nov 4 10:24:05 lola-4 kernel: [<ffffffffa09cbbc1>] ptlrpc_main+0xd31/0x1800 [ptlrpc] Nov 4 10:24:05 lola-4 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0 Nov 4 10:24:05 lola-4 kernel: [<ffffffffa09cae90>] ? ptlrpc_main+0x0/0x1800 [ptlrpc] Nov 4 10:24:05 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0 Nov 4 10:24:05 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20 Nov 4 10:24:05 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0 Nov 4 10:24:05 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 So the root is why the txg_quiesce_thread was blocked.

            Here is the LFSCK logs on the lola-8:

            69088:00100000:10000000:31.0:1478532776.517351:0:18252:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 0
            69089:00100000:10000000:16.0:1478532776.518230:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 6 for lfsck_layout: event = 6, rc = -4
            69090:00100000:10000000:16.0:1478532776.518242:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 15 for lfsck_layout: event = 6, rc = -4
            69091:00100000:10000000:16.0:1478532776.518248:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 8 for lfsck_layout: event = 6, rc = -4
            69092:00100000:10000000:16.0:1478532776.518253:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 10 for lfsck_layout: event = 6, rc = -4
            69093:00100000:10000000:16.0:1478532776.518258:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 4 for lfsck_layout: event = 6, rc = -4
            69094:00100000:10000000:16.0:1478532776.518262:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 9 for lfsck_layout: event = 6, rc = -4
            69095:00100000:10000000:16.0:1478532776.518265:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 13 for lfsck_layout: event = 6, rc = -4
            69096:00100000:10000000:16.0:1478532776.518268:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST f for lfsck_layout: event = 6, rc = -4
            69097:00100000:10000000:16.0:1478532776.518271:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST d for lfsck_layout: event = 6, rc = -4
            69098:00100000:10000000:16.0:1478532776.518274:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 0 for lfsck_layout: event = 6, rc = -4
            69099:00100000:10000000:16.0:1478532776.518280:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST e for lfsck_layout: event = 6, rc = -4
            69100:00100000:10000000:16.0:1478532776.518283:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 16 for lfsck_layout: event = 6, rc = -4
            69101:00100000:10000000:16.0:1478532776.518287:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST c for lfsck_layout: event = 6, rc = -4
            69102:00100000:10000000:16.0:1478532776.518290:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 1 for lfsck_layout: event = 6, rc = -4
            69103:00100000:10000000:16.0:1478532776.518293:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 11 for lfsck_layout: event = 6, rc = -4
            69104:00100000:10000000:16.0:1478532776.518296:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 3 for lfsck_layout: event = 6, rc = -4
            69105:00100000:10000000:16.0:1478532776.518299:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 14 for lfsck_layout: event = 6, rc = -4
            69106:00100000:10000000:16.0:1478532776.518302:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 2 for lfsck_layout: event = 6, rc = -4
            69107:00100000:10000000:16.0:1478532776.518305:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST b for lfsck_layout: event = 6, rc = -4
            69108:00100000:10000000:16.0:1478532776.518309:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 7 for lfsck_layout: event = 6, rc = -4
            69109:00100000:10000000:16.0:1478532776.518312:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 5 for lfsck_layout: event = 6, rc = -4
            69110:00100000:10000000:16.0:1478532776.518315:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 17 for lfsck_layout: event = 6, rc = -4
            69111:00100000:10000000:16.0:1478532776.518318:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST a for lfsck_layout: event = 6, rc = -4
            69112:00100000:10000000:16.0:1478532776.518322:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 12 for lfsck_layout: event = 6, rc = -4
            ...
            69151:00100000:10000000:16.0:1478532876.526908:0:18254:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = -110
            69152:00100000:10000000:16.0:1478532876.526916:0:18254:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 0
            69153:00100000:10000000:31.0:1478532876.526967:0:18252:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_layout post, rc = 0
            69154:00100000:10000000:31.0:1478532876.527030:0:18252:0:(lfsck_layout.c:4680:lfsck_layout_master_post()) soaked-MDT0000-osd: layout LFSCK master post done: rc = 0
            69155:00100000:10000000:31.0:1478532876.527035:0:18252:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 0
            69156:00100000:10000000:2.0:1478532876.527095:0:18256:0:(lfsck_engine.c:1805:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant sync before exit
            69193:00100000:10000000:2.0:1478532976.535859:0:18256:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = -110
            69194:00100000:10000000:2.0:1478532976.535867:0:18256:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 0
            69195:00100000:10000000:31.0:1478532976.535918:0:18252:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 0
            69196:00100000:10000000:31.0:1478532976.535987:0:18252:0:(lfsck_namespace.c:4303:lfsck_namespace_post()) soaked-MDT0000-osd: namespace LFSCK post done: rc = 0
            

            That means both the layout LFSCK assistant thread and the namespace LFSCK assistant thread have exited finally, although there were a lot failures when communicated with remote peers. So what Cliff saw about the LFSCK on the MDT was during the connection issues with lola-4. The MDT0000 needs to talks with the OSTs on lola-4 for layout LFSCK, since there were connection rouble with the lola-4, the LFSCK will retry until recovered or manually stopped or exited. So the LFSCK behaviour is in expectation. What we need to make clear is why the connection between the lola-8 and lola-4 were repeatedly down and up.

            yong.fan nasf (Inactive) added a comment - Here is the LFSCK logs on the lola-8: 69088:00100000:10000000:31.0:1478532776.517351:0:18252:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 0 69089:00100000:10000000:16.0:1478532776.518230:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 6 for lfsck_layout: event = 6, rc = -4 69090:00100000:10000000:16.0:1478532776.518242:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 15 for lfsck_layout: event = 6, rc = -4 69091:00100000:10000000:16.0:1478532776.518248:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 8 for lfsck_layout: event = 6, rc = -4 69092:00100000:10000000:16.0:1478532776.518253:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 10 for lfsck_layout: event = 6, rc = -4 69093:00100000:10000000:16.0:1478532776.518258:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 4 for lfsck_layout: event = 6, rc = -4 69094:00100000:10000000:16.0:1478532776.518262:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 9 for lfsck_layout: event = 6, rc = -4 69095:00100000:10000000:16.0:1478532776.518265:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 13 for lfsck_layout: event = 6, rc = -4 69096:00100000:10000000:16.0:1478532776.518268:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST f for lfsck_layout: event = 6, rc = -4 69097:00100000:10000000:16.0:1478532776.518271:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST d for lfsck_layout: event = 6, rc = -4 69098:00100000:10000000:16.0:1478532776.518274:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 0 for lfsck_layout: event = 6, rc = -4 69099:00100000:10000000:16.0:1478532776.518280:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST e for lfsck_layout: event = 6, rc = -4 69100:00100000:10000000:16.0:1478532776.518283:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 16 for lfsck_layout: event = 6, rc = -4 69101:00100000:10000000:16.0:1478532776.518287:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST c for lfsck_layout: event = 6, rc = -4 69102:00100000:10000000:16.0:1478532776.518290:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 1 for lfsck_layout: event = 6, rc = -4 69103:00100000:10000000:16.0:1478532776.518293:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 11 for lfsck_layout: event = 6, rc = -4 69104:00100000:10000000:16.0:1478532776.518296:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 3 for lfsck_layout: event = 6, rc = -4 69105:00100000:10000000:16.0:1478532776.518299:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 14 for lfsck_layout: event = 6, rc = -4 69106:00100000:10000000:16.0:1478532776.518302:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 2 for lfsck_layout: event = 6, rc = -4 69107:00100000:10000000:16.0:1478532776.518305:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST b for lfsck_layout: event = 6, rc = -4 69108:00100000:10000000:16.0:1478532776.518309:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 7 for lfsck_layout: event = 6, rc = -4 69109:00100000:10000000:16.0:1478532776.518312:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 5 for lfsck_layout: event = 6, rc = -4 69110:00100000:10000000:16.0:1478532776.518315:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 17 for lfsck_layout: event = 6, rc = -4 69111:00100000:10000000:16.0:1478532776.518318:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST a for lfsck_layout: event = 6, rc = -4 69112:00100000:10000000:16.0:1478532776.518322:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 12 for lfsck_layout: event = 6, rc = -4 ... 69151:00100000:10000000:16.0:1478532876.526908:0:18254:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = -110 69152:00100000:10000000:16.0:1478532876.526916:0:18254:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 0 69153:00100000:10000000:31.0:1478532876.526967:0:18252:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_layout post, rc = 0 69154:00100000:10000000:31.0:1478532876.527030:0:18252:0:(lfsck_layout.c:4680:lfsck_layout_master_post()) soaked-MDT0000-osd: layout LFSCK master post done: rc = 0 69155:00100000:10000000:31.0:1478532876.527035:0:18252:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 0 69156:00100000:10000000:2.0:1478532876.527095:0:18256:0:(lfsck_engine.c:1805:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant sync before exit 69193:00100000:10000000:2.0:1478532976.535859:0:18256:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = -110 69194:00100000:10000000:2.0:1478532976.535867:0:18256:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 0 69195:00100000:10000000:31.0:1478532976.535918:0:18252:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 0 69196:00100000:10000000:31.0:1478532976.535987:0:18252:0:(lfsck_namespace.c:4303:lfsck_namespace_post()) soaked-MDT0000-osd: namespace LFSCK post done: rc = 0 That means both the layout LFSCK assistant thread and the namespace LFSCK assistant thread have exited finally, although there were a lot failures when communicated with remote peers. So what Cliff saw about the LFSCK on the MDT was during the connection issues with lola-4. The MDT0000 needs to talks with the OSTs on lola-4 for layout LFSCK, since there were connection rouble with the lola-4, the LFSCK will retry until recovered or manually stopped or exited. So the LFSCK behaviour is in expectation. What we need to make clear is why the connection between the lola-8 and lola-4 were repeatedly down and up.
            pjones Peter Jones added a comment -

            Fan Yong

            Could you please advise on this issue?

            Thanks

            Peter

            pjones Peter Jones added a comment - Fan Yong Could you please advise on this issue? Thanks Peter

            People

              bzzz Alex Zhuravlev
              cliffw Cliff White (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: