Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8806

LFSCK hangs on MDT - osp_precreate_cleanup_orphans - cannot cleanup orphans: rc = -107

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.3
    • Soak cluster, tip of Lustre-master lustre: 2.8.59_79_gb8811a0
    • 3
    • 9223372036854775807

    Description

      After a failover, lfsck is started.
      2016-11-04 15:18:59,065:fsmgmt.fsmgmt:INFO lfsck started on lola-8
      lfsck never completes.
      2016-11-06 01:08:25,976:fsmgmt.fsmgmt:INFO lfsck still in progress for soaked-MDT0000 after 122865s
      Examining the logs, we see that the connection to OST0002 is going up and down, orphans cannot be cleaned up.

      Nov  4 15:18:32 lola-8 kernel: LustreError: 11-0: soaked-OST0017-osc-MDT0000: operation obd_ping to node 192.168.1.106@o2ib10 failed: rc = -107
      Nov  4 15:18:36 lola-8 kernel: LustreError: 11-0: soaked-OST0005-osc-MDT0000: operation ost_statfs to node 192.168.1.106@o2ib10 failed: rc = -107
      Nov  4 15:18:46 lola-8 kernel: LustreError: 6010:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) soaked-OST0002-osc-MDT0000: cannot cleanup orphans: rc = -107
      Nov  4 15:18:57 lola-8 kernel: LustreError: 11-0: soaked-OST000b-osc-MDT0000: operation obd_ping to node 192.168.1.106@o2ib10 failed: rc = -107
      Nov  4 15:18:57 lola-8 kernel: LustreError: Skipped 1 previous similar message
      Nov  4 15:19:27 lola-8 kernel: Lustre: 5851:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478297366/real 1478297366]  req@ffff8807e61689c0 x1550005401109488/t0(0) o6->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:28/4 lens 664/432 e 23 to 1 dl 1478297967 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
      Nov  4 15:19:27 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
      Nov  4 15:19:27 lola-8 kernel: Lustre: Skipped 9 previous similar messages
      Nov  4 15:19:27 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection restored to 192.168.1.104@o2ib10 (at 192.168.1.104@o2ib10)
      Nov  4 15:19:27 lola-8 kernel: Lustre: Skipped 18 previous similar messages
      Nov  4 15:19:27 lola-8 kernel: Lustre: 5851:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 38 previous similar messages
      Nov  4 15:29:28 lola-8 kernel: Lustre: 5830:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478297967/real 1478297967]  req@ffff8803e013d380 x1550005401094688/t0(0) o1000->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:24/4 lens 520/4320 e 23 to 1 dl 1478298568 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
      Nov  4 15:29:28 lola-8 kernel: Lustre: 5830:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 1 previous similar message
      Nov  4 15:29:28 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
      Nov  4 15:29:28 lola-8 kernel: Lustre: Skipped 1 previous similar message
      Nov  4 15:29:28 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection restored to 192.168.1.104@o2ib10 (at 192.168.1.104@o2ib10)
      Nov  4 15:29:28 lola-8 kernel: Lustre: Skipped 1 previous similar message
      Nov  4 15:31:22 lola-8 kernel: LustreError: 6010:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) soaked-OST0002-osc-MDT0000: cannot cleanup orphans: rc = -107
      Nov  4 15:39:29 lola-8 kernel: Lustre: 5840:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478298568/real 1478298568]  req@ffff8803c0c980c0 x1550005401096032/t0(0) o1000->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:24/4 lens 520/4320 e 23 to 1 dl 1478299169 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
      Nov  4 15:39:29 lola-8 kernel: Lustre: 5840:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      Nov  4 15:39:29 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
      

      lola-4 during this time was experiencing difficulties communicating with the MDS, prior to the start of the lfsck.

      Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
      Nov  4 14:49:18 lola-4 kernel:  req@ffff88050f9c2450 
      x1550005401098800/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478296163 ref 2 fl New:/2/ffffffff rc 0/-1
      Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 96 previous similar messages
      Nov  4 14:49:24 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
      Nov  4 14:38:26 lola-4 kernel: Lustre: soaked-OST0002: Export ffff8802e106a800 already connecting from 192.168.1.113@o2ib100
      Nov  4 14:38:26 lola-4 kernel: Lustre: Skipped 12 previous similar messages
      Nov  4 14:39:17 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
      Nov  4 14:39:17 lola-4 kernel:  req@ffff88052335ec50 x1550005401085472/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478295562 ref 2 fl New:/2/ffffffff rc 0/-1
      Nov  4 14:39:17 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 95 previous similar messages
      Nov  4 14:39:23 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
      Nov  4 14:39:23 lola-4 kernel: Lustre: Skipped 13 previous similar messages
      Nov  4 14:39:23 lola-4 kernel: Lustre: soaked-OST0002: Connection restored to soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10)
      Nov  4 14:49:16 lola-4 kernel: Lustre: soaked-OST0002: Export ffff8802e106a800 already connecting from 192.168.1.113@o2ib100
      Nov  4 14:49:16 lola-4 kernel: Lustre: Skipped 12 previous similar messages
      Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
      Nov  4 14:49:18 lola-4 kernel:  req@ffff88050f9c2450 x1550005401098800/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478296163 ref 2 fl New:/2/ffffffff rc 0/-1
      Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 96 previous similar messages
      Nov  4 14:49:24 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
      

      lola-4 continued dumping stacks, finally restarted the cluster
      Prior to restart, dumped lctl logs from lola-4 and lola-8, those are attached with message logs

      Attachments

        1. lola-4.syslog.txt
          1.56 MB
        2. lola-8.lustre-log.txt.gz
          889 kB
        3. lola-8.syslog.txt
          3.46 MB
        4. oss.lola-4.lustre-log.txt.gz
          2.63 MB

        Issue Links

          Activity

            [LU-8806] LFSCK hangs on MDT - osp_precreate_cleanup_orphans - cannot cleanup orphans: rc = -107

            Closing this as a duplicate of LU-10048, which landed a patch for 2.12.

            adilger Andreas Dilger added a comment - Closing this as a duplicate of LU-10048 , which landed a patch for 2.12.

            Alexey, the https://review.whamcloud.com/23689 patch points at https://review.whamcloud.com/27488 and 23689 has been abandoned. If you are having problems with this issue, then we welcome you to review and test that patch as you said you would do a few months ago, or submit a patch of your own. That is especially useful if this code is broken continuously for you, since it means you have a good reproducer and can verify if the fix works. I would also welcome an alternative patch based on the old ext3 delete thread, that pushes the extents/blocks to be truncated off to another temporary inode during the main transaction, and then does the actual truncate asynchronously. That would speed up all ext4 truncate/unlink operations, and not require changes to Lustre.

            adilger Andreas Dilger added a comment - Alexey, the https://review.whamcloud.com/23689 patch points at https://review.whamcloud.com/27488 and 23689 has been abandoned. If you are having problems with this issue, then we welcome you to review and test that patch as you said you would do a few months ago, or submit a patch of your own. That is especially useful if this code is broken continuously for you, since it means you have a good reproducer and can verify if the fix works. I would also welcome an alternative patch based on the old ext3 delete thread, that pushes the extents/blocks to be truncated off to another temporary inode during the main transaction, and then does the actual truncate asynchronously. That would speed up all ext4 truncate/unlink operations, and not require changes to Lustre.

            Peter,

            You have say this is targeted for 2.11 but i don't see a progress in ticket and lfsck broken continuously.

            shadow Alexey Lyashkov added a comment - Peter, You have say this is targeted for 2.11 but i don't see a progress in ticket and lfsck broken continuously.

            Peter,

            i see no updates for this patch for year. If you talk about an Alex patch, it's just a part of this solution and i will add to watchers to it.
            As about alternative solution - i don't see any good for now. It just a notification - it have a 2.7,2.8, 2.9 affected.
            Not about 2.10/2.11.

            shadow Alexey Lyashkov added a comment - Peter, i see no updates for this patch for year. If you talk about an Alex patch, it's just a part of this solution and i will add to watchers to it. As about alternative solution - i don't see any good for now. It just a notification - it have a 2.7,2.8, 2.9 affected. Not about 2.10/2.11.
            pjones Peter Jones added a comment -

            Alexey

            This issue is presently targeted for the 2.11 release and a draft patch of one possible patch exists already. Feel free to provide an alternative approach or feedback on any testing you conduct on the existing prototype.

            Peter

            pjones Peter Jones added a comment - Alexey This issue is presently targeted for the 2.11 release and a draft patch of one possible patch exists already. Feel free to provide an alternative approach or feedback on any testing you conduct on the existing prototype. Peter

            We hit this issue with 2.7 (IEEL3) code.
            OST deadlocked with lfsck operation.

            Stack : 
            __schedule
            schedule
            rwsem_down_write_failed
            call_rwsem_down_write_failed
            osd_object_write_lock
            out_tx_attr_set_exec
            out_tx_end
            out_handle
            tgt_request_handle
            ptlrpc_server_handle_request
            ptlrpc_main
            kthread
            ret_from_fork
            Progs:  2213 "ll_ost_out00_00
            

            Any plans to fix it?

            shadow Alexey Lyashkov added a comment - We hit this issue with 2.7 (IEEL3) code. OST deadlocked with lfsck operation. Stack : __schedule schedule rwsem_down_write_failed call_rwsem_down_write_failed osd_object_write_lock out_tx_attr_set_exec out_tx_end out_handle tgt_request_handle ptlrpc_server_handle_request ptlrpc_main kthread ret_from_fork Progs: 2213 "ll_ost_out00_00 Any plans to fix it?

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23689
            Subject: LU-8806 target: lock ostobj before transaction start
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 28bf0565e025ca20c72fb8d813274c142ed273fe

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23689 Subject: LU-8806 target: lock ostobj before transaction start Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 28bf0565e025ca20c72fb8d813274c142ed273fe
            yong.fan nasf (Inactive) added a comment - - edited

            As I remember, there seemed some discussion about the difference between MDT and OST operations for transaction and lock order. But I cannot remember the detail. Currently, the MDT side modification is start transaction before lock the object, but the OST side is the reverse order. Because the OUT handler is shared by both sides, there always be some potential deadlock. Is there any special reason for that? Can we unify them?

            yong.fan nasf (Inactive) added a comment - - edited As I remember, there seemed some discussion about the difference between MDT and OST operations for transaction and lock order. But I cannot remember the detail. Currently, the MDT side modification is start transaction before lock the object, but the OST side is the reverse order. Because the OUT handler is shared by both sides, there always be some potential deadlock. Is there any special reason for that? Can we unify them?

            looks like a lock/tx ordering issue.

            here we take the object's lock before TX

            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038c81d>] dmu_tx_wait+0x21d/0x400 [zfs]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03d1ef2>] ? txg_kick+0x92/0xd0 [zfs]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038cbf1>] dmu_tx_assign+0xa1/0x570 [zfs]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa0f06d7d>] osd_trans_start+0xed/0x430 [osd_zfs]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa102156c>] ofd_trans_start+0x7c/0x100 [ofd]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa1023980>] ofd_object_destroy+0x3d0/0x8e0 [ofd]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa101d2cd>] ofd_destroy_by_fid+0x35d/0x620 [ofd]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098bab0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098db10>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
            Nov 4 09:31:32 lola-4 kernel: [<ffffffffa10176a2>] ofd_destroy_hdl+0x2f2/0xb60 [ofd]

            and here we do with TX started:

            Nov 4 09:32:15 lola-4 kernel: Pid: 33274, comm: ll_ost_out01_00
            Nov 4 09:32:15 lola-4 kernel:
            Nov 4 09:32:15 lola-4 kernel: Call Trace:
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa03a22ce>] ? dsl_dir_tempreserve_impl+0x2ae/0x330 [zfs]
            Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cc55>] rwsem_down_failed_common+0x95/0x1d0
            Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cdb3>] rwsem_down_write_failed+0x23/0x30
            Nov 4 09:32:15 lola-4 kernel: [<ffffffff8129e923>] call_rwsem_down_write_failed+0x13/0x20
            Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153c2b2>] ? down_write+0x32/0x40
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0f0f021>] osd_object_write_lock+0x61/0x70 [osd_zfs]
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2bbf9>] out_tx_attr_set_exec+0x69/0x480 [ptlrpc]
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2296a>] out_tx_end+0xda/0x5d0 [ptlrpc]
            Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a282b9>] out_handle+0x11d9/0x18d0 [ptlrpc]

            bzzz Alex Zhuravlev added a comment - looks like a lock/tx ordering issue. here we take the object's lock before TX Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038c81d>] dmu_tx_wait+0x21d/0x400 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03d1ef2>] ? txg_kick+0x92/0xd0 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038cbf1>] dmu_tx_assign+0xa1/0x570 [zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa0f06d7d>] osd_trans_start+0xed/0x430 [osd_zfs] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa102156c>] ofd_trans_start+0x7c/0x100 [ofd] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa1023980>] ofd_object_destroy+0x3d0/0x8e0 [ofd] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa101d2cd>] ofd_destroy_by_fid+0x35d/0x620 [ofd] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098bab0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098db10>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc] Nov 4 09:31:32 lola-4 kernel: [<ffffffffa10176a2>] ofd_destroy_hdl+0x2f2/0xb60 [ofd] and here we do with TX started: Nov 4 09:32:15 lola-4 kernel: Pid: 33274, comm: ll_ost_out01_00 Nov 4 09:32:15 lola-4 kernel: Nov 4 09:32:15 lola-4 kernel: Call Trace: Nov 4 09:32:15 lola-4 kernel: [<ffffffffa03a22ce>] ? dsl_dir_tempreserve_impl+0x2ae/0x330 [zfs] Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cc55>] rwsem_down_failed_common+0x95/0x1d0 Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cdb3>] rwsem_down_write_failed+0x23/0x30 Nov 4 09:32:15 lola-4 kernel: [<ffffffff8129e923>] call_rwsem_down_write_failed+0x13/0x20 Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153c2b2>] ? down_write+0x32/0x40 Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0f0f021>] osd_object_write_lock+0x61/0x70 [osd_zfs] Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2bbf9>] out_tx_attr_set_exec+0x69/0x480 [ptlrpc] Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2296a>] out_tx_end+0xda/0x5d0 [ptlrpc] Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a282b9>] out_handle+0x11d9/0x18d0 [ptlrpc]

            There are no detailed Lustre debug logs for the txg_quiesce_thread hung. By simply comparing the stack trace, there are some similar tickets on Jira, such as LU-5165.

            Add Alex for more suggestion.

            yong.fan nasf (Inactive) added a comment - There are no detailed Lustre debug logs for the txg_quiesce_thread hung. By simply comparing the stack trace, there are some similar tickets on Jira, such as LU-5165 . Add Alex for more suggestion.

            People

              bzzz Alex Zhuravlev
              cliffw Cliff White (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: