[LU-8806] LFSCK hangs on MDT - osp_precreate_cleanup_orphans - cannot cleanup orphans: rc = -107 Created: 07/Nov/16  Updated: 22/Aug/18  Resolved: 22/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Cliff White (Inactive) Assignee: Alex Zhuravlev
Resolution: Duplicate Votes: 1
Labels: soak
Environment:

Soak cluster, tip of Lustre-master lustre: 2.8.59_79_gb8811a0


Attachments: Text File lola-4.syslog.txt     File lola-8.lustre-log.txt.gz     Text File lola-8.syslog.txt     File oss.lola-4.lustre-log.txt.gz    
Issue Links:
Related
is related to LU-5165 Test failure on test suite recovery-s... Open
is related to LU-5994 DT transaction start and object lock ... Resolved
is related to LU-10048 osd-ldiskfs to truncate outside of ma... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

After a failover, lfsck is started.
2016-11-04 15:18:59,065:fsmgmt.fsmgmt:INFO lfsck started on lola-8
lfsck never completes.
2016-11-06 01:08:25,976:fsmgmt.fsmgmt:INFO lfsck still in progress for soaked-MDT0000 after 122865s
Examining the logs, we see that the connection to OST0002 is going up and down, orphans cannot be cleaned up.

Nov  4 15:18:32 lola-8 kernel: LustreError: 11-0: soaked-OST0017-osc-MDT0000: operation obd_ping to node 192.168.1.106@o2ib10 failed: rc = -107
Nov  4 15:18:36 lola-8 kernel: LustreError: 11-0: soaked-OST0005-osc-MDT0000: operation ost_statfs to node 192.168.1.106@o2ib10 failed: rc = -107
Nov  4 15:18:46 lola-8 kernel: LustreError: 6010:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) soaked-OST0002-osc-MDT0000: cannot cleanup orphans: rc = -107
Nov  4 15:18:57 lola-8 kernel: LustreError: 11-0: soaked-OST000b-osc-MDT0000: operation obd_ping to node 192.168.1.106@o2ib10 failed: rc = -107
Nov  4 15:18:57 lola-8 kernel: LustreError: Skipped 1 previous similar message
Nov  4 15:19:27 lola-8 kernel: Lustre: 5851:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478297366/real 1478297366]  req@ffff8807e61689c0 x1550005401109488/t0(0) o6->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:28/4 lens 664/432 e 23 to 1 dl 1478297967 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
Nov  4 15:19:27 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
Nov  4 15:19:27 lola-8 kernel: Lustre: Skipped 9 previous similar messages
Nov  4 15:19:27 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection restored to 192.168.1.104@o2ib10 (at 192.168.1.104@o2ib10)
Nov  4 15:19:27 lola-8 kernel: Lustre: Skipped 18 previous similar messages
Nov  4 15:19:27 lola-8 kernel: Lustre: 5851:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 38 previous similar messages
Nov  4 15:29:28 lola-8 kernel: Lustre: 5830:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478297967/real 1478297967]  req@ffff8803e013d380 x1550005401094688/t0(0) o1000->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:24/4 lens 520/4320 e 23 to 1 dl 1478298568 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
Nov  4 15:29:28 lola-8 kernel: Lustre: 5830:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Nov  4 15:29:28 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete
Nov  4 15:29:28 lola-8 kernel: Lustre: Skipped 1 previous similar message
Nov  4 15:29:28 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection restored to 192.168.1.104@o2ib10 (at 192.168.1.104@o2ib10)
Nov  4 15:29:28 lola-8 kernel: Lustre: Skipped 1 previous similar message
Nov  4 15:31:22 lola-8 kernel: LustreError: 6010:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) soaked-OST0002-osc-MDT0000: cannot cleanup orphans: rc = -107
Nov  4 15:39:29 lola-8 kernel: Lustre: 5840:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478298568/real 1478298568]  req@ffff8803c0c980c0 x1550005401096032/t0(0) o1000->soaked-OST0002-osc-MDT0000@192.168.1.104@o2ib10:24/4 lens 520/4320 e 23 to 1 dl 1478299169 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
Nov  4 15:39:29 lola-8 kernel: Lustre: 5840:0:(client.c:2111:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
Nov  4 15:39:29 lola-8 kernel: Lustre: soaked-OST0002-osc-MDT0000: Connection to soaked-OST0002 (at 192.168.1.104@o2ib10) was lost; in progress operations using this service will wait for recovery to complete

lola-4 during this time was experiencing difficulties communicating with the MDS, prior to the start of the lfsck.

Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
Nov  4 14:49:18 lola-4 kernel:  req@ffff88050f9c2450 
x1550005401098800/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478296163 ref 2 fl New:/2/ffffffff rc 0/-1
Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 96 previous similar messages
Nov  4 14:49:24 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
Nov  4 14:38:26 lola-4 kernel: Lustre: soaked-OST0002: Export ffff8802e106a800 already connecting from 192.168.1.113@o2ib100
Nov  4 14:38:26 lola-4 kernel: Lustre: Skipped 12 previous similar messages
Nov  4 14:39:17 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
Nov  4 14:39:17 lola-4 kernel:  req@ffff88052335ec50 x1550005401085472/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478295562 ref 2 fl New:/2/ffffffff rc 0/-1
Nov  4 14:39:17 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 95 previous similar messages
Nov  4 14:39:23 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
Nov  4 14:39:23 lola-4 kernel: Lustre: Skipped 13 previous similar messages
Nov  4 14:39:23 lola-4 kernel: Lustre: soaked-OST0002: Connection restored to soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10)
Nov  4 14:49:16 lola-4 kernel: Lustre: soaked-OST0002: Export ffff8802e106a800 already connecting from 192.168.1.113@o2ib100
Nov  4 14:49:16 lola-4 kernel: Lustre: Skipped 12 previous similar messages
Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
Nov  4 14:49:18 lola-4 kernel:  req@ffff88050f9c2450 x1550005401098800/t0(0) o1000->soaked-MDT0000-mdtlov_UUID@192.168.1.108@o2ib10:-1/-1 lens 520/0 e 0 to 0 dl 1478296163 ref 2 fl New:/2/ffffffff rc 0/-1
Nov  4 14:49:18 lola-4 kernel: Lustre: 33277:0:(service.c:1331:ptlrpc_at_send_early_reply()) Skipped 96 previous similar messages
Nov  4 14:49:24 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting

lola-4 continued dumping stacks, finally restarted the cluster
Prior to restart, dumped lctl logs from lola-4 and lola-8, those are attached with message logs



 Comments   
Comment by Peter Jones [ 08/Nov/16 ]

Fan Yong

Could you please advise on this issue?

Thanks

Peter

Comment by nasf (Inactive) [ 08/Nov/16 ]

Here is the LFSCK logs on the lola-8:

69088:00100000:10000000:31.0:1478532776.517351:0:18252:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_layout post, rc = 0
69089:00100000:10000000:16.0:1478532776.518230:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 6 for lfsck_layout: event = 6, rc = -4
69090:00100000:10000000:16.0:1478532776.518242:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 15 for lfsck_layout: event = 6, rc = -4
69091:00100000:10000000:16.0:1478532776.518248:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 8 for lfsck_layout: event = 6, rc = -4
69092:00100000:10000000:16.0:1478532776.518253:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 10 for lfsck_layout: event = 6, rc = -4
69093:00100000:10000000:16.0:1478532776.518258:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 4 for lfsck_layout: event = 6, rc = -4
69094:00100000:10000000:16.0:1478532776.518262:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 9 for lfsck_layout: event = 6, rc = -4
69095:00100000:10000000:16.0:1478532776.518265:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 13 for lfsck_layout: event = 6, rc = -4
69096:00100000:10000000:16.0:1478532776.518268:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST f for lfsck_layout: event = 6, rc = -4
69097:00100000:10000000:16.0:1478532776.518271:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST d for lfsck_layout: event = 6, rc = -4
69098:00100000:10000000:16.0:1478532776.518274:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 0 for lfsck_layout: event = 6, rc = -4
69099:00100000:10000000:16.0:1478532776.518280:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST e for lfsck_layout: event = 6, rc = -4
69100:00100000:10000000:16.0:1478532776.518283:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 16 for lfsck_layout: event = 6, rc = -4
69101:00100000:10000000:16.0:1478532776.518287:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST c for lfsck_layout: event = 6, rc = -4
69102:00100000:10000000:16.0:1478532776.518290:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 1 for lfsck_layout: event = 6, rc = -4
69103:00100000:10000000:16.0:1478532776.518293:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 11 for lfsck_layout: event = 6, rc = -4
69104:00100000:10000000:16.0:1478532776.518296:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 3 for lfsck_layout: event = 6, rc = -4
69105:00100000:10000000:16.0:1478532776.518299:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 14 for lfsck_layout: event = 6, rc = -4
69106:00100000:10000000:16.0:1478532776.518302:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 2 for lfsck_layout: event = 6, rc = -4
69107:00100000:10000000:16.0:1478532776.518305:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST b for lfsck_layout: event = 6, rc = -4
69108:00100000:10000000:16.0:1478532776.518309:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 7 for lfsck_layout: event = 6, rc = -4
69109:00100000:10000000:16.0:1478532776.518312:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 5 for lfsck_layout: event = 6, rc = -4
69110:00100000:10000000:16.0:1478532776.518315:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 17 for lfsck_layout: event = 6, rc = -4
69111:00100000:10000000:16.0:1478532776.518318:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST a for lfsck_layout: event = 6, rc = -4
69112:00100000:10000000:16.0:1478532776.518322:0:18254:0:(lfsck_lib.c:2127:lfsck_async_interpret_common()) soaked-MDT0000-osd: fail to notify OST 12 for lfsck_layout: event = 6, rc = -4
...
69151:00100000:10000000:16.0:1478532876.526908:0:18254:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = -110
69152:00100000:10000000:16.0:1478532876.526916:0:18254:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_layout LFSCK assistant thread exit: rc = 0
69153:00100000:10000000:31.0:1478532876.526967:0:18252:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_layout post, rc = 0
69154:00100000:10000000:31.0:1478532876.527030:0:18252:0:(lfsck_layout.c:4680:lfsck_layout_master_post()) soaked-MDT0000-osd: layout LFSCK master post done: rc = 0
69155:00100000:10000000:31.0:1478532876.527035:0:18252:0:(lfsck_lib.c:2573:lfsck_post_generic()) soaked-MDT0000-osd: waiting for assistant to do lfsck_namespace post, rc = 0
69156:00100000:10000000:2.0:1478532876.527095:0:18256:0:(lfsck_engine.c:1805:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant sync before exit
69193:00100000:10000000:2.0:1478532976.535859:0:18256:0:(lfsck_engine.c:1811:lfsck_assistant_engine()) soaked-MDT0000-osd: LFSCK assistant synced before exit: rc = -110
69194:00100000:10000000:2.0:1478532976.535867:0:18256:0:(lfsck_engine.c:1838:lfsck_assistant_engine()) soaked-MDT0000-osd: lfsck_namespace LFSCK assistant thread exit: rc = 0
69195:00100000:10000000:31.0:1478532976.535918:0:18252:0:(lfsck_lib.c:2585:lfsck_post_generic()) soaked-MDT0000-osd: the assistant has done lfsck_namespace post, rc = 0
69196:00100000:10000000:31.0:1478532976.535987:0:18252:0:(lfsck_namespace.c:4303:lfsck_namespace_post()) soaked-MDT0000-osd: namespace LFSCK post done: rc = 0

That means both the layout LFSCK assistant thread and the namespace LFSCK assistant thread have exited finally, although there were a lot failures when communicated with remote peers. So what Cliff saw about the LFSCK on the MDT was during the connection issues with lola-4. The MDT0000 needs to talks with the OSTs on lola-4 for layout LFSCK, since there were connection rouble with the lola-4, the LFSCK will retry until recovered or manually stopped or exited. So the LFSCK behaviour is in expectation. What we need to make clear is why the connection between the lola-8 and lola-4 were repeatedly down and up.

Comment by nasf (Inactive) [ 08/Nov/16 ]

The logs on the lola-4 show that at the beginning, the txg_quiesce_thread was blocked for some unknown reason.

Nov  4 09:31:32 lola-4 kernel: INFO: task txg_quiesce:12998 blocked for more than 120 seconds.
Nov  4 09:31:32 lola-4 kernel:      Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1
Nov  4 09:31:32 lola-4 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  4 09:31:32 lola-4 kernel: txg_quiesce   D 000000000000000c     0 12998      2 0x00000080
Nov  4 09:31:32 lola-4 kernel: ffff88040ae23d20 0000000000000046 0000000000000000 ffffffff810674be
Nov  4 09:31:32 lola-4 kernel: 0000465576d5e718 ffff880400000000 0000465576dd3cd4 000000010497fc9b
Nov  4 09:31:32 lola-4 kernel: ffff88040e8725f8 000000010497fc9d ffff88040e8725f8 ffff88040ae23fd8
Nov  4 09:31:32 lola-4 kernel: Call Trace:
Nov  4 09:31:32 lola-4 kernel: [<ffffffff810674be>] ? try_to_wake_up+0x24e/0x3e0
Nov  4 09:31:32 lola-4 kernel: [<ffffffffa0314e5d>] cv_wait_common+0x11d/0x130 [spl]
Nov  4 09:31:32 lola-4 kernel: [<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40
Nov  4 09:31:32 lola-4 kernel: [<ffffffff812a0d8c>] ? __bitmap_weight+0x8c/0xb0
Nov  4 09:31:32 lola-4 kernel: [<ffffffffa0314ec5>] __cv_wait+0x15/0x20 [spl]
Nov  4 09:31:32 lola-4 kernel: [<ffffffffa03d29fb>] txg_quiesce_thread+0x32b/0x470 [zfs]
Nov  4 09:31:32 lola-4 kernel: [<ffffffffa03d26d0>] ? txg_quiesce_thread+0x0/0x470 [zfs]
Nov  4 09:31:32 lola-4 kernel: [<ffffffffa03d26d0>] ? txg_quiesce_thread+0x0/0x470 [zfs]
Nov  4 09:31:32 lola-4 kernel: [<ffffffffa0310238>] thread_generic_wrapper+0x68/0x80 [spl]
Nov  4 09:31:32 lola-4 kernel: [<ffffffffa03101d0>] ? thread_generic_wrapper+0x0/0x80 [spl]
Nov  4 09:31:32 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
Nov  4 09:31:32 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Nov  4 09:31:32 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
Nov  4 09:31:32 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20

The txg_quiesce_thread() is used for handling the ZFS transactions. Such hung caused a lot of subsequent transactions blocked. For example:

Nov  4 09:31:33 lola-4 kernel: INFO: task ll_ost_io01_002:13089 blocked for more than 120 seconds.
Nov  4 09:31:33 lola-4 kernel:      Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1
Nov  4 09:31:33 lola-4 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  4 09:31:33 lola-4 kernel: ll_ost_io01_0 D 0000000000000008     0 13089      2 0x00000080
Nov  4 09:31:33 lola-4 kernel: ffff880409027790 0000000000000046 0000000000000000 0000000000000000
Nov  4 09:31:33 lola-4 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Nov  4 09:31:33 lola-4 kernel: 0000000000000000 0000000000000000 ffff8804090025f8 ffff880409027fd8
Nov  4 09:31:33 lola-4 kernel: Call Trace:
Nov  4 09:31:33 lola-4 kernel: [<ffffffff810a1a6e>] ? prepare_to_wait_exclusive+0x4e/0x80
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0314e5d>] cv_wait_common+0x11d/0x130 [spl]
Nov  4 09:31:33 lola-4 kernel: [<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0314ec5>] __cv_wait+0x15/0x20 [spl]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa038c81d>] dmu_tx_wait+0x21d/0x400 [zfs]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa03d1ef2>] ? txg_kick+0x92/0xd0 [zfs]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa038cbf1>] dmu_tx_assign+0xa1/0x570 [zfs]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0f06d7d>] osd_trans_start+0xed/0x430 [osd_zfs]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa10280d5>] ofd_write_attr_set+0x2c5/0x9e0 [ofd]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa102933f>] ofd_commitrw_write+0x23f/0x10a0 [ofd]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa102df1d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa102a75f>] ofd_commitrw+0x5bf/0xb10 [ofd]
Nov  4 09:31:33 lola-4 kernel: [<ffffffff81149461>] ? kzfree+0x31/0x40
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa079d5c1>] ? lprocfs_counter_add+0x151/0x1c0 [obdclass]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0a17ee4>] obd_commitrw+0x114/0x380 [ptlrpc]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0a20c80>] tgt_brw_write+0xc70/0x1530 [ptlrpc]
Nov  4 09:31:33 lola-4 kernel: [<ffffffff8105e9b6>] ? enqueue_task+0x66/0x80
Nov  4 09:31:33 lola-4 kernel: [<ffffffff8105ab8d>] ? check_preempt_curr+0x6d/0x90
Nov  4 09:31:33 lola-4 kernel: [<ffffffff810674be>] ? try_to_wake_up+0x24e/0x3e0
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0974eb0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa0a1f4bc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa09cbbc1>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
Nov  4 09:31:33 lola-4 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
Nov  4 09:31:33 lola-4 kernel: [<ffffffffa09cae90>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
Nov  4 09:31:33 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
Nov  4 09:31:33 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Nov  4 09:31:33 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
Nov  4 09:31:33 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20

These OST RPC services thread hung caused related MDT/client RPC timeout, then triggered the reconnection from the MDT/client.

Nov  4 09:48:54 lola-4 kernel: Lustre: soaked-OST0002: Client soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10) reconnecting
Nov  4 09:48:54 lola-4 kernel: Lustre: Skipped 1 previous similar message
Nov  4 09:48:54 lola-4 kernel: Lustre: soaked-OST0002: Connection restored to soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib10)
Nov  4 09:48:54 lola-4 kernel: Lustre: Skipped 12 previous similar messages
Nov  4 09:48:58 lola-4 kernel: Lustre: soaked-OST0002: Client e093a05c-49f1-1f4e-3cd0-0381aa4e6173 (at 192.168.1.130@o2ib100) reconnecting
Nov  4 09:48:58 lola-4 kernel: Lustre: Skipped 2 previous similar messages
Nov  4 09:48:58 lola-4 kernel: Lustre: soaked-OST0008: deleting orphan objects from 0x0:27891162 to 0x0:27891185

For MDT, after reconnected, it sent orphan cleanup RPC to the OST for sync the last_id. Unfortunately, related RPC service thread was blocked by the mutex ofd_seq::os_create_lock that was held by another RPC service thread which was blocked by the transaction trouble. So there were a serious of blocking. And then these blocked orphan cleanup RPC timeout and trigger more and more reconnection repeatedly.

Nov  4 10:24:05 lola-4 kernel: Pid: 21738, comm: ll_ost01_103
Nov  4 10:24:05 lola-4 kernel: 
Nov  4 10:24:05 lola-4 kernel: Call Trace:
Nov  4 10:24:05 lola-4 kernel: [<ffffffff8153b4b6>] __mutex_lock_slowpath+0x96/0x210
Nov  4 10:24:05 lola-4 kernel: [<ffffffffa102086b>] ? ofd_seq_load+0xbb/0x9c0 [ofd]
Nov  4 10:24:05 lola-4 kernel: [<ffffffff8153afdb>] mutex_lock+0x2b/0x50
Nov  4 10:24:05 lola-4 kernel: [<ffffffffa1015a28>] ofd_create_hdl+0xba8/0x2530 [ofd]
Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09baf9b>] ? lustre_pack_reply_v2+0x1eb/0x280 [ptlrpc]
Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09bb0d6>] ? lustre_pack_reply_flags+0xa6/0x1e0 [ptlrpc]
Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09bb221>] ? lustre_pack_reply+0x11/0x20 [ptlrpc]
Nov  4 10:24:05 lola-4 kernel: [<ffffffffa0a1f4bc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09cbbc1>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
Nov  4 10:24:05 lola-4 kernel: [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
Nov  4 10:24:05 lola-4 kernel: [<ffffffffa09cae90>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
Nov  4 10:24:05 lola-4 kernel: [<ffffffff810a138e>] kthread+0x9e/0xc0
Nov  4 10:24:05 lola-4 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Nov  4 10:24:05 lola-4 kernel: [<ffffffff810a12f0>] ? kthread+0x0/0xc0
Nov  4 10:24:05 lola-4 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20

So the root is why the txg_quiesce_thread was blocked.

Comment by nasf (Inactive) [ 08/Nov/16 ]

There are no detailed Lustre debug logs for the txg_quiesce_thread hung. By simply comparing the stack trace, there are some similar tickets on Jira, such as LU-5165.

Add Alex for more suggestion.

Comment by Alex Zhuravlev [ 08/Nov/16 ]

looks like a lock/tx ordering issue.

here we take the object's lock before TX

Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038c81d>] dmu_tx_wait+0x21d/0x400 [zfs]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa03d1ef2>] ? txg_kick+0x92/0xd0 [zfs]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa038cbf1>] dmu_tx_assign+0xa1/0x570 [zfs]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa0f06d7d>] osd_trans_start+0xed/0x430 [osd_zfs]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa102156c>] ofd_trans_start+0x7c/0x100 [ofd]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa1023980>] ofd_object_destroy+0x3d0/0x8e0 [ofd]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa101d2cd>] ofd_destroy_by_fid+0x35d/0x620 [ofd]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098bab0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa098db10>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc]
Nov 4 09:31:32 lola-4 kernel: [<ffffffffa10176a2>] ofd_destroy_hdl+0x2f2/0xb60 [ofd]

and here we do with TX started:

Nov 4 09:32:15 lola-4 kernel: Pid: 33274, comm: ll_ost_out01_00
Nov 4 09:32:15 lola-4 kernel:
Nov 4 09:32:15 lola-4 kernel: Call Trace:
Nov 4 09:32:15 lola-4 kernel: [<ffffffffa03a22ce>] ? dsl_dir_tempreserve_impl+0x2ae/0x330 [zfs]
Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cc55>] rwsem_down_failed_common+0x95/0x1d0
Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153cdb3>] rwsem_down_write_failed+0x23/0x30
Nov 4 09:32:15 lola-4 kernel: [<ffffffff8129e923>] call_rwsem_down_write_failed+0x13/0x20
Nov 4 09:32:15 lola-4 kernel: [<ffffffff8153c2b2>] ? down_write+0x32/0x40
Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0f0f021>] osd_object_write_lock+0x61/0x70 [osd_zfs]
Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2bbf9>] out_tx_attr_set_exec+0x69/0x480 [ptlrpc]
Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a2296a>] out_tx_end+0xda/0x5d0 [ptlrpc]
Nov 4 09:32:15 lola-4 kernel: [<ffffffffa0a282b9>] out_handle+0x11d9/0x18d0 [ptlrpc]

Comment by nasf (Inactive) [ 09/Nov/16 ]

As I remember, there seemed some discussion about the difference between MDT and OST operations for transaction and lock order. But I cannot remember the detail. Currently, the MDT side modification is start transaction before lock the object, but the OST side is the reverse order. Because the OUT handler is shared by both sides, there always be some potential deadlock. Is there any special reason for that? Can we unify them?

Comment by Gerrit Updater [ 10/Nov/16 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/23689
Subject: LU-8806 target: lock ostobj before transaction start
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 28bf0565e025ca20c72fb8d813274c142ed273fe

Comment by Alexey Lyashkov [ 18/Dec/17 ]

We hit this issue with 2.7 (IEEL3) code.
OST deadlocked with lfsck operation.

Stack : 
__schedule
schedule
rwsem_down_write_failed
call_rwsem_down_write_failed
osd_object_write_lock
out_tx_attr_set_exec
out_tx_end
out_handle
tgt_request_handle
ptlrpc_server_handle_request
ptlrpc_main
kthread
ret_from_fork
Progs:  2213 "ll_ost_out00_00

Any plans to fix it?

Comment by Peter Jones [ 18/Dec/17 ]

Alexey

This issue is presently targeted for the 2.11 release and a draft patch of one possible patch exists already. Feel free to provide an alternative approach or feedback on any testing you conduct on the existing prototype.

Peter

Comment by Alexey Lyashkov [ 18/Dec/17 ]

Peter,

i see no updates for this patch for year. If you talk about an Alex patch, it's just a part of this solution and i will add to watchers to it.
As about alternative solution - i don't see any good for now. It just a notification - it have a 2.7,2.8, 2.9 affected.
Not about 2.10/2.11.

Comment by Alexey Lyashkov [ 27/Feb/18 ]

Peter,

You have say this is targeted for 2.11 but i don't see a progress in ticket and lfsck broken continuously.

Comment by Andreas Dilger [ 05/Mar/18 ]

Alexey, the https://review.whamcloud.com/23689 patch points at https://review.whamcloud.com/27488 and 23689 has been abandoned. If you are having problems with this issue, then we welcome you to review and test that patch as you said you would do a few months ago, or submit a patch of your own. That is especially useful if this code is broken continuously for you, since it means you have a good reproducer and can verify if the fix works. I would also welcome an alternative patch based on the old ext3 delete thread, that pushes the extents/blocks to be truncated off to another temporary inode during the main transaction, and then does the actual truncate asynchronously. That would speed up all ext4 truncate/unlink operations, and not require changes to Lustre.

Comment by Andreas Dilger [ 22/Aug/18 ]

Closing this as a duplicate of LU-10048, which landed a patch for 2.12.

Generated at Sat Feb 10 02:20:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.