[LU-14764] Single OST quotas out of sync (with hung I/O) and acquire quota failed:-115 Created: 15/Jun/21  Updated: 23/Nov/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.6
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Stephane Thiell Assignee: Hongchao Zhang
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS 7.9, ldiskfs


Attachments: Text File oak-MDT0000-oak-io6-s2-export.txt     File oak-OST0135_quota_issue.log.gz     Text File oak-io6-s2-LU-14764-sysrq-t.log     File oak-md1-s2-dk+quota+trace.log.gz    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We're enforcing group quota limits on Oak. They usually work great. After a MDS crash with 2.12.6 (we hit unrelated bug LU-14705) and restart of the MDTs, one OST (oak-OST0135) did not came back in sync with the quota master, and it's still out of sync. A few groups are out of quotas on this OST and this is generating hung I/Os at least new files that are stripped on this OST. As a workaround, I have set max_create_count=0 on all MDTs for this OST as a temporary workaround (and this worked).

Example of a group out of quotas on OST0135 (index 309):

# lfs quota -v -g oak_cblish /oak
[...]
oak-OST0134_UUID
                243010468       - 246011728       -       -       -       -       -
oak-OST0135_UUID
                256548944*      - 256542812       -       -       -       -       -
oak-OST0136_UUID
                85301696       - 86780916       -       -       -       -       -
oak-OST0137_UUID
                156684012       - 160709804       -       -       -       -       -
oak-OST0138_UUID
                35652032       - 36721800       -       -       -       -       -
oak-OST0139_UUID
                8830024       - 12589328       -       -       -       -       -

Example of hung I/O on clients:

[2021-06-14T15:31:23-07:00] [3776399.960372] tar             S ffff8b55ed7cb180     0 13047   4645 0x00000000^M
[2021-06-14T15:31:23-07:00] [3776399.968457] Call Trace:^M                      
[2021-06-14T15:31:23-07:00] [3776399.971387]  [<ffffffffb90b042b>] ? recalc_sigpending+0x1b/0x70^M
[2021-06-14T15:31:23-07:00] [3776399.978194]  [<ffffffffb9788eb9>] schedule+0x29/0x70^M
[2021-06-14T15:31:23-07:00] [3776399.983972]  [<ffffffffc0e935b5>] cl_sync_io_wait+0x2b5/0x3d0 [obdclass]^M
[2021-06-14T15:31:23-07:00] [3776399.991654]  [<ffffffffb90dadf0>] ? wake_up_state+0x20/0x20^M
[2021-06-14T15:31:23-07:00] [3776399.998377]  [<ffffffffc0e93d78>] cl_io_submit_sync+0x178/0x270 [obdclass]^M
[2021-06-14T15:31:23-07:00] [3776400.006298]  [<ffffffffc125d1d6>] vvp_io_commit_sync+0x106/0x340 [lustre]^M
[2021-06-14T15:31:23-07:00] [3776400.014084]  [<ffffffffc125e5b6>] vvp_io_write_commit+0x4c6/0x600 [lustre]^M
[2021-06-14T15:31:23-07:00] [3776400.021971]  [<ffffffffc125ed25>] vvp_io_write_start+0x635/0xa70 [lustre]^M
[2021-06-14T15:31:23-07:00] [3776400.029760]  [<ffffffffc0e90225>] ? cl_lock_enqueue+0x65/0x120 [obdclass]^M
[2021-06-14T15:31:23-07:00] [3776400.037550]  [<ffffffffc0e92788>] cl_io_start+0x68/0x130 [obdclass]^M
[2021-06-14T15:31:23-07:00] [3776400.044750]  [<ffffffffc0e949fc>] cl_io_loop+0xcc/0x1c0 [obdclass]^M
[2021-06-14T15:31:23-07:00] [3776400.051848]  [<ffffffffc121407b>] ll_file_io_generic+0x63b/0xc90 [lustre]^M
[2021-06-14T15:31:23-07:00] [3776400.059631]  [<ffffffffc1214b69>] ll_file_aio_write+0x289/0x660 [lustre]^M
[2021-06-14T15:31:23-07:00] [3776400.067317]  [<ffffffffc1215040>] ll_file_write+0x100/0x1c0 [lustre]^M
[2021-06-14T15:31:23-07:00] [3776400.074609]  [<ffffffffb924de00>] vfs_write+0xc0/0x1f0^M
[2021-06-14T15:31:23-07:00] [3776400.080541]  [<ffffffffb924ebdf>] SyS_write+0x7f/0xf0^M
[2021-06-14T15:31:23-07:00] [3776400.086380]  [<ffffffffb9795f92>] system_call_fastpath+0x25/0x2a^M

 
I tried to trigger a force_reint for this OST by doing:

echo 1 > /proc/fs/lustre/osd-ldiskfs/oak-OST0135/quota_slave/force_reint 

but that didn't seem to help. The status of the quota slave on this OST is now:

[root@oak-io6-s2 ~]# cat /proc/fs/lustre/osd-ldiskfs/oak-OST0135/quota_slave/info
target name:    oak-OST0135
pool ID:        0
type:           dt
quota enabled:  g
conn to master: setup
space acct:     ugp
user uptodate:  glb[0],slv[0],reint[0]
group uptodate: glb[0],slv[0],reint[1]
project uptodate: glb[0],slv[0],reint[0]

but should likely be:

group uptodate: glb[1],slv[1],reint[0]

like other healthy OSTs.

I've taken some debug logs with +trace and +quota from the OSS in question that I'm attaching as oak-OST0135_quota_issue.log.gz

We can see some -115 EINPROGRESS errors in the logs:

[root@oak-io6-s2 ~]# grep qsd:oak-OST0135 oak-OST0135_quota_issue.log  | head
00040000:04000000:46.0:1623797919.334289:0:107324:0:(qsd_handler.c:732:qsd_op_begin0()) $$$ op_begin space:12 qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:0 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:46.0:1623797919.334291:0:107324:0:(qsd_handler.c:636:qsd_acquire()) $$$ acquiring:12 count=0 qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:12 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:46.0:1623797919.334312:0:107324:0:(qsd_entry.c:247:qsd_refresh_usage()) $$$ disk usage: 119769696 qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:12 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:46.0:1623797919.334316:0:107324:0:(qsd_handler.c:121:qsd_ready()) $$$ not up-to-date, dropping request and kicking off reintegration qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:12 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:46.0:1623797919.334320:0:107324:0:(qsd_handler.c:751:qsd_op_begin0()) $$$ acquire quota failed:-115 qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:12 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:46.0:1623797919.334330:0:107324:0:(qsd_entry.c:247:qsd_refresh_usage()) $$$ disk usage: 119769696 qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:0 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:46.0:1623797919.334332:0:107324:0:(qsd_handler.c:222:qsd_calc_adjust()) $$$ overrun, reporting usage qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:0 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:46.0:1623797919.334334:0:107324:0:(qsd_handler.c:121:qsd_ready()) $$$ not up-to-date, dropping request and kicking off reintegration qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:0 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:46.0:1623797919.334336:0:107324:0:(qsd_handler.c:929:qsd_adjust()) $$$ delaying adjustment since qsd isn't ready qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:0 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
00040000:04000000:18.0:1623797930.024599:0:271450:0:(qsd_entry.c:247:qsd_refresh_usage()) $$$ disk usage: 232050092 qsd:oak-OST0135 qtype:grp id:4722 enforced:1 granted: 268435456 pending:0 waiting:0 req:0 usage: 232050092 qunit:0 qtune:0 edquot:0 default:no

Please let me know if you have any idea of what could be wrong, help troubleshoot, and/or if there is a way to force a resync in a different way. Otherwise, I'm going to restart this OST at a scheduled maintenance. Thanks!



 Comments   
Comment by Peter Jones [ 16/Jun/21 ]

Hongchao

Could you please advise

Thanks

Peter

Comment by Hongchao Zhang [ 17/Jun/21 ]

Hi,
Could you please collect the debug log at MDT0000? the QMT device is resided at MDT0000, which handles the quota requests
from MDTs & OSTs, and the error -115 was returned by QMT.
Thanks!

Comment by Stephane Thiell [ 18/Jun/21 ]

Hi Hongchao,

Thanks! OK. I've tried, but I don't think I was able to capture the error. I'm not sure why, I don't see any error -115 on the QMT, nor even mention of OST0135 itself. I used +quota and +trace debug options on MDT0 and create a file on OST index 309 (OST0135) and started a dd to write data to it. I also used lfs quota -v during that time. I can see many other OSTs being queried like:

lookup slave index file for oak-MDT0000-lwp-OST006b_UUID

but I can't find OST0135...

Just in case, I'm attaching the logs as oak-md1-s2-dk+quota+trace.log.gz.

Comment by Stephane Thiell [ 18/Jun/21 ]

Hongchao,
Are you sure the -115 comes from the QMT? For example in quota/qsd_handler.c function qsd_ready() returns -115 when reintegration is still in progress:

 111         /* In most case, reintegration must have been triggered (when enable
 112          * quota or on OST start), however, in rare race condition (enabling
 113          * quota when starting OSTs), we might miss triggering reintegration
 114          * for some qqi.
 115          *
 116          * If the previous reintegration failed for some reason, we'll
 117          * re-trigger it here as well. */
 118         if (!qqi->qqi_glb_uptodate || !qqi->qqi_slv_uptodate) {
 119                 read_unlock(&qsd->qsd_lock);
 120                 LQUOTA_DEBUG(lqe, "not up-to-date, dropping request and "
 121                              "kicking off reintegration");
 122                 qsd_start_reint_thread(qqi);
 123                 RETURN(-EINPROGRESS);
 124         }

which generated:

00040000:04000000:46.0:1623797919.334316:0:107324:0:(qsd_handler.c:121:qsd_ready()) $$$ not up-to-date, dropping request and kicking off reintegration qsd:oak-OST0135 qtype:grp id:8663 enforced:1 granted: 119761748 pending:0 waiting:12 req:0 usage: 119769696 qunit:0 qtune:0 edquot:0 default:no
Comment by Hongchao Zhang [ 22/Jun/21 ]

Hi Stephane,

Sorry!
Yes, the error -EINPROGRESS(-115) can be caused by the in QSD because the QSD is not in sync with QMT.

At OSS hosting OST0135, is there a process with name "qsd_rein_2.oak-OST0135”? this should be the kernel thread
to re-integrate with QMT, "cat /proc/<PID>/stack" could dump the stack to show where it was stuck?
Thanks!

Comment by Stephane Thiell [ 22/Jun/21 ]

Thanks Hongchao! I've attached a dump of all kernel tasks of the OSS as oak-io6-s2-LU-14764-sysrq-t.log

I can see this task:

[2021-06-22T08:56:09-07:00] [1031942.051787] qsd_reint_1.oak S ffff8b6f1a37e300     0 88577      2 0x00000080
[2021-06-22T08:56:09-07:00] [1031942.059042] Call Trace:
[2021-06-22T08:56:09-07:00] [1031942.061663]  [<ffffffffab187229>] schedule+0x29/0x70
[2021-06-22T08:56:09-07:00] [1031942.066788]  [<ffffffffab184c58>] schedule_timeout+0x168/0x2d0
[2021-06-22T08:56:09-07:00] [1031942.072782]  [<ffffffffaaaad6e0>] ? __internal_add_timer+0x130/0x130
[2021-06-22T08:56:09-07:00] [1031942.079314]  [<ffffffffc10f9600>] ? ptlrpc_init_rq_pool+0x110/0x110 [ptlrpc]
[2021-06-22T08:56:09-07:00] [1031942.086542]  [<ffffffffc1103350>] ptlrpc_set_wait+0x480/0x790 [ptlrpc]
[2021-06-22T08:56:09-07:00] [1031942.093228]  [<ffffffffaaadaf40>] ? wake_up_state+0x20/0x20
[2021-06-22T08:56:09-07:00] [1031942.098983]  [<ffffffffc11036e3>] ptlrpc_queue_wait+0x83/0x230 [ptlrpc]
[2021-06-22T08:56:09-07:00] [1031942.105764]  [<ffffffffc154d005>] qsd_fetch_index+0x175/0x490 [lquota]
[2021-06-22T08:56:09-07:00] [1031942.112451]  [<ffffffffc15517b8>] qsd_reint_index+0x5c8/0x15d0 [lquota]
[2021-06-22T08:56:09-07:00] [1031942.119228]  [<ffffffffc1553bb2>] qsd_reint_main+0x902/0xe90 [lquota]
[2021-06-22T08:56:09-07:00] [1031942.125834]  [<ffffffffc15532b0>] ? qsd_reconciliation+0xaf0/0xaf0 [lquota]
[2021-06-22T08:56:09-07:00] [1031942.132955]  [<ffffffffaaac5c21>] kthread+0xd1/0xe0
[2021-06-22T08:56:09-07:00] [1031942.137996]  [<ffffffffaaac5b50>] ? insert_kthread_work+0x40/0x40
[2021-06-22T08:56:09-07:00] [1031942.144245]  [<ffffffffab194ddd>] ret_from_fork_nospec_begin+0x7/0x21
[2021-06-22T08:56:09-07:00] [1031942.150845]  [<ffffffffaaac5b50>] ? insert_kthread_work+0x40/0x40
Comment by Hongchao Zhang [ 24/Jun/21 ]

it seems the reintegration thread was stuck

00000100:00000001:8.0:1623797928.588752:0:88577:0:(client.c:2206:ptlrpc_expired_set()) Process entered
00000100:00000001:8.0:1623797928.588753:0:88577:0:(client.c:2241:ptlrpc_expired_set()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797928.588755:0:88577:0:(client.c:2294:ptlrpc_set_next_timeout()) Process entered
00000100:00000001:8.0:1623797928.588755:0:88577:0:(client.c:2330:ptlrpc_set_next_timeout()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797928.588756:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797928.588757:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797928.588758:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797928.588759:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797928.588759:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797928.588760:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797928.588760:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797928.588760:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797928.588761:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797928.588761:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)

00000100:00000001:8.0:1623797929.588669:0:88577:0:(client.c:2206:ptlrpc_expired_set()) Process entered
00000100:00000001:8.0:1623797929.588670:0:88577:0:(client.c:2241:ptlrpc_expired_set()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797929.588671:0:88577:0:(client.c:2294:ptlrpc_set_next_timeout()) Process entered
00000100:00000001:8.0:1623797929.588671:0:88577:0:(client.c:2330:ptlrpc_set_next_timeout()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797929.588672:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797929.588673:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797929.588673:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797929.588674:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797929.588675:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797929.588675:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797929.588676:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797929.588676:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797929.588676:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797929.588676:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)

00000100:00000001:8.0:1623797930.588710:0:88577:0:(client.c:2206:ptlrpc_expired_set()) Process entered
00000100:00000001:8.0:1623797930.588721:0:88577:0:(client.c:2241:ptlrpc_expired_set()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797930.588723:0:88577:0:(client.c:2294:ptlrpc_set_next_timeout()) Process entered
00000100:00000001:8.0:1623797930.588723:0:88577:0:(client.c:2330:ptlrpc_set_next_timeout()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797930.588724:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797930.588725:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797930.588725:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797930.588726:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797930.588726:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797930.588727:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797930.588727:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797930.588727:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797930.588727:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797930.588728:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)

00000100:00000001:8.0:1623797931.588670:0:88577:0:(client.c:2206:ptlrpc_expired_set()) Process entered
00000100:00000001:8.0:1623797931.588671:0:88577:0:(client.c:2241:ptlrpc_expired_set()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797931.588673:0:88577:0:(client.c:2294:ptlrpc_set_next_timeout()) Process entered
00000100:00000001:8.0:1623797931.588673:0:88577:0:(client.c:2330:ptlrpc_set_next_timeout()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797931.588674:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797931.588675:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797931.588676:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797931.588676:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797931.588677:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797931.588677:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797931.588677:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797931.588678:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797931.588678:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797931.588678:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)

00000100:00000001:8.0:1623797932.588711:0:88577:0:(client.c:2206:ptlrpc_expired_set()) Process entered
00000100:00000001:8.0:1623797932.588712:0:88577:0:(client.c:2241:ptlrpc_expired_set()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797932.588714:0:88577:0:(client.c:2294:ptlrpc_set_next_timeout()) Process entered
00000100:00000001:8.0:1623797932.588715:0:88577:0:(client.c:2330:ptlrpc_set_next_timeout()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797932.588715:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797932.588717:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797932.588717:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797932.588719:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797932.588720:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:8.0:1623797932.588720:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:8.0:1623797932.588721:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797932.588721:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:8.0:1623797932.588722:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:8.0:1623797932.588722:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)

00000100:00000001:10.0:1623797933.588682:0:88577:0:(client.c:2206:ptlrpc_expired_set()) Process entered
00000100:00000001:10.0:1623797933.588682:0:88577:0:(client.c:2241:ptlrpc_expired_set()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797933.588683:0:88577:0:(client.c:2294:ptlrpc_set_next_timeout()) Process entered
00000100:00000001:10.0:1623797933.588684:0:88577:0:(client.c:2330:ptlrpc_set_next_timeout()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:10.0:1623797933.588684:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:10.0:1623797933.588685:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797933.588686:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:10.0:1623797933.588687:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797933.588688:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:10.0:1623797933.588688:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:10.0:1623797933.588688:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797933.588689:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:10.0:1623797933.588689:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797933.588689:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)

00000100:00000001:10.0:1623797934.588671:0:88577:0:(client.c:2206:ptlrpc_expired_set()) Process entered
00000100:00000001:10.0:1623797934.588672:0:88577:0:(client.c:2241:ptlrpc_expired_set()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797934.588674:0:88577:0:(client.c:2294:ptlrpc_set_next_timeout()) Process entered
00000100:00000001:10.0:1623797934.588674:0:88577:0:(client.c:2330:ptlrpc_set_next_timeout()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:10.0:1623797934.588675:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:10.0:1623797934.588675:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797934.588676:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:10.0:1623797934.588676:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797934.588677:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:10.0:1623797934.588677:0:88577:0:(client.c:1703:ptlrpc_check_set()) Process entered
00000100:00000001:10.0:1623797934.588677:0:88577:0:(client.c:2624:ptlrpc_unregister_reply()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797934.588678:0:88577:0:(client.c:1172:ptlrpc_import_delay_req()) Process entered
00000100:00000001:10.0:1623797934.588678:0:88577:0:(client.c:1227:ptlrpc_import_delay_req()) Process leaving (rc=1 : 1 : 1)
00000100:00000001:10.0:1623797934.588679:0:88577:0:(client.c:2117:ptlrpc_check_set()) Process leaving (rc=0 : 0 : 0)

it should be caused by the failed reconnection between OST135 and MDT0000 (using LWP)

int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)  
{
                ...
                if (req->rq_phase == RQ_PHASE_RPC) {
                                ...
                                spin_lock(&imp->imp_lock);                      
                                if (ptlrpc_import_delay_req(imp, req, &status)){      <--- here, ptlrpc_import_delay_req return 1
                                        /* put on delay list - only if we wait  
                                         * recovery finished - before send */   
                                        list_del_init(&req->rq_list);           
                                        list_add_tail(&req->rq_list,            
                                                          &imp->                
                                                          imp_delayed_list);    
                                        spin_unlock(&imp->imp_lock);            
                                        continue;                               
                                }
                                ...
                }
                ...            

Could you please get the value "cat /proc/fs/lustre/mdt/oak-MDT0000/exports/xxx/export" at MDT0000?
Thanks!

Comment by Stephane Thiell [ 24/Jun/21 ]

Hi Hongchao,
Ah, that's interesting indeed! I can't find OST0135 in /proc/fs/lustre/mdt/oak-MDT0000/exports/10.0.2.104@o2ib5/export (NID of OSS for this OST is 10.0.2.104@o2ib5). Attaching the output as oak-MDT0000-oak-io6-s2-export.txt

This filesystem has 6 MDTs, index 0 to 5. We can see that for the other MDTs, the OST is connected:

# clush -w @mds 'grep -H OST0135 /proc/fs/lustre/mdt/oak-MDT*/exports/10.0.2.104@o2ib5/export'
oak-md1-s2: /proc/fs/lustre/mdt/oak-MDT0003/exports/10.0.2.104@o2ib5/export:oak-MDT0003-lwp-OST0135_UUID:
oak-md1-s1: /proc/fs/lustre/mdt/oak-MDT0001/exports/10.0.2.104@o2ib5/export:oak-MDT0001-lwp-OST0135_UUID:
oak-md1-s1: /proc/fs/lustre/mdt/oak-MDT0002/exports/10.0.2.104@o2ib5/export:oak-MDT0002-lwp-OST0135_UUID:
oak-md2-s2: /proc/fs/lustre/mdt/oak-MDT0005/exports/10.0.2.104@o2ib5/export:oak-MDT0005-lwp-OST0135_UUID:
oak-md2-s1: /proc/fs/lustre/mdt/oak-MDT0004/exports/10.0.2.104@o2ib5/export:oak-MDT0004-lwp-OST0135_UUID:

For example, that is the export with MDT0001:

oak-MDT0001-lwp-OST0135_UUID:
    name: oak-MDT0001
    client: 10.0.2.104@o2ib5
    connect_flags: [ version, adaptive_timeouts, lru_resize, fid_is_enabled, full20, lvb_type, lightweight_conn, lfsck, bulk_mbits ]
    connect_data:
       flags: 0x2041401043000020
       instance: 47
       target_version: 2.12.6.0
    export_flags: [  ]

Is there something more we could do to help troubleshoot this, without having to restart the OST for example, perhaps there is a way to reset the LWP between MDT0000 and OST0135 somehow? Thanks again for taking the time to look at this!

Comment by Stephane Thiell [ 24/Jun/21 ]

Also, the osp between MDT0000 and OST0135 seems initialized, unlike the LWP:

[root@oak-md1-s2 ~]# cat /sys/fs/lustre/osp/oak-OST0135-osc-MDT0000/active 
1
[root@oak-md1-s2 ~]# cat /sys/fs/lustre/osp/oak-OST0135-osc-MDT0000/uuid 
oak-MDT0000-mdtlov_UUID

It seems like this could be a consequence of a config llog problem we have been having that I reported in LU-14695.

Comment by Hongchao Zhang [ 25/Jun/21 ]

I have looked the code lines related to LWP, but can't find the way to reconnect forcibly yet, sorry!
Does the dmesg log at OST0135 have any logs related to the reconnection? normally, there will be logs if the connection is lost.
Thanks!

Comment by Sergey Cheremencev [ 23/Nov/22 ]

If you had pool quotas here, this looks similar with LU-16339.

Generated at Sat Feb 10 03:12:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.