[LU-13599] LustreError: 30166:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8 ) failed Created: 25/May/20  Updated: 04/Oct/22  Resolved: 09/Dec/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.4
Fix Version/s: Lustre 2.14.0, Lustre 2.12.6

Type: Bug Priority: Minor
Reporter: Stephane Thiell Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File fir-md1-s1-vmcore-dmesg_070720.txt     Text File fir-md1-s4_hang_LU-13599.log    
Issue Links:
Duplicate
is duplicated by LU-13816 LustreError: 18408:0:(mdt_handler.c:8... Resolved
Related
is related to LU-16206 PCC crashes MDS: mdt_big_xattr_get())... Open
is related to LU-13615 mdt_big_xattr_get()) ASSERTION( info-... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We hit the following crash on one of a MDS of Fir last night, running Lustre 2.12.4. Same problem occurred after re-mount of MDT and recovery. I had to kill the robinhood client that was running purge but also MDT-to-MDT migration (a single lfs migrate -m 0). Then I was able to remount this MDT. It looks a bit like LU-5185. Unfortunately, we lost the vmcore this time. But if it happens again, I'll let you know and will attach it.

[1579975.369592] Lustre: fir-MDT0003: haven't heard from client 6fb18a53-0376-4 (at 10.50.6.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff9150778e2400, cur 1590281140 expire 1590280990 last 1590280913
[1580039.870825] Lustre: fir-MDT0003: Connection restored to b7b3778b-82b4-4 (at 10.50.6.54@o2ib2)
[1600137.924189] Lustre: fir-MDT0003: haven't heard from client ed7f1c7c-f5de-4 (at 10.50.4.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff913f6360b800, cur 1590301302 expire 1590301152 last 1590301075
[1619876.303011] Lustre: fir-MDT0003: Connection restored to 796a800c-02e4-4 (at 10.49.20.10@o2ib1)
[1639768.911234] Lustre: fir-MDT0003: Connection restored to 83415c02-51ff-4 (at 10.49.20.5@o2ib1)
[1639821.000702] Lustre: fir-MDT0003: haven't heard from client 83415c02-51ff-4 (at 10.49.20.5@o2ib1) in 227 seconds. I think it's dead, and I am evicting it. exp ffff913f32f56800, cur 1590340984 expire 1590340834 last 1590340757
[1647672.215034] Lustre: fir-MDT0003: haven't heard from client 19e3d49f-43e4-4 (at 10.50.9.37@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff913f6240bc00, cur 1590348835 expire 1590348685 last 1590348608
[1647717.069200] Lustre: fir-MDT0003: Connection restored to a4c7b337-bfab-4 (at 10.50.9.37@o2ib2)
[1667613.717650] Lustre: fir-MDT0003: haven't heard from client 20e68a82-bbdb-4 (at 10.50.6.54@o2ib2) in 227 seconds. I think it's dead, and I am evicting it. exp ffff914d1725c400, cur 1590368776 expire 1590368626 last 1590368549
[1667717.713398] Lustre: fir-MDT0003: Connection restored to b7b3778b-82b4-4 (at 10.50.6.54@o2ib2)
[1692403.249073] LustreError: 30166:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8 ) failed: 
[1692403.258985] LustreError: 30166:0:(service.c:189:ptlrpc_save_lock()) LBUG
[1692403.265867] Pid: 30166, comm: mdt00_002 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
[1692403.276224] Call Trace:
[1692403.278866]  [<ffffffffc0aff7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[1692403.285611]  [<ffffffffc0aff87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[1692403.292002]  [<ffffffffc0fb8851>] ptlrpc_save_lock+0xc1/0xd0 [ptlrpc]
[1692403.298695]  [<ffffffffc14b4bab>] mdt_save_lock+0x20b/0x360 [mdt]
[1692403.305003]  [<ffffffffc14b4d5c>] mdt_object_unlock+0x5c/0x3c0 [mdt]
[1692403.311572]  [<ffffffffc14b82e7>] mdt_object_unlock_put+0x17/0x120 [mdt]
[1692403.318479]  [<ffffffffc150c4fc>] mdt_unlock_list+0x54/0x174 [mdt]
[1692403.324876]  [<ffffffffc14d3fd3>] mdt_reint_migrate+0xa03/0x1310 [mdt]
[1692403.331619]  [<ffffffffc14d4963>] mdt_reint_rec+0x83/0x210 [mdt]
[1692403.337841]  [<ffffffffc14b1273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[1692403.344586]  [<ffffffffc14bc6e7>] mdt_reint+0x67/0x140 [mdt]
[1692403.350463]  [<ffffffffc101c64a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
[1692403.357586]  [<ffffffffc0fbf43b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[1692403.365483]  [<ffffffffc0fc2da4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[1692403.371991]  [<ffffffffb98c2e81>] kthread+0xd1/0xe0
[1692403.377079]  [<ffffffffb9f77c24>] ret_from_fork_nospec_begin+0xe/0x21
[1692403.383725]  [<ffffffffffffffff>] 0xffffffffffffffff
[1692403.388918] Kernel panic - not syncing: LBUG
[1692403.393363] CPU: 44 PID: 30166 Comm: mdt00_002 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1
[1692403.406214] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.10.6 08/15/2019
[1692403.414040] Call Trace:
[1692403.416670]  [<ffffffffb9f65147>] dump_stack+0x19/0x1b
[1692403.421981]  [<ffffffffb9f5e850>] panic+0xe8/0x21f
[1692403.426954]  [<ffffffffc0aff8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[1692403.433351]  [<ffffffffc0fb8851>] ptlrpc_save_lock+0xc1/0xd0 [ptlrpc]
[1692403.439978]  [<ffffffffc14b4bab>] mdt_save_lock+0x20b/0x360 [mdt]
[1692403.446259]  [<ffffffffc14b4d5c>] mdt_object_unlock+0x5c/0x3c0 [mdt]
[1692403.452796]  [<ffffffffc14b82e7>] mdt_object_unlock_put+0x17/0x120 [mdt]
[1692403.459687]  [<ffffffffc150c4fc>] mdt_unlock_list+0x54/0x174 [mdt]
[1692403.466057]  [<ffffffffc14d3fd3>] mdt_reint_migrate+0xa03/0x1310 [mdt]
[1692403.472794]  [<ffffffffc0d3cfa9>] ? check_unlink_entry+0x19/0xd0 [obdclass]
[1692403.479942]  [<ffffffffc14d4963>] mdt_reint_rec+0x83/0x210 [mdt]
[1692403.486134]  [<ffffffffc14b1273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
[1692403.492843]  [<ffffffffc14bc6e7>] mdt_reint+0x67/0x140 [mdt]
[1692403.498721]  [<ffffffffc101c64a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
[1692403.505806]  [<ffffffffc0ff40b1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[1692403.513553]  [<ffffffffc0affbde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
[1692403.520811]  [<ffffffffc0fbf43b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[1692403.528673]  [<ffffffffc0fbb565>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
[1692403.535632]  [<ffffffffb98cfeb4>] ? __wake_up+0x44/0x50
[1692403.541065]  [<ffffffffc0fc2da4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[1692403.547537]  [<ffffffffc0fc2270>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
[1692403.555106]  [<ffffffffb98c2e81>] kthread+0xd1/0xe0
[1692403.560158]  [<ffffffffb98c2db0>] ? insert_kthread_work+0x40/0x40
[1692403.566424]  [<ffffffffb9f77c24>] ret_from_fork_nospec_begin+0xe/0x21
[1692403.573037]  [<ffffffffb98c2db0>] ? insert_kthread_work+0x40/0x40
[root@fir-md1-s4 127.0.0.1-2020-05-25-00:59:34]# 
Message from syslogd@fir-md1-s4 at May 25 09:46:42 ...
 kernel:LustreError: 29280:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8 ) failed: 

Message from syslogd@fir-md1-s4 at May 25 09:46:42 ...
 kernel:LustreError: 29280:0:(service.c:189:ptlrpc_save_lock()) LBUG
May 25 09:55:43 fir-md1-s1 kernel: Lustre: fir-MDT0003: Recovery over after 2:45, of 1302 clients 1301 recovered and 1 was evicted.


 Comments   
Comment by Peter Jones [ 26/May/20 ]

Mike

Any ideas here?

Peter

Comment by Mikhail Pershin [ 26/May/20 ]

The mdt_reint_migrate() can take quite a lot of LDLM locks while ptlrpc_reply_state has only 8 slots for saved locks. There are checks to keep that limit and don't try to save more lock that allowed but it seems there is a flaw in these checks. This part of code should be reviewed

Comment by Stephane Thiell [ 14/Jun/20 ]

FYI, we had another MDS crash today with same LBUG on our MDT0000 with Lustre 2.12.5 (Fir) while two "lfs migrate -m 3" were running.

Comment by Mikhail Pershin [ 14/Jun/20 ]

Stephane, could you report layout of migrated directory and its default LOV layout as well?

Comment by Stephane Thiell [ 15/Jun/20 ]

Hi Mike,

One  directory being migrated was /scratch/groups/jidoyaga and the crash happened just after this output from lfs migrate:

migrate /scratch/groups/./jidoyaga/rleylek/atac-seq-pipeline/cromwell-executions/atac/49a5f27e-b6bd-46a8-beae-5ec5c01c1e7e/call-overlap_pr/shard-0/inputs/275296689 to MDT3 stripe count 0

Looks like this directory si still on MDT0 and the default LOV layout seems to be our default one:

[root@sh02-01n60 ~]# lfs getdirstripe /scratch/groups/jidoyaga/rleylek/atac-seq-pipeline/cromwell-executions/atac/49a5f27e-b6bd-46a8-beae-5ec5c01c1e7e/call-overlap_pr/shard-0/inputs/275296689
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
[root@sh02-01n60 ~]# lfs getstripe -d /scratch/groups/jidoyaga/rleylek/atac-seq-pipeline/cromwell-executions/atac/49a5f27e-b6bd-46a8-beae-5ec5c01c1e7e/call-overlap_pr/shard-0/inputs/275296689
  lcm_layout_gen:    0
  lcm_mirror_count:  1
  lcm_entry_count:   3
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 0
    lcme_extent.e_end:   134217728
      stripe_count:  1       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1

    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 134217728
    lcme_extent.e_end:   137438953472
      stripe_count:  2       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1

    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 137438953472
    lcme_extent.e_end:   EOF
      stripe_count:  4       stripe_size:   4194304       pattern:       raid0       stripe_offset: -1
 

 
When I try to access /scratch/groups/jidoyaga now, it hangs. I can see this backtrace on MDT0003:

[970717.500738] Lustre: DEBUG MARKER: Sun Jun 14 20:17:42 2020
 [970890.723334] LNet: Service thread pid 48047 was inactive for 200.23s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
 [970890.740358] Pid: 48047, comm: mdt_rdpg03_008 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
 [970890.751051] Call Trace:
 [970890.753601] [<ffffffffbb588c18>] call_rwsem_down_read_failed+0x18/0x30
 [970890.760341] [<ffffffffc0f5f0bf>] llog_cat_declare_add_rec+0x4f/0x260 [obdclass]
 [970890.767880] [<ffffffffc0f56318>] llog_declare_add+0x78/0x1a0 [obdclass]
 [970890.774709] [<ffffffffc12ab8be>] top_trans_start+0x17e/0x940 [ptlrpc]
 [970890.781416] [<ffffffffc188a494>] lod_trans_start+0x34/0x40 [lod]
 [970890.787632] [<ffffffffc193f6ba>] mdd_trans_start+0x1a/0x20 [mdd]
 [970890.793870] [<ffffffffc1932c29>] mdd_attr_set+0x649/0xda0 [mdd]
 [970890.800016] [<ffffffffc17bcf5b>] mdt_mfd_close+0x6cb/0x870 [mdt]
 [970890.806257] [<ffffffffc17c25b1>] mdt_close_internal+0x121/0x220 [mdt]
 [970890.812915] [<ffffffffc17c28d0>] mdt_close+0x220/0x780 [mdt]
 [970890.818797] [<ffffffffc129a66a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
 [970890.825829] [<ffffffffc123d44b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
 [970890.833642] [<ffffffffc1240db4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
 [970890.840058] [<ffffffffbb2c2e81>] kthread+0xd1/0xe0
 [970890.845058] [<ffffffffbb977c24>] ret_from_fork_nospec_begin+0xe/0x21
 [970890.851628] [<ffffffffffffffff>] 0xffffffffffffffff
 [970890.856735] LustreError: dumping log to /tmp/lustre-log.1592191235.48047

lfs getdirstripe /scratch/groups/jidoyaga is also hanging.

I'm attaching a dump of the tasks on fir-md1-s4 (MDT0003 @ 10.0.10.54@o2ib7) while the lfs getdirstripe command is hanging on a client (10.50.0.1@o2ib2) as fir-md1-s4_hang_LU-13599.log. Thanks!

Comment by Stephane Thiell [ 15/Jun/20 ]

Also, if I start from the directory of the last migrate message, and test each parent dir, I get:

[root@sh02-01n60 ~]# lfs getdirstripe /scratch/groups/jidoyaga/rleylek/atac-seq-pipeline/cromwell-executions/atac/49a5f27e-b6bd-46a8-beae-5ec5c01c1e7e/call-overlap_pr/shard-0/inputs/275296689
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
[root@sh02-01n60 ~]# lfs getdirstripe /scratch/groups/jidoyaga/rleylek/atac-seq-pipeline/cromwell-executions/atac/49a5f27e-b6bd-46a8-beae-5ec5c01c1e7e/call-overlap_pr/shard-0/inputs
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
[root@sh02-01n60 ~]# lfs getdirstripe /scratch/groups/jidoyaga/rleylek/atac-seq-pipeline/cromwell-executions/atac/49a5f27e-b6bd-46a8-beae-5ec5c01c1e7e/call-overlap_pr/shard-0
<hang>
Comment by Stephane Thiell [ 15/Jun/20 ]

Mike,
A restart of MDT0003 fixed the hang issue (phew!). We had to force restart as umount was blocked with messages like these:

[1015186.802058] LustreError: 0-0: Forced cleanup waiting for fir-MDT0000-osp-MDT0003 namespace with 1 resources in use, (rc=-110)

But then the recovery was fast and MDT is working again.

[root@sh02-01n60 ~]# lfs getdirstripe /scratch/groups/jidoyaga
lmv_stripe_count: 2 lmv_stripe_offset: 3 lmv_hash_type: fnv_1a_64,migrating
mdtidx		 FID[seq:oid:ver]
     3		 [0x280040561:0x1220:0x0]		
     0		 [0x200000406:0xf1:0x0]	
Comment by Mikhail Pershin [ 26/Jun/20 ]

Stephane, I've found possible source of problem, making fix right now.

Comment by Stephane Thiell [ 26/Jun/20 ]

Great, thanks for the update Mike!

Comment by Gerrit Updater [ 26/Jun/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39191
Subject: LU-13599 mdt: fix logic of skipping local locks in reply_state
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: adda59c2b113480822b2a7c8fedb38e0fe2745c7

Comment by Gerrit Updater [ 26/Jun/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39194
Subject: LU-13599 mdt: fix logic of skipping local locks in reply_state
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c168bae054d35fa5dde1cb5274c8cb394df7dda8

Comment by Stephane Thiell [ 07/Jul/20 ]

Mike, tomorrow on Wednesday, we have a maintenance on our systems (including Fir), so I think I'm going to try your patch on top of 2.12.5 on the servers, as it looks ready (unless you told me otherwise!). Thanks!

Comment by Mikhail Pershin [ 07/Jul/20 ]

Stephane, yes, I think patch is ready to try

Comment by Stephane Thiell [ 08/Jul/20 ]

Hi Mike, we tried the patch, but something is wrong with it I think. With 2.12.5 + patch, we hit the following crash two times. One time just after the recovery (see fir-md1-s1-vmcore-dmesg_070720.txt, crash dump available upon request), and one time a bit later, after the recovery was done, but when we restarted a single lfs migrate -m 3 (from MDT0 - the one that crashed - to MDT3 - no crash):

Jul 07 18:54:12 fir-md1-s1 kernel: Pid: 23205, comm: mdt00_105 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
Message from syslogd@fir-md1-s1 at Jul  7 18:54:12 ...
 kernel:LustreError: 23205:0:(mdt_handler.c:892:mdt_big_xattr_get()) ASSERTION( info->mti_big_lmm_used == 0 ) failed: 
Message from syslogd@fir-md1-s1 at Jul  7 18:54:12 ...
 kernel:LustreError: 23205:0:(mdt_handler.c:892:mdt_big_xattr_get()) LBUG
Jul 07 18:54:12 fir-md1-s1 kernel: Call Trace:
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc0d6d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc0d6d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc180dcc0>] mdt_big_xattr_get+0x640/0x810 [mdt]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc180e0c7>] mdt_stripe_get+0x237/0x400 [mdt]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc18306b6>] mdt_reint_migrate+0x1056/0x1350 [mdt]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc1830a33>] mdt_reint_rec+0x83/0x210 [mdt]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc180d273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc18186e7>] mdt_reint+0x67/0x140 [mdt]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc12e566a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc128844b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffc128bdb4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffff872c2e81>] kthread+0xd1/0xe0
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffff87977c24>] ret_from_fork_nospec_begin+0xe/0x21
Jul 07 18:54:12 fir-md1-s1 kernel:  [<ffffffffffffffff>] 0xffffffffffffffff`
Comment by Stephane Thiell [ 08/Jul/20 ]

We have not applied the patch at our system maintenance today, due to the new assertion above ASSERTION( info->mti_big_lmm_used == 0 ) which seems more frequent than {{ASSERTION( rs->rs_nlocks < 8 ) }} . We'll stay with 2.12.5 until we understand more. Thanks!

Comment by Mikhail Pershin [ 08/Jul/20 ]

Stephane, this assertion was seen on master and was fixed, I will prepare patch for 2.12

Comment by Stephane Thiell [ 08/Jul/20 ]

Great, thanks Mike!

Comment by Stephane Thiell [ 22/Jul/20 ]

Hi Mike,
Do you have a separate ticket for the problem of ASSERTION( info->mti_big_lmm_used == 0 )? Just wanted to check if you had the time to backport the patch? We would like to try both patches to avoid MDT crashes during MDT migration, which we're doing quite a lot (got another crash last night). Thanks!
Stephane

Comment by Gerrit Updater [ 28/Jul/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39521
Subject: LU-13599 mdt: fix mti_big_lmm buffer usage
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: f2cac96782dc729cfe93272db51d670284b6d7aa

Comment by Mikhail Pershin [ 28/Jul/20 ]

Hello Stephane, this fix had no separate ticket in master branch but was made as side work in LU-11025
I've extracted related things from it for 2.12. Please check if it helps

Comment by Stephane Thiell [ 28/Jul/20 ]

Thanks Mike!
I've applied the two patches on one of our MDS (on top of 2.12.5) which is running now while some MDT-MDT migrations are going on. Will keep you posted!

$ git log --oneline | head -3
8ac362a LU-13599 mdt: fix mti_big_lmm buffer usage
1324114 LU-13599 mdt: fix logic of skipping local locks in reply_state
78d712a New release 2.12.5
Comment by Stephane Thiell [ 29/Jul/20 ]

Hi Mike,

I'm glad to report that all 4 MDSes on Fir have now the two patches and so far have been running without any issue, even with multiple parallel lfs migrate -m  running on a client. I'll let you know if I see any issue but it's very promising!

Thanks so much!

Comment by Olaf Faaland [ 30/Jul/20 ]

Hi Peter or Mike,
Can you talk appropriate folks into reviewing the patches? Thanks

Comment by Peter Jones [ 31/Jul/20 ]

ofaaland I think that we're all set now - just need things to land

Comment by Gerrit Updater [ 07/Aug/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39191/
Subject: LU-13599 mdt: fix logic of skipping local locks in reply_state
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: dec36101852a8d300a6fdcc28c8d723989544aaa

Comment by Stephane Thiell [ 20/Aug/20 ]

Just checking regarding https://review.whamcloud.com/#/c/39521/

This patch is critical to avoid MDS crashes and has not landed into b2_12 yet.

Thanks!

Comment by Gerrit Updater [ 01/Sep/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39521/
Subject: LU-13599 mdt: fix mti_big_lmm buffer usage
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: b09b533b6f443c359e671e7b65208355d5c201dd

Comment by Stephane Thiell [ 01/Sep/20 ]

Thanks y'all!  

Comment by Gerrit Updater [ 08/Sep/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39194/
Subject: LU-13599 mdt: add test for rs_lock limit exceeding
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3448cdc16e361d2504f2f5b0982c92d7a0de933d

Comment by Peter Jones [ 09/Dec/20 ]

Landed for 2.12.6. Fixed on master as part of a larger change (LU-11025)

Generated at Sat Feb 10 03:02:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.