[LU-17365] steady LOD update llog connection Created: 14/Dec/23  Updated: 13/Jan/24  Resolved: 10/Jan/24

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Improvement Priority: Major
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16159 remove update logs after recovery abort Reopened
is related to LU-15934 client refused mount with -EAGAIN bec... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Continue LU-15934 work to further improve stability and revival of MDT-MDT update llog connectivity in stressful environment. Currently in progress the following improvements:

  • better filtering of possible llog errors, allow to either recreate update llog if it is damaged or retry reconnect if errors are from network
  • handle possible -78 in the same way as -ENOENT, create new llog
  • in lod_obd_get_info() try to recreate llog context if it is missing after failed/aborted MDT recovery

Note, that lod_process_config() has code to restore llogs when target is set to active:

 rc = lod_sub_prep_llog(env, lod,
                        sub_tgt->ltd_tgt,
                        sub_tgt->ltd_index);
 sub_tgt->ltd_active = !rc;

I am going to make the same in lod_obd_get_info(). These measures could greatly improve situations with lost mdt-mdt connection and need to restart MDTs

 



 Comments   
Comment by Mikhail Pershin [ 19/Dec/23 ]

"Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53510
Subject: LU-15934 lod: handle llog errors gracefuly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9d7ee71e7ada82d7d75f4d6389114e63cece4f1e

 

Comment by Gerrit Updater [ 10/Jan/24 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53510/
Subject: LU-17365 lod: handle llog errors gracefuly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e81805244476f1d3ffb5a2ecb0a85f54b936ce51

Comment by Peter Jones [ 10/Jan/24 ]

Merged for 2.16

Generated at Sat Feb 10 03:34:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.