Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      Continue LU-15934 work to further improve stability and revival of MDT-MDT update llog connectivity in stressful environment. Currently in progress the following improvements:

      • better filtering of possible llog errors, allow to either recreate update llog if it is damaged or retry reconnect if errors are from network
      • handle possible -78 in the same way as -ENOENT, create new llog
      • in lod_obd_get_info() try to recreate llog context if it is missing after failed/aborted MDT recovery

      Note, that lod_process_config() has code to restore llogs when target is set to active:

       rc = lod_sub_prep_llog(env, lod,
                              sub_tgt->ltd_tgt,
                              sub_tgt->ltd_index);
       sub_tgt->ltd_active = !rc;

      I am going to make the same in lod_obd_get_info(). These measures could greatly improve situations with lost mdt-mdt connection and need to restart MDTs

       

      Attachments

        Issue Links

          Activity

            [LU-17365] steady LOD update llog connection
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53510/
            Subject: LU-17365 lod: handle llog errors gracefuly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e81805244476f1d3ffb5a2ecb0a85f54b936ce51

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53510/ Subject: LU-17365 lod: handle llog errors gracefuly Project: fs/lustre-release Branch: master Current Patch Set: Commit: e81805244476f1d3ffb5a2ecb0a85f54b936ce51
            tappro Mikhail Pershin added a comment - - edited

            "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53510
            Subject: LU-15934 lod: handle llog errors gracefuly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9d7ee71e7ada82d7d75f4d6389114e63cece4f1e

             

            tappro Mikhail Pershin added a comment - - edited "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53510 Subject: LU-15934 lod: handle llog errors gracefuly Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9d7ee71e7ada82d7d75f4d6389114e63cece4f1e  

            People

              tappro Mikhail Pershin
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: