Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14714

allow starting with only MGS config log if local llog write fails

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.15.0
    • None
    • 9223372036854775807

    Description

      It should be possible to mount an MDT or OST filesystem using only the MGS config llog if the local OSD filesystem is full. Currently the mount fails with -28 = -ENOSPC when it can't write a local copy of the config llog. This makes it impossible to mount the MDT/OST with Lustre and clean up space in a consistent manner without expert knowledge of the filesystem structure to do the cleanup when mounted as type ldiskfs.

      mds02 kernel: LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
      ,nodelalloc
      mds02 kernel: LustreError: 5826:0:(osd_io.c:2172:osd_ldiskfs_write_record()) dm-6: error reading offset 0 (block 0, size 8192, offs 0), credits 29/29: rc = -28
      mds02 kernel: LustreError: 5826:0:(llog.c:1419:llog_backup()) MGC10.10.1.17@o2ib: failed to backup log lfs1-MDT0001: rc = -28
      mds02 kernel: LustreError: 5826:0:(mgc_request.c:1883:mgc_llog_local_copy()) MGC10.1.1.17@o2ib: failed to copy remote log lfs1-MDT0001: rc = -28
      mds02 kernel: LustreError: 5989:0:(osp_sync.c:1524:osp_sync_init()) lfs1-OST0001-osc-MDT0001: can't initialize llog: rc = -28
      mds02 kernel: LustreError: 5989:0:(obd_config.c:559:class_setup()) setup lfs1-OST0001-osc-MDT0001 failed (-28)
      mds02 kernel: LustreError: 5989:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.10.1.17@o2ib: cfg command failed: rc = -28
      mds02 kernel: Lustre:    cmd=cf003 0:lfs1-OST0001-osc-MDT0001  1:lfs1-OST0001_UUID  2:10.10.1.19@o2ib  \x0a
      mds02 kernel: LustreError: 15c-8: MGC10.10.1.17@o2ib: The configuration from log 'lfs1-MDT0001' failed (-28). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      

      Attachments

        Issue Links

          Activity

            [LU-14714] allow starting with only MGS config log if local llog write fails

            It looks like this new test_151a is timing out in Janitor testing, for example:
            https://testing.whamcloud.com/gerrit-janitor/45159/testresults/conf-sanity3-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/

            adilger Andreas Dilger added a comment - It looks like this new test_151a is timing out in Janitor testing, for example: https://testing.whamcloud.com/gerrit-janitor/45159/testresults/conf-sanity3-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55283/
            Subject: LU-14714 mgc: server to mount without local config
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 96d8987a36b9a679774545dc91633345759fba19

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55283/ Subject: LU-14714 mgc: server to mount without local config Project: fs/lustre-release Branch: master Current Patch Set: Commit: 96d8987a36b9a679774545dc91633345759fba19

            "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55283
            Subject: LU-14714 mgc: server to mount without local config
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 74f0776243cce63b8149d4a382e4219684c90867

            gerrit Gerrit Updater added a comment - "Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55283 Subject: LU-14714 mgc: server to mount without local config Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 74f0776243cce63b8149d4a382e4219684c90867

            Hello Peggy, long time no see...

            I'm not aware of anyone on our side working on this area. Definitely it makes sense to allow this to work. We've also discussed at times to have one or more "emergency object(s)" that can be deleted if there is no space in the filesystem at startup (also important for ZFS) and then recreated once there is some space.

            adilger Andreas Dilger added a comment - Hello Peggy, long time no see... I'm not aware of anyone on our side working on this area. Definitely it makes sense to allow this to work. We've also discussed at times to have one or more "emergency object(s)" that can be deleted if there is no space in the filesystem at startup (also important for ZFS) and then recreated once there is some space.

            This problem was recently hit on one of our in-house test systems for an OST that had free space available, but no free inodes.  The OST was recovered by removing some unused precreated objects.

            Just wondering whether there's any pending activity on this issue.

            peggy Peggy Gazzola added a comment - This problem was recently hit on one of our in-house test systems for an OST that had free space available, but no free inodes.  The OST was recovered by removing some unused precreated objects. Just wondering whether there's any pending activity on this issue.

            People

              tappro Mikhail Pershin
              adilger Andreas Dilger
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: