Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5553

Support "remount-ro" option of the ldiskfs backend

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • None
    • 3
    • 15489

    Description

      When ldiskfs hit critical errors it will remount the filesystem in read-only mode, the OST or MDT should provide read-only service properly in such situation, so that user can backup important files before further damage.

      That feature requires Lustre is able to handle -EROFS gracefully, so there could be lot of code changes in following code path:

      • Server start; (remount-ro could happen when mount ldiskfs)
      • connection handler; (it currently requires client data updating)

      Attachments

        Issue Links

          Activity

            [LU-5553] Support "remount-ro" option of the ldiskfs backend
            yong.fan nasf (Inactive) added a comment - The patch for master: https://review.whamcloud.com/24267

            In this case, LFSCK should be run in a read-only mode (--dryrun) so that it doesn't try to fix anything.

            adilger Andreas Dilger added a comment - In this case, LFSCK should be run in a read-only mode (--dryrun) so that it doesn't try to fix anything.

            LFSCK/OI Scrub should exit after initial scrub if the device is mounted read-only.

            Hm... we can make the OI scrub to exit after the initial OI scrub if the server is mounted as "-o ro". But if the admin wants to start the LFSCK manually only for system consistency routine check, and there quite probably no inconsistency, should we allow the LFSCK to be ran? As my understand, we should allow such use case.

            yong.fan nasf (Inactive) added a comment - LFSCK/OI Scrub should exit after initial scrub if the device is mounted read-only. Hm... we can make the OI scrub to exit after the initial OI scrub if the server is mounted as "-o ro". But if the admin wants to start the LFSCK manually only for system consistency routine check, and there quite probably no inconsistency, should we allow the LFSCK to be ran? As my understand, we should allow such use case.
            yong.fan nasf (Inactive) added a comment - - edited

            It probably makes sense for clients mounting the filesystem with "-o ro" should not create an entry in the last_rcvd file.

            In fact, for the server device, if the admin removed the last_rcvd file under ldiskfs mode directly, or the renaming fsname tools removed the last_rcvd file, then when mount the device as "Lustre" as "-o ro", it will still try to re-create the last_rcvd file without consider the "-o ro". That will cause mount failure. So should we prevent the last_rcvd file to be re-created if server is mounted as "-o ro" also?

            If fact, mounting the server device as read only is not special requirement only for snapshot, we should not assume that there will no llog modification when mount the MDT as read only; and there may be last_id sync up between MDT and OST. All these are server sponsored modification, if we cannot discard related modification requests quietly, it may cause the server mount failure.

            So "-o rdonly_dev" is more suitable than "-o ro".

            yong.fan nasf (Inactive) added a comment - - edited It probably makes sense for clients mounting the filesystem with "-o ro" should not create an entry in the last_rcvd file. In fact, for the server device, if the admin removed the last_rcvd file under ldiskfs mode directly, or the renaming fsname tools removed the last_rcvd file, then when mount the device as "Lustre" as "-o ro", it will still try to re-create the last_rcvd file without consider the "-o ro". That will cause mount failure. So should we prevent the last_rcvd file to be re-created if server is mounted as "-o ro" also? If fact, mounting the server device as read only is not special requirement only for snapshot, we should not assume that there will no llog modification when mount the MDT as read only; and there may be last_id sync up between MDT and OST. All these are server sponsored modification, if we cannot discard related modification requests quietly, it may cause the server mount failure. So "-o rdonly_dev" is more suitable than "-o ro".

            LFSCK/OI Scrub should exit after initial scrub if the device is mounted read-only.

            adilger Andreas Dilger added a comment - LFSCK/OI Scrub should exit after initial scrub if the device is mounted read-only.

            It probably makes sense for clients mounting the filesystem with "-o ro" should not create an entry in the last_rcvd file.

            adilger Andreas Dilger added a comment - It probably makes sense for clients mounting the filesystem with "-o ro" should not create an entry in the last_rcvd file.

            It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS

            The problem is that client/server assumes the connect flags assigned to export is a subset of what client provided, I'm afraid that (adding extra flags not provided by client) could cause troubles. In my new patch, I added a read-only flag for each osd device, and mdd/ofd operations will check the flag to decide if return -EROFS directly.

            niu Niu Yawei (Inactive) added a comment - It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS The problem is that client/server assumes the connect flags assigned to export is a subset of what client provided, I'm afraid that (adding extra flags not provided by client) could cause troubles. In my new patch, I added a read-only flag for each osd device, and mdd/ofd operations will check the flag to decide if return -EROFS directly.

            We also need to disable (skip) up layer LFSCK to avoid internal modification requests on the server side.

            yong.fan nasf (Inactive) added a comment - We also need to disable (skip) up layer LFSCK to avoid internal modification requests on the server side.

            People

              yong.fan nasf (Inactive)
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: