Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5553

Support "remount-ro" option of the ldiskfs backend

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • None
    • 3
    • 15489

    Description

      When ldiskfs hit critical errors it will remount the filesystem in read-only mode, the OST or MDT should provide read-only service properly in such situation, so that user can backup important files before further damage.

      That feature requires Lustre is able to handle -EROFS gracefully, so there could be lot of code changes in following code path:

      • Server start; (remount-ro could happen when mount ldiskfs)
      • connection handler; (it currently requires client data updating)

      Attachments

        Issue Links

          Activity

            [LU-5553] Support "remount-ro" option of the ldiskfs backend
            yong.fan nasf (Inactive) added a comment - The patch for master: https://review.whamcloud.com/24267

            In this case, LFSCK should be run in a read-only mode (--dryrun) so that it doesn't try to fix anything.

            adilger Andreas Dilger added a comment - In this case, LFSCK should be run in a read-only mode (--dryrun) so that it doesn't try to fix anything.

            LFSCK/OI Scrub should exit after initial scrub if the device is mounted read-only.

            Hm... we can make the OI scrub to exit after the initial OI scrub if the server is mounted as "-o ro". But if the admin wants to start the LFSCK manually only for system consistency routine check, and there quite probably no inconsistency, should we allow the LFSCK to be ran? As my understand, we should allow such use case.

            yong.fan nasf (Inactive) added a comment - LFSCK/OI Scrub should exit after initial scrub if the device is mounted read-only. Hm... we can make the OI scrub to exit after the initial OI scrub if the server is mounted as "-o ro". But if the admin wants to start the LFSCK manually only for system consistency routine check, and there quite probably no inconsistency, should we allow the LFSCK to be ran? As my understand, we should allow such use case.
            yong.fan nasf (Inactive) added a comment - - edited

            It probably makes sense for clients mounting the filesystem with "-o ro" should not create an entry in the last_rcvd file.

            In fact, for the server device, if the admin removed the last_rcvd file under ldiskfs mode directly, or the renaming fsname tools removed the last_rcvd file, then when mount the device as "Lustre" as "-o ro", it will still try to re-create the last_rcvd file without consider the "-o ro". That will cause mount failure. So should we prevent the last_rcvd file to be re-created if server is mounted as "-o ro" also?

            If fact, mounting the server device as read only is not special requirement only for snapshot, we should not assume that there will no llog modification when mount the MDT as read only; and there may be last_id sync up between MDT and OST. All these are server sponsored modification, if we cannot discard related modification requests quietly, it may cause the server mount failure.

            So "-o rdonly_dev" is more suitable than "-o ro".

            yong.fan nasf (Inactive) added a comment - - edited It probably makes sense for clients mounting the filesystem with "-o ro" should not create an entry in the last_rcvd file. In fact, for the server device, if the admin removed the last_rcvd file under ldiskfs mode directly, or the renaming fsname tools removed the last_rcvd file, then when mount the device as "Lustre" as "-o ro", it will still try to re-create the last_rcvd file without consider the "-o ro". That will cause mount failure. So should we prevent the last_rcvd file to be re-created if server is mounted as "-o ro" also? If fact, mounting the server device as read only is not special requirement only for snapshot, we should not assume that there will no llog modification when mount the MDT as read only; and there may be last_id sync up between MDT and OST. All these are server sponsored modification, if we cannot discard related modification requests quietly, it may cause the server mount failure. So "-o rdonly_dev" is more suitable than "-o ro".

            LFSCK/OI Scrub should exit after initial scrub if the device is mounted read-only.

            adilger Andreas Dilger added a comment - LFSCK/OI Scrub should exit after initial scrub if the device is mounted read-only.

            It probably makes sense for clients mounting the filesystem with "-o ro" should not create an entry in the last_rcvd file.

            adilger Andreas Dilger added a comment - It probably makes sense for clients mounting the filesystem with "-o ro" should not create an entry in the last_rcvd file.

            It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS

            The problem is that client/server assumes the connect flags assigned to export is a subset of what client provided, I'm afraid that (adding extra flags not provided by client) could cause troubles. In my new patch, I added a read-only flag for each osd device, and mdd/ofd operations will check the flag to decide if return -EROFS directly.

            niu Niu Yawei (Inactive) added a comment - It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS The problem is that client/server assumes the connect flags assigned to export is a subset of what client provided, I'm afraid that (adding extra flags not provided by client) could cause troubles. In my new patch, I added a read-only flag for each osd device, and mdd/ofd operations will check the flag to decide if return -EROFS directly.

            We also need to disable (skip) up layer LFSCK to avoid internal modification requests on the server side.

            yong.fan nasf (Inactive) added a comment - We also need to disable (skip) up layer LFSCK to avoid internal modification requests on the server side.

            There is already support for clients mounting the filesystem read-only (OBD_CONNECT_READONLY), which is checked on the MDT and OST for each export:

            static int mdt_intent_opc(long itopc, struct mdt_thread_info *info,
                                      struct ldlm_lock **lockp, __u64 flags)
            {
                    if (flv->it_flags & MUTABOR &&
                        exp_connect_flags(req->rq_export) & OBD_CONNECT_RDONLY)
                            RETURN(-EROFS);
            
            static int tgt_request_preprocess(struct tgt_session_info *tsi,
                                              struct tgt_handler *h,
                                              struct ptlrpc_request *req)
            {
                    if (flags & MUTABOR && tgt_conn_flags(tsi) & OBD_CONNECT_RDONLY)
                            RETURN(-EROFS);
            

            that will cause all filesystem-modifying operations to return -EROFS from the Lustre request handlers. It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS.

            adilger Andreas Dilger added a comment - There is already support for clients mounting the filesystem read-only (OBD_CONNECT_READONLY), which is checked on the MDT and OST for each export: static int mdt_intent_opc( long itopc, struct mdt_thread_info *info, struct ldlm_lock **lockp, __u64 flags) { if (flv->it_flags & MUTABOR && exp_connect_flags(req->rq_export) & OBD_CONNECT_RDONLY) RETURN(-EROFS); static int tgt_request_preprocess(struct tgt_session_info *tsi, struct tgt_handler *h, struct ptlrpc_request *req) { if (flags & MUTABOR && tgt_conn_flags(tsi) & OBD_CONNECT_RDONLY) RETURN(-EROFS); that will cause all filesystem-modifying operations to return -EROFS from the Lustre request handlers. It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS .

            Well, I was thinking that the goal of this ticket is to provide 'ro' mount option for ldiskfs backend, so that administrator can try to mount servers as read-only and backup data before further e2fsck on the damaged device. After initial code review and the eng meeting in Santa Clara, Now I think the scope of this ticket should probably be expanded: the 'ro' option isn't only used to handle above emergency, it should also be used for some other daily operations (like the poor-man's snapshot mentioned in eng meeting), so I think the requirement now should be:

            • Both ldiskfs and zfs should support 'ro' mount option;
            • Performance should be considered. In read-only mode, we'd always try to drop modify request earlier rather than wait for backend fs to return EROFS;
            • Internal modify component such as OI scrub thread and OSP sync thread should be disabled from beginning.

            According to the requirement, enforcing read-only in Lustre layer looks like an reasonable choice (not like my original approach: rely on the MS_RDONLY of ldiskfs). Andreas/Alex/Fan yong, what do you think?

            niu Niu Yawei (Inactive) added a comment - Well, I was thinking that the goal of this ticket is to provide 'ro' mount option for ldiskfs backend, so that administrator can try to mount servers as read-only and backup data before further e2fsck on the damaged device. After initial code review and the eng meeting in Santa Clara, Now I think the scope of this ticket should probably be expanded: the 'ro' option isn't only used to handle above emergency, it should also be used for some other daily operations (like the poor-man's snapshot mentioned in eng meeting), so I think the requirement now should be: Both ldiskfs and zfs should support 'ro' mount option; Performance should be considered. In read-only mode, we'd always try to drop modify request earlier rather than wait for backend fs to return EROFS; Internal modify component such as OI scrub thread and OSP sync thread should be disabled from beginning. According to the requirement, enforcing read-only in Lustre layer looks like an reasonable choice (not like my original approach: rely on the MS_RDONLY of ldiskfs). Andreas/Alex/Fan yong, what do you think?

            People

              yong.fan nasf (Inactive)
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: