Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5553

Support "remount-ro" option of the ldiskfs backend

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • None
    • 3
    • 15489

    Description

      When ldiskfs hit critical errors it will remount the filesystem in read-only mode, the OST or MDT should provide read-only service properly in such situation, so that user can backup important files before further damage.

      That feature requires Lustre is able to handle -EROFS gracefully, so there could be lot of code changes in following code path:

      • Server start; (remount-ro could happen when mount ldiskfs)
      • connection handler; (it currently requires client data updating)

      Attachments

        Issue Links

          Activity

            [LU-5553] Support "remount-ro" option of the ldiskfs backend

            It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS

            The problem is that client/server assumes the connect flags assigned to export is a subset of what client provided, I'm afraid that (adding extra flags not provided by client) could cause troubles. In my new patch, I added a read-only flag for each osd device, and mdd/ofd operations will check the flag to decide if return -EROFS directly.

            niu Niu Yawei (Inactive) added a comment - It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS The problem is that client/server assumes the connect flags assigned to export is a subset of what client provided, I'm afraid that (adding extra flags not provided by client) could cause troubles. In my new patch, I added a read-only flag for each osd device, and mdd/ofd operations will check the flag to decide if return -EROFS directly.

            We also need to disable (skip) up layer LFSCK to avoid internal modification requests on the server side.

            yong.fan nasf (Inactive) added a comment - We also need to disable (skip) up layer LFSCK to avoid internal modification requests on the server side.

            There is already support for clients mounting the filesystem read-only (OBD_CONNECT_READONLY), which is checked on the MDT and OST for each export:

            static int mdt_intent_opc(long itopc, struct mdt_thread_info *info,
                                      struct ldlm_lock **lockp, __u64 flags)
            {
                    if (flv->it_flags & MUTABOR &&
                        exp_connect_flags(req->rq_export) & OBD_CONNECT_RDONLY)
                            RETURN(-EROFS);
            
            static int tgt_request_preprocess(struct tgt_session_info *tsi,
                                              struct tgt_handler *h,
                                              struct ptlrpc_request *req)
            {
                    if (flags & MUTABOR && tgt_conn_flags(tsi) & OBD_CONNECT_RDONLY)
                            RETURN(-EROFS);
            

            that will cause all filesystem-modifying operations to return -EROFS from the Lustre request handlers. It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS.

            adilger Andreas Dilger added a comment - There is already support for clients mounting the filesystem read-only (OBD_CONNECT_READONLY), which is checked on the MDT and OST for each export: static int mdt_intent_opc( long itopc, struct mdt_thread_info *info, struct ldlm_lock **lockp, __u64 flags) { if (flv->it_flags & MUTABOR && exp_connect_flags(req->rq_export) & OBD_CONNECT_RDONLY) RETURN(-EROFS); static int tgt_request_preprocess(struct tgt_session_info *tsi, struct tgt_handler *h, struct ptlrpc_request *req) { if (flags & MUTABOR && tgt_conn_flags(tsi) & OBD_CONNECT_RDONLY) RETURN(-EROFS); that will cause all filesystem-modifying operations to return -EROFS from the Lustre request handlers. It seems the easiest thing to do would be to mark all exports with OBD_CONNECT_RDONLY at connect time if the underlying filesystem is mounted read-only. If it is remounted read-only due to an error then the underlying filesystem will return -EROFS .

            Well, I was thinking that the goal of this ticket is to provide 'ro' mount option for ldiskfs backend, so that administrator can try to mount servers as read-only and backup data before further e2fsck on the damaged device. After initial code review and the eng meeting in Santa Clara, Now I think the scope of this ticket should probably be expanded: the 'ro' option isn't only used to handle above emergency, it should also be used for some other daily operations (like the poor-man's snapshot mentioned in eng meeting), so I think the requirement now should be:

            • Both ldiskfs and zfs should support 'ro' mount option;
            • Performance should be considered. In read-only mode, we'd always try to drop modify request earlier rather than wait for backend fs to return EROFS;
            • Internal modify component such as OI scrub thread and OSP sync thread should be disabled from beginning.

            According to the requirement, enforcing read-only in Lustre layer looks like an reasonable choice (not like my original approach: rely on the MS_RDONLY of ldiskfs). Andreas/Alex/Fan yong, what do you think?

            niu Niu Yawei (Inactive) added a comment - Well, I was thinking that the goal of this ticket is to provide 'ro' mount option for ldiskfs backend, so that administrator can try to mount servers as read-only and backup data before further e2fsck on the damaged device. After initial code review and the eng meeting in Santa Clara, Now I think the scope of this ticket should probably be expanded: the 'ro' option isn't only used to handle above emergency, it should also be used for some other daily operations (like the poor-man's snapshot mentioned in eng meeting), so I think the requirement now should be: Both ldiskfs and zfs should support 'ro' mount option; Performance should be considered. In read-only mode, we'd always try to drop modify request earlier rather than wait for backend fs to return EROFS; Internal modify component such as OI scrub thread and OSP sync thread should be disabled from beginning. According to the requirement, enforcing read-only in Lustre layer looks like an reasonable choice (not like my original approach: rely on the MS_RDONLY of ldiskfs). Andreas/Alex/Fan yong, what do you think?
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/#/c/11721

            I think there are a couple of places where this is important:

            • accepting the "-o ro" mount option to start the filesystem as read-only
            • on initial setup some log files should not be modified if the filesystem is read-only
            • on the connection of a client if the device is mounted read-only then the export itself can get the OBD_CONNECT_RDONLY flag set so that the RPCs are aborted earlier
            adilger Andreas Dilger added a comment - I think there are a couple of places where this is important: accepting the "-o ro" mount option to start the filesystem as read-only on initial setup some log files should not be modified if the filesystem is read-only on the connection of a client if the device is mounted read-only then the export itself can get the OBD_CONNECT_RDONLY flag set so that the RPCs are aborted earlier

            People

              yong.fan nasf (Inactive)
              niu Niu Yawei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: