Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Send OSP_DISCONNECT only on health import. Otherwise, force local disconnect for unhealthy imports.

      Attachments

        Activity

          [LU-15020] OSP_DISCONNECT blocking MDT unmount
          pjones Peter Jones added a comment - Fix on master by https://review.whamcloud.com/#/c/44753/

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44753/
          Subject: EX-3687 osp: do force disconnect if import is not ready
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 8203c0f7a043aad9d087018119e278e4279ca8bc

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44753/ Subject: EX-3687 osp: do force disconnect if import is not ready Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8203c0f7a043aad9d087018119e278e4279ca8bc

          "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44753
          Subject: EX-3687 osp: do force disconnect if import is not ready
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 1a5d067b340e0b62f5577a20779401427ca0adca

          gerrit Gerrit Updater added a comment - "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44753 Subject: EX-3687 osp: do force disconnect if import is not ready Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1a5d067b340e0b62f5577a20779401427ca0adca
          tappro Mikhail Pershin added a comment - - edited

          adilger, I am not sure about this hang while MDT unmounting is related to stat() call you've mentioned. That problem and related patch are for client unmount and client RPC, but here we have local mountpoint unmount on server, I doubt it causes inter MDT stat, though there can be some other RPC.

          As for osp_disconnect() thing, simplest thing would be just to call ptlrpc_disconnect_import() with obd_force set if import is in recovery, so there will no waiting for import to recover and no disconnect RPC, if import is healthy then disconnect will be send, so other MDT could clean related resources.

          Another my question is about whole situation as per description, it states that server hangs waiting for response to DISCONNECT RPC, at the same time this RPC is sent always with rq_no_resent flag, so it should fail after timeout but not hang forever. So was that hang observed by customers are just long in time or it never ends really?

          tappro Mikhail Pershin added a comment - - edited adilger , I am not sure about this hang while MDT unmounting is related to stat() call you've mentioned. That problem and related patch are for client unmount and client RPC, but here we have local mountpoint unmount on server, I doubt it causes inter MDT stat, though there can be some other RPC. As for osp_disconnect() thing, simplest thing would be just to call ptlrpc_disconnect_import()  with obd_force set if import is in recovery, so there will no waiting for import to recover and no disconnect RPC, if import is healthy then disconnect will be send, so other MDT could clean related resources. Another my question is about whole situation as per description, it states that server hangs waiting for response to DISCONNECT RPC, at the same time this RPC is sent always with rq_no_resent flag, so it should fail after timeout but not hang forever. So was that hang observed by customers are just long in time or it never ends really?
          tappro Mikhail Pershin added a comment - - edited

          While I am checking how to make server disconnect gracefully, possible way to go with --force umount is to set device read-only before that, in that case clients will be preserved on server I think.

          tappro Mikhail Pershin added a comment - - edited While I am checking how to make server disconnect gracefully, possible way to go with --force umount is to set device read-only before that, in that case clients will be preserved on server I think.
          jhammond John Hammond added a comment - - edited

          Setup a FS with MDTs spread over 2 VMs. Start the FS, do some cross MDT operations, destroy one VM (no unmount, no shutdown) and try to umount (no --force) an MDT on the other vm.

          jhammond John Hammond added a comment - - edited Setup a FS with MDTs spread over 2 VMs. Start the FS, do some cross MDT operations, destroy one VM (no unmount, no shutdown) and try to umount (no --force) an MDT on the other vm.
          tappro Mikhail Pershin added a comment - - edited

          From these reports about MDT hangs, what would be the easiest way to reproduce that issue?

          tappro Mikhail Pershin added a comment - - edited From these reports about MDT hangs, what would be the easiest way to reproduce that issue?

          People

            tappro Mikhail Pershin
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: