Details

    • Bug
    • Resolution: Incomplete
    • Blocker
    • None
    • Lustre 2.5.3
    • None
    • 1
    • 9223372036854775807

    Description

      When trying to mount mdt all osp-syn threads stuck in 'D' state.

      Debug logs are filled with these messages

      00000004:00080000:8.0:1463850740.016156:0:14081:0:(osp_sync.c:317:osp_sync_request_commit_cb()) commit req ffff883ebf799800, transno 0
      00000004:00080000:8.0:1463850740.016164:0:14081:0:(osp_sync.c:351:osp_sync_interpret()) reply req ffff883ebf799800/1, rc -2, transno 0
      00000100:00100000:8.0:1463850740.016176:0:14081:0:(client.c:1872:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_3:nbp2-MDT0000-mdtlov_UUID:14081:1534957896521600:10.151.26.98@o2ib:6
      00000004:00080000:9.0:1463850740.016219:0:14087:0:(osp_sync.c:317:osp_sync_request_commit_cb()) commit req ffff883ebed48800, transno 0
      00000004:00080000:9.0:1463850740.016226:0:14087:0:(osp_sync.c:351:osp_sync_interpret()) reply req ffff883ebed48800/1, rc -2, transno 0
      

      I will upload full debug logs to ftp site.

      Attachments

        Issue Links

          Activity

            [LU-8177] osp-syn threads in D state

            Thank you for the update Mahmoud.

            I think we'll keep the priority as it is for the time being (for recording purposes).

            Do you want us to keep the ticket open for a while longer? Or do you think this event is now resolved?

            Best regards,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Thank you for the update Mahmoud. I think we'll keep the priority as it is for the time being (for recording purposes). Do you want us to keep the ticket open for a while longer? Or do you think this event is now resolved? Best regards, ~ jfc.

            Ok thanks. You may lower the priority of the case. It did finish.

            mhanafi Mahmoud Hanafi added a comment - Ok thanks. You may lower the priority of the case. It did finish.
            green Oleg Drokin added a comment -

            Generally since I believe your system is now mountable, it's safer to just let the sync threads to run their course. it would put additional load on the system, but should not be too bad.

            Once you apply lu7079 patch it should kill those records for good next time you reboot.
            Without the patch some of the records would still be killed, but not all of them (and more might be amassed until next reboot) so you are looking at a similar situation next time you remount anyway.

            green Oleg Drokin added a comment - Generally since I believe your system is now mountable, it's safer to just let the sync threads to run their course. it would put additional load on the system, but should not be too bad. Once you apply lu7079 patch it should kill those records for good next time you reboot. Without the patch some of the records would still be killed, but not all of them (and more might be amassed until next reboot) so you are looking at a similar situation next time you remount anyway.

            no we don't

            mhanafi Mahmoud Hanafi added a comment - no we don't
            green Oleg Drokin added a comment -

            do you use changelogs too?

            green Oleg Drokin added a comment - do you use changelogs too?

            I looked in /O/1/d* and there where files going back to 2015.

            should i just delete everything in /0/1/* and remount?

            mhanafi Mahmoud Hanafi added a comment - I looked in /O/1/d* and there where files going back to 2015. should i just delete everything in /0/1/* and remount?
            green Oleg Drokin added a comment -

            if you really need to clear the condition immediately, it's possible to unmount the MDT, mount it as ldiskfs, remove the stale llogs, unmount ldiskfs and remount mdt as lustre.
            Perhaps not all of them needs removing but just hte really old ones (you can tell by the date).

            green Oleg Drokin added a comment - if you really need to clear the condition immediately, it's possible to unmount the MDT, mount it as ldiskfs, remove the stale llogs, unmount ldiskfs and remount mdt as lustre. Perhaps not all of them needs removing but just hte really old ones (you can tell by the date).
            green Oleg Drokin added a comment -

            Alex advises that the condition will clear on it's own after all llogs are reproceessed. the duration of that is hard to tell as it depends on number of those llogs.

            green Oleg Drokin added a comment - Alex advises that the condition will clear on it's own after all llogs are reproceessed. the duration of that is hard to tell as it depends on number of those llogs.

            This was a remount after power down.
            The OST are mounted.
            The shutdown was normal.

            I unmounted all the OSTs the the mdt got mounted. Then I remounted the OSTs and the mdt got back to osp-sync in 'D' state. but at least i am able to mount it on the client.

            So do we need to apply the patch from LU-7079 and remount? or can we some how stop the osp-sync.

            mhanafi Mahmoud Hanafi added a comment - This was a remount after power down. The OST are mounted. The shutdown was normal. I unmounted all the OSTs the the mdt got mounted. Then I remounted the OSTs and the mdt got back to osp-sync in 'D' state. but at least i am able to mount it on the client. So do we need to apply the patch from LU-7079 and remount? or can we some how stop the osp-sync.

            basically some llog cancels got lost by mistake causing lots of IO to rescan llogs at startup.

            bzzz Alex Zhuravlev added a comment - basically some llog cancels got lost by mistake causing lots of IO to rescan llogs at startup.

            I think this can be a dup of LU-7079

            bzzz Alex Zhuravlev added a comment - I think this can be a dup of LU-7079

            People

              jfc John Fuchs-Chesney (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: