Details

    • Bug
    • Resolution: Incomplete
    • Blocker
    • None
    • Lustre 2.5.3
    • None
    • 1
    • 9223372036854775807

    Description

      When trying to mount mdt all osp-syn threads stuck in 'D' state.

      Debug logs are filled with these messages

      00000004:00080000:8.0:1463850740.016156:0:14081:0:(osp_sync.c:317:osp_sync_request_commit_cb()) commit req ffff883ebf799800, transno 0
      00000004:00080000:8.0:1463850740.016164:0:14081:0:(osp_sync.c:351:osp_sync_interpret()) reply req ffff883ebf799800/1, rc -2, transno 0
      00000100:00100000:8.0:1463850740.016176:0:14081:0:(client.c:1872:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd_3:nbp2-MDT0000-mdtlov_UUID:14081:1534957896521600:10.151.26.98@o2ib:6
      00000004:00080000:9.0:1463850740.016219:0:14087:0:(osp_sync.c:317:osp_sync_request_commit_cb()) commit req ffff883ebed48800, transno 0
      00000004:00080000:9.0:1463850740.016226:0:14087:0:(osp_sync.c:351:osp_sync_interpret()) reply req ffff883ebed48800/1, rc -2, transno 0
      

      I will upload full debug logs to ftp site.

      Attachments

        Issue Links

          Activity

            [LU-8177] osp-syn threads in D state
            yujian Jian Yu added a comment - - edited

            Hello Jay,

            Here is the back-ported patch for Lustre b2_5_fe branch: http://review.whamcloud.com/20392

            yujian Jian Yu added a comment - - edited Hello Jay, Here is the back-ported patch for Lustre b2_5_fe branch: http://review.whamcloud.com/20392

            We have a b2_7_fe version of the patch, but need a back port to b2_5_fe. Thanks!

            jaylan Jay Lan (Inactive) added a comment - We have a b2_7_fe version of the patch, but need a back port to b2_5_fe. Thanks!

            Please leave the case open for now.

            mhanafi Mahmoud Hanafi added a comment - Please leave the case open for now.

            Thank you for the update Mahmoud.

            I think we'll keep the priority as it is for the time being (for recording purposes).

            Do you want us to keep the ticket open for a while longer? Or do you think this event is now resolved?

            Best regards,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Thank you for the update Mahmoud. I think we'll keep the priority as it is for the time being (for recording purposes). Do you want us to keep the ticket open for a while longer? Or do you think this event is now resolved? Best regards, ~ jfc.

            Ok thanks. You may lower the priority of the case. It did finish.

            mhanafi Mahmoud Hanafi added a comment - Ok thanks. You may lower the priority of the case. It did finish.
            green Oleg Drokin added a comment -

            Generally since I believe your system is now mountable, it's safer to just let the sync threads to run their course. it would put additional load on the system, but should not be too bad.

            Once you apply lu7079 patch it should kill those records for good next time you reboot.
            Without the patch some of the records would still be killed, but not all of them (and more might be amassed until next reboot) so you are looking at a similar situation next time you remount anyway.

            green Oleg Drokin added a comment - Generally since I believe your system is now mountable, it's safer to just let the sync threads to run their course. it would put additional load on the system, but should not be too bad. Once you apply lu7079 patch it should kill those records for good next time you reboot. Without the patch some of the records would still be killed, but not all of them (and more might be amassed until next reboot) so you are looking at a similar situation next time you remount anyway.

            no we don't

            mhanafi Mahmoud Hanafi added a comment - no we don't
            green Oleg Drokin added a comment -

            do you use changelogs too?

            green Oleg Drokin added a comment - do you use changelogs too?

            I looked in /O/1/d* and there where files going back to 2015.

            should i just delete everything in /0/1/* and remount?

            mhanafi Mahmoud Hanafi added a comment - I looked in /O/1/d* and there where files going back to 2015. should i just delete everything in /0/1/* and remount?
            green Oleg Drokin added a comment -

            if you really need to clear the condition immediately, it's possible to unmount the MDT, mount it as ldiskfs, remove the stale llogs, unmount ldiskfs and remount mdt as lustre.
            Perhaps not all of them needs removing but just hte really old ones (you can tell by the date).

            green Oleg Drokin added a comment - if you really need to clear the condition immediately, it's possible to unmount the MDT, mount it as ldiskfs, remove the stale llogs, unmount ldiskfs and remount mdt as lustre. Perhaps not all of them needs removing but just hte really old ones (you can tell by the date).
            green Oleg Drokin added a comment -

            Alex advises that the condition will clear on it's own after all llogs are reproceessed. the duration of that is hard to tell as it depends on number of those llogs.

            green Oleg Drokin added a comment - Alex advises that the condition will clear on it's own after all llogs are reproceessed. the duration of that is hard to tell as it depends on number of those llogs.

            People

              jfc John Fuchs-Chesney (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: