Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • Lustre 2.8.0
    • None
    • lola
      build: master, 2.7.64-81-g6fc8da4, 6fc8da41f2ff5156639e89f379adcdbb73ac8567
    • 3
    • 9223372036854775807

    Description

      Error happened during lfsck run of soak FS using build '20160108'. (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160108)
      DNE is enabled.

      • lfsck started on MDS hosting mdt-0:
        [root@lola-8 ~]# date; lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ; date
        Wed Jan 13 04:42:28 PST 2016
        Started LFSCK on the device soaked-MDT0000: scrub layout namespace
        Wed Jan 13 04:42:28 PST 2016
        

        No soak test was running

      • lfsck_namespace don't complete phase scanning-phase2
      • MDSes lola-9,11 showed an increasing number of blocked mdt_out* - threads
      • Triggering stack trace lead kernel panic on lola-11 (2016-01-13-08:15:22)
      • All MDSes show only minimal utilization of system resources

      Attached files:

      • console, messages files of lola-9,11; containing stack trace information
      • vmcore-dmesg.txt of lola-11
      • lfsck status information of all MDTs

      Crash file location see next comment.

      Attachments

        1. console-lola-11.log.bz2
          90 kB
        2. console-lola-9.log.bz2
          70 kB
        3. lfsck-info.txt.bz2
          3 kB
        4. lu-7662-lola-11-1452785464.17420-lustre-log
          171 kB
        5. messages-lola-11.log.bz2
          35 kB
        6. messages-lola-9.log.bz2
          46 kB
        7. vmcore-dmesg.txt.bz2
          33 kB

        Issue Links

          Activity

            [LU-7662] lfsck don't complete
            yong.fan nasf (Inactive) made changes -
            Resolution New: Duplicate [ 3 ]
            Status Original: In Progress [ 3 ] New: Resolved [ 5 ]
            yong.fan nasf (Inactive) made changes -
            Link New: This issue duplicates LU-6684 [ LU-6684 ]
            jgmitter Joseph Gmitter (Inactive) made changes -
            Priority Original: Critical [ 2 ] New: Blocker [ 1 ]
            jgmitter Joseph Gmitter (Inactive) made changes -
            Fix Version/s New: Lustre 2.8.0 [ 11113 ]
            heckes Frank Heckes (Inactive) made changes -
            yong.fan nasf (Inactive) made changes -
            Status Original: Open [ 1 ] New: In Progress [ 3 ]
            yong.fan nasf (Inactive) made changes -
            Assignee Original: WC Triage [ wc-triage ] New: nasf [ yong.fan ]
            heckes Frank Heckes (Inactive) made changes -
            Attachment New: console-lola-9.log.bz2 [ 20106 ]
            Attachment New: console-lola-11.log.bz2 [ 20107 ]
            Attachment New: lfsck-info.txt.bz2 [ 20108 ]
            Attachment New: messages-lola-9.log.bz2 [ 20109 ]
            Attachment New: messages-lola-11.log.bz2 [ 20110 ]
            Attachment New: vmcore-dmesg.txt.bz2 [ 20111 ]
            heckes Frank Heckes (Inactive) made changes -
            Description Original: Error happened during lfsck run of soak FS using build '20160108'. (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160108)
            DNE is enabled.

            * {{lfsck}} started on MDS hosting mdt-0:
            {noformat}
            [root@lola-8 ~]# date; lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ; date
            Wed Jan 13 04:42:28 PST 2016
            Started LFSCK on the device soaked-MDT0000: scrub layout namespace
            Wed Jan 13 04:42:28 PST 2016
            {noformat}
            *No* soak test was running
            * lfsck_namespace don't complete phase _scanning-phase2_
            * MDSes {{lola-9,11}} showed an increasing number of blocked {{mdt_out*}} - threads
            * Triggering stack trace lead kernel panic on {{lola-11}} (2016-01-13-08:15:22)
            * All MDSes don't show only minimal utilization of system resources

            Attached files:
            * console, messages files of lola-9,11; containing stack trace information
            * vmcore-dmesg.txt of lola-11
            * {{lfsck}} status information of all MDTs

            Crash file location see next comment.
            New: Error happened during lfsck run of soak FS using build '20160108'. (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160108)
            DNE is enabled.

            * {{lfsck}} started on MDS hosting mdt-0:
            {noformat}
            [root@lola-8 ~]# date; lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ; date
            Wed Jan 13 04:42:28 PST 2016
            Started LFSCK on the device soaked-MDT0000: scrub layout namespace
            Wed Jan 13 04:42:28 PST 2016
            {noformat}
            *No* soak test was running
            * lfsck_namespace don't complete phase _scanning-phase2_
            * MDSes {{lola-9,11}} showed an increasing number of blocked {{mdt_out*}} - threads
            * Triggering stack trace lead kernel panic on {{lola-11}} (2016-01-13-08:15:22)
            * All MDSes show only minimal utilization of system resources

            Attached files:
            * console, messages files of lola-9,11; containing stack trace information
            * vmcore-dmesg.txt of lola-11
            * {{lfsck}} status information of all MDTs

            Crash file location see next comment.
            heckes Frank Heckes (Inactive) made changes -
            Description Original: Error happened during lfsck run of soak FS using build '20160108'. (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160108)
            DNE is enabled.

            * {{lfsck}} started on MDS hosting mdt-0:
            {noformat}
            [root@lola-8 ~]# date; lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ; date
            Wed Jan 13 04:42:28 PST 2016
            Started LFSCK on the device soaked-MDT0000: scrub layout namespace
            Wed Jan 13 04:42:28 PST 2016
            {noformat}
            *No* soak test was running
            * lfsck_namespace don't complete phase _scanning-phase2_
            * MDSes {{lola-9,11}} showed an increasing number of blocked {{mdt_out*}} - threads
            * Triggering stack trace lead kernel panic on {{lola-11}} (2016-01-13-08:15:22)
            * All MDSes don't sho

            Attached files:
            * console, messages files of lola-9,11; containing stack trace information
            * vmcore-dmesg.txt of lola-11
            * {{lfsck}} status information of all MDTs

            Crash file location see next comment.
            New: Error happened during lfsck run of soak FS using build '20160108'. (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160108)
            DNE is enabled.

            * {{lfsck}} started on MDS hosting mdt-0:
            {noformat}
            [root@lola-8 ~]# date; lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ; date
            Wed Jan 13 04:42:28 PST 2016
            Started LFSCK on the device soaked-MDT0000: scrub layout namespace
            Wed Jan 13 04:42:28 PST 2016
            {noformat}
            *No* soak test was running
            * lfsck_namespace don't complete phase _scanning-phase2_
            * MDSes {{lola-9,11}} showed an increasing number of blocked {{mdt_out*}} - threads
            * Triggering stack trace lead kernel panic on {{lola-11}} (2016-01-13-08:15:22)
            * All MDSes don't show only minimal utilization of system resources

            Attached files:
            * console, messages files of lola-9,11; containing stack trace information
            * vmcore-dmesg.txt of lola-11
            * {{lfsck}} status information of all MDTs

            Crash file location see next comment.
            heckes Frank Heckes (Inactive) created issue -

            People

              yong.fan nasf (Inactive)
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: