Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • Lustre 2.8.0
    • None
    • lola
      build: master, 2.7.64-81-g6fc8da4, 6fc8da41f2ff5156639e89f379adcdbb73ac8567
    • 3
    • 9223372036854775807

    Description

      Error happened during lfsck run of soak FS using build '20160108'. (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160108)
      DNE is enabled.

      • lfsck started on MDS hosting mdt-0:
        [root@lola-8 ~]# date; lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ; date
        Wed Jan 13 04:42:28 PST 2016
        Started LFSCK on the device soaked-MDT0000: scrub layout namespace
        Wed Jan 13 04:42:28 PST 2016
        

        No soak test was running

      • lfsck_namespace don't complete phase scanning-phase2
      • MDSes lola-9,11 showed an increasing number of blocked mdt_out* - threads
      • Triggering stack trace lead kernel panic on lola-11 (2016-01-13-08:15:22)
      • All MDSes show only minimal utilization of system resources

      Attached files:

      • console, messages files of lola-9,11; containing stack trace information
      • vmcore-dmesg.txt of lola-11
      • lfsck status information of all MDTs

      Crash file location see next comment.

      Attachments

        1. console-lola-11.log.bz2
          90 kB
        2. console-lola-9.log.bz2
          70 kB
        3. lfsck-info.txt.bz2
          3 kB
        4. lu-7662-lola-11-1452785464.17420-lustre-log
          171 kB
        5. messages-lola-11.log.bz2
          35 kB
        6. messages-lola-9.log.bz2
          46 kB
        7. vmcore-dmesg.txt.bz2
          33 kB

        Issue Links

          Activity

            [LU-7662] lfsck don't complete
            simmonsja James A Simmons added a comment - - edited

            Since this ticket, which was a blocker, is a duplicate of LU-6684 shouldn't LU-6684 be marked as a blocker then

            simmonsja James A Simmons added a comment - - edited Since this ticket, which was a blocker, is a duplicate of LU-6684 shouldn't LU-6684 be marked as a blocker then

            It is another failure instance of LU-6684.

            yong.fan nasf (Inactive) added a comment - It is another failure instance of LU-6684 .

            The patch http://review.whamcloud.com/#/c/18082/ have been improved to handle lola trouble more properly.

            yong.fan nasf (Inactive) added a comment - The patch http://review.whamcloud.com/#/c/18082/ have been improved to handle lola trouble more properly.

            The patch http://review.whamcloud.com/#/c/18082/ has been verified on lola today, works. But there is something can be improved.

            yong.fan nasf (Inactive) added a comment - The patch http://review.whamcloud.com/#/c/18082/ has been verified on lola today, works. But there is something can be improved.

            In triage today, it was reported that further work on the patch is needed after experiencing more failures. nasf is actively looking at it.

            jgmitter Joseph Gmitter (Inactive) added a comment - In triage today, it was reported that further work on the patch is needed after experiencing more failures. nasf is actively looking at it.

            The patch http://review.whamcloud.com/17032/ has already been landed to the latest master branch. If you are working on the latest master, then please apply the 2nd and 3rd patches directly.

            yong.fan nasf (Inactive) added a comment - The patch http://review.whamcloud.com/17032/ has already been landed to the latest master branch. If you are working on the latest master, then please apply the 2nd and 3rd patches directly.

            People

              yong.fan nasf (Inactive)
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: