Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • Lustre 2.8.0
    • None
    • lola
      build: master, 2.7.64-81-g6fc8da4, 6fc8da41f2ff5156639e89f379adcdbb73ac8567
    • 3
    • 9223372036854775807

    Description

      Error happened during lfsck run of soak FS using build '20160108'. (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160108)
      DNE is enabled.

      • lfsck started on MDS hosting mdt-0:
        [root@lola-8 ~]# date; lctl lfsck_start -M soaked-MDT0000 -s 1000 -t all -A ; date
        Wed Jan 13 04:42:28 PST 2016
        Started LFSCK on the device soaked-MDT0000: scrub layout namespace
        Wed Jan 13 04:42:28 PST 2016
        

        No soak test was running

      • lfsck_namespace don't complete phase scanning-phase2
      • MDSes lola-9,11 showed an increasing number of blocked mdt_out* - threads
      • Triggering stack trace lead kernel panic on lola-11 (2016-01-13-08:15:22)
      • All MDSes show only minimal utilization of system resources

      Attached files:

      • console, messages files of lola-9,11; containing stack trace information
      • vmcore-dmesg.txt of lola-11
      • lfsck status information of all MDTs

      Crash file location see next comment.

      Attachments

        1. console-lola-11.log.bz2
          90 kB
        2. console-lola-9.log.bz2
          70 kB
        3. lfsck-info.txt.bz2
          3 kB
        4. lu-7662-lola-11-1452785464.17420-lustre-log
          171 kB
        5. messages-lola-11.log.bz2
          35 kB
        6. messages-lola-9.log.bz2
          46 kB
        7. vmcore-dmesg.txt.bz2
          33 kB

        Issue Links

          Activity

            [LU-7662] lfsck don't complete
            simmonsja James A Simmons added a comment - - edited

            Since this ticket, which was a blocker, is a duplicate of LU-6684 shouldn't LU-6684 be marked as a blocker then

            simmonsja James A Simmons added a comment - - edited Since this ticket, which was a blocker, is a duplicate of LU-6684 shouldn't LU-6684 be marked as a blocker then

            It is another failure instance of LU-6684.

            yong.fan nasf (Inactive) added a comment - It is another failure instance of LU-6684 .

            The patch http://review.whamcloud.com/#/c/18082/ have been improved to handle lola trouble more properly.

            yong.fan nasf (Inactive) added a comment - The patch http://review.whamcloud.com/#/c/18082/ have been improved to handle lola trouble more properly.

            The patch http://review.whamcloud.com/#/c/18082/ has been verified on lola today, works. But there is something can be improved.

            yong.fan nasf (Inactive) added a comment - The patch http://review.whamcloud.com/#/c/18082/ has been verified on lola today, works. But there is something can be improved.

            In triage today, it was reported that further work on the patch is needed after experiencing more failures. nasf is actively looking at it.

            jgmitter Joseph Gmitter (Inactive) added a comment - In triage today, it was reported that further work on the patch is needed after experiencing more failures. nasf is actively looking at it.

            The patch http://review.whamcloud.com/17032/ has already been landed to the latest master branch. If you are working on the latest master, then please apply the 2nd and 3rd patches directly.

            yong.fan nasf (Inactive) added a comment - The patch http://review.whamcloud.com/17032/ has already been landed to the latest master branch. If you are working on the latest master, then please apply the 2nd and 3rd patches directly.

            Hm, the first patch can't be applied:

            [soakbuilder@lhn lustre-release]$ for i in /scratch/rpms/20160126/patches/*.patch; do git am $i; done
            Applying: LU-0000 dne: dne llog fixes
            warning: lustre/tests/conf-sanity.sh has type 100755, expected 100644
            Applying: LU-6684 lfsck: stop lfsck even if some servers offline
            error: patch failed: lustre/include/lustre_net.h:605
            error: lustre/include/lustre_net.h: patch does not apply
            error: patch failed: lustre/include/obd_support.h:557
            error: lustre/include/obd_support.h: patch does not apply
            error: patch failed: lustre/lfsck/lfsck_engine.c:1577
            error: lustre/lfsck/lfsck_engine.c: patch does not apply
            error: patch failed: lustre/lfsck/lfsck_internal.h:817
            error: lustre/lfsck/lfsck_internal.h: patch does not apply
            error: patch failed: lustre/lfsck/lfsck_layout.c:3248
            error: lustre/lfsck/lfsck_layout.c: patch does not apply
            error: patch failed: lustre/lfsck/lfsck_lib.c:31
            error: lustre/lfsck/lfsck_lib.c: patch does not apply
            error: patch failed: lustre/lfsck/lfsck_namespace.c:3931
            error: lustre/lfsck/lfsck_namespace.c: patch does not apply
            error: patch failed: lustre/obdclass/obd_mount_server.c:477
            error: lustre/obdclass/obd_mount_server.c: patch does not apply
            error: patch failed: lustre/osp/osp_trans.c:454
            error: lustre/osp/osp_trans.c: patch does not apply
            error: patch failed: lustre/ptlrpc/client.c:1661
            error: lustre/ptlrpc/client.c: patch does not apply
            error: patch failed: lustre/tests/sanity-lfsck.sh:4291
            error: lustre/tests/sanity-lfsck.sh: patch does not apply
            Patch failed at 0001 LU-6684 lfsck: stop lfsck even if some servers offline
            When you have resolved this problem run "git am --resolved".
            If you would prefer to skip this patch, instead run "git am --skip".
            To restore the original branch and stop patching run "git am --abort".
            previous rebase directory /home/soakbuilder/repos/lustre-release/.git/rebase-apply still exists but mbox given.
            

            Patch details:

            [soakbuilder@lhn lustre-release]$ ls -1 /scratch/rpms/20160126/patches/
            001-LU-0000_dne_dne_llog_fixes-PatchSet39.patch
            002-LU-6684_lfsck_stop_lfsck_even_if_some_servers_offline-PatchSet6.patch
            003-LU-6684_lfsck_set_the_lfsck_notify_as_interruptable-PatchSet3.patch
            004-LU-7680_mdd_put_migrated_object_on_the_orphan_list
            

            Status of master branch used to create sub-branch :

            [soakbuilder@lhn lustre-release]$ git describe ; git log | head -1
            2.7.65-38-g607f691
            commit 607f6919ea67b101796630d4b55649a12ea0e859
            
            heckes Frank Heckes (Inactive) added a comment - Hm, the first patch can't be applied: [soakbuilder@lhn lustre-release]$ for i in /scratch/rpms/20160126/patches/*.patch; do git am $i; done Applying: LU-0000 dne: dne llog fixes warning: lustre/tests/conf-sanity.sh has type 100755, expected 100644 Applying: LU-6684 lfsck: stop lfsck even if some servers offline error: patch failed: lustre/include/lustre_net.h:605 error: lustre/include/lustre_net.h: patch does not apply error: patch failed: lustre/include/obd_support.h:557 error: lustre/include/obd_support.h: patch does not apply error: patch failed: lustre/lfsck/lfsck_engine.c:1577 error: lustre/lfsck/lfsck_engine.c: patch does not apply error: patch failed: lustre/lfsck/lfsck_internal.h:817 error: lustre/lfsck/lfsck_internal.h: patch does not apply error: patch failed: lustre/lfsck/lfsck_layout.c:3248 error: lustre/lfsck/lfsck_layout.c: patch does not apply error: patch failed: lustre/lfsck/lfsck_lib.c:31 error: lustre/lfsck/lfsck_lib.c: patch does not apply error: patch failed: lustre/lfsck/lfsck_namespace.c:3931 error: lustre/lfsck/lfsck_namespace.c: patch does not apply error: patch failed: lustre/obdclass/obd_mount_server.c:477 error: lustre/obdclass/obd_mount_server.c: patch does not apply error: patch failed: lustre/osp/osp_trans.c:454 error: lustre/osp/osp_trans.c: patch does not apply error: patch failed: lustre/ptlrpc/client.c:1661 error: lustre/ptlrpc/client.c: patch does not apply error: patch failed: lustre/tests/sanity-lfsck.sh:4291 error: lustre/tests/sanity-lfsck.sh: patch does not apply Patch failed at 0001 LU-6684 lfsck: stop lfsck even if some servers offline When you have resolved this problem run "git am --resolved". If you would prefer to skip this patch, instead run "git am --skip". To restore the original branch and stop patching run "git am --abort". previous rebase directory /home/soakbuilder/repos/lustre-release/.git/rebase-apply still exists but mbox given. Patch details: [soakbuilder@lhn lustre-release]$ ls -1 /scratch/rpms/20160126/patches/ 001-LU-0000_dne_dne_llog_fixes-PatchSet39.patch 002-LU-6684_lfsck_stop_lfsck_even_if_some_servers_offline-PatchSet6.patch 003-LU-6684_lfsck_set_the_lfsck_notify_as_interruptable-PatchSet3.patch 004-LU-7680_mdd_put_migrated_object_on_the_orphan_list Status of master branch used to create sub-branch : [soakbuilder@lhn lustre-release]$ git describe ; git log | head -1 2.7.65-38-g607f691 commit 607f6919ea67b101796630d4b55649a12ea0e859

            People

              yong.fan nasf (Inactive)
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: