Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-534

(mds_open.c:1323:mds_open()) ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed: -> LBUG

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 1.8.8
    • Lustre 1.8.6
    • None
    • RHEL5 on all affected machines, Lustre exported via NFS
    • 3
    • 17,764
    • 6577

    Description

      We hit this LBUG frequently on one of our production file systems and now have managed to reproduce reliably on our test file system by exporting the Lustre file system via NFS on one Lustre client and by running a version of racer on a NFS client in the exported Lustre file system. After a few minutes the LBUG will happen on the MDS. We've initially seen this on Lustre 1.6.7.2, then 1.8.3-ddn3.3 and now have been able to reproduce on the test file system after upgrading the MDS to 1.8.6-wc1, leaving the OSSes and clients at 1.8.3-ddn3.3 for now.

      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: LustreError: 6854:0:(mds_open.c:1323:mds_open()) ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed: dchild 1d2764:0e4d3640 (ffff810429fc2b70) inode ffff81042aabfc30/1910628/239941184
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: LustreError: 6854:0:(mds_open.c:1323:mds_open()) LBUG
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: Pid: 6854, comm: ll_mdt_03
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel:
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: Call Trace:
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff887aa6a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff887aabda>] lbug_with_loc+0x7a/0xd0 [libcfs]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88c1d33d>] mds_open+0x26ad/0x38eb [mds]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff889a3461>] ksocknal_launch_packet+0x2b1/0x3a0 [ksocklnd]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff889a4f65>] ksocknal_alloc_tx+0x1f5/0x2a0 [ksocklnd]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88917491>] lustre_swab_buf+0x81/0x170 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff8000d567>] dput+0x2c/0x113
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88bf40b5>] mds_reint_rec+0x365/0x550 [mds]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88c1eb3e>] mds_update_unpack+0x1fe/0x280 [mds]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88be6eca>] mds_reint+0x35a/0x420 [mds]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88be5dda>] fixup_handle_for_resent_req+0x5a/0x2c0 [mds]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88bf0bfc>] mds_intent_policy+0x4ac/0xc20 [mds]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff888d8270>] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff888d5eb6>] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff888d27fd>] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff888fa870>] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff888f7b29>] ldlm_handle_enqueue+0xbf9/0x1210 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88befb20>] mds_handle+0x40e0/0x4d10 [mds]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff8008ddcd>] enqueue_task+0x41/0x56
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff8008de38>] __activate_task+0x56/0x6d
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff8891bd55>] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff889256d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88925e35>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff8008c85d>] __wake_up_common+0x3e/0x68
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88926dc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff88925e60>] ptlrpc_main+0x0/0x1120 [ptlrpc]
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel:
      Jul 25 16:13:50 cs04r-sc-mds02-03 kernel: LustreError: dumping log to /tmp/lustre-log.1311606830.6854

      I'll attach the racer scripts and lustre-log.

      I'm not sure but at least earlier traces seemed to look like it might have been this bug, now reporting here as I can still reproduce it with the 1.8.6-wc1: https://bugzilla.lustre.org/show_bug.cgi?id=17764

      [MDS:]cat /proc/fs/lustre/version
      lustre: 1.8.6
      kernel: patchless_client
      build: jenkins-wc1--PRISTINE-2.6.18-238.12.1.el5_lustre.g266a955

      Attachments

        Issue Links

          Activity

            [LU-534] (mds_open.c:1323:mds_open()) ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed: -> LBUG
            pjones Peter Jones added a comment -

            ok Frederik then let's close this ticket for now and reopen it if you find that this problem reoccurs in the future and this patch does not address the problem.

            pjones Peter Jones added a comment - ok Frederik then let's close this ticket for now and reopen it if you find that this problem reoccurs in the future and this patch does not address the problem.

            Peter,

            apologies for my late reply.

            I've been trying to reproduce this bug on my test system using the unpatch version of Lustre and it seems I have lost the ability to reproduce it. I'm not sure what has changed on our side though. I'll keep trying and I've downloaded the RPMs with the fix so I'll have them available locally once I can reproduce it.

            Kind regards,
            Frederik

            ferner Frederik Ferner (Inactive) added a comment - Peter, apologies for my late reply. I've been trying to reproduce this bug on my test system using the unpatch version of Lustre and it seems I have lost the ability to reproduce it. I'm not sure what has changed on our side though. I'll keep trying and I've downloaded the RPMs with the fix so I'll have them available locally once I can reproduce it. Kind regards, Frederik
            pjones Peter Jones added a comment -

            Frederik

            Have you had a chance to test out this fix yet? If not, when do you expect to have an opportunity to do so?

            Please advise

            Peter

            pjones Peter Jones added a comment - Frederik Have you had a chance to test out this fix yet? If not, when do you expect to have an opportunity to do so? Please advise Peter
            pjones Peter Jones added a comment -

            Frederik

            It looks like you can now go ahead and test the fix. The RPMs can be obtained at http://build.whamcloud.com/job/lustre-reviews/4163/

            Regards

            Peter

            pjones Peter Jones added a comment - Frederik It looks like you can now go ahead and test the fix. The RPMs can be obtained at http://build.whamcloud.com/job/lustre-reviews/4163/ Regards Peter

            Integrated in lustre-b1_8 » i686,server,el5,ofa #166
            LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664)
            LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62)

            Result = SUCCESS
            Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664
            Files :

            • lustre/mds/mds_open.c

            Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62
            Files :

            • lustre/tests/replay-vbr.sh
            • lustre/tests/parallel-scale.sh
            • lustre/tests/test-framework.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,server,el5,ofa #166 LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664) LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62) Result = SUCCESS Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664 Files : lustre/mds/mds_open.c Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62 Files : lustre/tests/replay-vbr.sh lustre/tests/parallel-scale.sh lustre/tests/test-framework.sh

            Integrated in lustre-b1_8 » i686,server,el5,inkernel #166
            LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664)
            LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62)

            Result = SUCCESS
            Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664
            Files :

            • lustre/mds/mds_open.c

            Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62
            Files :

            • lustre/tests/test-framework.sh
            • lustre/tests/parallel-scale.sh
            • lustre/tests/replay-vbr.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,server,el5,inkernel #166 LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664) LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62) Result = SUCCESS Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664 Files : lustre/mds/mds_open.c Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62 Files : lustre/tests/test-framework.sh lustre/tests/parallel-scale.sh lustre/tests/replay-vbr.sh

            Integrated in lustre-b1_8 » x86_64,server,el5,ofa #166
            LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664)
            LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62)

            Result = SUCCESS
            Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664
            Files :

            • lustre/mds/mds_open.c

            Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62
            Files :

            • lustre/tests/parallel-scale.sh
            • lustre/tests/replay-vbr.sh
            • lustre/tests/test-framework.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » x86_64,server,el5,ofa #166 LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664) LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62) Result = SUCCESS Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664 Files : lustre/mds/mds_open.c Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62 Files : lustre/tests/parallel-scale.sh lustre/tests/replay-vbr.sh lustre/tests/test-framework.sh

            Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #166
            LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664)
            LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62)

            Result = SUCCESS
            Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664
            Files :

            • lustre/mds/mds_open.c

            Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62
            Files :

            • lustre/tests/test-framework.sh
            • lustre/tests/replay-vbr.sh
            • lustre/tests/parallel-scale.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #166 LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664) LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62) Result = SUCCESS Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664 Files : lustre/mds/mds_open.c Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62 Files : lustre/tests/test-framework.sh lustre/tests/replay-vbr.sh lustre/tests/parallel-scale.sh

            Integrated in lustre-b1_8 » i686,client,el5,ofa #166
            LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664)
            LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62)

            Result = SUCCESS
            Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664
            Files :

            • lustre/mds/mds_open.c

            Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62
            Files :

            • lustre/tests/parallel-scale.sh
            • lustre/tests/replay-vbr.sh
            • lustre/tests/test-framework.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,client,el5,ofa #166 LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664) LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62) Result = SUCCESS Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664 Files : lustre/mds/mds_open.c Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62 Files : lustre/tests/parallel-scale.sh lustre/tests/replay-vbr.sh lustre/tests/test-framework.sh

            Integrated in lustre-b1_8 » i686,client,el5,inkernel #166
            LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664)
            LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62)

            Result = SUCCESS
            Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664
            Files :

            • lustre/mds/mds_open.c

            Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62
            Files :

            • lustre/tests/test-framework.sh
            • lustre/tests/replay-vbr.sh
            • lustre/tests/parallel-scale.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,client,el5,inkernel #166 LU-534 mds: correct assertion (Revision 069d0b6393841bf2adbef7e834919fa52310b664) LU-534 test: nfsread_orphan_file test (Revision 66cd9a73abc2f075abf7ce78215a1d0cb5038a62) Result = SUCCESS Johann Lombardi : 069d0b6393841bf2adbef7e834919fa52310b664 Files : lustre/mds/mds_open.c Johann Lombardi : 66cd9a73abc2f075abf7ce78215a1d0cb5038a62 Files : lustre/tests/test-framework.sh lustre/tests/replay-vbr.sh lustre/tests/parallel-scale.sh

            People

              bobijam Zhenyu Xu
              ferner Frederik Ferner (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: