Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6207

conf-sanity test_83: test failed to respond and timed out

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • None
    • 3
    • 17363

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/4db5899a-ab42-11e4-b27f-5254006e85c2.

      The sub-test test_83 failed with the following error:

      test failed to respond and timed out
      

      Please provide additional information about the failure here.

      Info required for matching: conf-sanity 83

      Attachments

        Issue Links

          Activity

            [LU-6207] conf-sanity test_83: test failed to respond and timed out
            ys Yang Sheng added a comment -

            Patches landed. Close ticket.

            ys Yang Sheng added a comment - Patches landed. Close ticket.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13649/
            Subject: LU-6207 osd: add osd_ost_fini in osd_obj_map_init
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 948794929c5724e6a78b2c470bf97bcea1a67555

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13649/ Subject: LU-6207 osd: add osd_ost_fini in osd_obj_map_init Project: fs/lustre-release Branch: master Current Patch Set: Commit: 948794929c5724e6a78b2c470bf97bcea1a67555
            pjones Peter Jones added a comment -

            Ah yes. Thanks for pointing that out.

            pjones Peter Jones added a comment - Ah yes. Thanks for pointing that out.

            Hello Peter Jones.
            I think this ticket should not be closed before the 2nd patch http://review.whamcloud.com/13649 will be landed.
            I added new test to conf-sanity in http://review.whamcloud.com/13648. But new test(conf-sanity_85) may cause kernel panic in similar place with the same symptoms in osd.
            So please take care that http://review.whamcloud.com/13649 is landed.
            Thank you

            scherementsev Sergey Cheremencev added a comment - Hello Peter Jones. I think this ticket should not be closed before the 2nd patch http://review.whamcloud.com/13649 will be landed. I added new test to conf-sanity in http://review.whamcloud.com/13648 . But new test(conf-sanity_85) may cause kernel panic in similar place with the same symptoms in osd. So please take care that http://review.whamcloud.com/13649 is landed. Thank you
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13648/
            Subject: LU-6207 osd: add dput in osd_ost_init
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 51c90a076884a8cda2bcc655058be43c2c0dced2

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13648/ Subject: LU-6207 osd: add dput in osd_ost_init Project: fs/lustre-release Branch: master Current Patch Set: Commit: 51c90a076884a8cda2bcc655058be43c2c0dced2
            pjones Peter Jones added a comment -

            Yang Sheng

            Could you please take care of this patch?

            Thanks

            Peter

            pjones Peter Jones added a comment - Yang Sheng Could you please take care of this patch? Thanks Peter

            Hi, Oleg

            I don't think that LU-5729 is buggy. It seems there is also several places where cleanup is not done correctly.
            We faced the similar problem after LU-5729(MRP-2109) and I prepared 2 patches.
            http://review.whamcloud.com/13648
            After above patch we hit the issue again and I found another place where dput is forgotten.
            2nd patch
            http://review.whamcloud.com/13649

            scherementsev Sergey Cheremencev added a comment - Hi, Oleg I don't think that LU-5729 is buggy. It seems there is also several places where cleanup is not done correctly. We faced the similar problem after LU-5729 (MRP-2109) and I prepared 2 patches. http://review.whamcloud.com/13648 After above patch we hit the issue again and I found another place where dput is forgotten. 2nd patch http://review.whamcloud.com/13649

            Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/13649
            Subject: LU-6207 osd: add osd_ost_fini in osd_obj_map_init
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 27f3a9c404d7268a57e6c5153f6c07c104eb19d9

            gerrit Gerrit Updater added a comment - Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/13649 Subject: LU-6207 osd: add osd_ost_fini in osd_obj_map_init Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 27f3a9c404d7268a57e6c5153f6c07c104eb19d9
            green Oleg Drokin added a comment -

            Looking through the logs we can see that OSS crashed with some sort of a bad dentry:

            19:22:19:BUG: Dentry ffff880071931b40{i=6243,n=O} still in use (1) [unmount of ldiskfs dm-0]
            19:22:19:------------[ cut here ]------------
            19:22:19:kernel BUG at fs/dcache.c:667!
            19:22:19:invalid opcode: 0000 [#1] SMP 
            19:22:19:last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/virtio0/block/vda/queue/scheduler
            19:22:19:CPU 1 
            19:22:19:Modules linked in: lod(U) mdt(U) mdd(U) mgs(U) obdecho(U) osc(U) ptlrpc_gss(U) osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
            19:22:19:
            19:22:19:Pid: 24277, comm: mount.lustre Not tainted 2.6.32-431.29.2.el6_lustre.g2cd44ad.x86_64 #1 Red Hat KVM
            19:22:19:RIP: 0010:[<ffffffff811a4358>]  [<ffffffff811a4358>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0
            19:22:19:RSP: 0018:ffff88006f76d838  EFLAGS: 00010296
            19:22:19:RAX: 000000000000005a RBX: ffff880071931b40 RCX: 0000000000000000
            19:22:19:RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
            19:22:19:RBP: ffff88006f76d878 R08: ffffffff81c06900 R09: 0000000000000000
            19:22:19:R10: 0000000000000000 R11: 2820657375206e69 R12: 0000000000000000
            19:22:19:R13: ffffffff81a843c0 R14: ffff880071a7e790 R15: ffff880071931ba0
            19:22:19:FS:  00007f61ca76c7a0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
            19:22:19:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
            19:22:19:CR2: 00007f61ca777000 CR3: 000000007ad98000 CR4: 00000000000006e0
            19:22:19:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            19:22:19:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            19:22:19:Process mount.lustre (pid: 24277, threadinfo ffff88006f76c000, task ffff88007d6b0080)
            19:22:19:Stack:
            19:22:19: ffff88006ea06270 0000000000000000 ffff88006f76d878 ffff88006ea06000
            19:22:19:<d> ffffffffa0813300 ffffffff81c06500 ffff88006ea06000 ffff880059a21bd0
            19:22:19:<d> ffff88006f76d898 ffffffff811a4396 0000000000000286 ffff88006ea06000
            19:22:19:Call Trace:
            19:22:19: [<ffffffff811a4396>] shrink_dcache_for_umount+0x36/0x60
            19:22:19: [<ffffffff8118b5df>] generic_shutdown_super+0x1f/0xe0
            19:22:19: [<ffffffff8118b6d1>] kill_block_super+0x31/0x50
            19:22:19: [<ffffffff8118bea7>] deactivate_super+0x57/0x80
            19:22:19: [<ffffffff811ab8af>] mntput_no_expire+0xbf/0x110
            19:22:19: [<ffffffffa0682b6d>] osd_umount+0x5d/0x130 [osd_ldiskfs]
            19:22:19: [<ffffffffa0685dea>] osd_device_alloc+0x5aa/0x9d0 [osd_ldiskfs]
            19:22:19: [<ffffffffa179fe2f>] obd_setup+0x1bf/0x290 [obdclass]
            19:22:19: [<ffffffffa17a0108>] class_setup+0x208/0x870 [obdclass]
            19:22:19: [<ffffffffa17a797c>] class_process_config+0x113c/0x27c0 [obdclass]
            19:22:19: [<ffffffffa08491c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            19:22:19: [<ffffffffa0843818>] ? libcfs_log_return+0x28/0x40 [libcfs]
            19:22:19: [<ffffffffa17ae312>] do_lcfg+0x622/0xac0 [obdclass]
            19:22:19: [<ffffffffa17ae844>] lustre_start_simple+0x94/0x200 [obdclass]
            19:22:19: [<ffffffffa17e2d31>] server_fill_super+0xfd1/0x1690 [obdclass]
            19:22:19: [<ffffffffa0843818>] ? libcfs_log_return+0x28/0x40 [libcfs]
            19:22:19: [<ffffffffa17b4370>] lustre_fill_super+0x560/0xa80 [obdclass]
            19:22:19: [<ffffffffa17b3e10>] ? lustre_fill_super+0x0/0xa80 [obdclass]
            19:22:19: [<ffffffff8118c56f>] get_sb_nodev+0x5f/0xa0
            19:22:19: [<ffffffffa17ab3c5>] lustre_get_sb+0x25/0x30 [obdclass]
            19:22:19: [<ffffffff8118bbcb>] vfs_kern_mount+0x7b/0x1b0
            19:22:19: [<ffffffff8118bd72>] do_kern_mount+0x52/0x130
            19:22:19: [<ffffffff811ad74b>] do_mount+0x2fb/0x930
            19:22:19: [<ffffffff811ade10>] sys_mount+0x90/0xe0
            19:22:19: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            19:22:19:Code: 50 30 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 70 02 00 00 48 89 de 48 c7 c7 80 75 7c 81 48 89 04 24 31 c0 e8 4c 4d 38 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 
            19:22:19:RIP  [<ffffffff811a4358>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0
            19:22:19: RSP <ffff88006f76d838>
            

            I wonder if this dentry is a fallout from incomplete/buggy LU-5729 fix whre we do free the inode, but not the dentry?

            green Oleg Drokin added a comment - Looking through the logs we can see that OSS crashed with some sort of a bad dentry: 19:22:19:BUG: Dentry ffff880071931b40{i=6243,n=O} still in use (1) [unmount of ldiskfs dm-0] 19:22:19:------------[ cut here ]------------ 19:22:19:kernel BUG at fs/dcache.c:667! 19:22:19:invalid opcode: 0000 [#1] SMP 19:22:19:last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/virtio0/block/vda/queue/scheduler 19:22:19:CPU 1 19:22:19:Modules linked in: lod(U) mdt(U) mdd(U) mgs(U) obdecho(U) osc(U) ptlrpc_gss(U) osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] 19:22:19: 19:22:19:Pid: 24277, comm: mount.lustre Not tainted 2.6.32-431.29.2.el6_lustre.g2cd44ad.x86_64 #1 Red Hat KVM 19:22:19:RIP: 0010:[<ffffffff811a4358>] [<ffffffff811a4358>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0 19:22:19:RSP: 0018:ffff88006f76d838 EFLAGS: 00010296 19:22:19:RAX: 000000000000005a RBX: ffff880071931b40 RCX: 0000000000000000 19:22:19:RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 19:22:19:RBP: ffff88006f76d878 R08: ffffffff81c06900 R09: 0000000000000000 19:22:19:R10: 0000000000000000 R11: 2820657375206e69 R12: 0000000000000000 19:22:19:R13: ffffffff81a843c0 R14: ffff880071a7e790 R15: ffff880071931ba0 19:22:19:FS: 00007f61ca76c7a0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 19:22:19:CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 19:22:19:CR2: 00007f61ca777000 CR3: 000000007ad98000 CR4: 00000000000006e0 19:22:19:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 19:22:19:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 19:22:19:Process mount.lustre (pid: 24277, threadinfo ffff88006f76c000, task ffff88007d6b0080) 19:22:19:Stack: 19:22:19: ffff88006ea06270 0000000000000000 ffff88006f76d878 ffff88006ea06000 19:22:19:<d> ffffffffa0813300 ffffffff81c06500 ffff88006ea06000 ffff880059a21bd0 19:22:19:<d> ffff88006f76d898 ffffffff811a4396 0000000000000286 ffff88006ea06000 19:22:19:Call Trace: 19:22:19: [<ffffffff811a4396>] shrink_dcache_for_umount+0x36/0x60 19:22:19: [<ffffffff8118b5df>] generic_shutdown_super+0x1f/0xe0 19:22:19: [<ffffffff8118b6d1>] kill_block_super+0x31/0x50 19:22:19: [<ffffffff8118bea7>] deactivate_super+0x57/0x80 19:22:19: [<ffffffff811ab8af>] mntput_no_expire+0xbf/0x110 19:22:19: [<ffffffffa0682b6d>] osd_umount+0x5d/0x130 [osd_ldiskfs] 19:22:19: [<ffffffffa0685dea>] osd_device_alloc+0x5aa/0x9d0 [osd_ldiskfs] 19:22:19: [<ffffffffa179fe2f>] obd_setup+0x1bf/0x290 [obdclass] 19:22:19: [<ffffffffa17a0108>] class_setup+0x208/0x870 [obdclass] 19:22:19: [<ffffffffa17a797c>] class_process_config+0x113c/0x27c0 [obdclass] 19:22:19: [<ffffffffa08491c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 19:22:19: [<ffffffffa0843818>] ? libcfs_log_return+0x28/0x40 [libcfs] 19:22:19: [<ffffffffa17ae312>] do_lcfg+0x622/0xac0 [obdclass] 19:22:19: [<ffffffffa17ae844>] lustre_start_simple+0x94/0x200 [obdclass] 19:22:19: [<ffffffffa17e2d31>] server_fill_super+0xfd1/0x1690 [obdclass] 19:22:19: [<ffffffffa0843818>] ? libcfs_log_return+0x28/0x40 [libcfs] 19:22:19: [<ffffffffa17b4370>] lustre_fill_super+0x560/0xa80 [obdclass] 19:22:19: [<ffffffffa17b3e10>] ? lustre_fill_super+0x0/0xa80 [obdclass] 19:22:19: [<ffffffff8118c56f>] get_sb_nodev+0x5f/0xa0 19:22:19: [<ffffffffa17ab3c5>] lustre_get_sb+0x25/0x30 [obdclass] 19:22:19: [<ffffffff8118bbcb>] vfs_kern_mount+0x7b/0x1b0 19:22:19: [<ffffffff8118bd72>] do_kern_mount+0x52/0x130 19:22:19: [<ffffffff811ad74b>] do_mount+0x2fb/0x930 19:22:19: [<ffffffff811ade10>] sys_mount+0x90/0xe0 19:22:19: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 19:22:19:Code: 50 30 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 70 02 00 00 48 89 de 48 c7 c7 80 75 7c 81 48 89 04 24 31 c0 e8 4c 4d 38 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 19:22:19:RIP [<ffffffff811a4358>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0 19:22:19: RSP <ffff88006f76d838> I wonder if this dentry is a fallout from incomplete/buggy LU-5729 fix whre we do free the inode, but not the dentry?

            People

              ys Yang Sheng
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: