[LU-6207] conf-sanity test_83: test failed to respond and timed out Created: 04/Feb/15  Updated: 07/Jan/16  Resolved: 09/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Related
is related to LU-5729 VFS: Busy inodes after unmount of loo... Resolved
Severity: 3
Rank (Obsolete): 17363

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/4db5899a-ab42-11e4-b27f-5254006e85c2.

The sub-test test_83 failed with the following error:

test failed to respond and timed out

Please provide additional information about the failure here.

Info required for matching: conf-sanity 83



 Comments   
Comment by Oleg Drokin [ 04/Feb/15 ]

Looking through the logs we can see that OSS crashed with some sort of a bad dentry:

19:22:19:BUG: Dentry ffff880071931b40{i=6243,n=O} still in use (1) [unmount of ldiskfs dm-0]
19:22:19:------------[ cut here ]------------
19:22:19:kernel BUG at fs/dcache.c:667!
19:22:19:invalid opcode: 0000 [#1] SMP 
19:22:19:last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/virtio0/block/vda/queue/scheduler
19:22:19:CPU 1 
19:22:19:Modules linked in: lod(U) mdt(U) mdd(U) mgs(U) obdecho(U) osc(U) ptlrpc_gss(U) osp(U) ofd(U) lfsck(U) ost(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic jbd2 nfsd exportfs nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
19:22:19:
19:22:19:Pid: 24277, comm: mount.lustre Not tainted 2.6.32-431.29.2.el6_lustre.g2cd44ad.x86_64 #1 Red Hat KVM
19:22:19:RIP: 0010:[<ffffffff811a4358>]  [<ffffffff811a4358>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0
19:22:19:RSP: 0018:ffff88006f76d838  EFLAGS: 00010296
19:22:19:RAX: 000000000000005a RBX: ffff880071931b40 RCX: 0000000000000000
19:22:19:RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
19:22:19:RBP: ffff88006f76d878 R08: ffffffff81c06900 R09: 0000000000000000
19:22:19:R10: 0000000000000000 R11: 2820657375206e69 R12: 0000000000000000
19:22:19:R13: ffffffff81a843c0 R14: ffff880071a7e790 R15: ffff880071931ba0
19:22:19:FS:  00007f61ca76c7a0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
19:22:19:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
19:22:19:CR2: 00007f61ca777000 CR3: 000000007ad98000 CR4: 00000000000006e0
19:22:19:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
19:22:19:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
19:22:19:Process mount.lustre (pid: 24277, threadinfo ffff88006f76c000, task ffff88007d6b0080)
19:22:19:Stack:
19:22:19: ffff88006ea06270 0000000000000000 ffff88006f76d878 ffff88006ea06000
19:22:19:<d> ffffffffa0813300 ffffffff81c06500 ffff88006ea06000 ffff880059a21bd0
19:22:19:<d> ffff88006f76d898 ffffffff811a4396 0000000000000286 ffff88006ea06000
19:22:19:Call Trace:
19:22:19: [<ffffffff811a4396>] shrink_dcache_for_umount+0x36/0x60
19:22:19: [<ffffffff8118b5df>] generic_shutdown_super+0x1f/0xe0
19:22:19: [<ffffffff8118b6d1>] kill_block_super+0x31/0x50
19:22:19: [<ffffffff8118bea7>] deactivate_super+0x57/0x80
19:22:19: [<ffffffff811ab8af>] mntput_no_expire+0xbf/0x110
19:22:19: [<ffffffffa0682b6d>] osd_umount+0x5d/0x130 [osd_ldiskfs]
19:22:19: [<ffffffffa0685dea>] osd_device_alloc+0x5aa/0x9d0 [osd_ldiskfs]
19:22:19: [<ffffffffa179fe2f>] obd_setup+0x1bf/0x290 [obdclass]
19:22:19: [<ffffffffa17a0108>] class_setup+0x208/0x870 [obdclass]
19:22:19: [<ffffffffa17a797c>] class_process_config+0x113c/0x27c0 [obdclass]
19:22:19: [<ffffffffa08491c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
19:22:19: [<ffffffffa0843818>] ? libcfs_log_return+0x28/0x40 [libcfs]
19:22:19: [<ffffffffa17ae312>] do_lcfg+0x622/0xac0 [obdclass]
19:22:19: [<ffffffffa17ae844>] lustre_start_simple+0x94/0x200 [obdclass]
19:22:19: [<ffffffffa17e2d31>] server_fill_super+0xfd1/0x1690 [obdclass]
19:22:19: [<ffffffffa0843818>] ? libcfs_log_return+0x28/0x40 [libcfs]
19:22:19: [<ffffffffa17b4370>] lustre_fill_super+0x560/0xa80 [obdclass]
19:22:19: [<ffffffffa17b3e10>] ? lustre_fill_super+0x0/0xa80 [obdclass]
19:22:19: [<ffffffff8118c56f>] get_sb_nodev+0x5f/0xa0
19:22:19: [<ffffffffa17ab3c5>] lustre_get_sb+0x25/0x30 [obdclass]
19:22:19: [<ffffffff8118bbcb>] vfs_kern_mount+0x7b/0x1b0
19:22:19: [<ffffffff8118bd72>] do_kern_mount+0x52/0x130
19:22:19: [<ffffffff811ad74b>] do_mount+0x2fb/0x930
19:22:19: [<ffffffff811ade10>] sys_mount+0x90/0xe0
19:22:19: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
19:22:19:Code: 50 30 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 70 02 00 00 48 89 de 48 c7 c7 80 75 7c 81 48 89 04 24 31 c0 e8 4c 4d 38 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 
19:22:19:RIP  [<ffffffff811a4358>] shrink_dcache_for_umount_subtree+0x2a8/0x2b0
19:22:19: RSP <ffff88006f76d838>

I wonder if this dentry is a fallout from incomplete/buggy LU-5729 fix whre we do free the inode, but not the dentry?

Comment by Gerrit Updater [ 04/Feb/15 ]

Sergey Cheremencev (sergey_cheremencev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/13649
Subject: LU-6207 osd: add osd_ost_fini in osd_obj_map_init
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 27f3a9c404d7268a57e6c5153f6c07c104eb19d9

Comment by Sergey Cheremencev [ 04/Feb/15 ]

Hi, Oleg

I don't think that LU-5729 is buggy. It seems there is also several places where cleanup is not done correctly.
We faced the similar problem after LU-5729(MRP-2109) and I prepared 2 patches.
http://review.whamcloud.com/13648
After above patch we hit the issue again and I found another place where dput is forgotten.
2nd patch
http://review.whamcloud.com/13649

Comment by Peter Jones [ 12/Jun/15 ]

Yang Sheng

Could you please take care of this patch?

Thanks

Peter

Comment by Gerrit Updater [ 19/Jun/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13648/
Subject: LU-6207 osd: add dput in osd_ost_init
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 51c90a076884a8cda2bcc655058be43c2c0dced2

Comment by Peter Jones [ 19/Jun/15 ]

Landed for 2.8

Comment by Sergey Cheremencev [ 19/Jun/15 ]

Hello Peter Jones.
I think this ticket should not be closed before the 2nd patch http://review.whamcloud.com/13649 will be landed.
I added new test to conf-sanity in http://review.whamcloud.com/13648. But new test(conf-sanity_85) may cause kernel panic in similar place with the same symptoms in osd.
So please take care that http://review.whamcloud.com/13649 is landed.
Thank you

Comment by Peter Jones [ 19/Jun/15 ]

Ah yes. Thanks for pointing that out.

Comment by Gerrit Updater [ 08/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13649/
Subject: LU-6207 osd: add osd_ost_fini in osd_obj_map_init
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 948794929c5724e6a78b2c470bf97bcea1a67555

Comment by Yang Sheng [ 09/Jul/15 ]

Patches landed. Close ticket.

Generated at Sat Feb 10 01:58:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.