[LU-4644] Test failure sanity test_161a: list_del corruption Created: 18/Feb/14 Updated: 18/Jul/17 Resolved: 17/Apr/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 12702 | ||||||||
| Description |
|
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/015e73f8-9870-11e3-968c-52540035b04c. The sub-test test_161a failed with the following error:
Info required for matching: sanity 161a MDS2 Console log: (review-dne-part-1 on master) 16:31:57:Lustre: DEBUG MARKER: == sanity test 161a: link ea sanity == 16:31:56 (1392683516) 16:31:57:------------[ cut here ]------------ 16:31:57:WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Not tainted) 16:31:57:Hardware name: KVM 16:31:57:list_del corruption. prev->next should be ffff88005f8fdb98, but was ffff88005f8fdbe0 16:31:57:Modules linked in: osp(U) mdd(U) lod(U) lfsck(U) mdt(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] 16:31:57:Pid: 10866, comm: mdt00_002 Not tainted 2.6.32-431.3.1.el6_lustre.gc762f0f.x86_64 #1 16:31:57:Call Trace: 16:31:57: [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0 16:31:57: [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50 16:31:57: [<ffffffff812948be>] ? list_del+0x6e/0xa0 16:31:57: [<ffffffff8109b6a1>] ? remove_wait_queue+0x31/0x50 16:31:57: [<ffffffffa05fa469>] ? lu_object_find_at+0xf9/0x360 [obdclass] 16:31:57: [<ffffffffa04a3cb2>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs] 16:31:57: [<ffffffffa05fa6e6>] ? lu_object_find+0x16/0x20 [obdclass] 16:31:57: [<ffffffffa0d80ed6>] ? mdt_object_find+0x56/0x170 [mdt] 16:31:57: [<ffffffffa0d8e0f3>] ? mdt_get_info+0xee3/0x1a20 [mdt] 16:31:57: [<ffffffffa089346c>] ? tgt_request_handle+0x23c/0xac0 [ptlrpc] 16:31:57: [<ffffffffa084266a>] ? ptlrpc_main+0xd1a/0x1980 [ptlrpc] 16:31:57: [<ffffffffa0841950>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] 16:31:57: [<ffffffff8109af06>] ? kthread+0x96/0xa0 16:31:57: [<ffffffff8100c20a>] ? child_rip+0xa/0x20 16:31:57: [<ffffffff8109ae70>] ? kthread+0x0/0xa0 16:31:57: [<ffffffff8100c200>] ? child_rip+0x0/0x20 16:31:57:---[ end trace dd02610dd223ce47 ]--- 16:31:57:------------[ cut here ]------------ |
| Comments |
| Comment by Alex Zhuravlev [ 18/Feb/14 ] |
|
this is due to extra lu_object_find() somewhere in mdt_get_info(), instead it should be picking that object directly from info like the other handlers do. please check with Mike Pershin for the details. |
| Comment by Jodi Levi (Inactive) [ 19/Feb/14 ] |
|
Mike, |
| Comment by Mikhail Pershin [ 11/Apr/14 ] |
|
The only possible point of problem is mdt_path_current() which may call mdt_object_find() twice for the same object, I need some time to make sure about that. |
| Comment by Andreas Dilger [ 14/Nov/14 ] |
|
Mike, can you please update this ticket. Sounds like it could be fixed by a relatively simple patch? |
| Comment by Andreas Dilger [ 25/Nov/14 ] |
|
Mike, can you please update this ticket? |
| Comment by Mikhail Pershin [ 27/Nov/14 ] |
|
The mdt_path()->mdt_path_current() may cause double mdt_object_find() it seems. I'll prepare patch. |
| Comment by Andreas Dilger [ 17/Apr/17 ] |
|
The referenced functions no longer exist, and no repeat of this problem in the past 3 years. |
| Comment by Oscar Santos [ 17/Jul/17 ] |
|
Andreas, Because my understanding is that mdt_get_info, mdt_path and mdt_patch_current are still present in master, can you clarify which referenced functions no longer exist? And as of what version? |
| Comment by Andreas Dilger [ 18/Jul/17 ] |
|
Oscar, I'm not sure of the context of my comment. I may have been confused because there were typos in the function names. In any case, this test has not failed in a long time, and this code has changed significantly since the bug was filed, so I don't think it is worthwhile to keep it open. |
| Comment by Alex Zhuravlev [ 18/Jul/17 ] |
|
multiple calls to lu_object_find() shouldn't be a problem since |