[LU-5285] mdt_reconstruct_setattr() calls mdt_attr_get_complex() without checking that object exists Created: 02/Jul/14 Updated: 12/Jan/15 Resolved: 11/Jul/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Hammond | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mdt, mn4 | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 14748 | ||||||||||||||||||||||||
| Description |
|
Running racer on 2 nodes with MDSCOUNT=1 and http://review.whamcloud.com/#/c/5936/ (add more operations to racer) I see this frequently. [ 803.128945] Lustre: lustre-MDT0000: Client 045a894d-d2a7-e744-df1f-a695740583d8 (at 19\ 2.168.122.108@tcp) reconnecting [ 803.134578] LustreError: 5113:0:(lu_object.h:852:lu_object_attr()) ASSERTION( ((o)->lo\ _header->loh_attr & LOHA_EXISTS) != 0 ) failed: [ 803.136583] LustreError: 5113:0:(lu_object.h:852:lu_object_attr()) LBUG [ 803.137789] Pid: 5113, comm: mdt00_005 [ 803.138439] [ 803.138440] Call Trace: [ 803.139129] [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 803.140296] [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs] [ 803.141337] [<ffffffffa0c034ac>] mdt_attr_get_complex+0xaec/0xb80 [mdt] [ 803.142457] [<ffffffffa0c20425>] mdt_reconstruct_setattr+0xd5/0x410 [mdt] [ 803.143631] [<ffffffffa04737e0>] ? lu_ucred+0x20/0x30 [obdclass] [ 803.144677] [<ffffffffa0c1fde5>] mdt_reconstruct+0x45/0x120 [mdt] [ 803.145722] [<ffffffffa0bfdbfa>] mdt_reint_internal+0x6fa/0x7c0 [mdt] [ 803.146820] [<ffffffffa0bfe24b>] mdt_reint+0x6b/0x120 [mdt] [ 803.147818] [<ffffffffa06e9c45>] tgt_request_handle+0x245/0xad0 [ptlrpc] [ 803.148977] [<ffffffffa069ce31>] ptlrpc_main+0xcf1/0x1880 [ptlrpc] [ 803.150053] [<ffffffffa069c140>] ? ptlrpc_main+0x0/0x1880 [ptlrpc] [ 803.151093] [<ffffffff8109eab6>] kthread+0x96/0xa0 [ 803.151886] [<ffffffff8100c30a>] child_rip+0xa/0x20 [ 803.152714] [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40 [ 803.153677] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 803.154572] [<ffffffff8109ea20>] ? kthread+0x0/0xa0 [ 803.155403] [<ffffffff8100c300>] ? child_rip+0x0/0x20 [ 803.156261] When I hit this I see the following sequence of events:
|
| Comments |
| Comment by Andreas Dilger [ 02/Jul/14 ] |
|
Looks like this was added in http://review.whamcloud.com/3782 |
| Comment by John Hammond [ 02/Jul/14 ] |
|
From the source mdt_reconstruct_create() has exactly the same issue. I haven't gotten a crash there but it does pass an object to mdt_attr_get_complex() without checking that it exists. mdt_reconstruct_open() looks OK. Probably all uses of mdt_attr_get_complex() should be audited... |
| Comment by Alex Zhuravlev [ 03/Jul/14 ] |
|
Andreas, 2.1 doesn't check that either. why do we need to return the attributes in the setattr path? we're in the reconstruct path and given the original transaction was committed, rep-ack lock was cancelled and another client was able to destroy the object. so we can't return valid attribues. with no LDLM lock the attributes can't be used either? |
| Comment by Andreas Dilger [ 04/Jul/14 ] |
|
I'm OK with not returning the attributes for setattr if the object is missing. The setattr itself must have been processed, so it shouldn't return an error to the caller (i.e. no -ENOENT), but it is OK if the attributes on the client inode are not updated, and it could even be purged from cache if there is a mechanism to do that (e.g. return only mbo_nlink=0 and mbo_flags=OBD_MD_FLNLINK to the client). |
| Comment by Alex Zhuravlev [ 09/Jul/14 ] |
| Comment by Peter Jones [ 11/Jul/14 ] |
|
Landed for 2.6 |