[LU-4958] do not crash accessing LOV object with FID {0, 0} Created: 25/Apr/14 Updated: 16/Dec/14 Resolved: 16/Dec/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.6, Lustre 2.5.1, Lustre 2.4.3 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.5.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | nasf (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB, mn4 | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 13718 | ||||||||
| Description |
|
If an orphan object with stripe_index != 0 is linked to a recreated MDS inode in http://review.whamcloud.com/7810, but not all of the objects are present (e.g. some of the stripes of that file were lost, but a non-zero stripe_index orphan remained) the client will crash if the file is accessed (e.g. "ls -l"): LustreError: 19393:0:(ldlm_resource.c:1077:ldlm_resource_get()) ASSERTION( name->name[0] != 0 ) failed: LustreError: 19393:0:(ldlm_resource.c:1077:ldlm_resource_get()) LBUG Pid: 19393, comm: ls Call Trace: [<ffffffffa0ef9895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0ef9e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa07c4f20>] ldlm_resource_get+0x700/0x900 [ptlrpc] [<ffffffffa07bf1b9>] ldlm_lock_create+0x59/0xcc0 [ptlrpc] [<ffffffffa07d8314>] ldlm_cli_enqueue+0xa4/0x790 [ptlrpc] [<ffffffffa09ebd44>] osc_enqueue_base+0x1e4/0x5b0 [osc] [<ffffffffa0a082fd>] osc_lock_enqueue+0x1ed/0x8c0 [osc] [<ffffffffa105be7c>] cl_enqueue_try+0xfc/0x300 [obdclass] [<ffffffffa0a5d42a>] lov_lock_enqueue+0x22a/0x850 [lov] [<ffffffffa105be7c>] cl_enqueue_try+0xfc/0x300 [obdclass] [<ffffffffa105d0cf>] cl_enqueue_locked+0x6f/0x1f0 [obdclass] [<ffffffffa105dd1e>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa123dba0>] cl_glimpse_lock+0x180/0x490 [lustre] [<ffffffffa123e415>] cl_glimpse_size0+0x1a5/0x1d0 [lustre] [<ffffffffa11eb55d>] ll_inode_revalidate_it+0x1cd/0x660 [lustre] [<ffffffffa11eba3a>] ll_getattr_it+0x4a/0x1b0 [lustre] [<ffffffffa11ebbd7>] ll_getattr+0x37/0x40 [lustre] [<ffffffff81186db1>] vfs_getattr+0x51/0x80 [<ffffffff81186e40>] vfs_fstatat+0x60/0x80 [<ffffffff81186ece>] vfs_lstat+0x1e/0x20 [<ffffffff81186ef4>] sys_newlstat+0x24/0x50 I'm not sure what the right way to handle this is, since this would affect all old clients trying to access files in .lustre/lost+found so fixing just the 2.6 client is not enough. Either we need to backport the fix to 2.5.2 and 2.4.3 and 2.1.7 clients (not very good, since we aren't sure if the client has the fix), or use some other lmm_magic or lmm_pattern to ensure that unpatched clients will not understand it. In the second case (using a different lmm_magic or lmm_pattern, maybe LOV_PATTERN_F_SPARSE?) the lfsck_layout_extend_lovea() code would need to decide as stripes are added if the layout is sparse (set the flag, old clients cannot access) or if it is full (clear the flag, old clients can access). |
| Comments |
| Comment by Andreas Dilger [ 25/Apr/14 ] |
|
This is similar to |
| Comment by Andreas Dilger [ 26/May/14 ] |
|
This is being fixed on master along with the LFSCK changes in http://review.whamcloud.com/10042 but there will need to be a separate patch for b2_4 and b2_5 that just fixes the client code. |
| Comment by nasf (Inactive) [ 26/May/14 ] |
|
With the patch http://review.whamcloud.com/#/c/10042/ to be landed on master, when old client talks with Lustre-2.6 server, the MDS will reply failure to the old client directly if the LOV EA contains hole. So it will NOT crash the old client. So do we still need to fix client-side code to operate the file with LOV EA hole? |
| Comment by Andreas Dilger [ 27/May/14 ] |
|
Yes, this bug should still be fixed on the older clients because there may still be corrupt LOV EA that has FID {0,0}even if it was not created by LFSCK. Clients should not crash from data sent by the network. |
| Comment by Andreas Dilger [ 08/Nov/14 ] |
|
See if we ca reassign this bug to someone else. |
| Comment by Jodi Levi (Inactive) [ 10/Nov/14 ] |
|
YangSheng, |
| Comment by Gerrit Updater [ 16/Nov/14 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/12740 Project: fs/lustre-release |
| Comment by Gerrit Updater [ 16/Nov/14 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: http://review.whamcloud.com/12742 Project: fs/lustre-release |
| Comment by Gerrit Updater [ 09/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12740/ Project: fs/lustre-release |
| Comment by Yang Sheng [ 10/Dec/14 ] |
|
Does this patch need port to b2_4? and b2_1? |
| Comment by Andreas Dilger [ 12/Dec/14 ] |
|
Probably no need to port this to those older branches. |
| Comment by Peter Jones [ 16/Dec/14 ] |
|
Landed for 2.5.4 and 2.7 |