LFSCK phase II technical debts
(LU-4701)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Technical task | Priority: | Blocker |
| Reporter: | Andreas Dilger | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 12832 | ||||||||
| Description |
|
If an orphan object with stripe_index != 0 is linked to a recreated MDS inode in http://review.whamcloud.com/7810, but not all of the objects are present (e.g. some of the stripes of that file were lost, but a non-zero stripe_index orphan remained) the client will crash if the file is accessed (e.g. "ls -l"): LustreError: 19393:0:(ldlm_resource.c:1077:ldlm_resource_get()) ASSERTION( name->name[0] != 0 ) failed: LustreError: 19393:0:(ldlm_resource.c:1077:ldlm_resource_get()) LBUG Pid: 19393, comm: ls Call Trace: [<ffffffffa0ef9895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0ef9e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa07c4f20>] ldlm_resource_get+0x700/0x900 [ptlrpc] [<ffffffffa07bf1b9>] ldlm_lock_create+0x59/0xcc0 [ptlrpc] [<ffffffffa07d8314>] ldlm_cli_enqueue+0xa4/0x790 [ptlrpc] [<ffffffffa09ebd44>] osc_enqueue_base+0x1e4/0x5b0 [osc] [<ffffffffa0a082fd>] osc_lock_enqueue+0x1ed/0x8c0 [osc] [<ffffffffa105be7c>] cl_enqueue_try+0xfc/0x300 [obdclass] [<ffffffffa0a5d42a>] lov_lock_enqueue+0x22a/0x850 [lov] [<ffffffffa105be7c>] cl_enqueue_try+0xfc/0x300 [obdclass] [<ffffffffa105d0cf>] cl_enqueue_locked+0x6f/0x1f0 [obdclass] [<ffffffffa105dd1e>] cl_lock_request+0x7e/0x270 [obdclass] [<ffffffffa123dba0>] cl_glimpse_lock+0x180/0x490 [lustre] [<ffffffffa123e415>] cl_glimpse_size0+0x1a5/0x1d0 [lustre] [<ffffffffa11eb55d>] ll_inode_revalidate_it+0x1cd/0x660 [lustre] [<ffffffffa11eba3a>] ll_getattr_it+0x4a/0x1b0 [lustre] [<ffffffffa11ebbd7>] ll_getattr+0x37/0x40 [lustre] [<ffffffff81186db1>] vfs_getattr+0x51/0x80 [<ffffffff81186e40>] vfs_fstatat+0x60/0x80 [<ffffffff81186ece>] vfs_lstat+0x1e/0x20 [<ffffffff81186ef4>] sys_newlstat+0x24/0x50 I'm not sure what the right way to handle this is, since this would affect all old clients trying to access files in .lustre/lost+found so fixing just the 2.6 client is not enough. Either we need to backport the fix to 2.5.2 and 2.4.3 and 2.1.7 clients (not very good, since we aren't sure if the client has the fix), or use some other lmm_magic or lmm_pattern to ensure that unpatched clients will not understand it. In the second case (using a different lmm_magic or lmm_pattern, maybe LOV_PATTERN_F_SPARSE?) the lfsck_layout_extend_lovea() code would need to decide as stripes are added if the layout is sparse (set the flag, old clients cannot access) or if it is full (clear the flag, old clients can access). |
| Comments |
| Comment by Andreas Dilger [ 26/Feb/14 ] |
|
To be clear, I don't think that the LASSERT() should be removed, since object {0,0}should never be accessed. Rather, the LOV code should just skip such objects entirely, and return -EIO in such a case. Please include a test case that creates such a file, and runs a number of different operations on it (stat, read, write, touch, chown, unlink) to make sure the different paths are covered. |
| Comment by nasf (Inactive) [ 22/Apr/14 ] |
|
Here is the patch: |
| Comment by nasf (Inactive) [ 22/Apr/14 ] |
|
When the layout LFSCK repairs orphan OST-object, if the parent Some of the LOV EA holes may cannot be re-filled finally becuase So we will make the client to be aware of the LOV EA is incomplete. For a new client, it recongizes the pattern flag LOV_PATTERN_F_HOLE, 1) getattr/getxattr opertions are permitted, such as stat/ls -l, and 2) Normal read the file with LOV EA hole is not permitted to avoid 3) If the modification only changes MDS-side metadata, such as chmod, 4) unlink/rm the file which has LOV EA holes is permitted. 5) For other modifications, if the modification will change something For a old client, since it will not recognize the new pattern flag |
| Comment by James Nunez (Inactive) [ 22/Apr/14 ] |
|
I've encountered this client crash while trying to track down what files were causing LFSCK to fail during phase 1 of the scan. |
| Comment by Andreas Dilger [ 25/Apr/14 ] |
|
To be clear, while http://review.whamcloud.com/10042 is fixing the problem of LFSCK creating layouts with holes in them, there is still the separate bug ( |
| Comment by nasf (Inactive) [ 29/May/14 ] |
|
The master patch has been landed. The patches for old client will be done under |