[LU-4363] (llite_lib.c:1683:ll_update_inode()) ASSERTION( lu_fid_eq(&lli->lli_fid, &body->fid1) ) failed Created: 09/Dec/13 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Sebastien Buisson (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | mn1 | ||
| Severity: | 3 |
| Rank (Obsolete): | 11946 |
| Description |
|
Hi, At IFERC customer site, 7 compute nodes crashed with the following message in the console: 2013-11-21 00:57:45 LustreError: 92325:0:(llite_lib.c:1683:ll_update_inode()) ASSERTION( lu_fid_eq(&lli->lli_fid, &body->fid1) ) failed: Trying to change FID [0x217294ce4:0x107f0:0x0] to the [0x217294ce4:0x107f1:0x0], inode 150634522759727089/35072332(ffff8807dcbf85f8) 2013-11-21 00:57:45 LustreError: 92325:0:(llite_lib.c:1683:ll_update_inode()) LBUG 2013-11-21 00:57:45 Pid: 92325, comm: writer_v131 2013-11-21 00:57:45 2013-11-21 00:57:45 Call Trace: 2013-11-21 00:57:45 [<ffffffffa046f7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2013-11-21 00:57:45 [<ffffffffa046fe07>] lbug_with_loc+0x47/0xb0 [libcfs] 2013-11-21 00:57:45 [<ffffffffa0a91ca0>] ll_update_inode+0x4a0/0xf60 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a928ea>] ll_prep_inode+0x18a/0xae0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a7c8c3>] ll_intent_file_open+0x563/0xb80 [lustre] 2013-11-21 00:57:45 [<ffffffffa0aa6a90>] ? ll_md_blocking_ast+0x0/0x700 [lustre] 2013-11-21 00:57:45 [<ffffffff8108163e>] ? down+0x2e/0x50 2013-11-21 00:57:45 [<ffffffffa0a7cf67>] ll_lov_setstripe_ea_info+0x87/0x2b0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a831a5>] ll_lov_setstripe+0x85/0x5a0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0aa3e8b>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a84ac6>] ll_file_ioctl+0x826/0xe00 [lustre] 2013-11-21 00:57:45 [<ffffffff81179ff2>] vfs_ioctl+0x22/0xa0 2013-11-21 00:57:45 [<ffffffff8117a4ba>] do_vfs_ioctl+0x3aa/0x580 2013-11-21 00:57:45 [<ffffffff8117a711>] sys_ioctl+0x81/0xa0 2013-11-21 00:57:45 [<ffffffff8149970e>] ? do_device_not_available+0xe/0x10 2013-11-21 00:57:45 [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b 2013-11-21 00:57:45 2013-11-21 00:57:45 Kernel panic - not syncing: LBUG 2013-11-21 00:57:45 Pid: 92325, comm: writer_v131 Tainted: G W --------------- 2.6.32-279.5.2.bl6.Bull.36.x86_64 #1 2013-11-21 00:57:45 Call Trace: 2013-11-21 00:57:45 [<ffffffff81495fe3>] ? panic+0xa0/0x168 2013-11-21 00:57:45 [<ffffffffa046fe5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 2013-11-21 00:57:45 [<ffffffffa0a91ca0>] ? ll_update_inode+0x4a0/0xf60 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a928ea>] ? ll_prep_inode+0x18a/0xae0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a7c8c3>] ? ll_intent_file_open+0x563/0xb80 [lustre] 2013-11-21 00:57:45 [<ffffffffa0aa6a90>] ? ll_md_blocking_ast+0x0/0x700 [lustre] 2013-11-21 00:57:45 [<ffffffff8108163e>] ? down+0x2e/0x50 2013-11-21 00:57:45 [<ffffffffa0a7cf67>] ? ll_lov_setstripe_ea_info+0x87/0x2b0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a831a5>] ? ll_lov_setstripe+0x85/0x5a0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0aa3e8b>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a84ac6>] ? ll_file_ioctl+0x826/0xe00 [lustre] 2013-11-21 00:57:45 [<ffffffff81179ff2>] ? vfs_ioctl+0x22/0xa0 2013-11-21 00:57:45 [<ffffffff8117a4ba>] ? do_vfs_ioctl+0x3aa/0x580 2013-11-21 00:57:45 [<ffffffff8117a711>] ? sys_ioctl+0x81/0xa0 2013-11-21 00:57:45 [<ffffffff8149970e>] ? do_device_not_available+0xe/0x10 2013-11-21 00:57:45 [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b This issue looks like I havetested with the following reproducer, given in llmount.sh
cd /mnt/lustre
touch file1
In a single process do:
struct lov_user_md_v3 *lum;
/* Initialize lum */
fd2 = open("file2", O_RDWR|O_CREAT|O_LOV_DELAY_CREATE, 0666);
rename("file1", "file2");
ioctl(fd2, LL_IOC_LOV_SETSTRIPE, lum);
With a stock 2.1.6 I can easily reproduce the issue. And unfortunately, with patch at http://review.whamcloud.com/6775 I am still able to hit the bug. Thanks, |
| Comments |
| Comment by Lai Siyao [ 09/Dec/13 ] |
|
http://review.whamcloud.com/#/c/7476/ should be able to fix this, but this patch is for master code, and it has some dependency on patches not on 2.1. |
| Comment by Peter Jones [ 09/Dec/13 ] |
|
Lai Would this be easier to port to b2_4? Sebastien If the answer to the above is yes, would you consider deploying a 2.4.x release at IFERC? Peter |
| Comment by Sebastien Buisson (Inactive) [ 09/Dec/13 ] |
|
Peter, The problem is upgrade to 2.4 at IFERC is planned for Q4 2014 |
| Comment by Peter Jones [ 10/Dec/13 ] |
|
ok Sebastien. We are looking into options that would work for b2_1 |
| Comment by Lai Siyao [ 10/Dec/13 ] |
|
Yes, Sebastien, I'm looking for a simpler way to handle this open-by-fid case only, and I'm still testing, will commit the patch tomorrow. |
| Comment by Lai Siyao [ 10/Dec/13 ] |
|
Hi Sebastien, I just committed a patch http://review.whamcloud.com/#/c/8529/, you can apply it plus http://review.whamcloud.com/#/c/7476/ to make the test pass. However as is noted by John in |
| Comment by Sebastien Buisson (Inactive) [ 11/Dec/13 ] |
|
Hi, Patch http://review.whamcloud.com/8529 can be applied on Lustre 2.1.6, but http://review.whamcloud.com/7476 cannot because it is a master version (more than 20 hunks failed when trying on 2.1). Sebastien. |
| Comment by Lai Siyao [ 12/Dec/13 ] |
|
Sorry Sebastien, I posted the wrong patch, it should be http://review.whamcloud.com/6775 + http://review.whamcloud.com/#/c/8529/. You hit that assertion is because MDS_OPEN_BY_FID flag is not in 2.1 code, so open tends to be done by name on MDS, therefore when rename happens, the new file with different fid will be opened, and it causes the assert on fid change on client. 2.4 has this flag, and patch http://review.whamcloud.com/#/c/8529/ backports this flag to 2.1, so the assert will not be hit any more. |
| Comment by Sebastien Buisson (Inactive) [ 12/Dec/13 ] |
|
Hi, Thank you very much for the explanations! Now with http://review.whamcloud.com/6775 + http://review.whamcloud.com/8529 I am not able to hit the assertion anymore One more question: could you re-explain the drawback you identified with this solution (it was related to setstripe returning -ENOENT but I dd not get your point) ? Thanks, |
| Comment by Lai Siyao [ 12/Dec/13 ] |
ioctl(fd2, LL_IOC_LOV_SETSTRIPE, lum); As you can see, setstripe is done via ioctl on an opened file handle, but in the code setstripe is implemented as an open (so it's actually a re-open), this looks should succeed, but current MDS code doesn't allow re-open or create OST object for unlinked file. However there is no posix standard for setstripe call, this can be regarded as normal, but it should be documented somewhere IMO. |