[LU-2523] ll_update_inode()) ASSERTION( lu_fid_eq(&lli->lli_fid, &body->fid1) ) failed: Trying to change FID Created: 24/Dec/12 Updated: 26/Jan/16 Resolved: 13/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Oleg Drokin | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB, mn1 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||
| Rank (Obsolete): | 5940 | ||||||||||||||||||||||||||||
| Description |
|
After landing http://review.whamcloud.com/#change,4478 I started to get racer crashes in lfs setstripe: [63022.322673] LustreError: 20985:0:(llite_lib.c:1765:ll_update_inode()) ASSERTION( lu_fid_eq(&lli->lli_fid, &body->fid1) ) failed: Trying to change FID [0x200000401:0x83b3:0x0] to the [0x200000400:0x8063:0x0], inode 144115205255757923/33554436(ffff88007ab86b08) [63022.323770] LustreError: 20985:0:(llite_lib.c:1765:ll_update_inode()) LBUG [63022.324067] Pid: 20985, comm: lfs [63022.324320] [63022.324321] Call Trace: [63022.324880] [<ffffffffa041d915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [63022.325186] [<ffffffffa041df27>] lbug_with_loc+0x47/0xb0 [libcfs] [63022.325481] [<ffffffffa0e1feb1>] ll_update_inode+0xe11/0xe50 [lustre] [63022.325778] [<ffffffffa0e20072>] ll_prep_inode+0x182/0xc00 [lustre] [63022.326071] [<ffffffffa0e0f503>] ll_intent_file_open+0x563/0x840 [lustre] [63022.326372] [<ffffffffa0e36850>] ? ll_md_blocking_ast+0x0/0x730 [lustre] [63022.326667] [<ffffffffa0e0f868>] ll_lov_setstripe_ea_info+0x88/0x2d0 [lustre] [63022.327180] [<ffffffffa0e12252>] ll_lov_setstripe+0x92/0x5a0 [lustre] [63022.327470] [<ffffffffa0e13fae>] ll_file_ioctl+0xc9e/0x1230 [lustre] [63022.327773] [<ffffffff81043444>] ? __do_page_fault+0x204/0x490 [63022.328041] [<ffffffff8118e112>] vfs_ioctl+0x22/0xa0 [63022.328297] [<ffffffff81188225>] ? putname+0x35/0x50 [63022.328563] [<ffffffff8118ea9e>] do_vfs_ioctl+0x3ee/0x5e0 [63022.328823] [<ffffffff8118ed11>] sys_ioctl+0x81/0xa0 [63022.329081] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b [63022.329349] [63022.338857] Kernel panic - not syncing: LBUG crashdump is in /exports/crashdumps/192.168.10.218-2012-12-24-01:46:51/ |
| Comments |
| Comment by Andreas Dilger [ 24/Dec/12 ] |
|
Haven't looked into the details yet, but it might be that the LOV EA from userspace is being used directly to set the striping on the inode, but there is a race between getting the layout and setting the layout, and the user data has an old FID in it. Probably not too hard to fix once the details are worked out. |
| Comment by Bruno Faccini (Inactive) [ 24/Dec/12 ] |
|
I wanted to have a look but I don't know on which system is the crash-dump repository ?? |
| Comment by Oleg Drokin [ 28/Dec/12 ] |
|
Bruno: this is on my home testbox, ping me on skype for login details |
| Comment by Bruno Faccini (Inactive) [ 03/Jan/13 ] |
|
Assigning to me, and thank's Oleg I will ping you on skype soon. |
| Comment by Bruno Faccini (Inactive) [ 04/Jan/13 ] |
|
Ok, thank's Oleg I am now connected to your testbox and have access to the crash-dump. But, since I did not find associated Kernel's, I decided to try using the one from the latest Maloo/Jenkins build (#11433) for this Jira, and doing so, crash tool reported me that the "vmcore" was not a SMP one !! Could this be the problem causing the crashes since as far as I know UMP Kernels willingly miss some locking/protection stuff, or more likely it is a crash-tool weirdness ... To confirm this or reproduce the crash I will provision a node with the build for this JIRA and run racer on it. Just to be sure we speak about the same test, it is the one that installs in /usr/lib64/lustre/tests/racer* ?? Last, and you may not do this automatically because of the space required, but since the vmlinux/NameList file is not saved in the same place, how can we retrieve or rebuild it ?? |
| Comment by Bruno Faccini (Inactive) [ 04/Jan/13 ] |
|
Ok thank's again Oleg, after you pointed me to the right vmlinux, crash becomes happy and no longer report the vmcore beeing non-SMP, I guess it would have been too easy !! So, diving into the crash-dump now ... |
| Comment by Andreas Dilger [ 17/Jan/13 ] |
|
Bruno, any update on this? |
| Comment by Bruno Faccini (Inactive) [ 18/Jan/13 ] |
|
Sorry Andreas, you are right I am late on this. Finally I got a few time to concentrate and dig into this crash-dump, and I 1st started from the upper-end of the panic stack where I found the Assert condition was hopefully still there. Trying from the lower-end, I found that the only fetched user-datas for the ioctl/setstripe looks like : crash> p/x *(struct lov_user_md_v1 *)0xffff880042c35d78
$29 = {
lmm_magic = 0xbd10bd0,
lmm_pattern = 0x0,
lmm_object_id = 0x0,
lmm_object_seq = 0x0,
lmm_stripe_size = 0x0,
lmm_stripe_count = 0x1,
u = {
lum_stripe_offset = 0xffff,
lum_layout_gen = 0xffff
},
lmm_objects = 0xffff880042c35d98
}
So the problem seem to come from out-dated FID in the Client/local inode vs the one coming from MDT ... crash> bt
PID: 20985 TASK: ffff88002ea52240 CPU: 3 COMMAND: "lfs"
#0 [ffff880042c35920] machine_kexec at ffffffff8103201b
#1 [ffff880042c35980] crash_kexec at ffffffff810b8ba2
#2 [ffff880042c35a50] panic at ffffffff814f75eb
#3 [ffff880042c35ad0] lbug_with_loc at ffffffffa041df7b [libcfs]
#4 [ffff880042c35af0] ll_update_inode at ffffffffa0e1feb1 [lustre]
#5 [ffff880042c35b90] ll_prep_inode at ffffffffa0e20072 [lustre]
#6 [ffff880042c35c50] ll_intent_file_open at ffffffffa0e0f503 [lustre]
#7 [ffff880042c35ce0] ll_lov_setstripe_ea_info at ffffffffa0e0f868 [lustre]
#8 [ffff880042c35d60] ll_lov_setstripe at ffffffffa0e12252 [lustre]
#9 [ffff880042c35df0] ll_file_ioctl at ffffffffa0e13fae [lustre]
#10 [ffff880042c35e60] vfs_ioctl at ffffffff8118e112
#11 [ffff880042c35ea0] do_vfs_ioctl at ffffffff8118ea9e
#12 [ffff880042c35f30] sys_ioctl at ffffffff8118ed11
#13 [ffff880042c35f80] system_call_fastpath at ffffffff8100b0f2
RIP: 00007fc341ecb257 RSP: 00007fff16f8b618 RFLAGS: 00010206
RAX: 0000000000000010 RBX: ffffffff8100b0f2 RCX: 0000000000000000
RDX: 00007fff16f8b6e0 RSI: 000000004008669a RDI: 0000000000000003
RBP: 0000000000000000 R8: 00000000ffffffff R9: 0000000000000001
R10: 00007fff16f8b3a0 R11: 0000000000000206 R12: 00007fff16f8b6e0
R13: 00007fff16f8d9fa R14: 0000000000000003 R15: 0000000000000000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
crash> files
PID: 20985 TASK: ffff88002ea52240 CPU: 3 COMMAND: "lfs"
ROOT: CWD: home/green/git/lustre-release/lustre/tests/racer
FD FILE DENTRY INODE TYPE PATH
0 ffff880026128f08 ffff8800b8092f18 ffff8800b9f04cf0 CHR dev/null
1 ffff88008524cf08 ffff88009d96bf18 ffff88004757acf0 FIFO
2 ffff8800305f1f08 ffff8800b8092f18 ffff8800b9f04cf0 CHR dev/null
3 ffff880052a11f08 ffff88000650af18 ffff88007ab86b08 LNK mnt/lustre2/racer/16
crash>
So could it be possible that we are not safe against underlying file/inode change ?? Oleg, running with the "lfs setstripe" addon in file_create.sh script of "racer", is it possible that we hit a collision with a concurrent file_symlink.sh ?? I know it is old, but do you remember what kind of work-load (multiple Clients, multiple "racer" occurrences, multiple directories, ...) you ran at time of crash ?? |
| Comment by Bruno Faccini (Inactive) [ 18/Jan/13 ] |
|
Having a look in the related source code and more current structures in the crash-dump, is it me or the problem/race would better have occurred somewhere between md_intent_lock() and ll_prep_inode()/md_get_lustre_md() calls in ll_intent_file_open() ?? |
| Comment by Bruno Faccini (Inactive) [ 24/Jan/13 ] |
|
I have been able to reproduce (concerned inode is also for a symlink!) on Toro, still after introducing "lfs setstripe" addon in file_create.sh script of "racer", but again the low-level of Lustre debug-trace does not help to understand the sequence of actions leading to the crash/LBUG. Trying to reproduce now with full debug-trace enabled and all-pages (including User/process address-space) to be dumped. And it just crashed again now !!! Waiting for node reboot to investigate new crash datas. |
| Comment by John Hammond [ 05/Feb/13 ] |
|
Here is a simplified reproducer:
llmount.sh
cd /mnt/lustre
touch file1
In a single process do:
struct lov_user_md_v3 *lum;
/* Initialize lum */
fd2 = open("file2", O_RDWR|O_CREAT|O_LOV_DELAY_CREATE, 0666);
rename("file1", "file2");
ioctl(fd2, LL_IOC_LOV_SETSTRIPE, lum);
|
| Comment by John Hammond [ 05/Feb/13 ] |
|
I if remove the mdd_is_dead_obj() checks from mdd_open_sanity_check() and mdd_cd_sanity_check() then this goes away and I pass racer. Does removing these checks sound scary? For a real fix, as long as setstripe is piggybacked on ll_intent_file_open(), I think that there needs to be a way of doing real open by fid without the unlinked checks, and without falling back to open by path as is happening here. |
| Comment by Bruno Faccini (Inactive) [ 06/Feb/13 ] |
|
Yes, I agree, open by path seems the problem. I confirmed this from the full trace I got. Will try to see how to change things in ll_lov_setstripe_ea_info()/ll_intent_file_open() sequence. I may need help/comments !!... |
| Comment by John Hammond [ 06/Feb/13 ] |
|
Three comments:
|
| Comment by Bruno Faccini (Inactive) [ 07/Feb/13 ] |
|
John, Since I am not really familiar with this code, I wanted to just have a look if something could not be done on client-side too. You were ahead of me in the involved source code reading ! So, I agree with you that avoiding the "dead" check in mdd_open_sanity_check()appears as a quick fix for this particular case, if no others/hidden implications. Having a look to the DEAD_OBJ flag usage/test in all the code, looks like there may be other candidates like mdd_cd_sanity_check() as you already pointed, and the POSIX-related comment in mdd_readpage() makes me wonder on how to address all cases. I know that dealing with opened+unlinked files as always been tricky for filesystems software, so do you have an idea ? Anyway I will start with a patch like you proposed as to "remove the mdd_is_dead_obj() checks from mdd_open_sanity_check() and mdd_cd_sanity_check()". But I will try to add it extra-checks (like object beeing already/currenly referenced, ...) in order to fit the specific condition it triggered there. Will add you as reviewer for having your feeling about it. |
| Comment by John Hammond [ 08/Feb/13 ] |
|
OK good. Using racer, I noticed that the issue reproduced much more quickly if I commented out the dd command from file_create.sh. So it may be worthwhile to test with and without that line. Also, running with the above discussed changes to mdd_open_sanity_check() and mdd_cd_sanity_check(), I hit the following: LustreError: 31309:0:(lod_lov.c:782:lod_load_striping()) ASSERTION( lo->ldo_stripe[i] ) failed: stripe 0 is NULL
LustreError: 31309:0:(lod_lov.c:782:lod_load_striping()) LBUG
Pid: 31309, comm: mdt00_003
crash> bt -l
PID: 31309 TASK: ffff88015023a040 CPU: 1 COMMAND: "mdt00_003"
#0 [ffff8801728dd828] machine_kexec at ffffffff81031f7b
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/kernel/machine_kexec_64.c: 336
#1 [ffff8801728dd888] crash_kexec at ffffffff810b8c22
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/kernel/kexec.c: 1106
#2 [ffff8801728dd958] panic at ffffffff814eae18
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/kernel/panic.c: 103
#3 [ffff8801728dd9d8] lbug_with_loc at ffffffffa0bcdeeb [libcfs]
/root/lustre-release/libcfs/libcfs/linux/linux-debug.c: 188
#4 [ffff8801728dd9f8] lod_load_striping at ffffffffa087e95f [lod]
/root/lustre-release/lustre/include/lu_object.h: 868
#5 [ffff8801728dda38] lod_declare_attr_set at ffffffffa088af8b [lod]
/root/lustre-release/lustre/lod/lod_object.c: 300
#6 [ffff8801728dda88] mdd_rename at ffffffffa0266528 [mdd]
/root/lustre-release/lustre/mdd/mdd_dir.c: 2087
#7 [ffff8801728ddba8] mdt_reint_rename at ffffffffa07b9627 [mdt]
/root/lustre-release/lustre/mdt/mdt_reint.c: 1270
#8 [ffff8801728ddcc8] mdt_reint_rec at ffffffffa07b56c1 [mdt]
/root/lustre-release/libcfs/include/libcfs/libcfs_debug.h: 211
#9 [ffff8801728ddce8] mdt_reint_internal at ffffffffa07aed23 [mdt]
/root/lustre-release/libcfs/include/libcfs/libcfs_debug.h: 211
#10 [ffff8801728ddd28] mdt_reint at ffffffffa07af054 [mdt]
/root/lustre-release/lustre/mdt/mdt_handler.c: 1818
#11 [ffff8801728ddd48] mdt_handle_common at ffffffffa079ffc8 [mdt]
/root/lustre-release/lustre/mdt/mdt_handler.c: 2981
#12 [ffff8801728ddd98] mds_regular_handle at ffffffffa07d7605 [mdt]
/root/lustre-release/lustre/mdt/mdt_mds.c: 354
#13 [ffff8801728ddda8] ptlrpc_server_handle_request at ffffffffa0fbbc7c [ptlrpc]
/root/lustre-release/lustre/include/lustre_net.h: 2771
#14 [ffff8801728ddea8] ptlrpc_main at ffffffffa0fbd1c6 [ptlrpc]
/root/lustre-release/lustre/ptlrpc/service.c: 2487
#15 [ffff8801728ddf48] kernel_thread at ffffffff8100c0ca
/usr/src/debug///////kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/kernel/entry_64.S: 1213
I can't say now if this was caused by my changes or not. |
| Comment by John Hammond [ 08/Feb/13 ] |
|
To my great relief, the lod_load_striping() LBUG is reproducible on master. I'll create a separate issue. |
| Comment by John Hammond [ 08/Feb/13 ] |
|
Please see http://review.whamcloud.com/5314. |
| Comment by John Hammond [ 08/Feb/13 ] |
|
Should ll_intent_file_open() be unconditionally setting MDS_OPEN_BY_FID in it_flags? |
| Comment by Andreas Dilger [ 08/Feb/13 ] |
|
John, Bruno, It would also be great if any patch you write would include comments to explain things so that the next person who has to look at the code has a bit of help. |
| Comment by Bruno Faccini (Inactive) [ 11/Feb/13 ] |
|
John, seems that autotest sanityn.test_30 failed with the patch. Due to the nature of test_30 it is highly suspect that change is involved. But may be it is a particular case with currently exec()'ed/binary files, then unlinked and re-accessed via its /proc/<PID>/exe method. I do not have access to "#bug #11110" cited in reference in this test, so I miss the original reason of this test, but may be it remembers something to you or somebody in CC ? Andreas, thank's for adding Jinshan and Fan Yong to help. BTW, this is actually one of the problem I have when I need to request for reviewers, I miss the knowledge of who has been involved with what and I don't know how to quickly find it. |
| Comment by John Hammond [ 11/Feb/13 ] |
|
It looks like the original reason for this test was to require that Lustre imitate NFS's ESTALE semantics for exec. This seems like a bogus requirement. The kernel and ext4 seem to have no problems with executing an open unlinked file through /proc/PID/exe. |
| Comment by Jinshan Xiong (Inactive) [ 11/Feb/13 ] |
|
I'm looking at this issue. |
| Comment by Jinshan Xiong (Inactive) [ 11/Feb/13 ] |
|
You guys are absolutely right for the root cause of problem. However, I do have a different idea about the fix. The code below may have some problem: From mdt_reint_open(): result = mdt_open_by_fid_lock(info, ldlm_rep, lhc);
if ((result != -ENOENT && !(create_flags & MDS_OPEN_CREAT)) &&
result != -EREMOTE)
GOTO(out, result);
Shouldn't be: if (!(result == -ENOENT && create_flags & MDS_OPEN_CREATE) || result == -EREMOTE)? that means if there is no file existing and no intention to create, or the file is on the other MDT, it will exit immediately, actually (result == -EREMOTE) is redundant here. Can you please productize your reproduce case and add it to sanity.sh? |
| Comment by Bruno Faccini (Inactive) [ 12/Feb/13 ] |
|
Seems to me that the reason sanityn/test_30 fails with John patch is because we no longer return ENOENT from mdd_open_sanity_check() if mdd_is_dead_obj() is true, thus we do not switch to open-by-name in mdt_reint_open() and we keep using/returning the same/original FID. This cause the FID control in mdc_finish_intent_lock() to be ok and thus ESTALE is not returned. If we want to still behave the same way for test_30 case and fix problem for this ticket, I was wondering if there is a way to pass the M_CHECK_STALE mode bit to the MDS/Server side, thus we can use it to keep return ENOENT ?? |
| Comment by John Hammond [ 12/Feb/13 ] |
|
In mdt_reint_open(), if mdt_open_by_fid_lock() returns -EREMOTE then I believe that we should not return early. So how can the 'result == -EREMOTE' test be redundant? Do you mean the following: if (result == -ENOENT) {
if (!(create_flags & MDS_OPEN_CREAT))
GOTO(out, result);
} else if (result != -EREMOTE) {
GOTO(out, result);
}
Or more concisely: if (result == -ENOENT ? !(create_flags & MDS_OPEN_CREAT) : result != -EREMOTE)
GOTO(out, result);
(I like this approach because every time I use the ternary operator, Andreas rejects my patch for insufficient parentheses.) |
| Comment by Bruno Faccini (Inactive) [ 13/Feb/13 ] |
|
I am currently building a test version including the change you just discussed, but just for my highlight, the expected outcome is to return ENOENT for this JIRA/ |
| Comment by John Hammond [ 13/Feb/13 ] |
|
Please see http://review.whamcloud.com/5417. I have tried to follow Jinshan's suggestion as closely as possible, and have not changed the behavior of mdd_open_sanity_check() or mdd_cd_sanity_check(). (Although I believe we should, doing so would be better addressed in a separate issue.) |
| Comment by John Hammond [ 13/Feb/13 ] |
|
Jinshan please tell me what's wrong with my reasoning about -EREMOTE in mdt_reint_open(). Even for a remote child the server needs to return a lock. Moreover if I use the condition you suggest then I can trigger an LBUG as follows: ### To avoid LU-2789, in lustre/tests/file_create.sh comment out the ### lfs setstripe lines and replace dd with touch. # MDSCOUNT=2 MOUNT_2=yes llmount.sh # (cd /mnt/lustre; while true; do lfs mkdir -i1 racer/$((RANDOM % 16)); done) 2>/dev/null & # sh ./lustre/tests/racer.sh LustreError: 10096:0:(mdt_handler.c:3784:mdt_intent_reint()) ASSERTION( lustre_handle_is_used(&lhc->mlh_reg_lh) ) failed: LustreError: 10096:0:(mdt_handler.c:3784:mdt_intent_reint()) LBUG Pid: 10096, comm: mdt00_005 Call Trace: [<ffffffffa0e31895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0e31e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0a4258e>] mdt_intent_reint+0x4ae/0x4f0 [mdt] [<ffffffffa0a3de9e>] mdt_intent_policy+0x3ae/0x750 [mdt] [<ffffffffa05d4351>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] [<ffffffffa05fa447>] ldlm_handle_enqueue0+0x4f7/0x1080 [ptlrpc] [<ffffffffa0a3e376>] mdt_enqueue+0x46/0x110 [mdt] [<ffffffffa0a32fb8>] mdt_handle_common+0x628/0x1620 [mdt] [<ffffffffa0a6a5b5>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa062c00c>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc] [<ffffffffa0e325de>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa0623739>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81052223>] ? __wake_up+0x53/0x70 [<ffffffffa062d556>] ptlrpc_main+0xb76/0x1870 [ptlrpc] [<ffffffffa062c9e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffffa062c9e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffffa062c9e0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 This usually happens within a few seconds of starting racer. Whereas with my condition I survive this modified racer. |
| Comment by Andreas Dilger [ 13/Feb/13 ] |
|
I've submitted http://review.whamcloud.com/5424 to disable the setstripe in racer/file_create.sh, unless "RACER_SETSTRIPE=true" is set in the environment. That will allow adding racer to the review test list, but also allow testing these patches as needed via "Test-Parameters: envdefinitions=RACER_SETSTRIPE=true". Please delete the exception for RACER_SETSTRIPE from file_create.sh when this patch and |
| Comment by Jodi Levi (Inactive) [ 13/Mar/13 ] |
|
Change/5417 landed to master. |