Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.1.6
-
3
-
11946
Description
Hi,
At IFERC customer site, 7 compute nodes crashed with the following message in the console:
2013-11-21 00:57:45 LustreError: 92325:0:(llite_lib.c:1683:ll_update_inode()) ASSERTION( lu_fid_eq(&lli->lli_fid, &body->fid1) ) failed: Trying to change FID [0x217294ce4:0x107f0:0x0] to the [0x217294ce4:0x107f1:0x0], inode 150634522759727089/35072332(ffff8807dcbf85f8) 2013-11-21 00:57:45 LustreError: 92325:0:(llite_lib.c:1683:ll_update_inode()) LBUG 2013-11-21 00:57:45 Pid: 92325, comm: writer_v131 2013-11-21 00:57:45 2013-11-21 00:57:45 Call Trace: 2013-11-21 00:57:45 [<ffffffffa046f7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2013-11-21 00:57:45 [<ffffffffa046fe07>] lbug_with_loc+0x47/0xb0 [libcfs] 2013-11-21 00:57:45 [<ffffffffa0a91ca0>] ll_update_inode+0x4a0/0xf60 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a928ea>] ll_prep_inode+0x18a/0xae0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a7c8c3>] ll_intent_file_open+0x563/0xb80 [lustre] 2013-11-21 00:57:45 [<ffffffffa0aa6a90>] ? ll_md_blocking_ast+0x0/0x700 [lustre] 2013-11-21 00:57:45 [<ffffffff8108163e>] ? down+0x2e/0x50 2013-11-21 00:57:45 [<ffffffffa0a7cf67>] ll_lov_setstripe_ea_info+0x87/0x2b0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a831a5>] ll_lov_setstripe+0x85/0x5a0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0aa3e8b>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a84ac6>] ll_file_ioctl+0x826/0xe00 [lustre] 2013-11-21 00:57:45 [<ffffffff81179ff2>] vfs_ioctl+0x22/0xa0 2013-11-21 00:57:45 [<ffffffff8117a4ba>] do_vfs_ioctl+0x3aa/0x580 2013-11-21 00:57:45 [<ffffffff8117a711>] sys_ioctl+0x81/0xa0 2013-11-21 00:57:45 [<ffffffff8149970e>] ? do_device_not_available+0xe/0x10 2013-11-21 00:57:45 [<ffffffff810030f2>] system_call_fastpath+0x16/0x1b 2013-11-21 00:57:45 2013-11-21 00:57:45 Kernel panic - not syncing: LBUG 2013-11-21 00:57:45 Pid: 92325, comm: writer_v131 Tainted: G W --------------- 2.6.32-279.5.2.bl6.Bull.36.x86_64 #1 2013-11-21 00:57:45 Call Trace: 2013-11-21 00:57:45 [<ffffffff81495fe3>] ? panic+0xa0/0x168 2013-11-21 00:57:45 [<ffffffffa046fe5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 2013-11-21 00:57:45 [<ffffffffa0a91ca0>] ? ll_update_inode+0x4a0/0xf60 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a928ea>] ? ll_prep_inode+0x18a/0xae0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a7c8c3>] ? ll_intent_file_open+0x563/0xb80 [lustre] 2013-11-21 00:57:45 [<ffffffffa0aa6a90>] ? ll_md_blocking_ast+0x0/0x700 [lustre] 2013-11-21 00:57:45 [<ffffffff8108163e>] ? down+0x2e/0x50 2013-11-21 00:57:45 [<ffffffffa0a7cf67>] ? ll_lov_setstripe_ea_info+0x87/0x2b0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a831a5>] ? ll_lov_setstripe+0x85/0x5a0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0aa3e8b>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre] 2013-11-21 00:57:45 [<ffffffffa0a84ac6>] ? ll_file_ioctl+0x826/0xe00 [lustre] 2013-11-21 00:57:45 [<ffffffff81179ff2>] ? vfs_ioctl+0x22/0xa0 2013-11-21 00:57:45 [<ffffffff8117a4ba>] ? do_vfs_ioctl+0x3aa/0x580 2013-11-21 00:57:45 [<ffffffff8117a711>] ? sys_ioctl+0x81/0xa0 2013-11-21 00:57:45 [<ffffffff8149970e>] ? do_device_not_available+0xe/0x10 2013-11-21 00:57:45 [<ffffffff810030f2>] ? system_call_fastpath+0x16/0x1b
This issue looks like LU-2523 and LU-3311, but the patch for b2_1 has not made any progress since July.
I havetested with the following reproducer, given in LU-2523:
llmount.sh cd /mnt/lustre touch file1 In a single process do: struct lov_user_md_v3 *lum; /* Initialize lum */ fd2 = open("file2", O_RDWR|O_CREAT|O_LOV_DELAY_CREATE, 0666); rename("file1", "file2"); ioctl(fd2, LL_IOC_LOV_SETSTRIPE, lum);
With a stock 2.1.6 I can easily reproduce the issue. And unfortunately, with patch at http://review.whamcloud.com/6775 I am still able to hit the bug.
Thanks,
Sebastien.
As you can see, setstripe is done via ioctl on an opened file handle, but in the code setstripe is implemented as an open (so it's actually a re-open), this looks should succeed, but current MDS code doesn't allow re-open or create OST object for unlinked file. However there is no posix standard for setstripe call, this can be regarded as normal, but it should be documented somewhere IMO.