[LU-6903] racer file migration crash ASSERTION( lov->lo_type == LLT_RAID0 ) Created: 25/Jul/15 Updated: 03/Jun/16 Resolved: 31/Aug/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Oleg Drokin | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
While testing http://review.whamcloud.com/#/c/13669/ I hit what appears to be lfs migration related crash (crashing lfs was called from file_migrate.sh): <0>[117918.259129] LustreError: 14696:0:(lov_cl_internal.h:750:lov_r0()) ASSERTION( lov->lo_type == LLT_RAID0 ) failed: <0>[117918.259574] LustreError: 14696:0:(lov_cl_internal.h:750:lov_r0()) LBUG <4>[117918.259892] Pid: 14696, comm: lfs <4>[117918.260049] <4>[117918.260049] Call Trace: <4>[117918.260399] [<ffffffffa06f8000>] ? return_if_equal+0x0/0x30 [lustre] <4>[117918.260657] [<ffffffffa03f9885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4>[117918.260864] [<ffffffffa03f9e87>] lbug_with_loc+0x47/0xb0 [libcfs] <4>[117918.261095] [<ffffffffa18c8fc3>] lov_find_cbdata_raid0+0xc3/0x100 [lov] <4>[117918.261307] [<ffffffffa06f8000>] ? return_if_equal+0x0/0x30 [lustre] <4>[117918.261526] [<ffffffffa18c6f1a>] lov_object_find_cbdata+0x4a/0x120 [lov] <4>[117918.261721] [<ffffffffa040df2b>] ? cfs_hash_add_unique+0x1b/0x40 [libcfs] <4>[117918.261931] [<ffffffffa14edb3b>] cl_object_find_cbdata+0x6b/0x120 [obdclass] <4>[117918.262239] [<ffffffffa06f978c>] ll_d_iput+0x10c/0x540 [lustre] <4>[117918.262460] [<ffffffff811a9149>] dentry_iput+0x89/0x110 <4>[117918.262629] [<ffffffff811a92c1>] d_kill+0x31/0x60 <4>[117918.262793] [<ffffffff811aaf2c>] dput+0x7c/0x160 <4>[117918.262957] [<ffffffff81190f3b>] __fput+0x1bb/0x280 <4>[117918.263124] [<ffffffff81191025>] fput+0x25/0x30 <4>[117918.263308] [<ffffffff8118c01d>] filp_close+0x5d/0x90 <4>[117918.263511] [<ffffffff8118c109>] sys_close+0xb9/0x120 <4>[117918.263675] [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b <4>[117918.263860] <0>[117918.337379] Kernel panic - not syncing: LBUG <4>[117918.337572] Pid: 14696, comm: lfs Tainted: P --------------- 2.6.32-rhe6.6-debug #1 <4>[117918.337918] Call Trace: <4>[117918.338100] [<ffffffff8151dcd9>] ? panic+0xa7/0x16f <4>[117918.338282] [<ffffffffa03f9edb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4>[117918.338474] [<ffffffffa18c8fc3>] ? lov_find_cbdata_raid0+0xc3/0x100 [lov] <4>[117918.338676] [<ffffffffa06f8000>] ? return_if_equal+0x0/0x30 [lustre] <4>[117918.338862] [<ffffffffa18c6f1a>] ? lov_object_find_cbdata+0x4a/0x120 [lov] <4>[117918.339126] [<ffffffffa040df2b>] ? cfs_hash_add_unique+0x1b/0x40 [libcfs] <4>[117918.339361] [<ffffffffa14edb3b>] ? cl_object_find_cbdata+0x6b/0x120 [obdclass] <4>[117918.339666] [<ffffffffa06f978c>] ? ll_d_iput+0x10c/0x540 [lustre] <4>[117918.339847] [<ffffffff811a9149>] ? dentry_iput+0x89/0x110 <4>[117918.340091] [<ffffffff811a92c1>] ? d_kill+0x31/0x60 <4>[117918.340287] [<ffffffff811aaf2c>] ? dput+0x7c/0x160 <4>[117918.340455] [<ffffffff81190f3b>] ? __fput+0x1bb/0x280 <4>[117918.340626] [<ffffffff81191025>] ? fput+0x25/0x30 <4>[117918.340792] [<ffffffff8118c01d>] ? filp_close+0x5d/0x90 <4>[117918.340997] [<ffffffff8118c109>] ? sys_close+0xb9/0x120 <4>[117918.341237] [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b Crashump in /exports/crashdumps/192.168.10.210-2015-07-24-19:52:32 tag in my tree 20150723 I examined the object and it seems to be ok and the type is ok too, so likely a race. |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 27/Jul/15 ] |
|
The issue is obvious because lov_object_find_cbdata() doesn't hold layout type lock so the layout can change behind it. I will create a patch. |
| Comment by Gerrit Updater [ 27/Jul/15 ] |
|
Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15748 |
| Comment by Gerrit Updater [ 31/Aug/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15748/ |
| Comment by Joseph Gmitter (Inactive) [ 31/Aug/15 ] |
|
Landed for 2.8. |