[LU-6903] racer file migration crash ASSERTION( lov->lo_type == LLT_RAID0 ) Created: 25/Jul/15  Updated: 03/Jun/16  Resolved: 31/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4840 Deadlock when truncating file during... Resolved
is related to LU-7073 racer with OST object migration hangs... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

While testing http://review.whamcloud.com/#/c/13669/ I hit what appears to be lfs migration related crash (crashing lfs was called from file_migrate.sh):

<0>[117918.259129] LustreError: 14696:0:(lov_cl_internal.h:750:lov_r0()) ASSERTION( lov->lo_type == LLT_RAID0 ) failed: 
<0>[117918.259574] LustreError: 14696:0:(lov_cl_internal.h:750:lov_r0()) LBUG
<4>[117918.259892] Pid: 14696, comm: lfs
<4>[117918.260049] 
<4>[117918.260049] Call Trace:
<4>[117918.260399]  [<ffffffffa06f8000>] ? return_if_equal+0x0/0x30 [lustre]
<4>[117918.260657]  [<ffffffffa03f9885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[117918.260864]  [<ffffffffa03f9e87>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[117918.261095]  [<ffffffffa18c8fc3>] lov_find_cbdata_raid0+0xc3/0x100 [lov]
<4>[117918.261307]  [<ffffffffa06f8000>] ? return_if_equal+0x0/0x30 [lustre]
<4>[117918.261526]  [<ffffffffa18c6f1a>] lov_object_find_cbdata+0x4a/0x120 [lov]
<4>[117918.261721]  [<ffffffffa040df2b>] ? cfs_hash_add_unique+0x1b/0x40 [libcfs]
<4>[117918.261931]  [<ffffffffa14edb3b>] cl_object_find_cbdata+0x6b/0x120 [obdclass]
<4>[117918.262239]  [<ffffffffa06f978c>] ll_d_iput+0x10c/0x540 [lustre]
<4>[117918.262460]  [<ffffffff811a9149>] dentry_iput+0x89/0x110
<4>[117918.262629]  [<ffffffff811a92c1>] d_kill+0x31/0x60
<4>[117918.262793]  [<ffffffff811aaf2c>] dput+0x7c/0x160
<4>[117918.262957]  [<ffffffff81190f3b>] __fput+0x1bb/0x280
<4>[117918.263124]  [<ffffffff81191025>] fput+0x25/0x30
<4>[117918.263308]  [<ffffffff8118c01d>] filp_close+0x5d/0x90
<4>[117918.263511]  [<ffffffff8118c109>] sys_close+0xb9/0x120
<4>[117918.263675]  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
<4>[117918.263860] 
<0>[117918.337379] Kernel panic - not syncing: LBUG
<4>[117918.337572] Pid: 14696, comm: lfs Tainted: P           ---------------    2.6.32-rhe6.6-debug #1
<4>[117918.337918] Call Trace:
<4>[117918.338100]  [<ffffffff8151dcd9>] ? panic+0xa7/0x16f
<4>[117918.338282]  [<ffffffffa03f9edb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
<4>[117918.338474]  [<ffffffffa18c8fc3>] ? lov_find_cbdata_raid0+0xc3/0x100 [lov]
<4>[117918.338676]  [<ffffffffa06f8000>] ? return_if_equal+0x0/0x30 [lustre]
<4>[117918.338862]  [<ffffffffa18c6f1a>] ? lov_object_find_cbdata+0x4a/0x120 [lov]
<4>[117918.339126]  [<ffffffffa040df2b>] ? cfs_hash_add_unique+0x1b/0x40 [libcfs]
<4>[117918.339361]  [<ffffffffa14edb3b>] ? cl_object_find_cbdata+0x6b/0x120 [obdclass]
<4>[117918.339666]  [<ffffffffa06f978c>] ? ll_d_iput+0x10c/0x540 [lustre]
<4>[117918.339847]  [<ffffffff811a9149>] ? dentry_iput+0x89/0x110
<4>[117918.340091]  [<ffffffff811a92c1>] ? d_kill+0x31/0x60
<4>[117918.340287]  [<ffffffff811aaf2c>] ? dput+0x7c/0x160
<4>[117918.340455]  [<ffffffff81190f3b>] ? __fput+0x1bb/0x280
<4>[117918.340626]  [<ffffffff81191025>] ? fput+0x25/0x30
<4>[117918.340792]  [<ffffffff8118c01d>] ? filp_close+0x5d/0x90
<4>[117918.340997]  [<ffffffff8118c109>] ? sys_close+0xb9/0x120
<4>[117918.341237]  [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b

Crashump in /exports/crashdumps/192.168.10.210-2015-07-24-19:52:32

tag in my tree 20150723

I examined the object and it seems to be ok and the type is ok too, so likely a race.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 27/Jul/15 ]

The issue is obvious because lov_object_find_cbdata() doesn't hold layout type lock so the layout can change behind it. I will create a patch.

Comment by Gerrit Updater [ 27/Jul/15 ]

Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/15748
Subject: LU-6903 lov: call lov_object_find_cbdata() inside lock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7e263f9d4e8f6f4ee3e6e3003dbdaf8815cc0c45

Comment by Gerrit Updater [ 31/Aug/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15748/
Subject: LU-6903 lov: call lov_object_find_cbdata() inside lock
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7198cfebaa43584b558aa5ae672fe62cbd737d1a

Comment by Joseph Gmitter (Inactive) [ 31/Aug/15 ]

Landed for 2.8.

Generated at Sat Feb 10 02:04:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.