[LU-5968] racer test 1: mv operation hung Created: 02/Dec/14  Updated: 25/Jan/19  Resolved: 25/Jan/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: Joseph Gmitter (Inactive)
Resolution: Won't Fix Votes: 0
Labels: 22pl, dne, mq115
Environment:

Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/104/
MDSCOUNT=2


Issue Links:
Related
is related to LU-6085 racer stuck on mutex_lock in ll_setat... Resolved
Severity: 3
Rank (Obsolete): 16671

 Description   

While running racer test with MDSCOUNT=2, mv operation hung on client node as follows:

LustreError: 29294:0:(xattr.c:510:ll_getxattr()) server bug: replied size 56 > 32 for 11 (trusted.lov)
INFO: task mv:26147 blocked for more than 120 seconds. 
      Not tainted 2.6.32-431.29.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
mv            D 0000000000000000     0 26147  18764 0x00000080
 ffff88006b603cd8 0000000000000086 ffff88005fd60dc0 0000000000000000
 0000000000016840 ffff88005fd60dc0 ffffffff8100b9ce ffff88006b603cd8
 ffff880064a0baf8 ffff88006b603fd8 000000000000fbc8 ffff880064a0baf8
Call Trace:
 [<ffffffff8100b9ce>] ? common_interrupt+0xe/0x13
 [<ffffffff8105546b>] ? mutex_spin_on_owner+0x9b/0xc0
 [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50
 [<ffffffffa1cdf99e>] ll_setattr_raw+0x2ee/0x1070 [lustre] 
 [<ffffffff81078fd7>] ? current_fs_time+0x27/0x30
 [<ffffffffa1ce0785>] ll_setattr+0x65/0xd0 [lustre] 
 [<ffffffff811a7ca8>] notify_change+0x168/0x340
 [<ffffffff8119b502>] ? user_path_at+0x62/0xa0
 [<ffffffff811862be>] chown_common+0x6e/0x90
 [<ffffffff8118658f>] sys_fchownat+0xbf/0xe0
 [<ffffffff811865d0>] sys_lchown+0x20/0x30
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Maloo reports:
https://testing.hpdd.intel.com/test_sets/28604232-79df-11e4-807e-5254006e85c2
https://testing.hpdd.intel.com/test_sets/b9d50256-7a21-11e4-bb7c-5254006e85c2

The failure was previously reported in LU-4105, which was fixed by the patch for LU-5144. However, the failure still occurred after the patch for LU-5144 was landed on Lustre b2_5 branch. So I create this new ticket to track the issue.



 Comments   
Comment by Andreas Dilger [ 03/Dec/14 ]

Is this problem also present on master?

Comment by Jian Yu [ 04/Dec/14 ]

Is this problem also present on master?

On master branch, racer test failed with LU-5915 under DNE configuration.

Comment by Gerrit Updater [ 16/Jan/15 ]

Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/13433
Subject: LU-5968 mdt: return valid attribute only to client
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: b2b57084ed1a35d82ee1e0a80071a71143dc5fbf

Comment by Jinshan Xiong (Inactive) [ 08/Feb/18 ]

This patch may still be needed

Comment by Joseph Gmitter (Inactive) [ 25/Jan/19 ]

No plan to push a patch to master as this is not being hit.

Generated at Sat Feb 10 01:56:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.