[LU-8807] racer test_1: (layout.c:2062:__req_capsule_get()) LBUG Created: 07/Nov/16  Updated: 21/Dec/16  Resolved: 21/Dec/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Full - EL7.2 Server/EL7.2 Client - DNE
master, build# 3468


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e9d2ad98-a26d-11e6-bf05-5254006e85c2.

The sub-test test_1 failed with the following error:

test failed to respond and timed out

client test_log:

04:31:40:[53110.788399] LustreError: 20828:0:(ldlm_resource.c:874:ldlm_resource_complain()) lustre-OST0001-osc-ffff88004691b800: namespace resource [0xa6:0x0:0x0].0x0 (ffff880079103b40) refcount nonzero (1) after lock cleanup; forcing cleanup.
04:31:40:[53110.793207] LustreError: 20828:0:(ldlm_resource.c:1455:ldlm_resource_dump()) --- Resource: [0xa6:0x0:0x0].0x0 (ffff880079103b40) refcount = 2
04:31:40:[53110.797501] LustreError: 20828:0:(ldlm_resource.c:1458:ldlm_resource_dump()) Granted locks (in reverse order):
04:31:40:[53110.799997] LustreError: 20828:0:(ldlm_resource.c:1461:ldlm_resource_dump()) ### ### ns: lustre-OST0001-osc-ffff88004691b800 lock: ffff88004c9f8200/0x63275ddd857fc6d lrc: 3/0,1 mode: PW/PW res: [0xa6:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x526400000000 nid: local remote: 0x6943ada1f873b7a5 expref: -99 pid: 1080 timeout: 0 lvb_type: 1
04:31:40:[53510.956205] Lustre: 27937:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478172086/real 1478172086]  req@ffff880046af1500 x1549972390972608/t0(0) o36->lustre-MDT0001-mdc-ffff8800415bc000@10.2.4.176@tcp:12/10 lens 872/952 e 5 to 1 dl 1478172687 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
04:31:40:[53510.964116] Lustre: lustre-MDT0001-mdc-ffff8800415bc000: Connection to lustre-MDT0001 (at 10.2.4.176@tcp) was lost; in progress operations using this service will wait for recovery to complete
04:31:40:[53510.969142] Lustre: Skipped 1 previous similar message
04:31:40:[53510.976402] Lustre: lustre-MDT0001-mdc-ffff8800415bc000: Connection restored to 10.2.4.176@tcp (at 10.2.4.176@tcp)
04:31:40:[53510.979289] Lustre: Skipped 7 previous similar messages
04:31:40:[53518.256263] LustreError: 11171:0:(layout.c:2062:__req_capsule_get()) ASSERTION( msg != ((void *)0) ) failed: 
04:31:40:[53518.263177] LustreError: 11171:0:(layout.c:2062:__req_capsule_get()) LBUG
04:31:40:[53518.266145] Pid: 11171, comm: lfs
04:31:40:[53518.268672] 


 Comments   
Comment by Peter Jones [ 08/Nov/16 ]

Niu

Could you please advise on this issue?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 09/Nov/16 ]

ll_migrate() tries to read reply buffer without checking if the request is replied successfully, I think the fix of LU-7396 isn't complete.

Comment by Gerrit Updater [ 09/Nov/16 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/23666
Subject: LU-8807 llite: check reply status in ll_migrate()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ad151b8fa56d4560ca778df757ac4eb949fb38de

Comment by Gerrit Updater [ 21/Dec/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23666/
Subject: LU-8807 llite: check reply status in ll_migrate()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 056783782eab03b341c464c85ce4a803508e390b

Comment by Minh Diep [ 21/Dec/16 ]

Landed in 2.10

Generated at Sat Feb 10 02:20:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.