[LU-8807] racer test_1: (layout.c:2062:__req_capsule_get()) LBUG Created: 07/Nov/16 Updated: 21/Dec/16 Resolved: 21/Dec/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Full - EL7.2 Server/EL7.2 Client - DNE |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e9d2ad98-a26d-11e6-bf05-5254006e85c2. The sub-test test_1 failed with the following error: test failed to respond and timed out client test_log: 04:31:40:[53110.788399] LustreError: 20828:0:(ldlm_resource.c:874:ldlm_resource_complain()) lustre-OST0001-osc-ffff88004691b800: namespace resource [0xa6:0x0:0x0].0x0 (ffff880079103b40) refcount nonzero (1) after lock cleanup; forcing cleanup. 04:31:40:[53110.793207] LustreError: 20828:0:(ldlm_resource.c:1455:ldlm_resource_dump()) --- Resource: [0xa6:0x0:0x0].0x0 (ffff880079103b40) refcount = 2 04:31:40:[53110.797501] LustreError: 20828:0:(ldlm_resource.c:1458:ldlm_resource_dump()) Granted locks (in reverse order): 04:31:40:[53110.799997] LustreError: 20828:0:(ldlm_resource.c:1461:ldlm_resource_dump()) ### ### ns: lustre-OST0001-osc-ffff88004691b800 lock: ffff88004c9f8200/0x63275ddd857fc6d lrc: 3/0,1 mode: PW/PW res: [0xa6:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x526400000000 nid: local remote: 0x6943ada1f873b7a5 expref: -99 pid: 1080 timeout: 0 lvb_type: 1 04:31:40:[53510.956205] Lustre: 27937:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478172086/real 1478172086] req@ffff880046af1500 x1549972390972608/t0(0) o36->lustre-MDT0001-mdc-ffff8800415bc000@10.2.4.176@tcp:12/10 lens 872/952 e 5 to 1 dl 1478172687 ref 2 fl Rpc:X/0/ffffffff rc 0/-1 04:31:40:[53510.964116] Lustre: lustre-MDT0001-mdc-ffff8800415bc000: Connection to lustre-MDT0001 (at 10.2.4.176@tcp) was lost; in progress operations using this service will wait for recovery to complete 04:31:40:[53510.969142] Lustre: Skipped 1 previous similar message 04:31:40:[53510.976402] Lustre: lustre-MDT0001-mdc-ffff8800415bc000: Connection restored to 10.2.4.176@tcp (at 10.2.4.176@tcp) 04:31:40:[53510.979289] Lustre: Skipped 7 previous similar messages 04:31:40:[53518.256263] LustreError: 11171:0:(layout.c:2062:__req_capsule_get()) ASSERTION( msg != ((void *)0) ) failed: 04:31:40:[53518.263177] LustreError: 11171:0:(layout.c:2062:__req_capsule_get()) LBUG 04:31:40:[53518.266145] Pid: 11171, comm: lfs 04:31:40:[53518.268672] |
| Comments |
| Comment by Peter Jones [ 08/Nov/16 ] |
|
Niu Could you please advise on this issue? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 09/Nov/16 ] |
|
ll_migrate() tries to read reply buffer without checking if the request is replied successfully, I think the fix of |
| Comment by Gerrit Updater [ 09/Nov/16 ] |
|
Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/23666 |
| Comment by Gerrit Updater [ 21/Dec/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23666/ |
| Comment by Minh Diep [ 21/Dec/16 ] |
|
Landed in 2.10 |