[LU-7392] client evicted: namespace resource [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refcount nonzero (1) Created: 05/Nov/15 Updated: 08/Feb/18 Resolved: 08/Feb/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Error occurred during soak testing of build '20151104.1' on cluster lola (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151104.1). MDTs are fromated with ldiskfs and OSTs with zfs as storage backend. DNE is enabled. MDSes are configured in HA failover configuration. Sequence of events:
The problem might be related to Attached files:
|
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 05/Nov/15 ] |
|
Hi Jinshan, |
| Comment by Andreas Dilger [ 05/Nov/15 ] |
|
Peter thinks this may related to the patch http://review.whamcloud.com/15127 " |
| Comment by Frank Heckes (Inactive) [ 30/Nov/15 ] |
|
Sorry, for the delay. I'll take care to include the patch in the next soak build. |
| Comment by Frank Heckes (Inactive) [ 08/Jan/16 ] |
|
This error also occured for build '20160106' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160106) again: lola-26.log:Jan 7 19:52:06 lola-26 kernel: LustreError: 167-0: soaked-OST000f-osc-ffff8808301ba000: This client was evicted by soaked-OST000f; in progress operations using this service will fail. lola-26.log:Jan 7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:887:ldlm_resource_complain()) soaked-OST000f-osc-ffff8808301ba000: namespace resource [0x7c0000401:0x18fbb85:0x0].0x0 (ffff88006f2206c0) refcount nonzero (2) after lock cleanup; forcing cleanup. lola-26.log:Jan 7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x7c0000401:0x18fbb85:0x0].0x0 (ffff88006f2206c0) refcount = 3 lola-26.log:Jan 7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order): lola-26.log:Jan 7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### ### ns: soaked-OST000f-osc-ffff8808301ba000 lock: ffff880366080940/0xedab12f62583edad lrc: 3/0,1 mode: PW/PW res: [0x7c0000401:0x18fbb85:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x526400000000 nid: local remote: 0x6dd9a1f2125effbd expref: -99 pid: 129953 timeout: 0 lvb_type: 1 lola-26.log:Jan 7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1523:ldlm_resource_dump()) Waiting locks: lola-26.log:Jan 7 19:52:06 lola-26 kernel: LustreError: 130892:0:(ldlm_resource.c:1525:ldlm_resource_dump()) ### ### ns: soaked-OST000f-osc-ffff8808301ba000 lock: ffff880a0b5668c0/0xedab12f62583edbb lrc: 4/0,1 mode: --/PW res: [0x7c0000401:0x18fbb85:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x106400020000 nid: local remote: 0x6dd9a1f2125effc4 expref: -99 pid: 129954 timeout: 0 lvb_type: 1 OSS lola-5.log:Jan 7 19:51:46 lola-5 kernel: LustreError: 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.1.126@o2ib100 ns: filter-soaked-OST000f_UUID lock: ffff880341ac0300/0x6dd9a1f2125effbd lrc: 3/0,0 mode: PW/PW res: [0x7c0000401:0x18fbb85:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000000000020 nid: 192.168.1.126@o2ib100 remote: 0xedab12f62583edad expref: 6 pid: 18388 timeout: 4385304734 lvb_type: 0 lola-5.log:Jan 7 19:51:46 lola-5 kernel: LustreError: 17976:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8802b92190c0 x1522673517776812/t0(0) o105->soaked-OST000f@192.168.1.126@o2ib100:15/16 lens 360/224 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 lola-5.log:Jan 7 19:51:46 lola-5 kernel: LustreError: 17976:0:(ldlm_lockd.c:689:ldlm_handle_ast_error()) ### client (nid 192.168.1.126@o2ib100) failed to reply to completion AST (req status 0 rc -5), evict it ns: filter-soaked-OST000f_UUID lock: ffff8803e925c3c0/0x6dd9a1f2125effc4 lrc: 3/0,0 mode: PW/PW res: [0x7c0000401:0x18fbb85:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x40000000020000 nid: 192.168.1.126@o2ib100 remote: 0xedab12f62583edbb expref: 4 pid: 18388 timeout: 0 lvb_type: 0 lola-5.log:Jan 7 19:51:47 lola-5 kernel: LustreError: 17976:0:(ldlm_lockd.c:689:ldlm_handle_ast_error()) Skipped 4 previous similar messages |
| Comment by Åke Sandgren [ 11/Nov/16 ] |
|
Hi! Is there any progress on this problem? We are getting hit by a problem that judging from the error messages are a good match, except that there is no zfs involved here. We're running 2.8.56 + the fixes for (The clients also have the tentative fix for |
| Comment by Jinshan Xiong (Inactive) [ 08/Feb/18 ] |
|
close old tickets |