[LU-5496] fix for LU-5266 Created: 15/Aug/14 Updated: 28/Apr/15 Resolved: 02/Oct/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.5.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Vitaly Fertman | Assignee: | Li Wei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 15337 | ||||||||
| Description |
|
the fix in |
| Comments |
| Comment by Vitaly Fertman [ 15/Aug/14 ] |
| Comment by Peter Jones [ 15/Aug/14 ] |
|
Thanks Vitaly! |
| Comment by Li Wei (Inactive) [ 20/Aug/14 ] |
|
For the record, one way this may manifest is like: Aug 19 19:13:34 lola-24 kernel: Lustre: 3728:0:(client.c:1926:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1408500806/real 1408500806] req@ffff880ff523c800 x1476487427817788/t0(0) o101->soaked-OST0000-osc-ffff8810329a9800@192.168.1.102@o2ib:28/4 lens 328/400 e 0 to 1 dl 1408500814 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 Aug 19 19:13:34 lola-24 kernel: Lustre: soaked-OST0000-osc-ffff8810329a9800: Connection to soaked-OST0000 (at 192.168.1.102@o2ib) was lost; in progress operations using this service will wait for recovery to complete Aug 19 19:13:34 lola-24 kernel: Lustre: soaked-OST0000-osc-ffff8810329a9800: Connection restored to soaked-OST0000 (at 192.168.1.102@o2ib) Aug 19 19:13:34 lola-24 kernel: LustreError: 11-0: soaked-OST0000-osc-ffff8810329a9800: Communicating with 192.168.1.102@o2ib, operation ldlm_enqueue failed with -12. On the OSS: 00010000:00010000:28.0:1408500814.882478:0:5651:0:(ldlm_lockd.c:1268:ldlm_handle_enqueue0()) @@@ found existing lock cookie 0x840d55bbc87132f5 req@ffff88082f115050 x1476487427817788/t0(0) o101->c1d7cd54-55f6-0482-0887-cf6de8216f19@192.168.1.124@o2ib1:0/0 lens 328/0 e 0 to 0 dl 1408500821 ref 1 fl Interpret:/2/ffffffff rc 0/-1 [...] 00010000:00000001:28.0:1408500814.882522:0:5651:0:(ldlm_lock.c:441:ldlm_lock_destroy_nolock()) Process leaving 00010000:00000001:28.0:1408500814.882523:0:5651:0:(ldlm_lock.c:1685:ldlm_lock_enqueue()) Process leaving via out (rc=4294967284 : 4294967284 : 0xfffffff4) 00010000:00000001:28.0:1408500814.882525:0:5651:0:(ldlm_lockd.c:1338:ldlm_handle_enqueue0()) Process leaving via out (rc=4294967284 : 4294967284 : 0xfffffff4) 00010000:00010000:28.0:1408500814.882529:0:5651:0:(ldlm_lockd.c:1422:ldlm_handle_enqueue0()) ### server-side enqueue handler, sending reply(err=-12, rc=-12) ns: filter-soaked-OST0000_UUID lock: ffff880823f1bbc0/0x840d55bbc87132f5 lrc: 1/0,0 mode: PW/PW res: [0x236111:0x0:0x0].0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x44000000000000 nid: 192.168.1.124@o2ib1 remote: 0xcc587d962f96a2f5 expref: 1019 pid: 5651 timeout: 0 lvb_type: 0 The ENOMEM comes from here in ldlm_lock_enqueue(): ldlm_resource_unlink_lock(lock);
if (res->lr_type == LDLM_EXTENT && lock->l_tree_node == NULL) {
if (node == NULL) {
ldlm_lock_destroy_nolock(lock);
GOTO(out, rc = -ENOMEM);
}
CFS_INIT_LIST_HEAD(&node->li_group);
ldlm_interval_attach(node, lock);
node = NULL;
}
|
| Comment by Peter Jones [ 28/Aug/14 ] |
|
Landed for 2.7 |
| Comment by Vitaly Fertman [ 28/Aug/14 ] |
|
heh, have not succeeded to submit 2nd version before the land, so a separate patch: |
| Comment by Peter Jones [ 28/Aug/14 ] |
|
Heh. ok. |
| Comment by Jodi Levi (Inactive) [ 02/Oct/14 ] |
|
Patches have landed to Master |