Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
Lustre 2.1.1
-
None
-
Hyperion - RHEL 5
-
3
-
6445
Description
mds-recovery fails, a single client is evicted.
Client:
---------------
Lustre: lustre-MDT0000-mdc-ffff81021ccdac00: Connection restored to service lustre-MDT0000 using nid 192.168.120.126@o2ib. Lustre: DEBUG MARKER: mds has failed over 2 times, and counting... LustreError: 11-0: an error occurred while communicating with 192.168.120.126@o2ib. The ldlm_enqueue operation failed with -107 Lustre: lustre-MDT0000-mdc-ffff81021ccdac00: Connection to service lustre-MDT0000 via nid 192.168.120.126@o2ib was lost; in progress operations using this service w ill wait for recovery to complete. LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. Lustre: Server lustre-MDT0000_UUID version (2.1.1.0) is much newer than client version (1.8.7) LustreError: 20567:0:(mdc_locks.c:652:mdc_enqueue()) ldlm_cli_enqueue error: -4 LustreError: 20567:0:(file.c:3329:ll_inode_revalidate_fini()) failure -4 inode 222298113 LustreError: 20742:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff810217f9ec00 x1394757178766278/t0 o101->lustre-MDT0000_UUID@192.168.120.126@o 2ib:12/10 lens 544/1232 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0 Lustre: lustre-MDT0000-mdc-ffff81021ccdac00: Connection restored to service lustre-MDT0000 using nid 192.168.120.126@o2ib. Lustre: DEBUG MARKER: Duration: 86400 LustreError: 17920:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) 192.168.117.3@o2ib rejected: o2iblnd fatal error LustreError: 17920:0:(o2iblnd_cb.c:2532:kiblnd_rejected()) Skipped 39 previous similar messages
MDS
Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failover -- failure NOT OK Lustre: lustre-MDT0000: sending delayed replies to recovered clients Lustre: 25439:0:(mds_lov.c:1024:mds_notify()) MDS mdd_obd-lustre-MDT0000: in recovery, not resetting orphans on lustre-OST0000_UUID Lustre: 25439:0:(mds_lov.c:1024:mds_notify()) Skipped 7 previous similar messages Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0004_UUID now active, resetting orphans Lustre: Skipped 7 previous similar messages Lustre: DEBUG MARKER: mds has failed over 2 times, and counting... md: rebuild md1 throttled due to IO LustreError: 0:0:(ldlm_lockd.c:356:waiting_locks_callback()) ### lock callback timer expired after 150s: evicting client at 192.168.114.116@o2ib ns: mdt-ffff81091498f800 lock: ffff810fbf9f66c0/0xcb280298ce1d3c25 lrc: 3/0,0 mode: PR/PR res: 222298113/3922531948 bits 0x3 rrc: 217 type: IBT flags: 0x20 remote: 0x3c5e7588abacbec3 expref: 8 pid: 25553 timeout: 4299068451 LustreError: 0:0:(ldlm_lockd.c:356:waiting_locks_callback()) ### lock callback timer expired after 150s: evicting client at 192.168.114.51@o2ib ns: mdt-ffff81091498f800 lock: ffff810fbf9f6480/0xcb280298ce1d3c17 lrc: 3/0,0 mode: PR/PR res: 222298113/3922531948 bits 0x3 rrc: 217 type: IBT flags: 0x20 remote: 0x5711f697b9a89693 expref: 8 pid: 25553 timeout: 4299068451 LustreError: 25588:0:(ldlm_lockd.c:1210:ldlm_handle_enqueue0()) ### lock on destroyed export ffff81054ec6c000 ns: mdt-ffff81091498f800 lock: ffff810cef6a4480/0xcb280298ce1d3f2e lrc: 3/0,0 mode: PR/PR res: 222298113/3922531948 bits 0x3 rrc: 193 type: IBT flags: 0x4000000 remote: 0xfb40c962a891f585 expref: 3 pid: 25588 timeout: 0 LustreError: 25588:0:(ldlm_lib.c:2129:target_send_reply_msg()) @@@ processing error (-107) req@ffff810397453000 x1394757210221710/t0(0) o-1->7a66717e-dbe2-1092-ecee-6263c3bca713@NET_0x50000c0a8728f_UUID:0/0 lens 544/536 e 2 to 0 dl 1330146049 ref 1 fl Interpret:/ffffffff/ffffffff rc -107/-1 LustreError: 25616:0:(ldlm_lockd.c:1210:ldlm_handle_enqueue0()) ### lock on destroyed export ffff810550486000 ns: mdt-ffff81091498f800 lock: ffff8105542e5d80/0xcb280298ce1d40d2 lrc: 3/0,0 mode: PR/PR res: 222298113/3922531948 bits 0x3 rrc: 168 type: IBT flags: 0x4000000 remote: 0x3c5e7588abacbed1 expref: 3 pid: 25616 timeout: 0 LustreError: 25588:0:(ldlm_lib.c:2129:target_send_reply_msg()) Skipped 96 previous similar messages Lustre: 25570:0:(ldlm_lib.c:877:target_handle_connect()) lustre-MDT0000: connection from bb5f6103-fd47-8201-1084-9a41a87168fe@192.168.114.116@o2ib t8590090887 exp 0000000000000000 cur 1330145971 last 0 Lustre: 25570:0:(ldlm_lib.c:877:target_handle_connect()) Skipped 127 previous similar messages Lustre: 25582:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import lustre-MDT0000->NET_0x50000c0a87291_UUID netid 50000: select flavor null Lustre: 25582:0:(sec.c:1474:sptlrpc_import_sec_adapt()) Skipped 136 previous similar messages Lustre: DEBUG MARKER: Duration: 86400 md: rebuild md1 throttled due to IO md: rebuild md1 throttled due to IO md: rebuild md1 throttled due to IO