Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0, Lustre 1.8.9
-
Lustre Client: b1_8
Lustre Client Build: http://build.whamcloud.com/job/lustre-b1_8/245/
Lustre Server: master
Lustre Server Build: http://build.whamcloud.com/job/lustre-master/1172/
Distro/Arch: RHEL6.3/x86_64
-
3
-
6181
Description
While running runtests test on Lustre b1_8 clients with master servers, it failed as follows:
copying /etc/hosts to /mnt/lustre/hosts.9085 again cp: writing `/mnt/lustre/hosts.9085': Input/output error runtests : @@@@@@ FAIL: can't cp /etc/hosts to /mnt/lustre/hosts.9085 again 6
Dmesg on the client node client-12vm1 showed that:
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.9085 again LustreError: 11-0: an error occurred while communicating with 10.10.4.209@tcp. The obd_ping operation failed with -107 Lustre: lustre-OST0000-osc-ffff88007cea8800: Connection to service lustre-OST0000 via nid 10.10.4.209@tcp was lost; in progress operations using this service will wait for recovery to complete. LustreError: 167-0: This client was evicted by lustre-OST0000; in progress operations using this service will fail. Lustre: Server lustre-OST0000_UUID version (2.3.58.0) is much newer than client version (1.8.8.60) Lustre: Skipped 7 previous similar messages LustreError: 10880:0:(ldlm_resource.c:519:ldlm_namespace_cleanup()) Namespace lustre-OST0000-osc-ffff88007cea8800 resource refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 10880:0:(ldlm_resource.c:524:ldlm_namespace_cleanup()) Resource: ffff88007b9de380 (1/0/0/0) (rc: 1) Lustre: lustre-OST0000-osc-ffff88007cea8800: Connection restored to service lustre-OST0000 using nid 10.10.4.209@tcp. LustreError: 10869:0:(lov_request.c:211:lov_update_enqueue_set()) enqueue objid 0x2 subobj 0x1 on OST idx 0: rc -5 Lustre: DEBUG MARKER: /usr/sbin/lctl mark runtests : @@@@@@ FAIL: can\'t cp \/etc\/hosts to \/mnt\/lustre\/hosts.9085 again 6
Dmesg on the OSS node client-12vm4 showed that:
Lustre: DEBUG MARKER: copying /etc/hosts to /mnt/lustre/hosts.9085 again LustreError: 0:0:(ldlm_lockd.c:391:waiting_locks_callback()) ### lock callback timer expired after 151s: evicting client at 10.10.4.206@tcp ns: filter-ffff880037b92000 lock: ffff88007b5f6000/0xf7c07c2f873c39f lrc: 3/0,0 mode: PR/PR res: 1/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 nid: 10.10.4.206@tcp remote: 0x904b6ab232a7b36 expref: 5 pid: 4268 timeout: 4296500377 lvb_type: 1 Lustre: DEBUG MARKER: /usr/sbin/lctl mark runtests : @@@@@@ FAIL: can\'t cp \/etc\/hosts to \/mnt\/lustre\/hosts.9085 again 6
Maloo report: https://maloo.whamcloud.com/test_sets/9ad0fc8a-6181-11e2-be04-52540035b04c