Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Chris Horn <hornc@cray.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a207aa40-96c4-434c-bd4d-eef570e96859
test_77c failed with the following error:
dd (write) failed (2)
trevis-54vm1: 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00923709 s, 114 MB/s Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:6332:error() = /usr/lib64/lustre/tests/sanityn.sh:3735:nrs_write_read() = /usr/lib64/lustre/tests/sanityn.sh:3794:orr_trr() = /usr/lib64/lustre/tests/sanityn.sh:3819:test_77c() = /usr/lib64/lustre/tests/test-framework.sh:6636:run_one() = /usr/lib64/lustre/tests/test-framework.sh:6683:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:6524:run_test() = /usr/lib64/lustre/tests/sanityn.sh:3824:main() Dumping lctl log to /autotest/autotest-2/2021-10-29/lustre-reviews_review-dne-zfs-part-5_83998_1_14_304f7726-8602-463b-b32a-48a2b320195d//sanityn.test_77c.*.1635570109.log CMD: trevis-54vm1.trevis.whamcloud.com,trevis-54vm2,trevis-54vm3,trevis-54vm4,trevis-54vm5 /usr/sbin/lctl dk > /autotest/autotest-2/2021-10-29/lustre-reviews_review-dne-zfs-part-5_83998_1_14_304f7726-8602-463b-b32a-48a2b320195d//sanityn.test_77c.debug_log.\$(hostname -s).1635570109.log; dmesg > /autotest/autotest-2/2021-10-29/lustre-reviews_review-dne-zfs-part-5_83998_1_14_304f7726-8602-463b-b32a-48a2b320195d//sanityn.test_77c.dmesg.\$(hostname -s).1635570109.log
Cluster hit some network errors. Not clear why:
[17983.580383] Lustre: DEBUG MARKER: declare -a pids_r;
for ((i = 0; i lustre-OST0003-osc-ffff8ae7a6f75800@10.9.6.102@tcp:17/18 lens 328/224 e 0 to 1 dl 1635570078 ref 2 fl Rpc:Xr/0/ffffffff rc 0/-1 job:'ldlm_bl_09.0'
[18040.721602] Lustre: lustre-OST0003-osc-ffff8ae7a6f75800: Connection to lustre-OST0003 (at 10.9.6.102@tcp) was lost; in progress operations using this service will wait for recovery to complete
[18057.099196] Lustre: 196580:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1635570036/real 0] req@00000000d56fcfbd x1715011554636160/t0(0) o400->lustre-OST0000-osc-ffff8ae7a6f75800@10.9.6.102@tcp:28/4 lens 224/224 e 0 to 1 dl 1635570093 ref 2 fl Rpc:XNr/0/ffffffff rc 0/-1 job:'kworker/u4:1.0'
[18057.104557] Lustre: 196580:0:(client.c:2290:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[18057.106345] Lustre: lustre-OST0000-osc-ffff8ae7a6f75800: Connection to lustre-OST0000 (at 10.9.6.102@tcp) was lost; in progress operations using this service will wait for recovery to complete
[18067.338589] Lustre: 196579:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1635570046/real 0] req@00000000a0cd1f2a x1715011554638208/t0(0) o400->lustre-OST0000-osc-ffff8ae7a6f75800@10.9.6.102@tcp:28/4 lens 224/224 e 0 to 1 dl 1635570104 ref 2 fl Rpc:XNr/0/ffffffff rc 0/-1 job:'kworker/u4:1.0'
[18067.344444] Lustre: 196579:0:(client.c:2290:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
[18067.346517] Lustre: lustre-OST0002-osc-ffff8ae7a6f75800: Connection to lustre-OST0002 (at 10.9.6.102@tcp) was lost; in progress operations using this service will wait for recovery to complete
[18067.349960] Lustre: Skipped 4 previous similar messages
[18067.838060] Lustre: Evicted from lustre-OST0000_UUID (at 10.9.6.102@tcp) after server handle changed from 0x44b76394139bb0ba to 0x44b76394139c0ce7
[18067.842456] LustreError: 167-0: lustre-OST0000-osc-ffff8ae7a6f75800: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
[18067.850084] Lustre: 196577:0:(llite_lib.c:3360:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.6.103@tcp:/lustre/fid: [0x200000404:0xb32:0x0]// may get corrupted (rc -5)
[18067.850086] Lustre: 196579:0:(llite_lib.c:3360:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.6.103@tcp:/lustre/fid: [0x200000404:0xb33:0x0]// may get corrupted (rc -5)
[18067.869310] Lustre: lustre-OST0003-osc-ffff8ae7a6f75800: Connection restored to 10.9.6.102@tcp (at 10.9.6.102@tcp)
[18067.878592] LustreError: 488862:0:(ldlm_resource.c:1124:ldlm_resource_complain()) lustre-OST0002-osc-ffff8ae7a6f75800: namespace resource [0x184:0x0:0x0].0x0 (000000008626a371) refcount nonzero (1) after lock cleanup; forcing cleanup.
[18067.885709] LustreError: 488862:0:(ldlm_resource.c:1124:ldlm_resource_complain()) Skipped 3 previous similar messages
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanityn test_77c - dd (write) failed (2)