Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15185

sanityn test_77c: Error: 'dd (write) failed (2)'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Chris Horn <hornc@cray.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a207aa40-96c4-434c-bd4d-eef570e96859

      test_77c failed with the following error:

      dd (write) failed (2)
      
      trevis-54vm1: 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00923709 s, 114 MB/s
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6332:error()
        = /usr/lib64/lustre/tests/sanityn.sh:3735:nrs_write_read()
        = /usr/lib64/lustre/tests/sanityn.sh:3794:orr_trr()
        = /usr/lib64/lustre/tests/sanityn.sh:3819:test_77c()
        = /usr/lib64/lustre/tests/test-framework.sh:6636:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:6683:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:6524:run_test()
        = /usr/lib64/lustre/tests/sanityn.sh:3824:main()
      Dumping lctl log to /autotest/autotest-2/2021-10-29/lustre-reviews_review-dne-zfs-part-5_83998_1_14_304f7726-8602-463b-b32a-48a2b320195d//sanityn.test_77c.*.1635570109.log
      CMD: trevis-54vm1.trevis.whamcloud.com,trevis-54vm2,trevis-54vm3,trevis-54vm4,trevis-54vm5 /usr/sbin/lctl dk > /autotest/autotest-2/2021-10-29/lustre-reviews_review-dne-zfs-part-5_83998_1_14_304f7726-8602-463b-b32a-48a2b320195d//sanityn.test_77c.debug_log.\$(hostname -s).1635570109.log;
      		dmesg > /autotest/autotest-2/2021-10-29/lustre-reviews_review-dne-zfs-part-5_83998_1_14_304f7726-8602-463b-b32a-48a2b320195d//sanityn.test_77c.dmesg.\$(hostname -s).1635570109.log
      

      Cluster hit some network errors. Not clear why:

      [17983.580383] Lustre: DEBUG MARKER: declare -a pids_r;
                     		for ((i = 0; i lustre-OST0003-osc-ffff8ae7a6f75800@10.9.6.102@tcp:17/18 lens 328/224 e 0 to 1 dl 1635570078 ref 2 fl Rpc:Xr/0/ffffffff rc 0/-1 job:'ldlm_bl_09.0'
      [18040.721602] Lustre: lustre-OST0003-osc-ffff8ae7a6f75800: Connection to lustre-OST0003 (at 10.9.6.102@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [18057.099196] Lustre: 196580:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1635570036/real 0]  req@00000000d56fcfbd x1715011554636160/t0(0) o400->lustre-OST0000-osc-ffff8ae7a6f75800@10.9.6.102@tcp:28/4 lens 224/224 e 0 to 1 dl 1635570093 ref 2 fl Rpc:XNr/0/ffffffff rc 0/-1 job:'kworker/u4:1.0'
      [18057.104557] Lustre: 196580:0:(client.c:2290:ptlrpc_expire_one_request()) Skipped 1 previous similar message
      [18057.106345] Lustre: lustre-OST0000-osc-ffff8ae7a6f75800: Connection to lustre-OST0000 (at 10.9.6.102@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [18067.338589] Lustre: 196579:0:(client.c:2290:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1635570046/real 0]  req@00000000a0cd1f2a x1715011554638208/t0(0) o400->lustre-OST0000-osc-ffff8ae7a6f75800@10.9.6.102@tcp:28/4 lens 224/224 e 0 to 1 dl 1635570104 ref 2 fl Rpc:XNr/0/ffffffff rc 0/-1 job:'kworker/u4:1.0'
      [18067.344444] Lustre: 196579:0:(client.c:2290:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
      [18067.346517] Lustre: lustre-OST0002-osc-ffff8ae7a6f75800: Connection to lustre-OST0002 (at 10.9.6.102@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [18067.349960] Lustre: Skipped 4 previous similar messages
      [18067.838060] Lustre: Evicted from lustre-OST0000_UUID (at 10.9.6.102@tcp) after server handle changed from 0x44b76394139bb0ba to 0x44b76394139c0ce7
      [18067.842456] LustreError: 167-0: lustre-OST0000-osc-ffff8ae7a6f75800: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
      [18067.850084] Lustre: 196577:0:(llite_lib.c:3360:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.6.103@tcp:/lustre/fid: [0x200000404:0xb32:0x0]// may get corrupted (rc -5)
      [18067.850086] Lustre: 196579:0:(llite_lib.c:3360:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.6.103@tcp:/lustre/fid: [0x200000404:0xb33:0x0]// may get corrupted (rc -5)
      [18067.869310] Lustre: lustre-OST0003-osc-ffff8ae7a6f75800: Connection restored to 10.9.6.102@tcp (at 10.9.6.102@tcp)
      [18067.878592] LustreError: 488862:0:(ldlm_resource.c:1124:ldlm_resource_complain()) lustre-OST0002-osc-ffff8ae7a6f75800: namespace resource [0x184:0x0:0x0].0x0 (000000008626a371) refcount nonzero (1) after lock cleanup; forcing cleanup.
      [18067.885709] LustreError: 488862:0:(ldlm_resource.c:1124:ldlm_resource_complain()) Skipped 3 previous similar messages
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanityn test_77c - dd (write) failed (2)

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: