Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12656

sanity-flr test 200 fails with 'failed writing to *:*’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.13.0, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.5, Lustre 2.12.7
    • SLES12SP4, RHEL 8.0 and Ubuntu 18.04 clients
    • 3
    • 9223372036854775807

    Description

      sanity-flr test_200 fails with messages similar to 'failed writing to 1540352:850912'. It looks like this test is failing for “new” kernels running for Ubuntu 18.04 and RHEL8 clients. This issue started around 5 May 2019.

      Looking at the client test_log for https://testing.whamcloud.com/test_sets/f6a78ce6-b5bb-11e9-b88c-52540065bddc, we see a lot of resync failures and a write ‘input/output error’:

      == sanity-flr test 200: stress test ================================================================== 03:20:28 (1564716028)
      Starting client: trevis-39vm5.trevis.whamcloud.com:  -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre2
      CMD: trevis-39vm5.trevis.whamcloud.com mkdir -p /mnt/lustre2
      CMD: trevis-39vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre2
      Starting client: trevis-39vm5.trevis.whamcloud.com:  -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre3
      CMD: trevis-39vm5.trevis.whamcloud.com mkdir -p /mnt/lustre3
      CMD: trevis-39vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-39vm8@tcp:/lustre /mnt/lustre3
      fail_loc=0x1A03
      CMD: trevis-39vm8 /usr/sbin/lctl set_param fail_loc=0x1A03
      fail_loc=0x1A03
      resync file /mnt/lustre3/f200.sanity-flr with 'mirror_io resync -e resync_start' ..Extending file size to 2917280 ..
      Extending file size to 5634016 ..
      failed
      …
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      resync file /mnt/lustre3/f200.sanity-flr with '/usr/bin/lfs mirror resync' ..failed
      write: Input/output error
       sanity-flr test_200: @@@@@@ FAIL: failed writing to 1540352:850912 
       sanity-flr test_200: @@@@@@ FAIL: read failed 
      

      There are errors on the OSS console log:

      [33656.735317] Lustre: DEBUG MARKER: == sanity-flr test 200: stress test ================================================================== 03:20:28 (1564716028)
      [33746.465968] Lustre: lustre-OST0006: Client 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp) reconnecting
      [33746.468305] Lustre: Skipped 2 previous similar messages
      [33746.691754] LustreError: 21525:0:(ldlm_lib.c:3255:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff9f4601388050 x1640713887127712/t0(0) o3->8f96d56b-3831-a498-7a9f-598dabb943ba@10.9.5.223@tcp:548/0 lens 488/440 e 0 to 0 dl 1564716153 ref 1 fl Interpret:/0/0 rc 0/0
      [33746.695913] Lustre: lustre-OST0006: Bulk IO read error with 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp), client will retry: rc -110
      [33747.468651] Lustre: 32287:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1564716112/real 1564716112]  req@ffff9f45f8771f80 x1640688537737184/t0(0) o104->lustre-OST0001@10.9.5.223@tcp:15/16 lens 296/224 e 0 to 1 dl 1564716119 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
      [33747.473941] Lustre: 32287:0:(client.c:2134:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      [33752.042829] Lustre: lustre-OST0005: Client 8f96d56b-3831-a498-7a9f-598dabb943ba (at 10.9.5.223@tcp) reconnecting
      [33752.044858] Lustre: Skipped 2 previous similar messages
      [33797.383872] Lustre: lustre-OST0002: Client 9b851642-fac7-baed-c494-00770d32c258 (at 10.9.5.223@tcp) reconnecting
      [33797.384269] Lustre: lustre-OST0002: Connection restored to 577b503a-fd94-6a91-d5f9-abcccfabd52d (at 10.9.5.223@tcp)
      [33797.384271] Lustre: Skipped 63 previous similar messages
      [33797.388702] Lustre: Skipped 8 previous similar messages
      [33847.309939] LustreError: 30819:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 106s: evicting client at 10.9.5.223@tcp  ns: filter-lustre-OST0000_UUID lock: ffff9f45d47c4fc0/0xc82ebda2297890c1 lrc: 3/0,0 mode: PR/PR res: [0x480c:0x0:0x0].0x0 rrc: 6 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x60000400000020 nid: 10.9.5.223@tcp remote: 0xb3340d303f93846 expref: 9 pid: 19643 timeout: 33849 lvb_type: 1
      [33849.577110] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-flr test_200: @@@@@@ FAIL: failed writing to 1540352:850912 
      [33849.578735] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-flr test_200: @@@@@@ FAIL: read failed 
      

      Other test failures at
      https://testing.whamcloud.com/test_sets/e4e45b80-705e-11e9-bd0e-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: