Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10721

Interop 2.10.3<->master recovery-small test_115a: BAD READ CHECKSUM: should have changed on the client or in transit

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.11.0, Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      recovery-small test_115a - Timeout occurred after 212 mins, last suite running was recovery-small, restarting cluster to continue tests
      ^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^

      This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/4ee5d662-12aa-11e8-a6ad-52540065bddc

      test_115a failed with the following error:

      Timeout occurred after 212 mins, last suite running was recovery-small, restarting cluster to continue tests
      

      client: lustre-master tag-2.10.58
      server: 2.10.3

      OST console

      [ 9177.783390] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == recovery-small test 115a: read: late REQ MDunlink and no bulk ===================================== 10:08:44 \(1518689324\)
      [ 9177.969217] Lustre: DEBUG MARKER: == recovery-small test 115a: read: late REQ MDunlink and no bulk ===================================== 10:08:44 (1518689324)
      [ 9178.139373] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null ||
      				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
      				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
      [ 9179.546836] Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0x8000051a
      [ 9182.363552] Lustre: *** cfs_fail_loc=51a, val=0***
      [ 9212.794207] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting
      [ 9213.793456] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [ 9253.893995] LNet: Service thread pid 18623 was inactive for 40.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [ 9253.895896] Pid: 18623, comm: ll_ost_io00_012
      [ 9253.896368] 
      Call Trace:
      [ 9253.896822]  [<ffffffff816ab6b9>] schedule+0x29/0x70
      [ 9253.897357]  [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0
      [ 9253.898152]  [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10
      [ 9253.898933]  [<ffffffffc09d247e>] target_bulk_io+0x4ae/0xab0 [ptlrpc]
      [ 9253.899635]  [<ffffffff810c6440>] ? default_wake_function+0x0/0x20
      [ 9253.900476]  [<ffffffffc0a7c482>] tgt_brw_read+0xf32/0x1850 [ptlrpc]
      [ 9253.901189]  [<ffffffffc0a16dc3>] ? lustre_pack_reply_v2+0x183/0x280 [ptlrpc]
      [ 9253.902042]  [<ffffffffc0a16f2f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc]
      [ 9253.902824]  [<ffffffffc09cfe60>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
      [ 9253.903575]  [<ffffffffc0a79da5>] tgt_request_handle+0x925/0x1370 [ptlrpc]
      [ 9253.904421]  [<ffffffffc0a22b16>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
      [ 9253.905276]  [<ffffffffc05d7bc7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 9253.906048]  [<ffffffffc0a26252>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [ 9253.906800]  [<ffffffffc0a257c0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
      [ 9253.907481]  [<ffffffff810b252f>] kthread+0xcf/0xe0
      [ 9253.908062]  [<ffffffff810b2460>] ? kthread+0x0/0xe0
      [ 9253.908623]  [<ffffffff816b8798>] ret_from_fork+0x58/0x90
      [ 9253.909214]  [<ffffffff810b2460>] ? kthread+0x0/0xe0
      
      [ 9253.910003] LustreError: dumping log to /tmp/lustre-log.1518689399.18623
      [ 9313.794986] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff88006133d450 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:170/0 lens 608/432 e 4 to 0 dl 1518689465 ref 1 fl Interpret:/0/0 rc 0/0
      [ 9313.797912] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [ 9313.799353] LNet: Service thread pid 18623 completed after 100.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [ 9320.792161] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting
      [ 9320.794621] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [ 9420.796985] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff880060b31450 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:277/0 lens 608/432 e 0 to 0 dl 1518689572 ref 1 fl Interpret:/2/0 rc 0/0
      [ 9420.799665] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [ 9427.795775] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [ 9527.797989] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff88006027b850 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:384/0 lens 608/432 e 0 to 0 dl 1518689679 ref 1 fl Interpret:/2/0 rc 0/0
      [ 9527.800790] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [ 9534.798632] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting
      [ 9534.799948] Lustre: Skipped 1 previous similar message
      [ 9534.800610] Lustre: lustre-OST0000: Connection restored to de427d50-d7c3-9c4d-0314-3a8b85bd9da6 (at 10.2.8.124@tcp)
      [ 9534.801699] Lustre: Skipped 117 previous similar messages
      [ 9534.803081] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [ 9634.804986] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff88006128b850 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:491/0 lens 608/432 e 0 to 0 dl 1518689786 ref 1 fl Interpret:/2/0 rc 0/0
      [ 9634.807683] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [ 9641.800411] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [ 9741.802995] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff88005d038c50 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:598/0 lens 608/432 e 0 to 0 dl 1518689893 ref 1 fl Interpret:/2/0 rc 0/0
      [ 9741.805670] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [ 9748.800033] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [ 9848.801984] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff880060638850 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:705/0 lens 608/432 e 0 to 0 dl 1518690000 ref 1 fl Interpret:/2/0 rc 0/0
      [ 9848.804674] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [ 9855.800100] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting
      [ 9855.801538] Lustre: Skipped 2 previous similar messages
      [ 9855.803027] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [ 9955.804985] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff880060786050 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:57/0 lens 608/432 e 0 to 0 dl 1518690107 ref 1 fl Interpret:/2/0 rc 0/0
      [ 9955.807591] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [ 9962.801022] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [10062.802989] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff880068906050 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:164/0 lens 608/432 e 0 to 0 dl 1518690214 ref 1 fl Interpret:/2/0 rc 0/0
      [10062.805909] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [10069.802829] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [10169.804997] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff88006133e050 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:271/0 lens 608/432 e 0 to 0 dl 1518690321 ref 1 fl Interpret:/2/0 rc 0/0
      [10169.807940] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [10176.807295] Lustre: lustre-OST0000: Connection restored to de427d50-d7c3-9c4d-0314-3a8b85bd9da6 (at 10.2.8.124@tcp)
      [10176.808453] Lustre: Skipped 5 previous similar messages
      [10283.808108] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4)
      [10283.810562] LustreError: Skipped 1 previous similar message
      [10383.810991] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff88005c641c50 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:485/0 lens 608/432 e 0 to 0 dl 1518690535 ref 1 fl Interpret:/2/0 rc 0/0
      [10383.813599] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) Skipped 1 previous similar message
      [10383.814654] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110
      [10383.816086] Lustre: Skipped 1 previous similar message
      [10390.807059] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: