Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.11.0, Lustre 2.12.0
-
None
-
3
-
9223372036854775807
Description
recovery-small test_115a - Timeout occurred after 212 mins, last suite running was recovery-small, restarting cluster to continue tests
^^^^^^^^^^^^^ DO NOT REMOVE LINE ABOVE ^^^^^^^^^^^^^
This issue was created by maloo for sarah_lw <wei3.liu@intel.com>
This issue relates to the following test suite run:
https://testing.hpdd.intel.com/test_sets/4ee5d662-12aa-11e8-a6ad-52540065bddc
test_115a failed with the following error:
Timeout occurred after 212 mins, last suite running was recovery-small, restarting cluster to continue tests
client: lustre-master tag-2.10.58
server: 2.10.3
OST console
[ 9177.783390] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == recovery-small test 115a: read: late REQ MDunlink and no bulk ===================================== 10:08:44 \(1518689324\) [ 9177.969217] Lustre: DEBUG MARKER: == recovery-small test 115a: read: late REQ MDunlink and no bulk ===================================== 10:08:44 (1518689324) [ 9178.139373] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null || /usr/sbin/lctl lustre_build_version 2>/dev/null || /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2 [ 9179.546836] Lustre: DEBUG MARKER: lctl set_param fail_val=0 fail_loc=0x8000051a [ 9182.363552] Lustre: *** cfs_fail_loc=51a, val=0*** [ 9212.794207] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting [ 9213.793456] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [ 9253.893995] LNet: Service thread pid 18623 was inactive for 40.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 9253.895896] Pid: 18623, comm: ll_ost_io00_012 [ 9253.896368] Call Trace: [ 9253.896822] [<ffffffff816ab6b9>] schedule+0x29/0x70 [ 9253.897357] [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0 [ 9253.898152] [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10 [ 9253.898933] [<ffffffffc09d247e>] target_bulk_io+0x4ae/0xab0 [ptlrpc] [ 9253.899635] [<ffffffff810c6440>] ? default_wake_function+0x0/0x20 [ 9253.900476] [<ffffffffc0a7c482>] tgt_brw_read+0xf32/0x1850 [ptlrpc] [ 9253.901189] [<ffffffffc0a16dc3>] ? lustre_pack_reply_v2+0x183/0x280 [ptlrpc] [ 9253.902042] [<ffffffffc0a16f2f>] ? lustre_pack_reply_flags+0x6f/0x1e0 [ptlrpc] [ 9253.902824] [<ffffffffc09cfe60>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [ 9253.903575] [<ffffffffc0a79da5>] tgt_request_handle+0x925/0x1370 [ptlrpc] [ 9253.904421] [<ffffffffc0a22b16>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc] [ 9253.905276] [<ffffffffc05d7bc7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 9253.906048] [<ffffffffc0a26252>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [ 9253.906800] [<ffffffffc0a257c0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] [ 9253.907481] [<ffffffff810b252f>] kthread+0xcf/0xe0 [ 9253.908062] [<ffffffff810b2460>] ? kthread+0x0/0xe0 [ 9253.908623] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [ 9253.909214] [<ffffffff810b2460>] ? kthread+0x0/0xe0 [ 9253.910003] LustreError: dumping log to /tmp/lustre-log.1518689399.18623 [ 9313.794986] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff88006133d450 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:170/0 lens 608/432 e 4 to 0 dl 1518689465 ref 1 fl Interpret:/0/0 rc 0/0 [ 9313.797912] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [ 9313.799353] LNet: Service thread pid 18623 completed after 100.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [ 9320.792161] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting [ 9320.794621] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [ 9420.796985] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff880060b31450 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:277/0 lens 608/432 e 0 to 0 dl 1518689572 ref 1 fl Interpret:/2/0 rc 0/0 [ 9420.799665] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [ 9427.795775] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [ 9527.797989] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff88006027b850 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:384/0 lens 608/432 e 0 to 0 dl 1518689679 ref 1 fl Interpret:/2/0 rc 0/0 [ 9527.800790] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [ 9534.798632] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting [ 9534.799948] Lustre: Skipped 1 previous similar message [ 9534.800610] Lustre: lustre-OST0000: Connection restored to de427d50-d7c3-9c4d-0314-3a8b85bd9da6 (at 10.2.8.124@tcp) [ 9534.801699] Lustre: Skipped 117 previous similar messages [ 9534.803081] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [ 9634.804986] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff88006128b850 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:491/0 lens 608/432 e 0 to 0 dl 1518689786 ref 1 fl Interpret:/2/0 rc 0/0 [ 9634.807683] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [ 9641.800411] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [ 9741.802995] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff88005d038c50 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:598/0 lens 608/432 e 0 to 0 dl 1518689893 ref 1 fl Interpret:/2/0 rc 0/0 [ 9741.805670] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [ 9748.800033] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [ 9848.801984] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff880060638850 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:705/0 lens 608/432 e 0 to 0 dl 1518690000 ref 1 fl Interpret:/2/0 rc 0/0 [ 9848.804674] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [ 9855.800100] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting [ 9855.801538] Lustre: Skipped 2 previous similar messages [ 9855.803027] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [ 9955.804985] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff880060786050 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:57/0 lens 608/432 e 0 to 0 dl 1518690107 ref 1 fl Interpret:/2/0 rc 0/0 [ 9955.807591] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [ 9962.801022] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [10062.802989] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff880068906050 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:164/0 lens 608/432 e 0 to 0 dl 1518690214 ref 1 fl Interpret:/2/0 rc 0/0 [10062.805909] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [10069.802829] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [10169.804997] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff88006133e050 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:271/0 lens 608/432 e 0 to 0 dl 1518690321 ref 1 fl Interpret:/2/0 rc 0/0 [10169.807940] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [10176.807295] Lustre: lustre-OST0000: Connection restored to de427d50-d7c3-9c4d-0314-3a8b85bd9da6 (at 10.2.8.124@tcp) [10176.808453] Lustre: Skipped 5 previous similar messages [10283.808108] LustreError: 132-0: lustre-OST0000: BAD READ CHECKSUM: should have changed on the client or in transit: from 10.2.8.124@tcp inode [0x200039611:0x9:0x0] object 0x0:59083 extent [0-4095], client returned csum 0 (type 4), server csum 98f94189 (type 4) [10283.810562] LustreError: Skipped 1 previous similar message [10383.810991] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) @@@ timeout on bulk READ after 100+0s req@ffff88005c641c50 x1592451778374256/t0(0) o3->8a9263ac-9afd-de65-c52c-eb00e5830c2e@10.2.8.124@tcp:485/0 lens 608/432 e 0 to 0 dl 1518690535 ref 1 fl Interpret:/2/0 rc 0/0 [10383.813599] LustreError: 18623:0:(ldlm_lib.c:3226:target_bulk_io()) Skipped 1 previous similar message [10383.814654] Lustre: lustre-OST0000: Bulk IO read error with 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp), client will retry: rc -110 [10383.816086] Lustre: Skipped 1 previous similar message [10390.807059] Lustre: lustre-OST0000: Client 8a9263ac-9afd-de65-c52c-eb00e5830c2e (at 10.2.8.124@tcp) reconnecting