[LU-10333] Ubuntu1604 client sanity-255a hung: buffer overflow detected Created: 06/Dec/17  Updated: 02/Feb/18  Resolved: 02/Feb/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: ubuntu
Environment:

server: 2.10.2 RC1 RHEL7
client: Ubuntu16.04


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_255a hung, client console shows

[16329.701229] LustreError: 9542:0:(symlink.c:94:ll_readlink_internal()) lustre: inode [0x200000401:0x4d:0x0]: symlink length 33 not expected 35
[16363.646193] LustreError: 11-0: MGC10.2.5.222@tcp: operation obd_ping to node 10.2.5.222@tcp failed: rc = -107
[16363.648603] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[16402.644216] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[17110.477444] LustreError: 21259:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007690a000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[17548.198662] LustreError: 30782:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 412 sleeping for 1000ms
[17549.196068] LustreError: 30782:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 412 awake
aaaaaa
[18923.899874] LustreError: 20250:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007690a000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[18957.879890] LustreError: 22323:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007690a000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[19077.650016] LustreError: 11-0: lustre-MDT0000-mdc-ffff88007690a000: operation obd_ping to node 10.2.5.222@tcp failed: rc = -107
[19077.652737] LustreError: Skipped 1 previous similar message
[19084.652184] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[19226.978444] LustreError: 6099:0:(osc_request.c:1941:osc_build_rpc()) prep_req failed: -12
[19226.979200] LustreError: 6099:0:(osc_cache.c:2282:osc_check_rpcs()) Write request failed with -12
[19277.233584] LustreError: 13562:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007690a000: error writing proc 'stripecount=-1'='-1': rc = -34
[19296.770118] LustreError: 14821:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007690a000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[19296.771624] LustreError: 14821:0:(obd_config.c:1431:class_process_proc_param()) Skipped 1 previous similar message
[19410.960027] LustreError: 132-0: lustre-OST0001-osc-ffff88007690a000: BAD WRITE CHECKSUM: changed on the client after we checksummed it - likely false positive due to mmap IO (bug 11742): from 10.2.2.44@tcp inode [0x2000013a1:0x2c54:0x0] object 0x0:30300 extent [0-4194303], original client csum 9e56117e (type 2), server csum 9e56117d (type 2), client csum now 9e56117d
[19410.962826] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff88003aa81200 x1585974601976528/t8589964424(8589964424) o4->lustre-OST0001-osc-ffff88007690a000@10.2.2.44@tcp:6/4 lens 608/416 e 0 to 0 dl 1512520511 ref 3 fl Interpret:R/4/0 rc 0/0
[19412.787614] LustreError: 133-1: lustre-OST0001-osc-ffff88007690a000: BAD READ CHECKSUM: from 10.2.2.44@tcp inode [0x2000013a1:0x2c54:0x0] object 0x0:30300 extent [0-4194303], client 606f3cce, server ec9f5fa3, cksum_type 1
[19412.789414] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff88003aa80300 x1585974601976704/t0(0) o3->lustre-OST0001-osc-ffff88007690a000@10.2.2.44@tcp:6/4 lens 608/400 e 0 to 0 dl 1512520513 ref 2 fl Interpret:RM/0/0 rc 4194304/4194304
[19414.809667] LustreError: 133-1: lustre-OST0001-osc-ffff88007690a000: BAD READ CHECKSUM: from 10.2.2.44@tcp inode [0x2000013a1:0x2c54:0x0] object 0x0:30300 extent [0-4194303], client cb101033, server 9e56117d, cksum_type 2
[19414.814167] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff88003aa80000 x1585974601976864/t0(0) o3->lustre-OST0001-osc-ffff88007690a000@10.2.2.44@tcp:6/4 lens 608/400 e 0 to 0 dl 1512520515 ref 2 fl Interpret:RM/0/0 rc 4194304/4194304
[19420.999278] LustreError: 133-1: lustre-OST0000-osc-ffff88007690a000: BAD READ CHECKSUM: from 10.2.2.44@tcp inode [0x2000013a1:0x2c55:0x0] object 0x0:29504 extent [0-1048575], client 2b9a32d1, server 5e84341b, cksum_type 2
[19421.001200] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff88003aa81b00 x1585974601977392/t0(0) o3->lustre-OST0000-osc-ffff88007690a000@10.2.2.44@tcp:6/4 lens 608/400 e 0 to 0 dl 1512520521 ref 2 fl Interpret:RM/0/0 rc 1048576/1048576
[19432.445103] LustreError: 132-0: lustre-OST0000-osc-ffff88007690a000: BAD WRITE CHECKSUM: changed on the client after we checksummed it - likely false positive due to mmap IO (bug 11742): from 10.2.2.44@tcp inode [0x2000013a1:0x2c57:0x0] object 0x0:29505 extent [0-4194303], original client csum 7d33b9a0 (type 2), server csum 7d33b99f (type 2), client csum now 7d33b99f
[19432.454390] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff88003aa80300 x1585974601977600/t8589965292(8589965292) o4->lustre-OST0000-osc-ffff88007690a000@10.2.2.44@tcp:6/4 lens 608/416 e 0 to 0 dl 1512520533 ref 2 fl Interpret:RM/0/0 rc 0/0
[19433.809531] LustreError: 133-1: lustre-OST0000-osc-ffff88007690a000: BAD READ CHECKSUM: from 10.2.2.44@tcp inode [0x2000013a1:0x2c57:0x0] object 0x0:29505 extent [0-4194303], client a379b80f, server 7d33b99f, cksum_type 2
[19438.258914] LustreError: 132-0: lustre-OST0001-osc-ffff88007690a000: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.2.2.44@tcp inode [0x2000013a1:0x2c59:0x0] object 0x0:30301 extent [0-4194303], original client csum 8b19b060 (type 1), server csum 8b19b05f (type 1), client csum now 8b19b060
[19441.603056] LustreError: 132-0: lustre-OST0001-osc-ffff88007690a000: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.2.2.44@tcp inode [0x2000013a1:0x2c59:0x0] object 0x0:30301 extent [0-4194303], original client csum 8b19b060 (type 1), server csum 8b19b05f (type 1), client csum now 8b19b060
[19441.615157] LustreError: Skipped 3 previous similar messages
[19441.618023] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff88003aa80600 x1585974601978000/t8589964432(8589964432) o4->lustre-OST0001-osc-ffff88007690a000@10.2.2.44@tcp:6/4 lens 608/416 e 0 to 0 dl 1512520542 ref 2 fl Interpret:RM/0/0 rc 0/0
[19441.627090] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 5 previous similar messages
[19449.020947] LustreError: 132-0: lustre-OST0001-osc-ffff88007690a000: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.2.2.44@tcp inode [0x2000013a1:0x2c59:0x0] object 0x0:30301 extent [4194304-8388607], original client csum 8b19b060 (type 1), server csum 8b19b05f (type 1), client csum now 8b19b060
[19449.030322] LustreError: Skipped 3 previous similar messages
[19460.457503] LustreError: 132-0: lustre-OST0001-osc-ffff88007690a000: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.2.2.44@tcp inode [0x2000013a1:0x2c59:0x0] object 0x0:30301 extent [4194304-8388607], original client csum 8b19b060 (type 1), server csum 8b19b05f (type 1), client csum now 8b19b060
[19460.466868] LustreError: Skipped 3 previous similar messages
[19460.469660] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff88003aa81800 x1585974601978144/t8589964440(8589964440) o4->lustre-OST0001-osc-ffff88007690a000@10.2.2.44@tcp:6/4 lens 608/416 e 0 to 0 dl 1512520561 ref 2 fl Interpret:RM/0/0 rc 0/0
[19460.483185] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 7 previous similar messages
[19479.012358] LustreError: 132-0: lustre-OST0001-osc-ffff88007690a000: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.2.2.44@tcp inode [0x2000013a1:0x2c59:0x0] object 0x0:30301 extent [4194304-8388607], original client csum 8b19b060 (type 1), server csum 8b19b05f (type 1), client csum now 8b19b060
[19479.022031] LustreError: Skipped 5 previous similar messages
[19484.796786] LustreError: 22890:0:(osc_request.c:1735:brw_interpret()) lustre-OST0001-osc-ffff88007690a000: too many resent retries for object: 0:30301, rc = -11.
[19496.387007] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff88003aa81800 x1585974601978624/t8589964458(8589964458) o4->lustre-OST0001-osc-ffff88007690a000@10.2.2.44@tcp:6/4 lens 608/416 e 0 to 0 dl 1512520597 ref 2 fl Interpret:RM/0/0 rc 0/0
[19496.396257] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 15 previous similar messages
[19513.893895] LustreError: 132-0: lustre-OST0001-osc-ffff88007690a000: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.2.2.44@tcp inode [0x2000013a1:0x2c59:0x0] object 0x0:30301 extent [4194304-8388607], original client csum 7d33b9a0 (type 2), server csum 7d33b99f (type 2), client csum now 7d33b9a0
[19513.903199] LustreError: Skipped 17 previous similar messages
[19532.383085] LustreError: 22889:0:(osc_request.c:1735:brw_interpret()) lustre-OST0001-osc-ffff88007690a000: too many resent retries for object: 0:30301, rc = -11.
[19532.389193] LustreError: 22889:0:(osc_request.c:1735:brw_interpret()) Skipped 1 previous similar message
[19538.435901] LustreError: 133-1: lustre-OST0000-osc-ffff88007690a000: BAD READ CHECKSUM: from 10.2.2.44@tcp inode [0x2000013a1:0x2c5b:0x0] object 0x0:29506 extent [0-4194303], client d6471036, server 9e56117d, cksum_type 2
[19543.815154] LustreError: 27584:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88004a8a5800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[19976.019915] LustreError: 8098:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88004a90a800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[19976.030344] LustreError: 8098:0:(obd_config.c:1431:class_process_proc_param()) Skipped 1 previous similar message
[20058.918205] LustreError: 167-0: lustre-OST0000-osc-ffff88007adfd800: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
[20073.467574] LustreError: 21648:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 315 sleeping for 10000ms
[20083.472082] LustreError: 21648:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 315 awake
[20118.860169] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[20159.943785] LustreError: 25630:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff880048850000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[20159.950869] LustreError: 25630:0:(obd_config.c:1431:class_process_proc_param()) Skipped 1 previous similar message
[20564.796416] LustreError: 11-0: lustre-OST0001-osc-ffff880048850000: operation ost_write to node 10.2.2.44@tcp failed: rc = -30
[20564.801740] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -30  req@ffff88001973ad00 x1585974609118848/t0(0) o4->lustre-OST0001-osc-ffff880048850000@10.2.2.44@tcp:6/4 lens 608/192 e 0 to 0 dl 1512521701 ref 2 fl Interpret:ReM/0/0 rc -30/-30
[20564.810208] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 13 previous similar messages
[20597.685674] LustreError: 1790:0:(osc_request.c:1941:osc_build_rpc()) prep_req failed: -22
[20597.693798] LustreError: 1790:0:(osc_cache.c:2282:osc_check_rpcs()) Write request failed with -22
[20600.802096] LustreError: 2253:0:(osc_request.c:1941:osc_build_rpc()) prep_req failed: -12
[20600.805126] LustreError: 2253:0:(osc_cache.c:2282:osc_check_rpcs()) Write request failed with -12
[20604.475552] LustreError: 11-0: lustre-OST0001-osc-ffff880048850000: operation ost_write to node 10.2.2.44@tcp failed: rc = -5
[20604.483927] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -5  req@ffff88000b606a00 x1585974609121456/t0(0) o4->lustre-OST0001-osc-ffff880048850000@10.2.2.44@tcp:6/4 lens 608/192 e 0 to 0 dl 1512521742 ref 2 fl Interpret:ReM/0/0 rc -5/-5
[20605.501138] LustreError: 11-0: lustre-OST0001-osc-ffff880048850000: operation ost_write to node 10.2.2.44@tcp failed: rc = -5
[20607.517296] LustreError: 11-0: lustre-OST0001-osc-ffff880048850000: operation ost_write to node 10.2.2.44@tcp failed: rc = -5
[20614.541014] LustreError: 11-0: lustre-OST0001-osc-ffff880048850000: operation ost_write to node 10.2.2.44@tcp failed: rc = -5
[20614.546880] LustreError: Skipped 1 previous similar message
[20625.565602] LustreError: 11-0: lustre-OST0001-osc-ffff880048850000: operation ost_write to node 10.2.2.44@tcp failed: rc = -5
[20625.573546] LustreError: Skipped 1 previous similar message
[20625.577325] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -5  req@ffff88000b606a00 x1585974609121648/t0(0) o4->lustre-OST0001-osc-ffff880048850000@10.2.2.44@tcp:6/4 lens 608/192 e 0 to 0 dl 1512521763 ref 2 fl Interpret:ReM/0/0 rc -5/-5
[20625.585380] LustreError: 22890:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 5 previous similar messages
[20649.605612] LustreError: 11-0: lustre-OST0001-osc-ffff880048850000: operation ost_write to node 10.2.2.44@tcp failed: rc = -5
[20649.613485] LustreError: Skipped 2 previous similar messages
[20659.617115] LustreError: 22890:0:(osc_request.c:1735:brw_interpret()) lustre-OST0001-osc-ffff880048850000: too many resent retries for object: 0:30981, rc = -5.
[20659.627612] LustreError: 22890:0:(osc_request.c:1735:brw_interpret()) Skipped 1 previous similar message
[20663.648617] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -5  req@ffff88000b605b00 x1585974609122256/t0(0) o4->lustre-OST0001-osc-ffff880048850000@10.2.2.44@tcp:6/4 lens 608/192 e 0 to 0 dl 1512521801 ref 2 fl Interpret:ReM/0/0 rc -5/-5
[20663.660838] LustreError: 22889:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 3 previous similar messages
[20857.264695] LustreError: 22889:0:(mdc_locks.c:1191:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -110
*** buffer overflow detected ***: filefrag terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7ff72cd1c7e5]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7ff72cdbe11c]
/lib/x86_64-linux-gnu/libc.so.6(+0x117120)[0x7ff72cdbc120]
/lib/x86_64-linux-gnu/libc.so.6(+0x116689)[0x7ff72cdbb689]
/lib/x86_64-linux-gnu/libc.so.6(_IO_default_xsputn+0x80)[0x7ff72cd206b0]
/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0xc90)[0x7ff72ccf2e00]
/lib/x86_64-linux-gnu/libc.so.6(__vsprintf_chk+0x84)[0x7ff72cdbb714]
/lib/x86_64-linux-gnu/libc.so.6(__sprintf_chk+0x7d)[0x7ff72cdbb66d]
filefrag[0x4018c8]
filefrag[0x401ccf]
filefrag[0x4012f5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff72ccc5830]
filefrag[0x4015c9]
======= Memory map: ========
00400000-00403000 r-xp 00000000 fd:01 396248                             /usr/sbin/filefrag
00602000-00603000 r--p 00002000 fd:01 396248                             /usr/sbin/filefrag
00603000-00604000 rw-p 00003000 fd:01 396248                             /usr/sbin/filefrag
0185b000-0187d000 rw-p 00000000 00:00 0                                  [heap]
7ff72ca8f000-7ff72caa5000 r-xp 00000000 fd:01 1442313                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff72caa5000-7ff72cca4000 ---p 00016000 fd:01 1442313                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff72cca4000-7ff72cca5000 rw-p 00015000 fd:01 1442313                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7ff72cca5000-7ff72ce65000 r-xp 00000000 fd:01 1446711                    /lib/x86_64-linux-gnu/libc-2.23.so
7ff72ce65000-7ff72d065000 ---p 001c0000 fd:01 1446711                    /lib/x86_64-linux-gnu/libc-2.23.so
7ff72d065000-7ff72d069000 r--p 001c0000 fd:01 1446711                    /lib/x86_64-linux-gnu/libc-2.23.so
7ff72d069000-7ff72d06b000 rw-p 001c4000 fd:01 1446711                    /lib/x86_64-linux-gnu/libc-2.23.so
7ff72d06b000-7ff72d06f000 rw-p 00000000 00:00 0 
7ff72d06f000-7ff72d095000 r-xp 00000000 fd:01 1446689                    /lib/x86_64-linux-gnu/ld-2.23.so
7ff72d287000-7ff72d28a000 rw-p 00000000 00:00 0 
7ff72d291000-7ff72d294000 rw-p 00000000 00:00 0 
7ff72d294000-7ff72d295000 r--p 00025000 fd:01 1446689                    /lib/x86_64-linux-gnu/ld-2.23.so
7ff72d295000-7ff72d296000 rw-p 00026000 fd:01 1446689                    /lib/x86_64-linux-gnu/ld-2.23.so
7ff72d296000-7ff72d297000 rw-p 00000000 00:00 0 
7ffe43dc2000-7ffe43de3000 rw-p 00000000 00:00 0                          [stack]
7ffe43df1000-7ffe43df3000 r--p 00000000 00:00 0                          [vvar]
7ffe43df3000-7ffe43df5000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[21197.439605] LustreError: 3487:0:(gss_cli_upcall.c:241:gss_do_ctx_init_rpc()) ioctl size 5, expect 80, please check lgss_keyring version
[21197.459216] LustreError: 3528:0:(ldlm_resource.c:346:lru_size_store()) lru_size: invalid value written
[21258.765681] LustreError: 5176:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88000b68c000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[21415.930222] LustreError: 8729:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff880039c45800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[21691.268255] LustreError: 32558:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 140c sleeping for 5000ms
[21696.268099] LustreError: 32558:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 140c awake
[21722.619266] LustreError: 5228:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 50e sleeping for 3000ms
[21724.499967] LustreError: dumping log to /tmp/lustre-log.1512522818.8722
[21725.620120] LustreError: 5228:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 50e awake
[21725.628164] LustreError: dumping log to /tmp/lustre-log.1512522819.5228
[21905.351114] LustreError: 18187:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x2000032e2:0x4083:0x0]: -5
[21905.364435] LustreError: 18187:0:(llite_lib.c:2308:ll_prep_inode()) new_inode -fatal: rc -5
[21942.352137] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[21947.566103] LustreError: 4228:0:(mdc_request.c:564:mdc_replay_open()) @@@ Can't properly replay without open data.  req@ffff8800188e4900 x1585994619683312/t30064849214(30064849214) o101->lustre-MDT0000-mdc-ffff880039c45800@10.2.5.222@tcp:12/10 lens 840/544 e 0 to 0 dl 1512523048 ref 2 fl Interpret:RP/4/0 rc 301/301
[21947.578895] LustreError: 4228:0:(client.c:3007:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req@ffff8800188e4900 x1585994619683312/t30064849214(30064849214) o101->lustre-MDT0000-mdc-ffff880039c45800@10.2.5.222@tcp:12/10 lens 840/544 e 0 to 0 dl 1512523048 ref 2 fl Interpret:RP/4/0 rc 301/301
[21974.357502] LustreError: 11-0: MGC10.2.5.222@tcp: operation obd_ping to node 10.2.5.222@tcp failed: rc = -107
[21974.364067] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[21974.377632] LustreError: 4228:0:(mdc_request.c:564:mdc_replay_open()) @@@ Can't properly replay without open data.  req@ffff8800188e4900 x1585994619683312/t30064849214(30064849214) o101->lustre-MDT0000-mdc-ffff880039c45800@10.2.5.222@tcp:12/10 lens 840/544 e 0 to 0 dl 1512523075 ref 2 fl Interpret:RP/4/0 rc 301/301
[21974.390633] LustreError: 4228:0:(client.c:3007:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req@ffff8800188e7c00 x1585994619684592/t34359738371(34359738371) o101->lustre-MDT0000-mdc-ffff880039c45800@10.2.5.222@tcp:12/10 lens 816/544 e 0 to 0 dl 1512523075 ref 2 fl Interpret:RP/4/0 rc 301/301
[21974.399256] LustreError: 4228:0:(client.c:3007:ptlrpc_replay_interpret()) Skipped 2 previous similar messages
[21991.054569] LustreError: 4230:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 40f sleeping for 10000ms
[21991.058899] LustreError: 4230:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 40f awake
[22075.079263] LustreError: 4225:0:(events.c:199:client_bulk_callback()) event type 1, status -5, desc ffff88003a331600
[22077.834688] LustreError: 4225:0:(events.c:199:client_bulk_callback()) event type 2, status -5, desc ffff880039ee0a00
[22101.297714] LustreError: 11-0: lustre-OST0000-osc-ffff880039c45800: operation ldlm_enqueue to node 10.2.2.44@tcp failed: rc = -107
[22101.307021] LustreError: 167-0: lustre-OST0000-osc-ffff880039c45800: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
[22102.840926] LustreError: 11-0: lustre-OST0001-osc-ffff880039c45800: operation obd_ping to node 10.2.2.44@tcp failed: rc = -107
[22102.848728] LustreError: 167-0: lustre-OST0001-osc-ffff880039c45800: This client was evicted by lustre-OST0001; in progress operations using this service will fail.
[22423.472188] LustreError: 11304:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff880039b84800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[22523.224264] LustreError: 11-0: lustre-MDT0000-mdc-ffff880039b84800: operation mds_readpage to node 10.2.5.222@tcp failed: rc = -12
[22593.475644] LustreError: 27842:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007a3a5800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[22596.405790] LustreError: 28361:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007a3a2000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[22596.411373] LustreError: 28361:0:(obd_config.c:1431:class_process_proc_param()) Skipped 1 previous similar message
[22596.419670] LustreError: 28359:0:(llite_lib.c:475:client_common_fill_super()) cannot mds_connect: rc = -2
[22596.428785] LustreError: 28359:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-2)
[22599.107084] LustreError: 28822:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007a3a7800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[22604.956002] LustreError: 29860:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007a3a5000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[22604.961521] LustreError: 29860:0:(obd_config.c:1431:class_process_proc_param()) Skipped 3 previous similar messages
[22605.035932] LustreError: 11-0: lustre-MDT0000-mdc-ffff88007a3a6000: operation mds_get_root to node 10.2.5.222@tcp failed: rc = -22
[22605.050325] LustreError: 29883:0:(llite_lib.c:475:client_common_fill_super()) cannot mds_connect: rc = -22
[22605.065213] LustreError: 29883:0:(obd_mount.c:1505:lustre_fill_super()) Unable to mount  (-22)


 Comments   
Comment by Andreas Dilger [ 06/Dec/17 ]

It isn't clear what test is running that is generating these log messages. The sanity test_255a is about "lfs ladvise", but there are errors about symlinks, checksum errors, the class_proc_param() message (LU-9611, fixed in master), and "filefrag", but nothing that would make me think about ladvise.

I also don't see any of the "lctl mark" messages that report when the tests are started/finished?

Comment by Sarah Liu [ 07/Dec/17 ]

This the the Maloo link https://testing.hpdd.intel.com/test_sets/cad15292-db78-11e7-9c63-52540065bddc

test_255a was the last subtest ran and the system hung, the above log was shown in the client-1 screen but unfortunately it was incomplete. Here are other logs that I saw on client-2 and oss

client-2

[ 6257.561379] LustreError: 11-0: lustre-OST0000-osc-ffff88007a70c800: operation obd_ping to node 10.2.2.44@tcp failed: rc = -107
[ 6257.563862] LustreError: Skipped 1 previous similar message
[ 8013.017978] LustreError: 13686:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007a70c800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[ 8045.561694] LustreError: 14050:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007a70c800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[ 8182.562261] LustreError: 11-0: lustre-MDT0000-mdc-ffff88007a70c800: operation obd_ping to node 10.2.5.222@tcp failed: rc = -107
[ 8182.565247] LustreError: Skipped 1 previous similar message
[ 8182.565975] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[ 8366.001548] LustreError: 18486:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007a70c800: error writing proc 'stripecount=-1'='-1': rc = -34
[ 8387.707852] LustreError: 18711:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88007a70c800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[ 8387.709147] LustreError: 18711:0:(obd_config.c:1431:class_process_proc_param()) Skipped 1 previous similar message
[ 9207.561139] LustreError: 11-0: MGC10.2.5.222@tcp: operation obd_ping to node 10.2.5.222@tcp failed: rc = -107
[ 9207.562647] LustreError: Skipped 1 previous similar message
[ 9207.563941] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[ 9248.370214] LustreError: 25890:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88003f0a3000: error writing proc 'stripeoffset=-1'='-1': rc = -34
[10347.193490] LustreError: 31970:0:(obd_config.c:1431:class_process_proc_param()) lustre-clilov-ffff88003f0a4800: error writing proc 'stripeoffset=-1'='-1': rc = -34
[11029.196104] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[11062.854193] LustreError: 11-0: lustre-MDT0000-mdc-ffff88003f0a4800: operation mds_statfs to node 10.2.5.222@tcp failed: rc = -107
[11062.855462] LustreError: Skipped 1 previous similar message
[11079.196782] LustreError: 11-0: MGC10.2.5.222@tcp: operation obd_ping to node 10.2.5.222@tcp failed: rc = -107
[11079.198179] LustreError: 166-1: MGC10.2.5.222@tcp: Connection to MGS (at 10.2.5.222@tcp) was lost; in progress operations using this service will fail
[11204.197436] LustreError: 11-0: lustre-OST0000-osc-ffff88003f0a4800: operation obd_ping to node 10.2.2.44@tcp failed: rc = -107
[11204.201672] LustreError: 167-0: lustre-OST0000-osc-ffff88003f0a4800: This client was evicted by lustre-OST0000; in progress operations using this service will fail.

oss

[27753.578591] Lustre: DEBUG MARKER: == sanity test 255a: check 'lfs ladvise -a willread' ================================================= 01:28:56 (1512523736)
[27755.680209] LustreError: 132955:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 237 sleeping for 4000ms
[27755.695266] LustreError: 132955:0:(fail.c:129:__cfs_fail_timeout_set()) Skipped 1 previous similar message
[27759.680197] LustreError: 132990:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 237 awake
[27759.719626] LustreError: 132955:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 237 sleeping for 4000ms
[27759.734506] LustreError: 132955:0:(fail.c:129:__cfs_fail_timeout_set()) Skipped 1 previous similar message
[27763.719211] LustreError: 132990:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 237 awake
[27763.732650] LustreError: 132990:0:(fail.c:133:__cfs_fail_timeout_set()) Skipped 1 previous similar message

Comment by Minh Diep [ 08/Dec/17 ]

this test passed in AT https://testing.hpdd.intel.com/test_sets/d212dae6-dc3e-11e7-a066-52540065bddc

Comment by Sarah Liu [ 02/Feb/18 ]

Did not see the failure in recent 2.10.3 and tag-2.10.57 testing, I think we can close it as can not reproduce

Generated at Sat Feb 10 02:34:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.