[LU-15376] sanity-benchmark test_iozone: fsync: Protocol error Created: 15/Dec/21  Updated: 04/Oct/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.8, Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: Jian Yu
Resolution: Unresolved Votes: 0
Labels: failing_tests

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a4e5bea6-99ab-4014-9c91-53044b574fe5

test_iozone failed with the following error:

iozone (1) failed

Hit the error between 2.10.8 client and 2.12.8 server, not sure if it is a DCO issue.

	Include fsync in write timing
	>>> I/O Diagnostic mode enabled. <<<
	Performance measurements are invalid in this mode.
	Record Size 512 kB
	File size set to 5503128 kB
	Command line used: iozone -i 0 -i 1 -i 2 -e -+d -r 512 -s 5503128 -f /mnt/lustre/d0.iozone/iozone
	Output is in kBytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 kBytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                              random    random     bkwd    record    stride                                    
              kB  reclen    write  rewrite    read    reread    read     write     read   rewrite      read   fwrite frewrite    fread  freread
         5503128     512fsync: Protocol error

iozone: interrupted

exiting iozone

 sanity-benchmark test_iozone: @@@@@@ FAIL: iozone (1) failed 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5337:error()
  = /usr/lib64/lustre/tests/sanity-benchmark.sh:130:test_iozone()
  = /usr/lib64/lustre/tests/test-framework.sh:5618:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5657:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5504:run_test()
  = /usr/lib64/lustre/tests/sanity-benchmark.sh:175:main()

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-benchmark test_iozone - iozone (1) failed



 Comments   
Comment by Minh Diep [ 14/Jun/22 ]

https://testing.whamcloud.com/test_sets/6c6114a7-2b43-42b6-bb96-7608d75045ca

seems like the client got evicted

[ 6953.344248] Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone ============================================================== 03:31:28 (1654313488) [ 6965.443725] Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1783480kB available, using 5502664kB file size [ 6965.658945] Lustre: DEBUG MARKER: min OST has 1783480kB available, using 5502664kB file size [ 6981.216048] Lustre: 10335:0:(client.c:2169:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1654313510/real 0] req@ffff9e629def5b00 x1734667382522880/t0(0) o400->lustre-MDT0000-mdc-ffff9e62ba648000@10.240.38.243@tcp:12/10 lens 224/224 e 0 to 1 dl 1654313517 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 [ 6981.223121] Lustre: lustre-MDT0000-mdc-ffff9e62ba648000: Connection to lustre-MDT0000 (at 10.240.38.243@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 6982.203107] kworker/u4:0: page allocation failure: order:0, mode:0x20

Comment by Colin Faber [ 28/Sep/22 ]

Hi yujian 

Can you take a look? This looks very similar to the other protocol issue you've investigated recently.

Thank you!

Generated at Sat Feb 10 03:17:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.