[LU-11752] sanity test 271f fails with '1 READ RPC occured' Created: 10/Dec/18  Updated: 01/Apr/19  Resolved: 21/Mar/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.1

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

ZFS


Issue Links:
Duplicate
duplicates LU-11532 sanity test_271f defect: sanity.sh: l... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_271f started failing on November 27, 2018 with the error '1 READ RPC occured'. So far, this failure is only seen with non-DNE and ZFS configurations.

Looking at the logs for a recent failure at https://testing.whamcloud.com/test_sets/faa5559a-f9ef-11e8-8512-52540065bddc , the client test_log contains

== sanity test 271f: DoM: read on open (200K file and read tail) ===================================== 23:15:29 (1544051729)
CMD: onyx-46vm9 /usr/sbin/lctl get_param -n version 2>/dev/null ||
				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
warning: '-M' deprecated, use '--mdt-index' or '-m' instead
1+0 records in
1+0 records out
200000 bytes (200 kB) copied, 0.00141796 s, 141 MB/s
1+0 records in
1+0 records out
200000 bytes (200 kB) copied, 0.00106307 s, 188 MB/s
Append to the same page
 sanity test_271f: @@@@@@ FAIL: 1 READ RPC occured 

There is no indication of any problems in the console logs.

This failure looks different from LU-11532 in that this failure does not look like a bash/test error. In early November, there was a modification made to this test by path https://review.whamcloud.com/33490 , but the start of these failures is delayed from the landing of this patch.

Other similar failures are at
https://testing.whamcloud.com/test_sets/23e23310-f2a2-11e8-b67f-52540065bddc
https://testing.whamcloud.com/test_sets/512dc616-f2ea-11e8-b67f-52540065bddc
https://testing.whamcloud.com/test_sets/db8b58f8-f530-11e8-86c0-52540065bddc
https://testing.whamcloud.com/test_sets/d8091eb8-f72e-11e8-b67f-52540065bddc



 Comments   
Comment by Peter Jones [ 12/Dec/18 ]

Mike

Is this something that we should be concerned about for 2.12?

Peter

Comment by Gerrit Updater [ 13/Dec/18 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33847
Subject: LU-11752 test: add debug info into test 271f
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f2bbc8e2b17d83dfaa1af973aed50611782714bb

Comment by Andreas Dilger [ 05/Feb/19 ]

It looks like there is also a debug patch in LU-11532.

Comment by Andreas Dilger [ 04/Mar/19 ]

Is there any progress on this ticket? I hit it 5x on my patch https://review.whamcloud.com/34367 that shouldn't have anything to do with DoM read.

Comment by Mikhail Pershin [ 10/Mar/19 ]

patch was updated with the fix. Problem was in incorrect value of ocd_grant_blkbits during reconnect.

Comment by Gerrit Updater [ 21/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33847/
Subject: LU-11752 osc: pass client page size during reconnect too
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5bec8f95cc1028d207e55e659a27d80081864a83

Comment by Peter Jones [ 21/Mar/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 21/Mar/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34485
Subject: LU-11752 osc: pass client page size during reconnect too
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a524d7c5652af2406b6da748d47c9e83d913f168

Comment by Gerrit Updater [ 01/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34485/
Subject: LU-11752 osc: pass client page size during reconnect too
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: a4bef2a016339b7ab800e02884624a5916310f0d

Generated at Sat Feb 10 02:46:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.