[LU-10019] Interop - sanity test_77c: no checksum dump file on OSS Created: 22/Sep/17  Updated: 22/Aug/23  Resolved: 21/Mar/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.1
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Jian Yu
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Servers: 2.10.1 RC1 b2_10, build 26
Client: 2.10 , b2_10, build 5
ZFS


Issue Links:
Related
is related to LU-9604 sanity test_77c: no checksum dump fil... Resolved
is related to LU-17047 interop rolling-downgrade-client2 san... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/55451e1e-9f2d-11e7-ba27-5254006e85c2.

The sub-test test_77c failed with the following error:

no checksum dump file on OSS

test_log:

== sanity test 77c: checksum error on client read with debug ========================================= 06:01:48 (1505714508)
8+0 records in
8+0 records out
8388608 bytes (8.4 MB) copied, 0.267813 s, 31.3 MB/s
osc.lustre-OST0000-osc-ffff88003d08b000.checksum_dump=1
osc.lustre-OST0001-osc-ffff88003d08b000.checksum_dump=1
obdfilter.lustre-OST0000.checksum_dump=1
obdfilter.lustre-OST0001.checksum_dump=1
fail_loc=0x80000408
8+0 records in
8+0 records out
8388608 bytes (8.4 MB) copied, 1.57763 s, 5.3 MB/s
fail_loc=0
onyx-80: ls: cannot access -checksum_dump-ost-[0x200002340:0x677a:0x0]*: No such file or directory
 sanity test_77c: @@@@@@ FAIL: no checksum dump file on OSS 


 Comments   
Comment by Bruno Faccini (Inactive) [ 22/Sep/17 ]

According to the filename in the error msg :

ls: cannot access -checksum_dump-ost-[0x200002340:0x677a:0x0]*: No such file or directory

it looks like the problem is that "lctl get_param -n debug_path" has returned an empty string on OSS.

So it is not related to LU-9604 at all.

Comment by Bruno Faccini (Inactive) [ 22/Sep/17 ]

Could this be a regression after patches from LU-8066 (particularly "LU-8066 libcfs: migrate to debugfs") have landed ?
May be sanity/test_77c sub-test needs to strengthen ?

Comment by James A Simmons [ 22/Sep/17 ]

The migration to debugfs landed for lustre 2.11. This bug is reported for lustre 2.10

Comment by Peter Jones [ 22/Sep/17 ]

Jian

Could you please look into this one?

Thanks

Peter

Comment by Bruno Faccini (Inactive) [ 23/Sep/17 ]

James, I may have been wrong in suspecting patch from LU-8066 due to referring to its next tag on master branch instead to check in b2_10 branch, sorry.

Comment by Jian Yu [ 03/Oct/17 ]

Hi Saurabh and Bruno,
I submitted a for-test-only patch https://review.whamcloud.com/29292 to run sanity test 77 on Lustre 2.10.0 clients with 2.10.1 RC1 and RC2 servers separately. All of the test runs passed with FSTYPE=zfs:
https://testing.hpdd.intel.com/test_sessions/f9db83a2-6f86-4cb5-bf04-a0f389b08811
https://testing.hpdd.intel.com/test_sessions/8c856d96-7b4d-4d6e-9e77-45a38b9ed9bd

I also tried to run the test manually, but still could not reproduce the failure. The test passed:
https://testing.hpdd.intel.com/test_sessions/58500240-a7d4-11e7-bb19-5254006e85c2

Comment by Jian Yu [ 16/Oct/17 ]

Hi Saurabh,
Is this failure reproducible from you?

Comment by Jian Yu [ 17/Nov/17 ]

I'm closing this ticket as "Cannot Reproduce". If the failure occurs again, please feel free to reopen the ticket.

Comment by Sarah Liu [ 21/Mar/22 ]

Hit similar issue in interop testing between master and 2.14
error says "no checksum dump file on Client"
https://testing.whamcloud.com/test_sets/4834f8fb-fd5c-4194-9ccf-e9d835f42a9b

Generated at Sat Feb 10 02:31:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.