[LU-13751] sanity test_160j: FAIL: read changelog failed Created: 05/Jul/20  Updated: 10/Feb/21  Resolved: 08/Feb/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: rhel8.3

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for jianyu <yujian@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/737a1583-79d3-4fbc-a7a8-97e2c2a459e2

test_160j failed with the following error:

cat: -: Cannot send after transport endpoint shutdown
 sanity test_160j: @@@@@@ FAIL: read changelog failed

Console log on client:

Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-22vm4@tcp:/lustre /mnt/lustre2
Lustre: Mounted lustre-client
Lustre: 656:0:(llog_cat.c:834:llog_cat_process_common()) lustre-MDT0000-mdc-ffff8b40363b1000: can't find llog handle [0x51f:0x1:0x0]:0: rc = -108
LustreError: 656:0:(mdc_changelog.c:335:chlg_load()) lustre-MDT0000-mdc-ffff8b40363b1000: fail to process llog: rc = -108
Lustre: Unmounted lustre-client
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_160j: @@@@@@ FAIL: read changelog failed 

More failure instances on master branch:
https://testing.whamcloud.com/test_sets/91dbe253-024d-439c-8d6b-e025071d97a7
https://testing.whamcloud.com/test_sets/d9e11ade-9442-4759-b61e-18635b197bea
https://testing.whamcloud.com/test_sets/b21e9658-39e1-4e46-b916-84ba852b553c

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_160j - read changelog failed



 Comments   
Comment by Jian Yu [ 04/Dec/20 ]

This failure occurred 8 times in the past two weeks. It's affecting the patch testing on master branch.

Comment by Jian Yu [ 08/Dec/20 ]

The failure occurred 9 times in the past one week.

Comment by Peter Jones [ 04/Jan/21 ]

Mike

Could you please investigate?

Thanks

Peter

Comment by John Hammond [ 04/Jan/21 ]

The first failure I could find with this error message was https://testing.whamcloud.com/sub_tests/57f6774c-e2bb-11e9-9874-52540065bddc

Comment by Mikhail Pershin [ 11/Jan/21 ]

I tend to think this is test script issue, the llog read from client sends RPC to server and ptlrpc_wait_queue() may ends up with -ESHUTDOWN on umount, so that is OK I'd say. Originally this test 160j was added in context of LU-11626 to check there is no LBUG due to missed obd device so correctness of this test is not about 'changelog must be read after umount' but about the server shouldn't see LBUG during that. Therefore I think we should just consider that error as valid case during the test

Comment by Peter Jones [ 11/Jan/21 ]

Thanks Mike. James will look into making that change.

Comment by Gerrit Updater [ 26/Jan/21 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41317
Subject: LU-13751 tests: remove error on changelog read 160j
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bf523dd41fd60ec0f585b898869567328394aeca

Comment by Gerrit Updater [ 08/Feb/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41317/
Subject: LU-13751 tests: remove read of changelog sanity 160j
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 388e8ed199b47337616beba573cb595343e71cca

Comment by Peter Jones [ 08/Feb/21 ]

Landed for 2.14

Generated at Sat Feb 10 03:03:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.