[LU-17100] sanityn test_100b: test_100b failed with 1 Created: 08/Sep/23  Updated: 08/Sep/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for eaujames <eaujames@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/415750f5-3380-4473-958e-9b9ced76ab0d

test_100b failed with the following error:

test_100b failed with 1

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/97846 - 4.18.0-477.15.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/97846 - 4.18.0-477.15.1.el8_lustre.x86_64

mkdir: cannot create directory '/mnt/lustre': Input/output error
lfs setstripe: cannot create composite file '/mnt/lustre/d100b.sanityn/dom': Cannot send after transport endpoint shutdown
dd: failed to open '/mnt/lustre2/d100b.sanityn/dom': No such file or directory
 sanityn test_100b: @@@@@@ FAIL: test_100b failed with 1 

MDT0001/MDT0003 node is no able to communicate (evicted):

19764.475180] Lustre: lustre-MDT0001: Client 1e5065e8-439e-4ac7-adcf-ad737755a8e1 (at 10.240.25.232@tcp) reconnecting
[19764.476940] Lustre: Skipped 7 previous similar messages
[19767.544960] Lustre: lustre-MDT0003: Client 1e5065e8-439e-4ac7-adcf-ad737755a8e1 (at 10.240.25.232@tcp) reconnecting
[19767.546741] Lustre: Skipped 1 previous similar message
[19770.352338] Lustre: lustre-OST0003-osc-MDT0001: Connection restored to  (at 10.240.25.234@tcp)
[19770.353857] Lustre: Skipped 5 previous similar messages
[19770.354805] LustreError: 167-0: lustre-OST0004-osc-MDT0003: This client was evicted by lustre-OST0004; in progress operations using this service will fail.
[19770.618854] Lustre: lustre-MDT0001: Client dcd2d5cf-1394-47fa-8a1a-19e1bc89c924 (at 10.240.25.233@tcp) reconnecting
[19770.620623] Lustre: Skipped 3 previous similar messages
[19771.473936] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanityn test_100b: @@@@@@ FAIL: test_100b failed with 1 
[19771.649759] Lustre: DEBUG MARKER: sanityn test_100b: @@@@@@ FAIL: test_100b failed with 1
[19771.889262] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-1/2023-09-07/lustre-reviews_review-dne-part-5_97846_7_28453664-6cdc-48f4-aefd-fb903ac40b43//sanityn.test_100b.debug_log.$(hostname -s).1694122485.log;
               		dmesg > /autotest/autotest-1/2023-09-07/lustre-reviews_review-d
[19773.277932] Lustre: lustre-MDT0001: Client lustre-MDT0001-lwp-OST0000_UUID (at 10.240.25.234@tcp) reconnecting

MDT0000/MDT0002

[12740.984357] Lustre: DEBUG MARKER: == sanityn test 100b: DoM: no glimpse RPC for stat with IO lock (DoM only file) ========================================================== 21:34:37 (1694122477)
[12742.802635] Lustre: 3107:0:(client.c:2310:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1694122430/real 1694122430]  req@00000000faabeb67 x1776402845536448/t0(0) o400->lustre-MDT0001-osp-MDT0002@10.240.25.236@tcp:24/4 lens 224/224 e 0 to 1 dl 1694122479 ref 1 fl Rpc:XNQr/200/ffffffff rc 0/-1 job:'kworker.0' uid:0 gid:0
[12742.807631] Lustre: 3107:0:(client.c:2310:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
[12742.809256] Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 10.240.25.236@tcp) was lost; in progress operations using this service will wait for recovery to complete
[12742.811905] Lustre: Skipped 7 previous similar messages
[12747.943403] Lustre: MGS: Client 5e5a0f93-5855-4281-a6d2-34377072c0f1 (at 10.240.25.232@tcp) reconnecting
[12748.213026] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanityn test_100b: @@@@@@ FAIL: test_100b failed with 1 
[12748.399036] Lustre: DEBUG MARKER: sanityn test_100b: @@@@@@ FAIL: test_100b failed with 1
[12748.633924] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-1/2023-09-07/lustre-reviews_review-dne-part-5_97846_7_28453664-6cdc-48f4-aefd-fb903ac40b43//sanityn.test_100b.debug_log.$(hostname -s).1694122485.log;
               		dmesg > /autotest/autotest-1/2023-09-07/lustre-reviews_review-d

client is evicted from MDT0000:

[19863.843380] Lustre: lustre-MDT0001-mdc-ffff992d84e26000: Connection restored to 10.240.25.236@tcp (at 10.240.25.236@tcp)
[19866.910452] Lustre: lustre-MDT0003-mdc-ffff992d84e26000: Connection restored to 10.240.25.236@tcp (at 10.240.25.236@tcp)
[19866.912190] Lustre: Skipped 1 previous similar message
[19867.937153] LustreError: 167-0: lustre-OST0000-osc-ffff992d937a3000: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
[19867.939561] LustreError: Skipped 1 previous similar message
[19867.940674] Lustre: lustre-OST0001-osc-ffff992d937a3000: Connection restored to  (at 10.240.25.234@tcp)
[19867.942196] Lustre: Skipped 1 previous similar message
[19870.558213] Lustre: lustre-MDT0000-mdc-ffff992d937a3000: Connection to lustre-MDT0000 (at 10.240.25.235@tcp) was lost; in progress operations using this service will wait for recovery to complete
[19870.561039] Lustre: Skipped 1 previous similar message
[19870.571643] LustreError: 11-0: lustre-MDT0000-mdc-ffff992d937a3000: operation ldlm_enqueue to node 10.240.25.235@tcp failed: rc = -107
[19870.573702] LustreError: 167-0: lustre-MDT0000-mdc-ffff992d84e26000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[19870.576040] LustreError: Skipped 2 previous similar messages
[19870.577380] Lustre: Evicted from MGS (at 10.240.25.235@tcp) after server handle changed from 0x112b4fefc236a622 to 0x112b4fefc25116b2
[19870.579661] LustreError: 832905:0:(file.c:5360:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5
[19870.584097] LustreError: 832910:0:(file.c:246:ll_close_inode_openhandle()) lustre-clilmv-ffff992d937a3000: inode [0x200000405:0xc5e:0x0] mdc close failed: rc = -108

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanityn test_100b - test_100b failed with 1


Generated at Sat Feb 10 03:32:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.