[LU-16398] ost-pools: FAIL: remove sub-test dirs failed Created: 14/Dec/22  Updated: 13/Jan/23  Resolved: 13/Jan/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16159 remove update logs after recovery abort Reopened
is related to LU-14474 Oops in llog_cat_prep_log() in sanity... Resolved
is related to LU-14932 runtests: test_1 llog_cat_cleanup()) ... Resolved
is related to LU-16335 "lfs rm_entry" failed to remove broke... Resolved
is related to LU-15139 sanity test_160h: dt_record_write() A... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Cyril Bordage <cbordage@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6c25ddb2-8f54-4bfc-b517-7d8fa10a26e0

[15276.445833] Lustre: lustre-MDT0000-mdc-ffff8e6f43d55000: Connection to lustre-MDT0000 (at 10.240.29.75@tcp) was lost; in progress operations using this service will wait for recovery to complete
[15282.525287] Lustre: 8050:0:(client.c:2305:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1670998319/real 1670998319]  req@00000000ed826699 x1752152989501440/t0(0) o400->MGC10.240.29.75@tcp@10.240.29.75@tcp:26/25 lens 224/224 e 0 to 1 dl 1670998326 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:2.0'
[15282.529897] LustreError: 166-1: MGC10.240.29.75@tcp: Connection to MGS (at 10.240.29.75@tcp) was lost; in progress operations using this service will fail
[15282.535285] Lustre: Evicted from MGS (at 10.240.29.75@tcp) after server handle changed from 0xf1d807d0e20aa75a to 0xf1d807d0e20aeef7
[15282.537316] Lustre: MGC10.240.29.75@tcp: Connection restored to 10.240.29.75@tcp (at 10.240.29.75@tcp)
[15282.538776] Lustre: Skipped 1 previous similar message
[15282.540476] LustreError: 8047:0:(client.c:3253:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req@000000000ae752ed x1752152938284544/t575525618194(575525618194) o101->lustre-MDT0000-mdc-ffff8e6f43d55000@10.240.29.75@tcp:12/10 lens 784/608 e 0 to 0 dl 1670998333 ref 2 fl Interpret:RPQU/4/0 rc 301/301 job:'lfs.0'
[15282.544713] LustreError: 8047:0:(client.c:3253:ptlrpc_replay_interpret()) Skipped 4 previous similar messages
[15316.942119] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_30-1 2>/dev/null || echo foo
[15329.368031] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_30-2 2>/dev/null || echo foo
[15329.850726] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == ost-pools test 31: OST pool spilling chained ========== 06:12:52 \(1670998372\)
[15330.287265] Lustre: DEBUG MARKER: == ost-pools test 31: OST pool spilling chained ========== 06:12:52 (1670998372)
[15336.189947] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-1 2>/dev/null || echo foo
[15341.427534] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-1 | grep -e lustre-OST0000_UUID | sort -u | tr '\n' ' ' 
[15346.697976] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-2 2>/dev/null || echo foo
[15351.904531] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-2 | grep -e lustre-OST0001_UUID | sort -u | tr '\n' ' ' 
[15357.089037] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-3 2>/dev/null || echo foo
[15362.271074] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-3 | grep -e lustre-OST0002_UUID | sort -u | tr '\n' ' ' 
[15367.429384] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-4 2>/dev/null || echo foo
[15372.625338] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-4 | grep -e lustre-OST0003_UUID | sort -u | tr '\n' ' ' 
[15396.253066] Lustre: lustre-OST0000-osc-ffff8e6f43d55000: disconnect after 23s idle
[15396.254390] Lustre: Skipped 7 previous similar messages
[15409.018878] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-1 2>/dev/null || echo foo
[15418.687254] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-2 2>/dev/null || echo foo
[15428.344557] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-3 2>/dev/null || echo foo
[15437.992415] Lustre: DEBUG MARKER: lctl get_param -n lov.lustre-*.pools.test_31-4 2>/dev/null || echo foo
[15438.453491] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == ost-pools test complete, duration 3469 sec ============ 06:14:41 \(1670998481\)
[15438.911011] Lustre: DEBUG MARKER: == ost-pools test complete, duration 3469 sec ============ 06:14:41 (1670998481)
[15441.136862] LustreError: 668781:0:(file.c:242:ll_close_inode_openhandle()) lustre-clilmv-ffff8e6f43d55000: inode [0x280000407:0x1:0x0] mdc close failed: rc = -2
[15441.139233] LustreError: 668781:0:(file.c:242:ll_close_inode_openhandle()) Skipped 4 previous similar messages
[15441.703081] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  ost-pools : @@@@@@ FAIL: remove sub-test dirs failed 
[15442.070656] Lustre: DEBUG MARKER: ost-pools : @@@@@@ FAIL: remove sub-test dirs failed
[15442.482095] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-2/2022-12-14/lustre-reviews_review-dne-part-6_91178_8_65e71c9b-ceca-42ee-bf2e-b293cbbbbbb5//ost-pools..debug_log.$(hostname -s).1670998485.log;
[15442.482095] 		dmesg > /autotest/autotest-2/2022-12-14/lustre-reviews_review-dne-part
[15452.573498] Lustre: lustre-MDT0000-mdc-ffff8e6f43d55000: Connection to lustre-MDT0000 (at 10.240.29.75@tcp) was lost; in progress operations using this service will wait for recovery to complete
[15452.583527] Lustre: Skipped 1 previous similar message
[15459.740896] Lustre: 8049:0:(client.c:2305:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1670998496/real 1670998496]  req@0000000075856cdc x1752152989699456/t0(0) o400->MGC10.240.29.75@tcp@10.240.29.75@tcp:26/25 lens 224/224 e 0 to 1 dl 1670998503 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/u4:2.0'
[15459.745591] LustreError: 166-1: MGC10.240.29.75@tcp: Connection to MGS (at 10.240.29.75@tcp) was lost; in progress operations using this service will fail


 Comments   
Comment by Alex Zhuravlev [ 14/Dec/22 ]

I think this is caused by LU-16159

Comment by Gerrit Updater [ 14/Dec/22 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49412
Subject: LU-16398 tests: exclude replay-single/100c until fixed
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 75112019f87c5feb88d5e9cabed960dd6e04217e

Comment by Andreas Dilger [ 13/Jan/23 ]

I think this cleanup issue is fixed by patch https://review.whamcloud.com/49335 "LU-16335 test: add fail_abort_cleanup()"

Generated at Sat Feb 10 03:26:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.