[LU-16987] runtests test_1: FAIL: copying /etc /bin - No space left on device Created: 27/Jul/23  Updated: 27/Jul/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for eaujames <eaujames@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/ba745172-f6b1-4f8d-806c-f07c4b42c5de

test_1 failed with the following error:

copying /etc /bin

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/96445 - 4.18.0-477.15.1.el8_8.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/96445 - 4.18.0-477.15.1.el8_lustre.x86_64

Test log:

striped dir -i1 -c2 -H all_char /mnt/lustre/d1.runtests
copying 590 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin  at Thu Jul 27 00:27:45 UTC 2023
tar: Removing leading `/' from member names
tar: etc/cups/subscriptions.conf: Cannot open: No space left on device
tar: etc/cups/client.conf: Cannot open: No space left on device
tar: Exiting with failure status due to previous errors
 runtests test_1: @@@@@@ FAIL: copying /etc /bin 

MDS2 log:

[ 7620.821470] Lustre: DEBUG MARKER: copying 590 files from /etc /bin to /mnt/lustre/d1.runtests/etc /bin at Thu Jul 27 00:27:45 UTC 2023
[ 7661.566335] Lustre: mdt00_004: service thread pid 27300 was inactive for 40.662 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[ 7661.569288] Pid: 27300, comm: mdt00_004 4.18.0-477.15.1.el8_lustre.x86_64 #1 SMP Wed Jul 19 16:54:24 UTC 2023
[ 7661.570858] Call Trace TBD:
[ 7661.571389] [<0>] osp_precreate_reserve+0x63b/0x9a0 [osp]
[ 7661.572292] [<0>] osp_declare_create+0x4cc/0x720 [osp]
[ 7661.573153] [<0>] lod_sub_declare_create+0x10f/0x260 [lod]
[ 7661.574083] [<0>] lod_qos_declare_object_on+0x1e5/0x430 [lod]
[ 7661.575047] [<0>] lod_ost_alloc_rr+0xba3/0x11e0 [lod]
[ 7661.575906] [<0>] lod_qos_prep_create+0xb1c/0x1350 [lod]
[ 7661.576805] [<0>] lod_prepare_create+0x231/0x320 [lod]
[ 7661.577675] [<0>] lod_declare_striped_create+0x1e2/0x930 [lod]
[ 7661.578647] [<0>] lod_declare_create+0x27c/0x520 [lod]
[ 7661.579514] [<0>] mdd_declare_create_object_internal+0xcd/0x370 [mdd]
[ 7661.580581] [<0>] mdd_declare_create_object.isra.36+0x49/0x880 [mdd]
[ 7661.581631] [<0>] mdd_declare_create+0x72/0x490 [mdd]
[ 7661.582478] [<0>] mdd_create+0x838/0x1a30 [mdd]
[ 7661.583243] [<0>] mdt_reint_open+0x286a/0x3220 [mdt]
[ 7661.584103] [<0>] mdt_reint_rec+0x11f/0x270 [mdt]
[ 7661.584932] [<0>] mdt_reint_internal+0x4d3/0x7f0 [mdt]
[ 7661.585807] [<0>] mdt_intent_open+0x13b/0x420 [mdt]
[ 7661.586645] [<0>] mdt_intent_opc+0x130/0xc30 [mdt]
[ 7661.587464] [<0>] mdt_intent_policy+0xe8/0x460 [mdt]
[ 7661.588303] [<0>] ldlm_lock_enqueue+0x47f/0xb20 [ptlrpc]
[ 7661.589273] [<0>] ldlm_handle_enqueue+0x643/0x1870 [ptlrpc]
[ 7661.590266] [<0>] tgt_enqueue+0xa8/0x230 [ptlrpc]
[ 7661.591142] [<0>] tgt_request_handle+0xd20/0x19c0 [ptlrpc]
[ 7661.592124] [<0>] ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc]
[ 7661.593205] [<0>] ptlrpc_main+0xc52/0x1510 [ptlrpc]
[ 7661.594079] [<0>] kthread+0x134/0x150
[ 7661.594713] [<0>] ret_from_fork+0x35/0x40
[ 8023.549721] Lustre: 10419:0:(client.c:2309:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690417468/real 1690417468]  req@00000000108824be x1772523706976448/t0(0) o6->lustre-OST0007-osc-MDT0001@10.240.43.202@tcp:28/4 lens 544/432 e 18 to 1 dl 1690418069 ref 1 fl Rpc:XQr/202/ffffffff rc -11/-1 uid:0 gid:0 job:'osp-syn-7-1.0'
[ 8023.554643] Lustre: 10419:0:(client.c:2309:ptlrpc_expire_one_request()) Skipped 26 previous similar messages
[ 8023.556231] Lustre: lustre-OST0007-osc-MDT0001: Connection to lustre-OST0007 (at 10.240.43.202@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 8023.558799] Lustre: Skipped 14 previous similar messages
[ 8023.560376] Lustre: lustre-OST0007-osc-MDT0001: Connection restored to  (at 10.240.43.202@tcp)
[ 8023.561793] Lustre: Skipped 16 previous similar messages
[ 8038.421874] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  runtests test_1: @@@@@@ FAIL: copying \/etc \/bin 

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
runtests test_1 - copying /etc /bin



 Comments   
Comment by Etienne Aujames [ 27/Jul/23 ]

The following request (triggering the MDT0001/OSTs reconnection) was sent during "replay-dual test 32" (@1690417468/7423):

[ 8023.549721] Lustre: 10419:0:(client.c:2309:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1690417468/real 1690417468]  req@00000000108824be x1772523706976448/t0(0) o6->lustre-OST0007-osc-MDT0001@10.240.43.202@tcp:28/4 lens 544/432 e 18 to 1 dl 1690418069 ref 1 fl Rpc:XQr/202/ffffffff rc -11/-1 uid:0 gid:0 job:'osp-syn-7-1.0
Generated at Sat Feb 10 03:31:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.