[LU-15738] sanity-lfsck test 8: Fail to start LFSCK: Operation already in progress - script issue? Created: 12/Apr/22  Updated: 10/Oct/23  Resolved: 10/Oct/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11036 sanity-lfsck test_8: (16) Fail to sta... Resolved
is related to LU-10848 sanity-lfsck test_8: Expect 'scanned-... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/11ebd1ed-9cbe-4b23-a973-b112d6d46418

The failure is is sanity-fsck test 8.

Perhaps a script issue, appears the LFSCK starts auto-magically somehow before the script can trigger it:

CMD: onyx-72vm10 e2label /dev/mapper/mds1_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
pdsh@onyx-209vm7: onyx-72vm10: ssh exited with exit code 1
CMD: onyx-72vm10 e2label /dev/mapper/mds1_flakey 2>/dev/null
CMD: onyx-209vm7.onyx.whamcloud.com lctl get_param -n at_max
CMD: onyx-72vm10 /usr/sbin/lctl get_param -n 			mdt.lustre-MDT0000.recovery_status |
			awk '/^status/ { print \$2 }'
CMD: onyx-72vm10 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
CMD: onyx-72vm10 /usr/sbin/lctl set_param fail_loc=0x1601
fail_loc=0x1601
CMD: onyx-72vm10 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace
onyx-72vm10: Fail to start LFSCK: Operation already in progress
pdsh@onyx-209vm7: onyx-72vm10: ssh exited with exit code 114
CMD: onyx-72vm10 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
.............
current_position: 20101, [0x200000007:0x1:0x0], 0x50a44f29b1655d78
 sanity-lfsck test_8: @@@@@@ FAIL: (16) Fail to start LFSCK for namespace! 


 Comments   
Comment by Andreas Dilger [ 18/Apr/22 ]

When you are not busy, can you please take a look.

Is this a test script issue, or a change caused by a recent patch?

Comment by Gerrit Updater [ 22/Jul/22 ]

"Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48018
Subject: LU-15738 test: check lfsck status before starting
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4c408e38840f32e450164b02e8942570e35af43d

Comment by Gerrit Updater [ 10/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48018/
Subject: LU-15738 test: check lfsck status before starting
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 29aaf679afac89359e1b116b8de0480f24b4e8ac

Comment by Peter Jones [ 10/Oct/22 ]

Landed for 2.16

Generated at Sat Feb 10 03:20:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.