[LU-11036] sanity-lfsck test_8: (16) Fail to start LFSCK for namespace! Created: 18/May/18  Updated: 23/Aug/23  Resolved: 23/Aug/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.13.0, Lustre 2.10.6, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.15.0, Lustre 2.15.3
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Sarah Liu Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: failing_tests, rhel8, ubuntu

Issue Links:
Duplicate
Related
is related to LU-15738 sanity-lfsck test 8: Fail to start LF... Resolved
is related to LU-15781 Ubuntu 22.04 LTS release support Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/980cb918-5937-11e8-b9d3-52540065bddc

test_8 failed with the following error:

(16) Fail to start LFSCK for namespace!

Env: 2.10.4-RC1 EL7.5 server/Ubuntu16.04 client

test log

CMD: onyx-60vm11.onyx.hpdd.intel.com lctl get_param -n at_max
CMD: onyx-61vm2 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.recovery_status |
 awk '/^status/ \{ print \$2 }'
CMD: onyx-61vm2 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
CMD: onyx-61vm2 /usr/sbin/lctl set_param fail_loc=0x1601
fail_loc=0x1601
CMD: onyx-61vm2 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace
onyx-61vm2: Fail to start LFSCK: Operation already in progress
 sanity-lfsck test_8: @@@@@@ FAIL: (16) Fail to start LFSCK for namespace!

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lfsck test_8 - (16) Fail to start LFSCK for namespace!



 Comments   
Comment by Sarah Liu [ 18/May/18 ]

By searching Maloo in the last 4 weeks, seems this issue only seen on b2_10

Comment by Peter Jones [ 18/May/18 ]

Fan Yong

Can you please comment?

Peter

Comment by Sarah Liu [ 20/May/18 ]

+1 on b2_10 https://testing.hpdd.intel.com/test_sets/03190216-5bc7-11e8-b9d3-52540065bddc

Comment by Gerrit Updater [ 21/May/18 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/32473
Subject: LU-11036 lfsck: enable full debug for sanity-lfsck t8
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 81eb432e837c0cfdb371556655212c8e74b58f08

Comment by nasf (Inactive) [ 06/Jun/18 ]

The issue can be reproduced with the debug patch, but because of ATM-1011 we cannot collect Lustre debug logs when the issue happen...

Comment by Bob Glossman (Inactive) [ 11/Jun/18 ]

another on master:
https://testing.hpdd.intel.com/test_sets/e02256a6-6dbc-11e8-a55d-52540065bddc

Comment by Fan Yong [ 15/Jul/18 ]

The  issue can be reproduced with the debug patch, but because of ATM-1011 we cannot collect Lustre debug logs when the issue happen...

Comment by James Nunez (Inactive) [ 17/Jul/18 ]

Note, the problem is that autotest combines all logs for test suites with the same name with the very last test that is run. Please see ATM-13 for more details.

So, logs do exist, you just need to look at the very last test suite with the same name. For example, logs for the sanity-lfsck test 8 failure for test session https://testing.whamcloud.com/test_sessions/ed59ced3-a15c-4d97-92d6-4e42608dae5f are all included at https://testing.whamcloud.com/sub_tests/8a218e5e-87e6-11e8-9e83-52540065bddc.

Comment by Fan Yong [ 08/Aug/18 ]

I do not think I have the permission to check related logs or ATM tickets.

Comment by Jian Yu [ 10/Sep/18 ]

+1 on master branch: https://testing.whamcloud.com/test_sets/853145de-b4ef-11e8-b86b-52540065bddc

Comment by Chris Horn [ 14/May/20 ]

+1 on master https://testing.whamcloud.com/test_sets/c8db4621-f9d4-44b0-b87d-0ba7b5ba25ec

Comment by Olaf Faaland [ 28/Jul/20 ]

+1 on master: https://testing.whamcloud.com/test_sets/f6a24f84-5a92-4a86-9549-8c421de119d9

Comment by Gerrit Updater [ 28/Jul/20 ]

Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/39526
Subject: LU-11036 tests: debug information for sanity-lfsck test_8
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1eebace3b664f82fb3b39cac6331e1afdd62b9e6

Comment by Gerrit Updater [ 26/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/39526/
Subject: LU-11036 tests: debug information for sanity-lfsck test_8
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a72fb33895572d73d6c8e9796c3e9c2ac61b0f93

Comment by Sarah Liu [ 22/Feb/22 ]

+1 on master https://testing.whamcloud.com/test_sets/aa87dbb4-8832-42fc-8073-e635fe0b0178

Comment by Sarah Liu [ 14/Dec/22 ]

+1 on 2.15.2 https://testing.whamcloud.com/test_sets/3b0da7be-576f-4cfd-8b8c-e2f9ec851051

name_hash_repaired: 0
linkea_overflow_cleared: 0
agent_entries_repaired: 0
success_count: 0
run_time_phase1: 5 seconds
run_time_phase2: 0 seconds
average_speed_phase1: 0 items/sec
average_speed_phase2: N/A
average_speed_total: 0 items/sec
real_time_speed_phase1: 0 items/sec
real_time_speed_phase2: N/A
current_position: 20101, [0x200000007:0x1:0x0], 0x52a646ee23883686
 sanity-lfsck test_8: @@@@@@ FAIL: (16) Fail to start LFSCK for namespace!
Comment by Gerrit Updater [ 20/Jul/23 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51720
Subject: LU-11036 test: race in sanity-lfsck test_8
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6fa6c6e66e4bd9967d3408e955fd516d4fc71909

Comment by Gerrit Updater [ 01/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51720/
Subject: LU-11036 test: race in sanity-lfsck test_8
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f1ddb4093ed623c1382f75e47807e4081962cc3d

Comment by James A Simmons [ 23/Aug/23 ]

Patch landed. If more work is needed please reopen.

Generated at Sat Feb 10 02:40:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.