[LU-15310] sanity test_160g: mds1: User cl7 not registered Created: 02/Dec/21  Updated: 19/Jun/23  Resolved: 19/Jun/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.7, Lustre 2.15.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Mikhail Pershin
Resolution: Duplicate Votes: 0
Labels: failing_tests

Issue Links:
Duplicate
duplicates LU-14893 'lctl --device scratch-MDT0000 change... Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Chris Horn <hornc@cray.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/6bbb92ce-2b80-41ac-bc9a-abd4ef50c767

test_160g failed with the following error:

mds1: User cl7 not registered
== sanity test 160g: changelog garbage collect (old users) =========================================== 10:15:50 (1638353750)
CMD: onyx-78vm18 /usr/sbin/lctl set_param fail_loc=0x1314
fail_loc=0x1314
CMD: onyx-78vm18 /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n
CMD: onyx-78vm18 /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm
mdd.lustre-MDT0000.changelog_mask=+hsm
CMD: onyx-78vm18 /usr/sbin/lctl --device lustre-MDT0000 changelog_register -n
Registered 1 changelog users: 'cl7'
CMD: onyx-78vm18 /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n
CMD: onyx-78vm18 /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm
mdd.lustre-MDT0000.changelog_mask=+hsm
CMD: onyx-78vm18 /usr/sbin/lctl --device lustre-MDT0000 changelog_register -n
Registered 1 changelog users: 'cl7 cl8'
total: 2 create in 0.00 seconds: 773.43 ops/second
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_max_idle_indexes
CMD: onyx-78vm18 /usr/sbin/lctl set_param mdd.*.changelog_max_idle_indexes=0
mdd.lustre-MDT0000.changelog_max_idle_indexes=0
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_gc
CMD: onyx-78vm18 /usr/sbin/lctl set_param mdd.*.changelog_gc=1
mdd.lustre-MDT0000.changelog_gc=1
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_min_gc_interval
CMD: onyx-78vm18 /usr/sbin/lctl set_param mdd.*.changelog_min_gc_interval=2
mdd.lustre-MDT0000.changelog_min_gc_interval=2
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_min_free_cat_entries
CMD: onyx-78vm18 /usr/sbin/lctl set_param mdd.*.changelog_min_free_cat_entries=3
mdd.lustre-MDT0000.changelog_min_free_cat_entries=3
CMD: onyx-78vm18 /usr/sbin/lctl set_param fail_loc=0x1313 fail_val=3
fail_loc=0x1313
fail_val=3
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users
lustre-MDT0000: clear the changelog for cl7 to record #31
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users
mds1: verifying user cl7 clear:  29 + 2 == 31
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users
total: 2 create in 0.01 seconds: 306.37 ops/second
CMD: onyx-78vm18 ps -e -o comm= | grep chlg_gc_thread
pdsh@onyx-78vm15: onyx-78vm18: ssh exited with exit code 1
CMD: onyx-78vm18 ps -e -o comm= | grep chlg_gc_thread
pdsh@onyx-78vm15: onyx-78vm18: ssh exited with exit code 1
CMD: onyx-78vm18 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users
 sanity test_160g: @@@@@@ FAIL: mds1: User cl7 not registered 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5919:error()
  = /usr/lib64/lustre/tests/sanity.sh:13713:test_160g()
  = /usr/lib64/lustre/tests/test-framework.sh:6222:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:6271:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:6111:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:13731:main()

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_160g - mds1: User cl7 not registered



 Comments   
Comment by Andreas Dilger [ 03/Dec/21 ]

It looks like this test was for a patch on b2_12, so I put 2.12.7, but please correct it if this is wrong. There were definitely some fixes to test_160g in master (LU-14058) and to the changelog expiry itself (LU-14699), so this may already be fixed.

Comment by Colin Faber [ 28/Sep/22 ]

Hi aioffe 

When you have some cycles, can you please take a look?

Thank you!

Comment by Alexandre Ioffe [ 21/Dec/22 ]

Colin, it is related to changelog.  Could you assign it to tappro ?

Comment by Colin Faber [ 21/Dec/22 ]

tappro can you take a look? Thank you!

Comment by Andreas Dilger [ 19/Jun/23 ]

This looks like an interop test failure due to LU-14893.

Generated at Sat Feb 10 03:17:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.