[LU-14087] sanity-hsm test 254b fails with 'Expected 0 (!= '60') active restore requests' Created: 29/Oct/20  Updated: 11/Aug/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

RHEL8.2


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-hsm test_254b fails for el8.2 with “'Expected 0 (!= '60') active restore requests”

Looking at the failure at https://testing.whamcloud.com/test_sets/39cda0dc-b495-4af9-b0ca-757042d6fd3a, we see the following in the suite_log

== sanity-hsm test 254b: Request counters are correctly incremented and decremented ================== 01:46:54 (1603849614)
Will launch 60 requests of each type
CMD: trevis-4vm6 mkdir -p /tmp/arc1/sanity-hsm.test_254b/
Starting copytool agt1 on trevis-4vm6
CMD: trevis-4vm6 lhsmtool_posix  --daemon --hsm-root "/tmp/arc1/sanity-hsm.test_254b/" "/mnt/lustre2" < /dev/null > "/autotest/autotest-1/2020-10-27/lustre-reviews_review-dne-part-2_77301_1_104_721734b9-fe3f-443d-bbbf-9f1c00a88e0e/sanity-hsm.test_254b.copytool_log.trevis-4vm6.log" 2>&1
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.max_requests
CMD: trevis-4vm8 /usr/sbin/lctl set_param -n mdt.lustre-MDT0000.hsm.max_requests=60
CMD: trevis-4vm9 /usr/sbin/lctl set_param -n mdt.lustre-MDT0001.hsm.max_requests=60
CMD: trevis-4vm8 /usr/sbin/lctl set_param -n mdt.lustre-MDT0002.hsm.max_requests=60
CMD: trevis-4vm9 /usr/sbin/lctl set_param -n mdt.lustre-MDT0003.hsm.max_requests=60
Checking archive requests
CMD: trevis-4vm6 libtool execute pkill -STOP -x lhsmtool_posix
Copytool is suspended on trevis-4vm6
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.archive_count
CMD: trevis-4vm6 libtool execute pkill -CONT -x lhsmtool_posix
Copytool is continued on trevis-4vm6
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.archive_count
Checking restore requests
CMD: trevis-4vm6 libtool execute pkill -STOP -x lhsmtool_posix
Copytool is suspended on trevis-4vm6
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.restore_count
CMD: trevis-4vm6 libtool execute pkill -CONT -x lhsmtool_posix
Copytool is continued on trevis-4vm6
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions
CMD: trevis-4vm8 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.restore_count
 sanity-hsm test_254b: @@@@@@ FAIL: Expected 0 (!= '60')  active restore requests 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6254:error()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:4363:test_254b()

There is nothing obviously wrong in the copytool log nor in the console logs.

We’ve seen this at least once before:
https://testing.whamcloud.com/test_sets/8d8131a1-8b0e-4260-ae5a-ebc98157ca2d



 Comments   
Comment by Nikitas Angelinas [ 19/Oct/22 ]

+1 on master with archive requests: https://testing.whamcloud.com/test_sets/2b45cc81-2e99-45e9-a150-b97c2aa266a4

Comment by Arshad Hussain [ 29/Jun/23 ]

+1 on master (https://testing.whamcloud.com/sub_tests/45a59b3f-d5c3-4288-a4e5-4e497eb75e47)

Generated at Sat Feb 10 03:06:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.