[LU-8133] ost-pools are not destroyed if test-case using ost-pools fail. Created: 12/May/16  Updated: 01/Mar/17  Resolved: 01/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Bhagyesh Dudhediya (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-8952 Handling test specific cleanup of ost... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It is found that if the test-case involving ost-pools fails, the pools created in the test-case is not destroyed.
Following is the result when I purposely made sanity/test_220 to fail and found out that pool test_220 still exists in the system.

== sanity test 220: preallocated MDS objects still used if ENOSPC from OST == 15:27:51 (1463047071)
pdsh@Seagate: 169.254.90.7: ssh exited with exit code 3
UUID                      Inodes       IUsed       IFree IUse% Mounted on
lustre-MDT0000_UUID       100000         218       99782   0% /mnt/lustre[MDT:0]
lustre-OST0000_UUID        50016       50016           0 100% /mnt/lustre[OST:0]
lustre-OST0001_UUID        50016       50016           0 100% /mnt/lustre[OST:1]

filesystem summary:          218         218           0 100% /mnt/lustre

fail_val=-1
fail_loc=0x229
169.254.90.6: Pool lustre.test_220 created
169.254.90.6: poolname is empty
169.254.90.6: argument lustre. must be <fsname>.<poolname>
169.254.90.6: pool_add: Invalid argument
pdsh@Seagate: 169.254.90.6: ssh exited with exit code 22
 sanity test_220: @@@@@@ FAIL: test_220 failed with 2 
  Trace dump:
  = /root/Desktop/code/lustre-wc-rel/lustre/tests/test-framework.sh:4673:error()
  = /root/Desktop/code/lustre-wc-rel/lustre/tests/test-framework.sh:4933:run_one()
  = /root/Desktop/code/lustre-wc-rel/lustre/tests/test-framework.sh:4969:run_one_logged()
  = /root/Desktop/code/lustre-wc-rel/lustre/tests/test-framework.sh:4775:run_test()
  = sanity.sh:11787:main()
Dumping lctl log to /tmp/test_logs/1463047063/sanity.test_220.*.1463047083.log
169.254.90.7: ssh: Could not resolve hostname Seagate: Temporary failure in name resolution
169.254.90.7: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
169.254.90.7: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
pdsh@Seagate: 169.254.90.7: ssh exited with exit code 12
169.254.90.6: ssh: Could not resolve hostname Seagate: Temporary failure in name resolution
169.254.90.6: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
169.254.90.6: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
pdsh@Seagate: 169.254.90.6: ssh exited with exit code 12
169.254.90.8: ssh: Could not resolve hostname Seagate: Temporary failure in name resolution
169.254.90.8: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
169.254.90.8: rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]
pdsh@Seagate: 169.254.90.8: ssh exited with exit code 12
test_220 returned 1
FAIL 220 (13s)
== sanity test complete, duration 21 sec == 15:28:04 (1463047084)
debug=super ioctl neterror warning dlmtrace error emerg ha rpctrace vfstrace config console lfsck
[root@Seagate tests]# lctl pool_list lustre 
Pools from lustre:
lustre.test_220       <==pool still exists

Similar is the case for other tests like replay-single/test_85b, etc.



 Comments   
Comment by Andreas Dilger [ 12/May/16 ]

The proper way to handle this is to create a cleanup_220() function that does the test cleanup, and add an exit trap so that it is run even when the test fails. It turns out there is an existing cleanup_pools() helper function that could be used for this, just need to set ${FSNAME}_CREATED_POOLS=$TESTNAME.

Comment by Bhagyesh Dudhediya (Inactive) [ 13/May/16 ]

Hello Andreas,
Thanks a lot for your time.
A pool is added to the list ${FSNAME}_CREATED_POOLS only when pool is created by calling create_pool_nofail() or a function which internally calls create_pool(). In such cases cleanup_pools() help in cleaning those pools.
However, if pools are created using lctl commands as seen in sanity/test_220, replay-single/test_85b, cleanup_pools() will not help.
I guess destroy_pool_int() can do the job in such cases. Also we will have to figure out all the test-cases and have a cleanup function(as mentioned by you) for those if not already present.
Do correct me if I am missing anything.
Thanks!

Comment by Andreas Dilger [ 01/Mar/17 ]

There is already a patch under LU-8952 to fix this.

Generated at Sat Feb 10 02:14:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.