[LU-12848] Add test case for LU-11549 Created: 11/Oct/19  Updated: 25/Aug/21  Resolved: 25/Aug/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Alexander Zarochentsev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11549 Unattached inodes after 3 min racer run. Resolved
is related to LU-13346 Fix link and rename race on zfs odb Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The test case in patch https://review.whamcloud.com/35991 "LU-11549 tests: link succeded to an ophan remote object" does not currently work. It needs to be updated so that the OBD_RACE() condition is hit reliably.



 Comments   
Comment by Alexander Zarochentsev [ 20/Nov/19 ]

the test reveals problems with ZFS backend (it is test #104 in our branch and test #105 in the patch for master):

== sanityn test 104: rename to an open file and link race should not cause fs corruption ============= 15:06:58 (1574089618)
fail_loc=0x8000018a
/usr/lib64/lustre/tests/sanityn.sh: line 4633: 17769 Terminated              $MULTIOP $DIR2/$tdir/mdt0dir/foodir/file2 Ow4096_c
rm: cannot remove '/mnt/lustre/d104.sanityn/mdt1dir/file2x': No such file or directory
 sanityn test_104: @@@@@@ FAIL: Removing test dir failed 
  Trace dump:
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:5988:error()
  = /usr/lib64/lustre/tests/sanityn.sh:4634:test_104()
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:6272:run_one()
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:6311:run_one_logged()
  = /usr/lib64/lustre/tests/../tests/test-framework.sh:6107:run_test()
  = /usr/lib64/lustre/tests/sanityn.sh:4636:main()
Dumping lctl log to /tmp/test_logs/1574089606/sanityn.test_104.*.1574089621.log
Resetting fail_loc on all nodes...done.
FAIL 104 (5s)
sanityn: FAIL: test_104 Removing test dir failed
Dumping lctl log to /tmp/test_logs/1574089606/sanityn..*.1574089624.log
Resetting fail_loc on all nodes...done.

the same failure seen in Oleg's testing http://testing.linuxhacker.ru:3333/lustre-reports/4579/results.html :

== sanityn test 105: A racy rename/link an open file should not cause fs corruption ================== 13:15:42 (1574273742)
fail_loc=0x8000018a
/home/green/git/lustre-release/lustre/tests/sanityn.sh: line 4905: 10460 Terminated              $MULTIOP $DIR2/$tdir/mdt0dir/foodir/file2 Ow4096_c
rm: cannot remove '/mnt/lustre/d105.sanityn/mdt1dir/file2x': No such file or directory
 sanityn test_105: @@@@@@ FAIL: Removing test dir failed 
  Trace dump:
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6108:error()
  = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4906:test_105()
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6410:run_one()
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6449:run_one_logged()
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6280:run_test()
  = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4908:main()
Dumping lctl log to /tmp/testlogs//sanityn.test_105.*.1574273747.log
oleg256-server: Warning: Permanently added 'oleg256-client.virtnet' (ECDSA) to the list of known hosts.
oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.debug_log.oleg256-server.1574273747.log.Gd6d36" failed: Operation not permitted (1)
oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.dmesg.oleg256-server.1574273747.log.uwPdAz" failed: Operation not permitted (1)
oleg256-server: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
pdsh@oleg256-client: oleg256-server: ssh exited with exit code 23
Resetting fail_loc on all nodes...done.
FAIL 105 (6s)
cleanup: ======================================================
== sanityn test complete, duration 21 sec ============================================================ 13:15:51 (1574273751)
sanityn: FAIL: test_105 Removing test dir failed
rm: cannot remove '/mnt/lustre/d105.sanityn/mdt1dir': Directory not empty
 sanityn test_105: @@@@@@ FAIL: remove sub-test dirs failed 
  Trace dump:
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6108:error()
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:5593:check_and_cleanup_lustre()
  = /home/green/git/lustre-release/lustre/tests/sanityn.sh:4920:main()
Dumping lctl log to /tmp/testlogs//sanityn.test_105.*.1574273752.log
oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.debug_log.oleg256-server.1574273752.log.65aqHO" failed: Operation not permitted (1)
oleg256-server: rsync: chown "/tmp/testlogs/.sanityn.test_105.dmesg.oleg256-server.1574273752.log.rhRRav" failed: Operation not permitted (1)
oleg256-server: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
pdsh@oleg256-client: oleg256-server: ssh exited with exit code 23

I believe it means an fs corruption, but ZFS has no tool to check it.

Comment by Alexander Zarochentsev [ 20/Nov/19 ]

sanityN 105 test which illustrates the problem https://review.whamcloud.com/#/c/35991/

Comment by Gerrit Updater [ 25/Aug/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/35991/
Subject: LU-12848 tests: link succeded to an ophan remote object
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dcad0502b5682ab76ce4456573dc7060bcce7da0

Comment by Peter Jones [ 25/Aug/21 ]

Landed for 2.15

Generated at Sat Feb 10 02:56:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.