[LU-2862] 2.1.4<->2.4.0 interop: replay-dual test_11: rm: cannot remove `/mnt/lustre/f11-[1-5]': No such file or directory Created: 25/Feb/13  Updated: 03/Jun/16  Resolved: 03/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.1.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: WC Triage
Resolution: Incomplete Votes: 0
Labels: mq213, yuc2
Environment:

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/176
Lustre master server build: http://build.whamcloud.com/job/lustre-master/1269
Distro/Arch: RHEL6.3/x86_64


Severity: 3
Rank (Obsolete): 6928

 Description   

replay-dual test_11 failed as follows:

Starting mds1: -o loop,user_xattr,acl  /dev/lvm-MDS/P1 /mnt/mds1
CMD: client-19vm3 mkdir -p /mnt/mds1; mount -t lustre -o loop,user_xattr,acl  /dev/lvm-MDS/P1 /mnt/mds1
CMD: client-19vm3 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin: NAME=autotest_config sh rpc.sh set_default_debug \"-1\" \" 0xffb7e3ff\" 2 
CMD: client-19vm3 e2label /dev/lvm-MDS/P1
Started lustre-MDT0000
CMD: client-19vm3 lctl set_param fail_loc=0
fail_loc=0
rm: cannot remove `/mnt/lustre/f11-[1-5]': No such file or directory
 replay-dual test_11: @@@@@@ FAIL: test_11 failed with 1

Console log on the client node client-19vm1 showed that:

08:16:21:Lustre: lustre-MDT0000-mdc-ffff880033305000: Connection restored to lustre-MDT0000 (at 10.10.4.222@tcp)
08:16:21:Lustre: Skipped 12 previous similar messages
08:16:21:Lustre: 8555:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1361549717/real 1361549717]  req@ffff88004306b800 x1427685198042724/t51539607554(51539607554) o36->lustre-MDT0000-mdc-ffff880075efa000@10.10.4.222@tcp:12/10 lens 488/416 e 1 to 1 dl 1361549778 ref 2 fl Rpc:X/4/ffffffff rc 0/-1
08:16:21:Lustre: 8555:0:(client.c:1817:ptlrpc_expire_one_request()) Skipped 18 previous similar messages
08:16:21:LustreError: 8555:0:(client.c:2576:ptlrpc_replay_interpret()) request replay timed out, restarting recovery
08:16:21:LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
08:16:22:LustreError: 28715:0:(mdc_locks.c:736:mdc_enqueue()) ldlm_cli_enqueue: -4
08:16:22:LustreError: 28715:0:(dir.c:423:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at 0: rc -4
08:16:22:LustreError: 28715:0:(dir.c:648:ll_readdir()) error reading dir [0x200000007:0x1:0x0] at 0: rc -4
08:16:22:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-dual test_11: @@@@@@ FAIL: test_11 failed with 1

Console log on the MDS client-19vm3 showed that:

08:15:15:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o loop,user_xattr,acl  /dev/lvm-MDS/P1 /mnt/mds1
08:15:15:LDISKFS-fs (loop0): recovery complete
08:15:15:LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quota=on. Opts: 
08:15:16:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
08:15:16:LNet: 11113:0:(debug.c:324:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
08:15:16:LNet: 11113:0:(debug.c:324:libcfs_debug_str2mask()) Skipped 2 previous similar messages
08:15:16:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1
08:15:47:Lustre: *** cfs_fail_loc=119, val=2147483648***
08:15:47:LustreError: 10976:0:(ldlm_lib.c:2422:target_send_reply_msg()) @@@ dropping reply  req@ffff880053917850 x1427685198042724/t51539607554(51539607554) o36->2c21c195-e62a-2d57-459b-4cc847aed904@10.10.4.220@tcp:0/0 lens 488/448 e 1 to 0 dl 1361549749 ref 1 fl Complete:/4/0 rc 0/0
08:15:58:Lustre: DEBUG MARKER: lctl set_param fail_loc=0
08:16:20:Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
08:16:20:Lustre: lustre-MDT0000: disconnecting 1 stale clients
08:16:20:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-dual test_11: @@@@@@ FAIL: test_11 failed with 1 

Maloo report: https://maloo.whamcloud.com/test_sets/a19d48ee-7d78-11e2-85d0-52540035b04c



 Comments   
Comment by Jian Yu [ 13/Mar/13 ]

Lustre b2_1 client build: http://build.whamcloud.com/job/lustre-b2_1/186
Lustre master server build: http://build.whamcloud.com/job/lustre-master/1302
Distro/Arch: RHEL6.3/x86_64

The replay-dual test 11 passed: https://maloo.whamcloud.com/test_sets/1a958dc8-8b58-11e2-965f-52540035b04c

Comment by Sarah Liu [ 22/Apr/13 ]

another failure: https://maloo.whamcloud.com/test_sets/8d87aa7a-a78a-11e2-b3cc-52540035b04c

Comment by Jian Yu [ 14/Aug/13 ]

Lustre client build: http://build.whamcloud.com/job/lustre-b2_1/215/ (2.1.6)
Lustre server build: http://build.whamcloud.com/job/lustre-b2_4/29/

replay-dual test 11 hit the same failure:
https://maloo.whamcloud.com/test_sets/00eecdfe-0479-11e3-a8e9-52540035b04c

Generated at Sat Feb 10 01:28:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.