Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.4.0, Lustre 2.1.5, Lustre 1.8.8
-
None
-
Lustre Branch: b1_8
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/238
Test Group: failover
-
3
-
6040
Description
While running replay-vbr with failover configuration and hard failure mode (power off and on), test 0c failed as follows:
== replay-vbr test 0c: VBR: open (non O_CREAT) does not checks versions ============================== 11:36:50 (1357673810) CMD: client-32vm3 /usr/sbin/lctl set_param *.lustre-MDT0000.sync_permission=0 client-32vm3: error: set_param: /proc/{fs,sys}/{lnet,lustre}/*/lustre-MDT0000/sync_permission: Found no match CMD: client-32vm1.lab.whamcloud.com mkdir -p -m 755 /mnt/lustre/d0.replay-vbr/d0 CMD: client-32vm1.lab.whamcloud.com openfile -f O_RDWR:O_CREAT -m 0644 /mnt/lustre/d0.replay-vbr/d0/f0c Succeed in opening file "/mnt/lustre/d0.replay-vbr/d0/f0c"(flags=O_RDWR, mode=644) CMD: client-32vm3 sync Filesystem 1K-blocks Used Available Use% Mounted on client-32vm3:client-32vm7:/lustre 36535940 1696868 32982672 5% /mnt/lustre CMD: client-32vm3 /usr/sbin/lctl --device %lustre-MDT0000 notransno client-32vm3: opening /dev/obd failed: No such device client-32vm3: hint: the kernel modules may not be loaded client-32vm3: No device found for name lustre-MDT0000: No such device CMD: client-32vm3 /usr/sbin/lctl --device %lustre-MDT0000 readonly client-32vm3: opening /dev/obd failed: No such device client-32vm3: hint: the kernel modules may not be loaded client-32vm3: No device found for name lustre-MDT0000: No such device CMD: client-32vm3 /usr/sbin/lctl mark mds REPLAY BARRIER on lustre-MDT0000 client-32vm3: opening /dev/lnet failed: No such device client-32vm3: hint: the kernel modules may not be loaded client-32vm3: IOC_LIBCFS_MARK_DEBUG failed: No such device CMD: client-32vm5 chmod 777 /mnt/lustre/d0.replay-vbr/d0 CMD: client-32vm5 chmod 666 /mnt/lustre/d0.replay-vbr/d0/f0c CMD: client-32vm1.lab.whamcloud.com rm -f /tmp/multiop_bg.pid.16759 && MULTIOP_PID_FILE=/tmp/multiop_bg.pid.16759 LUSTRE= runmultiop_bg_pause /mnt/lustre/d0.replay-vbr/d0/f0c o_c multiop /mnt/lustre/d0.replay-vbr/d0/f0c vo_c TMPPIPE=/tmp/multiop_open_wait_pipe.20009 CMD: client-32vm1.lab.whamcloud.com cat /tmp/multiop_bg.pid.16759 node client-32vm1.lab.whamcloud.com multiop_bg started multiop_pid=20017 CMD: client-32vm5 grep -c /mnt/lustre' ' /proc/mounts Stopping client client-32vm5 /mnt/lustre (opts:) CMD: client-32vm5 lsof -t /mnt/lustre CMD: client-32vm5 umount /mnt/lustre 2>&1 Failing mds on node client-32vm3 CMD: client-32vm3 lctl dl client-32vm3: error: dl: No such file or directory opening /proc/fs/lustre/devices client-32vm3: opening /dev/obd failed: No such device client-32vm3: hint: the kernel modules may not be loaded client-32vm3: Error getting device list: No such device: check dmesg. + pm -h powerman --off client-32vm3 pm: warning: server version (2.3.5) != client (2.3.12) Command completed successfully affected facets: + pm -h powerman --on client-32vm3 pm: warning: server version (2.3.5) != client (2.3.12) Command completed successfully df pid is 20158 CMD: hostname pdsh@client-32vm1: gethostbyname("hostname") failed
The MDS failover pair is client-32vm3 and client-32vm7. And in the previous test 0b, MDS client-32vm3 has failed over to client-32vm7. So, in test 0c, the active MDS should be client-32vm7 instead of client-32vm3.
This is a test script issue under failover configuration.