[LU-699] replay-dual test_1 fails to remount mdt Created: 21/Sep/11  Updated: 03/Jun/16  Resolved: 03/Jun/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 1.8.7
Fix Version/s: Lustre 2.2.0

Type: Bug Priority: Minor
Reporter: Minh Diep Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: TCP
ENABLE_QUOTA=yes

Lustre Servers:
Tag: v2_1_0_0_RC2
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.6.1.el6_lustre.g65156ed.x86_64)
Build: http://newbuild.whamcloud.com/job/lustre-master/228/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/
Network: TCP


Issue Links:
Duplicate
is duplicated by LU-1012 replay-vbr: test_1b failure. Resolved
Severity: 3
Rank (Obsolete): 4884

 Description   

v2_1_0_RC2 testing

Report: https://maloo.whamcloud.com/test_sets/396f9254-e440-11e0-9909-52540025f9af

== replay-dual test 1: |X| simple create == 06:21:32 (1316524892)
Filesystem 1K-blocks Used Available Use% Mounted on
fat-intel-1vm3@tcp:/lustre
79724848 3154788 72568440 5% /mnt/lustre
Failing mds on node fat-intel-1vm3
Stopping /mnt/mds (opts
affected facets: mds
df pid is 14339
Failover mds to fat-intel-1vm3
06:21:50 (1316524910) waiting for fat-intel-1vm3 network 900 secs ...
06:21:50 (1316524910) network interface is UP
Starting mds: -o user_xattr,acl /dev/lvm/P0 /mnt/mds
fat-intel-1vm3: mount.lustre: mount /dev/mapper/lvm-P0 at /mnt/mds failed: No such file or directory
fat-intel-1vm3: Is the MGS specification correct?
fat-intel-1vm3: Is the filesystem name correct?
fat-intel-1vm3: If upgrading, is the copied client log valid? (see upgrade docs)
mount -t lustre /dev/lvm/P0 /mnt/mds
Start of /dev/lvm/P0 on mds failed 2
replay-dual test_1: @@@@@@ FAIL: Restart of mds failed!
Dumping lctl log to /logdir/test_logs/2011-09-19/lustre-mixed-el6-x86_64_283_-7f6a2ad2c9e0/replay-dual.test_1.*.1316524910.log

MDS console shows

06:21:42:Lustre: server umount lustre-MDT0000 complete
06:21:51:LDISKFS-fs warning (device dm-0): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
06:21:52:LDISKFS-fs (dm-0): recovery complete
06:21:52:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode
06:21:52:LustreError: 8035:0:(obd_mount.c:294:ldd_parse()) cannot open CONFIGS/mountdata: rc = -2
06:21:52:LustreError: 8035:0:(obd_mount.c:1373:server_kernel_mount()) premount parse options failed: rc = -2
06:21:52:LustreError: 8035:0:(obd_mount.c:1681:server_fill_super()) Unable to mount device /dev/mapper/lvm-P0: -2
06:21:52:LustreError: 8035:0:(obd_mount.c:2160:lustre_fill_super()) Unable to mount (-2)
06:21:52:Lustre: DEBUG MARKER: replay-dual test_1: @@@@@@ FAIL: Restart of mds failed!



 Comments   
Comment by Andreas Dilger [ 21/Sep/11 ]

> 06:21:51:LDISKFS-fs warning (device dm-0): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.

FYI, this message is a complete red herring for the MDT (i.e. unrelated to this or any problem, in case there was any uncertainty), and we should remove it from our ldiskfs filesystems. I don't think that extents on the MDT can possibly help, and will likely hurt performance since directory blocks are allocated one-at-a-time and storing them as extents is less efficient.

Comment by Jian Yu [ 23/Sep/11 ]

Lustre Clients:
Tag: 1.8.6-wc1
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32_131.2.1.el6)
Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el6,ib_stack=inkernel/
Network: TCP (1GigE)
ENABLE_QUOTA=yes

Lustre Servers:
Tag: v2_1_0_0_RC2
Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-131.6.1.el6_lustre)
Build: http://newbuild.whamcloud.com/job/lustre-master/283/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/

replay-dual test passed in manual run: https://maloo.whamcloud.com/test_sets/a60c9bd8-e5c5-11e0-9909-52540025f9af

Comment by nasf (Inactive) [ 18/Oct/11 ]

Another failure on lustre-2.1:

https://maloo.whamcloud.com/test_sets/3055c2c6-f6bd-11e0-a451-52540025f9af

Comment by Jinshan Xiong (Inactive) [ 19/Oct/11 ]

I took a close look at this problem because it may block my IR landings.

Obviously the problem is due to the corruption on lustre_disk_data. When we're modifying lustre_disk_data, we just wait for log to be committed but NOT data to actually write to disk. So in this test case, if the data is not written into disk when we mark the device as readonly, we will be in trouble because we'll lose updated lustre_disk_data.

I'm going to fix this problem by using O_SYNC to update lustre_disk_data if this is hit in the test.

Comment by Jinshan Xiong (Inactive) [ 19/Oct/11 ]

patch is at: http://review.whamcloud.com/1557

Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » i686,client,el5,ofa #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 20/Oct/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Build Master (Inactive) [ 21/Oct/11 ]

Integrated in lustre-master » i686,server,el5,ofa #305
LU-699 tests: replay-dual test_1 failure

Oleg Drokin : 886e67d4fe87d293952a11e7f41b98a8c3abeddd
Files :

  • lustre/obdclass/obd_mount.c
Comment by Jay Lan (Inactive) [ 30/Apr/12 ]

I seemed to hit the data corruption problem in REPLAY_DUAL test 16 and 20.

Why did this ticket not marked RESOLVED?

Comment by Jay Lan (Inactive) [ 01/May/12 ]

I rebuilt the lustre server with this patch. It still failed, on a different way
though.

Generated at Sat Feb 10 01:09:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.