[LU-10012] conf-sanity test_50c: test_50c returned 1 Created: 20/Sep/17  Updated: 21/Sep/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/fdfcb666-9c9e-11e7-ba27-5254006e85c2.

The sub-test test_50c failed with the following error:

test_50c returned 1

test log shows:

CMD: trevis-45vm7 zfs get -H -o value 				lustre:svname lustre-mdt1/mdt1 2>/dev/null | 				grep -E ':[a-zA-Z]{3}[0-9]{4}'
CMD: trevis-45vm7 zfs get -H -o value 				lustre:svname lustre-mdt1/mdt1 2>/dev/null | 				grep -E ':[a-zA-Z]{3}[0-9]{4}'
Commit the device label on lustre-mdt1/mdt1
CMD: trevis-45vm7 sync; sleep 1; sync
CMD: trevis-45vm7 zfs get -H -o value lustre:svname 		                           lustre-mdt1/mdt1 2>/dev/null
no label for lustre-mdt1/mdt1


 Comments   
Comment by Andreas Dilger [ 21/Sep/17 ]

Earlier in the logs for this test, we can see that lustre-mdt1/mdt1 and the OSTs are mounted and functional, but after unmount all of the devices appear to have been erased:

CMD: trevis-45vm7 tunefs.lustre --quiet --writeconf lustre-mdt1/mdt1
trevis-45vm7: tunefs.lustre FATAL: Device lustre-mdt1/mdt1 has not been formatted with mkfs.lustre
trevis-45vm7: tunefs.lustre: exiting with 19 (No such device)
checking for existing Lustre data: not found
CMD: trevis-45vm8 tunefs.lustre --quiet --writeconf lustre-ost1/ost1
trevis-45vm8: tunefs.lustre FATAL: Device lustre-ost1/ost1 has not been formatted with mkfs.lustre
trevis-45vm8: tunefs.lustre: exiting with 19 (No such device)
checking for existing Lustre data: not found
CMD: trevis-45vm8 tunefs.lustre --quiet --writeconf lustre-ost2/ost2
trevis-45vm8: tunefs.lustre FATAL: Device lustre-ost2/ost2 has not been formatted with mkfs.lustre

The test then appears to try and reformat the MDT filesystem, which appears to succeed before hitting the error:

Format mds1: lustre-mdt1/mdt1
CMD: trevis-45vm7 mkfs.lustre --mgs --fsname=lustre --mdt --index=0 --param=sys.timeout=20 --param=lov.stripesize=1048576 --param=lov.stripecount=0 --param=mdt.identity_upcall=/usr/sbin/l_getidentity --backfstype=zfs --device-size=200000 --reformat lustre-mdt1/mdt1 /dev/lvm-Role_MDS/P1

   Permanent disk data:
Target:     lustre:MDT0000
Index:      0
Lustre FS:  lustre
Mount type: zfs
Flags:      0x65
              (MDT MGS first_time update )
Persistent mount opts: 
Parameters: sys.timeout=20 lov.stripesize=1048576 lov.stripecount=0 mdt.identity_upcall=/usr/sbin/l_getidentity
mkfs_cmd = zpool create -f -O canmount=off lustre-mdt1 /dev/lvm-Role_MDS/P1
mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1
Writing lustre-mdt1/mdt1 properties
  lustre:sys.timeout=20
  lustre:lov.stripesize=1048576
  lustre:lov.stripecount=0
  lustre:mdt.identity_upcall=/usr/sbin/l_getidentity
  lustre:version=1
  lustre:flags=101
  lustre:index=0
  lustre:fsname=lustre
  lustre:svname=lustre:MDT0000

Looking into the MDS dmesg log it reports:
https://testing.hpdd.intel.com/test_logs/033a3e1e-9c9f-11e7-ba27-5254006e85c2/show_text

WARNING: Pool 'lustre-mdt1' has encountered an uncorrectable I/O failure and has been suspended.
Comment by John Hammond [ 21/Sep/17 ]

Can the console logs from the test VMs be reconstructed? They seem to be missing on maloo. Also, it would be useful to look at the messages from the host node (trevis-45) from the same time period (for any IO errors or out of space messages).

Generated at Sat Feb 10 02:31:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.