Details
-
Bug
-
Resolution: Not a Bug
-
Minor
-
None
-
Lustre 2.9.0
-
CentOS 7 in a Hyper-V vm
-
1
-
3
-
9223372036854775807
Description
I'm trying to build a draid based OST for lustre.
Initially created as [https://github.com/thegreatgazoo/zfs/issues/2|thegreatgazoo/zfs issue].
Generic lustre MGS/MDT is up'n'running.
Fresh VM (4 CPU, 4G RAM) with CentOS 7 "minimal" install and 18 scsi disks (images).
Perform yum -y update ; reboot, then run setup-node.sh NODE from workstation.
Ssh to the NODE and run ./mkzpool.sh:
[root@node26 ~]# ./mkzpool.sh + zpool list + grep -w 'no pools available' + zpool destroy oss3pool + zpool list + grep -w 'no pools available' no pools available + '[' -f 17.nvl ']' + draidcfg -r 17.nvl dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare Using 32 base permutations 15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0, 5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16, 10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8, 13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10, 13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15, 8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10, 16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3, 5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13, 4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1, 10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15, 2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14, 15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13, 1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15, 3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5, 15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11, 14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6, 7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15, 16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8, 0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13, 8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1, 9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1, 8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16, 5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2, 15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3, 15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14, 15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6, 15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1, 0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13, 7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4, 14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2, 9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16, 4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15, + zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr + zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT oss3pool 14,7G 612K 14,7G - 0% 0% 1.00x ONLINE - + zpool status pool: oss3pool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM oss3pool ONLINE 0 0 0 draid1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0 sdn ONLINE 0 0 0 sdo ONLINE 0 0 0 sdp ONLINE 0 0 0 sdq ONLINE 0 0 0 sdr ONLINE 0 0 0 spares $draid1-0-s0 AVAIL $draid1-0-s1 AVAIL errors: No known data errors + grep oss3pool + mount oss3pool on /oss3pool type zfs (rw,xattr,noacl) + mkfs.lustre --reformat --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01 Permanent disk data: Target: ZFS01:OST0003 Index: 3 Lustre FS: ZFS01 Mount type: zfs Flags: 0x62 (OST first_time update ) Persistent mount opts: Parameters: mgsnode=172.17.32.220@tcp mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01 Writing oss3pool/ZFS01 properties lustre:version=1 lustre:flags=98 lustre:index=3 lustre:fsname=ZFS01 lustre:svname=ZFS01:OST0003 lustre:mgsnode=172.17.32.220@tcp + '[' -d /lustre/ZFS01/. ']' + mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = oss3pool/ZFS01 arg[5] = /lustre/ZFS01 source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01 options = rw checking for existing Lustre data: found Writing oss3pool/ZFS01 properties lustre:version=1 lustre:flags=34 lustre:index=3 lustre:fsname=ZFS01 lustre:svname=ZFS01:OST0003 lustre:mgsnode=172.17.32.220@tcp mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,virgin,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01 mount.lustre: mount oss3pool/ZFS01 at /lustre/ZFS01 failed: Address already in use retries left: 0 mount.lustre: mount oss3pool/ZFS01 at /lustre/ZFS01 failed: Address already in use The target service's index is already in use. (oss3pool/ZFS01) [root@node26 ~]# mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = oss3pool/ZFS01 arg[5] = /lustre/ZFS01 source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01 options = rw checking for existing Lustre data: found mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,virgin,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
Attachments
- collect-info.sh
- 1 kB
- mkzpool.sh.txt
- 0.7 kB
Activity
I understand, I have been there before. Please let us know if we can close this ticket. If you have any dRAID testing questions please keep in contact with us.
Wow!
It was a Hyper-V host out of my control...
Thanks for the point, went to dig.
Looks like you are getting further in the process but failing early in the mount process due to underlying errors:
[ 1946.710418] LNet: HW CPU cores: 4, npartitions: 1 [ 1946.718890] alg: No test for adler32 (adler32-zlib) [ 1946.719309] alg: No test for crc32 (crc32-table) [ 1951.762047] sha512_ssse3: Using AVX optimized SHA-512 implementation [ 1954.979635] Lustre: Lustre: Build Version: 2.9.0 [ 1955.302507] LNet: Added LNI 172.17.32.226@tcp [8/256/0/180] [ 1955.302711] LNet: Accept secure, port 988 [ 1959.056006] GPT:disk_guids don't match. [ 1959.056034] GPT:partition_entry_array_crc32 values don't match: 0x5d3c877c != 0x443a8464 [ 1959.056037] GPT: Use GNU Parted to correct GPT errors. [ 1959.056059] sdb: sdb1 sdb9 [ 1959.230444] sdb: sdb1 sdb9 [ 1959.406374] GPT:disk_guids don't match. [ 1959.406384] GPT:partition_entry_array_crc32 values don't match: 0x29fa53ae != 0xa19347b0 [ 1959.406387] GPT: Use GNU Parted to correct GPT errors. [ 1959.406408] sdc: sdc1 sdc9 [ 1959.610229] sdc: sdc1 sdc9 [ 1959.821418] Alternate GPT is invalid, using primary GPT. [ 1959.821444] sdd: sdd1 sdd9 [ 1959.903091] sdd: sdd1 sdd9 [ 1960.088271] GPT:disk_guids don't match. [ 1960.088279] GPT:partition_entry_array_crc32 values don't match: 0xda543dc7 != 0xdb0d75f4 [ 1960.088281] GPT: Use GNU Parted to correct GPT errors. [ 1960.088302] sde: sde1 sde9 [ 1960.324063] sde: sde1 sde9 [ 1960.347788] sde: sde1 sde9 [ 1960.515198] Alternate GPT is invalid, using primary GPT. [ 1960.515225] sdf: sdf1 sdf9 [ 1960.845503] sdf: sdf1 sdf9 [ 1960.869365] sdf: sdf1 sdf9 [ 1961.018646] GPT:disk_guids don't match. [ 1961.018654] GPT:partition_entry_array_crc32 values don't match: 0xf42c8d7b != 0x97a63590 [ 1961.018657] GPT: Use GNU Parted to correct GPT errors. [ 1961.018679] sdg: sdg1 sdg9 [ 1961.349725] sdg: sdg1 sdg9 [ 1961.373959] sdg: sdg1 sdg9 [ 1961.524544] Alternate GPT is invalid, using primary GPT. [ 1961.524569] sdh: sdh1 sdh9 [ 1961.655219] sdh: sdh1 sdh9 [ 1961.814506] GPT:disk_guids don't match. [ 1961.814515] GPT:partition_entry_array_crc32 values don't match: 0x3d5540f9 != 0x85f3e2e6 [ 1961.814517] GPT: Use GNU Parted to correct GPT errors. [ 1961.814537] sdi: sdi1 sdi9 [ 1961.867240] sdi: sdi1 sdi9 [ 1962.081393] Alternate GPT is invalid, using primary GPT. [ 1962.081420] sdj: sdj1 sdj9 [ 1962.261463] sdj: sdj1 sdj9 [ 1962.485817] Alternate GPT is invalid, using primary GPT. [ 1962.485841] sdk: sdk1 sdk9 [ 1962.617151] sdk: sdk1 sdk9 [ 1962.828196] GPT:disk_guids don't match. [ 1962.828206] GPT:partition_entry_array_crc32 values don't match: 0x7cf05c31 != 0xbfb68e7 [ 1962.828208] GPT: Use GNU Parted to correct GPT errors. [ 1962.828232] sdl: sdl1 sdl9 [ 1962.990115] sdl: sdl1 sdl9 [ 1963.188994] GPT:disk_guids don't match. [ 1963.189028] GPT:partition_entry_array_crc32 values don't match: 0x38ff1612 != 0xc53037f5 [ 1963.189031] GPT: Use GNU Parted to correct GPT errors. [ 1963.189055] sdm: sdm1 sdm9 [ 1963.453695] sdm: sdm1 sdm9 [ 1963.622171] GPT:disk_guids don't match. [ 1963.622179] GPT:partition_entry_array_crc32 values don't match: 0x6577aef4 != 0x1624515d [ 1963.622182] GPT: Use GNU Parted to correct GPT errors. [ 1963.622202] sdn: sdn1 sdn9 [ 1963.927932] sdn: sdn1 sdn9 [ 1964.131710] Alternate GPT is invalid, using primary GPT. [ 1964.131737] sdo: sdo1 sdo9 [ 1964.304545] sdo: sdo1 sdo9 [ 1964.537353] Alternate GPT is invalid, using primary GPT. [ 1964.537380] sdp: sdp1 sdp9 [ 1964.608130] sdp: sdp1 sdp9 [ 1964.861371] Alternate GPT is invalid, using primary GPT. [ 1964.861397] sdq: sdq1 sdq9 [ 1964.988531] sdq: sdq1 sdq9 [ 1965.295413] GPT:disk_guids don't match. [ 1965.295421] GPT:partition_entry_array_crc32 values don't match: 0x2cd0988d != 0x828d383e [ 1965.295424] GPT: Use GNU Parted to correct GPT errors. [ 1965.295458] sdr: sdr1 sdr9 [ 1965.577126] sdr: sdr1 sdr9
I have seen this before when disks are exact clones of each other and have identical UUID/WWIDs, but there are probably other reasons as well.
Well, same GOOD different days with some quite randomly chosen modules:
[root@node26 ~]# ./mkzpool.sh + modules=(spl zfs lnet lustre ost osd_zfs) + typeset -a modules + ./collect-info.sh adding: debug_info.20170502_050611.727878977_0400-3460-node26/ (stored 0%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/Now (deflated 51%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.script.log (deflated 89%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.rpm-qa.txt (deflated 69%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lsmod.txt (deflated 66%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lsblk.txt (deflated 79%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.df.txt (deflated 54%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.mount.txt (deflated 74%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.dmesg.txt (deflated 73%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.zpool_events.txt (deflated 61%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.zpool_events_verbose.txt (deflated 79%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lctl_dl.txt (stored 0%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lctl_dk.txt (deflated 12%) adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.messages (deflated 84%) + modFilter= + for module in '${modules[*]}' + echo '+ [spl]' + [spl] ++ test -z '' ++ echo '' + modFilter=spl + modprobe -v spl + for module in '${modules[*]}' + echo '+ [zfs]' + [zfs] ++ test -z spl ++ echo 'spl|' + modFilter='spl|zfs' + modprobe -v zfs + for module in '${modules[*]}' + echo '+ [lnet]' + [lnet] ++ test -z 'spl|zfs' ++ echo 'spl|zfs|' + modFilter='spl|zfs|lnet' + modprobe -v lnet insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/net/lustre/libcfs.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/net/lustre/lnet.ko + for module in '${modules[*]}' + echo '+ [lustre]' + [lustre] ++ test -z 'spl|zfs|lnet' ++ echo 'spl|zfs|lnet|' + modFilter='spl|zfs|lnet|lustre' + modprobe -v lustre insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/obdclass.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/ptlrpc.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/fld.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/fid.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lov.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/mdc.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lmv.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lustre.ko + for module in '${modules[*]}' + echo '+ [ost]' + [ost] ++ test -z 'spl|zfs|lnet|lustre' ++ echo 'spl|zfs|lnet|lustre|' + modFilter='spl|zfs|lnet|lustre|ost' + modprobe -v ost insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/ost.ko + for module in '${modules[*]}' + echo '+ [osd_zfs]' + [osd_zfs] ++ test -z 'spl|zfs|lnet|lustre|ost' ++ echo 'spl|zfs|lnet|lustre|ost|' + modFilter='spl|zfs|lnet|lustre|ost|osd_zfs' + modprobe -v osd_zfs insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lquota.ko insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/osd_zfs.ko + lsmod + grep -E 'spl|zfs|lnet|lustre|ost|osd_zfs' osd_zfs 252589 0 lquota 354067 1 osd_zfs ost 14991 0 lustre 816649 0 lmv 222021 1 lustre mdc 173180 1 lustre lov 295937 1 lustre fid 90581 2 mdc,osd_zfs fld 85860 3 fid,lmv,osd_zfs ptlrpc 2129791 8 fid,fld,lmv,mdc,lov,ost,lquota,lustre obdclass 1909130 20 fid,fld,lmv,mdc,lov,ost,lquota,lustre,ptlrpc,osd_zfs lnet 444969 4 lustre,obdclass,ptlrpc,ksocklnd libcfs 405310 13 fid,fld,lmv,mdc,lov,ost,lnet,lquota,lustre,obdclass,ptlrpc,osd_zfs,ksocklnd zfs 4026085 1 osd_zfs zunicode 331170 1 zfs zavl 19839 1 zfs icp 299501 1 zfs zcommon 77836 2 zfs,osd_zfs znvpair 93348 3 zfs,zcommon,osd_zfs spl 130321 6 icp,zfs,zavl,zcommon,znvpair,osd_zfs zlib_deflate 26914 1 spl + zpool list + grep -w 'no pools available' + zpool destroy oss3pool + zpool list + grep -w 'no pools available' no pools available + '[' -f 17.nvl ']' + draidcfg -r 17.nvl dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare Using 32 base permutations 15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0, 5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16, 10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8, 13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10, 13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15, 8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10, 16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3, 5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13, 4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1, 10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15, 2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14, 15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13, 1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15, 3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5, 15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11, 14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6, 7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15, 16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8, 0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13, 8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1, 9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1, 8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16, 5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2, 15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3, 15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14, 15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6, 15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1, 0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13, 7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4, 14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2, 9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16, 4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15, + zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr + zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT oss3pool 14,7G 612K 14,7G - 0% 0% 1.00x ONLINE - + zpool status pool: oss3pool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM oss3pool ONLINE 0 0 0 draid1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0 sdn ONLINE 0 0 0 sdo ONLINE 0 0 0 sdp ONLINE 0 0 0 sdq ONLINE 0 0 0 sdr ONLINE 0 0 0 spares $draid1-0-s0 AVAIL $draid1-0-s1 AVAIL errors: No known data errors + mount + grep oss3pool oss3pool on /oss3pool type zfs (rw,xattr,noacl) + ./collect-info.sh adding: debug_info.20170502_050643.778657696_0400-4594-node26/ (stored 0%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/Now (deflated 51%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.script.log (deflated 90%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.rpm-qa.txt (deflated 69%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lsmod.txt (deflated 67%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lsblk.txt (deflated 79%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.df.txt (deflated 55%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.mount.txt (deflated 74%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.dmesg.txt (deflated 73%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.zpool_events.txt (deflated 69%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lctl_dl.txt (stored 0%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lctl_dk.txt (deflated 68%) adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.messages (deflated 84%) + mkfs.lustre --reformat --replace --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01 Permanent disk data: Target: ZFS01-OST0003 Index: 3 Lustre FS: ZFS01 Mount type: zfs Flags: 0x42 (OST update ) Persistent mount opts: Parameters: mgsnode=172.17.32.220@tcp mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01 Writing oss3pool/ZFS01 properties lustre:version=1 lustre:flags=66 lustre:index=3 lustre:fsname=ZFS01 lustre:svname=ZFS01-OST0003 lustre:mgsnode=172.17.32.220@tcp + ./collect-info.sh adding: debug_info.20170502_050655.040550023_0400-5778-node26/ (stored 0%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/Now (deflated 51%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.script.log (deflated 90%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.rpm-qa.txt (deflated 69%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lsmod.txt (deflated 67%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lsblk.txt (deflated 79%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.df.txt (deflated 56%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.mount.txt (deflated 74%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.dmesg.txt (deflated 73%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.zpool_events.txt (deflated 69%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lctl_dl.txt (stored 0%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lctl_dk.txt (deflated 9%) adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.messages (deflated 84%) + '[' -d /lustre/ZFS01/. ']' + mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = oss3pool/ZFS01 arg[5] = /lustre/ZFS01 source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01 options = rw checking for existing Lustre data: found Writing oss3pool/ZFS01 properties lustre:version=1 lustre:flags=2 lustre:index=3 lustre:fsname=ZFS01 lustre:svname=ZFS01-OST0003 lustre:mgsnode=172.17.32.220@tcp mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
Debug info will arrive in few minutes...
BTW, there are quite a while (37) of modules here:
[root@node26 ~]# find . -name '*.ko' ./spl/module/spl/spl.ko ./spl/module/splat/splat.ko ./zfs/module/avl/zavl.ko ./zfs/module/icp/icp.ko ./zfs/module/nvpair/znvpair.ko ./zfs/module/unicode/zunicode.ko ./zfs/module/zcommon/zcommon.ko ./zfs/module/zfs/zfs.ko ./zfs/module/zpios/zpios.ko ./lustre-release/libcfs/libcfs/libcfs.ko ./lustre-release/lnet/klnds/o2iblnd/ko2iblnd.ko ./lustre-release/lnet/klnds/socklnd/ksocklnd.ko ./lustre-release/lnet/lnet/lnet.ko ./lustre-release/lnet/selftest/lnet_selftest.ko ./lustre-release/lustre/fid/fid.ko ./lustre-release/lustre/fld/fld.ko ./lustre-release/lustre/lfsck/lfsck.ko ./lustre-release/lustre/llite/llite_lloop.ko ./lustre-release/lustre/llite/lustre.ko ./lustre-release/lustre/lmv/lmv.ko ./lustre-release/lustre/lod/lod.ko ./lustre-release/lustre/lov/lov.ko ./lustre-release/lustre/mdc/mdc.ko ./lustre-release/lustre/mdd/mdd.ko ./lustre-release/lustre/mdt/mdt.ko ./lustre-release/lustre/mgc/mgc.ko ./lustre-release/lustre/mgs/mgs.ko ./lustre-release/lustre/obdclass/obdclass.ko ./lustre-release/lustre/obdclass/llog_test.ko ./lustre-release/lustre/obdecho/obdecho.ko ./lustre-release/lustre/ofd/ofd.ko ./lustre-release/lustre/osc/osc.ko ./lustre-release/lustre/osd-zfs/osd_zfs.ko ./lustre-release/lustre/osp/osp.ko ./lustre-release/lustre/ost/ost.ko ./lustre-release/lustre/ptlrpc/ptlrpc.ko ./lustre-release/lustre/quota/lquota.ko
and it's not obvious to me which and when to load...
In the debug_info.20170424_044934.896525307_0400-5362-node26.zip one may see spl and zfs things loaded.
Ok, I'll try to load all or some of them now and re-try.
Right but look in your lsmod output – it does not appear lustre or lnet are there.
$ grep lustre OUTPUT.lsmod.txt
$
This is why all of your Lustre commands are failing – such as: invalid parameter 'dump_kernel'
open(dump_kernel) failed: No such file or directory
Could you please load the Lustre & lnet kernel modules and try this again? Also I do not see output from the mds. If there are still issues that would be helpful.
Thank you
Hi there,
Yes, it was installed. From build (make install).
I.e. one may see
[root@node26 ~]# lustre_ lustre_req_history lustre_routes_config lustre_rsync lustre_rmmod lustre_routes_conversion lustre_start [root@node26 ~]# lustre_
Greetings,
Maybe I am not looking at this right but it does not look like Lustre is installed on the OSS node? Can you confirm? In the rpm list I didn't see the rpms and the Lustre command did not appear to be able to run.
I've added calls to collect-info.sh right into mkzpool.sh script (with sleep/sync/sleep magic to have the last zip kept).
Here we are:
- debug_info.20170424_044901.648221747_0400-3235-node26.zip
- debug_info.20170424_044924.231035970_0400-4268-node26.zip
- debug_info.20170424_044934.896525307_0400-5362-node26.zip
- console at hang (I dunno what "dcla" means here)
[root@node26 ~]# ./mkzpool.sh + ./collect-info.sh adding: debug_info.20170424_044901.648221747_0400-3235-node26/ (stored 0%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/Now (deflated 51%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.script.log (deflated 89%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.rpm-qa.txt (deflated 69%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.lsmod.txt (deflated 66%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.lsblk.txt (deflated 79%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.df.txt (deflated 54%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.mount.txt (deflated 74%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.dmesg.txt (deflated 73%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.zpool_events.txt (deflated 62%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.zpool_events_verbose.txt (deflated 79%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.lctl_dl.txt (stored 0%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.lctl_dk.txt (deflated 12%) adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.messages (deflated 83%) + zpool list + grep -w 'no pools available' + zpool destroy oss3pool + zpool list + grep -w 'no pools available' no pools available + '[' -f 17.nvl ']' + draidcfg -r 17.nvl dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare Using 32 base permutations 15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0, 5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16, 10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8, 13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10, 13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15, 8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10, 16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3, 5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13, 4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1, 10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15, 2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14, 15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13, 1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15, 3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5, 15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11, 14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6, 7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15, 16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8, 0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13, 8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1, 9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1, 8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16, 5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2, 15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3, 15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14, 15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6, 15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1, 0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13, 7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4, 14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2, 9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16, 4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15, + zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr + zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT oss3pool 14,7G 612K 14,7G - 0% 0% 1.00x ONLINE - + zpool status pool: oss3pool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM oss3pool ONLINE 0 0 0 draid1-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0 sdn ONLINE 0 0 0 sdo ONLINE 0 0 0 sdp ONLINE 0 0 0 sdq ONLINE 0 0 0 sdr ONLINE 0 0 0 spares $draid1-0-s0 AVAIL $draid1-0-s1 AVAIL errors: No known data errors + grep oss3pool + mount oss3pool on /oss3pool type zfs (rw,xattr,noacl) + ./collect-info.sh adding: debug_info.20170424_044924.231035970_0400-4268-node26/ (stored 0%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/Now (deflated 51%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.script.log (deflated 89%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.rpm-qa.txt (deflated 69%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.lsmod.txt (deflated 66%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.lsblk.txt (deflated 79%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.df.txt (deflated 55%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.mount.txt (deflated 74%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.dmesg.txt (deflated 73%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.zpool_events.txt (deflated 71%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.lctl_dl.txt (stored 0%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.lctl_dk.txt (deflated 12%) adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.messages (deflated 83%) + mkfs.lustre --reformat --replace --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01 Permanent disk data: Target: ZFS01-OST0003 Index: 3 Lustre FS: ZFS01 Mount type: zfs Flags: 0x42 (OST update ) Persistent mount opts: Parameters: mgsnode=172.17.32.220@tcp mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01 Writing oss3pool/ZFS01 properties lustre:version=1 lustre:flags=66 lustre:index=3 lustre:fsname=ZFS01 lustre:svname=ZFS01-OST0003 lustre:mgsnode=172.17.32.220@tcp + ./collect-info.sh adding: debug_info.20170424_044934.896525307_0400-5362-node26/ (stored 0%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/Now (deflated 51%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.script.log (deflated 89%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.rpm-qa.txt (deflated 69%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.lsmod.txt (deflated 66%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.lsblk.txt (deflated 79%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.df.txt (deflated 55%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.mount.txt (deflated 74%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.dmesg.txt (deflated 73%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.zpool_events.txt (deflated 72%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.lctl_dl.txt (stored 0%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.lctl_dk.txt (deflated 12%) adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.messages (deflated 83%) + '[' -d /lustre/ZFS01/. ']' + mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = oss3pool/ZFS01 arg[5] = /lustre/ZFS01 source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01 options = rw checking for existing Lustre data: found Writing oss3pool/ZFS01 properties lustre:version=1 lustre:flags=2 lustre:index=3 lustre:fsname=ZFS01 lustre:svname=ZFS01-OST0003 lustre:mgsnode=172.17.32.220@tcp mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
And yes, it now (after fixing my mistake with --index) crashes on the 1st mount.
Thanks for support, folks!
I'll try to contact Hyper-V admin and re-create the disk set to work it out.
But it may take a while.