[LU-9370] Lustre 2.9 + zfs 0.7 + draid = OSS hangup Created: 20/Apr/17 Updated: 21/Dec/17 Resolved: 21/Dec/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | jno | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 1 |
| Labels: | zfs | ||
| Environment: |
CentOS 7 in a Hyper-V vm |
||
| Attachments: |
|
| Epic/Theme: | lustre-2.9, zfs |
| Business Value: | 1 |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
I'm trying to build a draid based OST for lustre. Initially created as [https://github.com/thegreatgazoo/zfs/issues/2|thegreatgazoo/zfs issue]. Generic lustre MGS/MDT is up'n'running. Fresh VM (4 CPU, 4G RAM) with CentOS 7 "minimal" install and 18 scsi disks (images). Perform yum -y update ; reboot, then run setup-node.sh NODE from workstation. [root@node26 ~]# ./mkzpool.sh
+ zpool list
+ grep -w 'no pools available'
+ zpool destroy oss3pool
+ zpool list
+ grep -w 'no pools available'
no pools available
+ '[' -f 17.nvl ']'
+ draidcfg -r 17.nvl
dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare
Using 32 base permutations
15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0,
5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16,
10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8,
13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10,
13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15,
8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10,
16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3,
5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13,
4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1,
10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15,
2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14,
15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13,
1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15,
3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5,
15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11,
14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6,
7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15,
16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8,
0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13,
8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1,
9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1,
8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16,
5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2,
15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3,
15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14,
15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6,
15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1,
0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13,
7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4,
14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2,
9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16,
4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15,
+ zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr
+ zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
oss3pool 14,7G 612K 14,7G - 0% 0% 1.00x ONLINE -
+ zpool status
pool: oss3pool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
oss3pool ONLINE 0 0 0
draid1-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
sdn ONLINE 0 0 0
sdo ONLINE 0 0 0
sdp ONLINE 0 0 0
sdq ONLINE 0 0 0
sdr ONLINE 0 0 0
spares
$draid1-0-s0 AVAIL
$draid1-0-s1 AVAIL
errors: No known data errors
+ grep oss3pool
+ mount
oss3pool on /oss3pool type zfs (rw,xattr,noacl)
+ mkfs.lustre --reformat --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01
Permanent disk data:
Target: ZFS01:OST0003
Index: 3
Lustre FS: ZFS01
Mount type: zfs
Flags: 0x62
(OST first_time update )
Persistent mount opts:
Parameters: mgsnode=172.17.32.220@tcp
mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01
Writing oss3pool/ZFS01 properties
lustre:version=1
lustre:flags=98
lustre:index=3
lustre:fsname=ZFS01
lustre:svname=ZFS01:OST0003
lustre:mgsnode=172.17.32.220@tcp
+ '[' -d /lustre/ZFS01/. ']'
+ mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = oss3pool/ZFS01
arg[5] = /lustre/ZFS01
source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
options = rw
checking for existing Lustre data: found
Writing oss3pool/ZFS01 properties
lustre:version=1
lustre:flags=34
lustre:index=3
lustre:fsname=ZFS01
lustre:svname=ZFS01:OST0003
lustre:mgsnode=172.17.32.220@tcp
mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,virgin,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
mount.lustre: mount oss3pool/ZFS01 at /lustre/ZFS01 failed: Address already in use retries left: 0
mount.lustre: mount oss3pool/ZFS01 at /lustre/ZFS01 failed: Address already in use
The target service's index is already in use. (oss3pool/ZFS01)
[root@node26 ~]# mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = oss3pool/ZFS01
arg[5] = /lustre/ZFS01
source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
options = rw
checking for existing Lustre data: found
mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,virgin,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
|
| Comments |
| Comment by Andreas Dilger [ 20/Apr/17 ] |
|
The "address already in use" message means that you have previously formatted an OST with the same index for this filesystem. You could use a new index, or use the --replace option to re-use the same index. |
| Comment by jno [ 20/Apr/17 ] |
|
Ok, but why it hangs?? PS. Yes, EADDRINUSE gone, but it still hangs: + mkfs.lustre --reformat --replace --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01 Permanent disk data: Target: ZFS01-OST0003 Index: 3 Lustre FS: ZFS01 Mount type: zfs Flags: 0x42 (OST update ) Persistent mount opts: Parameters: mgsnode=172.17.32.220@tcp mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01 Writing oss3pool/ZFS01 properties lustre:version=1 lustre:flags=66 lustre:index=3 lustre:fsname=ZFS01 lustre:svname=ZFS01-OST0003 lustre:mgsnode=172.17.32.220@tcp + '[' -d /lustre/ZFS01/. ']' + mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = oss3pool/ZFS01 arg[5] = /lustre/ZFS01 source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01 options = rw checking for existing Lustre data: found Writing oss3pool/ZFS01 properties lustre:version=1 lustre:flags=2 lustre:index=3 lustre:fsname=ZFS01 lustre:svname=ZFS01-OST0003 lustre:mgsnode=172.17.32.220@tcp mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
|
| Comment by John Salinas (Inactive) [ 21/Apr/17 ] |
|
Is the mount hanging or the system hanging? Can you provide the following for each node:
mkdir -p "$Dir" ./show_kernelmod_params.sh > "$Dir"/OUTPUT.show_kernelmod_params.txt |
| Comment by jno [ 24/Apr/17 ] |
|
Well, It's the entier system (VM) hangs (crashes), not a mount, an app, or a session. Not every time I'm lucky enough to see that panic on the console. Usually it just hangs. There are
|
| Comment by jno [ 24/Apr/17 ] |
|
I've added calls to collect-info.sh right into mkzpool.sh script (with sleep/sync/sleep magic to have the last zip kept). Here we are:
|
| Comment by John Salinas (Inactive) [ 24/Apr/17 ] |
|
Greetings, Maybe I am not looking at this right but it does not look like Lustre is installed on the OSS node? Can you confirm? In the rpm list I didn't see the rpms and the Lustre command did not appear to be able to run. |
| Comment by jno [ 25/Apr/17 ] |
|
Hi there,
Yes, it was installed. From build (make install). I.e. one may see [root@node26 ~]# lustre_ lustre_req_history lustre_routes_config lustre_rsync lustre_rmmod lustre_routes_conversion lustre_start [root@node26 ~]# lustre_ |
| Comment by John Salinas (Inactive) [ 25/Apr/17 ] |
|
Right but look in your lsmod output – it does not appear lustre or lnet are there. $ grep lustre OUTPUT.lsmod.txt This is why all of your Lustre commands are failing – such as: invalid parameter 'dump_kernel' Could you please load the Lustre & lnet kernel modules and try this again? Also I do not see output from the mds. If there are still issues that would be helpful. Thank you |
| Comment by jno [ 02/May/17 ] |
|
BTW, there are quite a while (37) of modules here: [root@node26 ~]# find . -name '*.ko' ./spl/module/spl/spl.ko ./spl/module/splat/splat.ko ./zfs/module/avl/zavl.ko ./zfs/module/icp/icp.ko ./zfs/module/nvpair/znvpair.ko ./zfs/module/unicode/zunicode.ko ./zfs/module/zcommon/zcommon.ko ./zfs/module/zfs/zfs.ko ./zfs/module/zpios/zpios.ko ./lustre-release/libcfs/libcfs/libcfs.ko ./lustre-release/lnet/klnds/o2iblnd/ko2iblnd.ko ./lustre-release/lnet/klnds/socklnd/ksocklnd.ko ./lustre-release/lnet/lnet/lnet.ko ./lustre-release/lnet/selftest/lnet_selftest.ko ./lustre-release/lustre/fid/fid.ko ./lustre-release/lustre/fld/fld.ko ./lustre-release/lustre/lfsck/lfsck.ko ./lustre-release/lustre/llite/llite_lloop.ko ./lustre-release/lustre/llite/lustre.ko ./lustre-release/lustre/lmv/lmv.ko ./lustre-release/lustre/lod/lod.ko ./lustre-release/lustre/lov/lov.ko ./lustre-release/lustre/mdc/mdc.ko ./lustre-release/lustre/mdd/mdd.ko ./lustre-release/lustre/mdt/mdt.ko ./lustre-release/lustre/mgc/mgc.ko ./lustre-release/lustre/mgs/mgs.ko ./lustre-release/lustre/obdclass/obdclass.ko ./lustre-release/lustre/obdclass/llog_test.ko ./lustre-release/lustre/obdecho/obdecho.ko ./lustre-release/lustre/ofd/ofd.ko ./lustre-release/lustre/osc/osc.ko ./lustre-release/lustre/osd-zfs/osd_zfs.ko ./lustre-release/lustre/osp/osp.ko ./lustre-release/lustre/ost/ost.ko ./lustre-release/lustre/ptlrpc/ptlrpc.ko ./lustre-release/lustre/quota/lquota.ko and it's not obvious to me which and when to load... Ok, I'll try to load all or some of them now and re-try. |
| Comment by jno [ 02/May/17 ] |
|
Well, same GOOD different days with some quite randomly chosen modules: [root@node26 ~]# ./mkzpool.sh
+ modules=(spl zfs lnet lustre ost osd_zfs)
+ typeset -a modules
+ ./collect-info.sh
adding: debug_info.20170502_050611.727878977_0400-3460-node26/ (stored 0%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/Now (deflated 51%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.script.log (deflated 89%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.rpm-qa.txt (deflated 69%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lsmod.txt (deflated 66%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lsblk.txt (deflated 79%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.df.txt (deflated 54%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.mount.txt (deflated 74%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.dmesg.txt (deflated 73%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.zpool_events.txt (deflated 61%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.zpool_events_verbose.txt (deflated 79%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lctl_dl.txt (stored 0%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lctl_dk.txt (deflated 12%)
adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.messages (deflated 84%)
+ modFilter=
+ for module in '${modules[*]}'
+ echo '+ [spl]'
+ [spl]
++ test -z ''
++ echo ''
+ modFilter=spl
+ modprobe -v spl
+ for module in '${modules[*]}'
+ echo '+ [zfs]'
+ [zfs]
++ test -z spl
++ echo 'spl|'
+ modFilter='spl|zfs'
+ modprobe -v zfs
+ for module in '${modules[*]}'
+ echo '+ [lnet]'
+ [lnet]
++ test -z 'spl|zfs'
++ echo 'spl|zfs|'
+ modFilter='spl|zfs|lnet'
+ modprobe -v lnet
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/net/lustre/libcfs.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/net/lustre/lnet.ko
+ for module in '${modules[*]}'
+ echo '+ [lustre]'
+ [lustre]
++ test -z 'spl|zfs|lnet'
++ echo 'spl|zfs|lnet|'
+ modFilter='spl|zfs|lnet|lustre'
+ modprobe -v lustre
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/obdclass.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/ptlrpc.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/fld.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/fid.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lov.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/mdc.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lmv.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lustre.ko
+ for module in '${modules[*]}'
+ echo '+ [ost]'
+ [ost]
++ test -z 'spl|zfs|lnet|lustre'
++ echo 'spl|zfs|lnet|lustre|'
+ modFilter='spl|zfs|lnet|lustre|ost'
+ modprobe -v ost
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/ost.ko
+ for module in '${modules[*]}'
+ echo '+ [osd_zfs]'
+ [osd_zfs]
++ test -z 'spl|zfs|lnet|lustre|ost'
++ echo 'spl|zfs|lnet|lustre|ost|'
+ modFilter='spl|zfs|lnet|lustre|ost|osd_zfs'
+ modprobe -v osd_zfs
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lquota.ko
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/osd_zfs.ko
+ lsmod
+ grep -E 'spl|zfs|lnet|lustre|ost|osd_zfs'
osd_zfs 252589 0
lquota 354067 1 osd_zfs
ost 14991 0
lustre 816649 0
lmv 222021 1 lustre
mdc 173180 1 lustre
lov 295937 1 lustre
fid 90581 2 mdc,osd_zfs
fld 85860 3 fid,lmv,osd_zfs
ptlrpc 2129791 8 fid,fld,lmv,mdc,lov,ost,lquota,lustre
obdclass 1909130 20 fid,fld,lmv,mdc,lov,ost,lquota,lustre,ptlrpc,osd_zfs
lnet 444969 4 lustre,obdclass,ptlrpc,ksocklnd
libcfs 405310 13 fid,fld,lmv,mdc,lov,ost,lnet,lquota,lustre,obdclass,ptlrpc,osd_zfs,ksocklnd
zfs 4026085 1 osd_zfs
zunicode 331170 1 zfs
zavl 19839 1 zfs
icp 299501 1 zfs
zcommon 77836 2 zfs,osd_zfs
znvpair 93348 3 zfs,zcommon,osd_zfs
spl 130321 6 icp,zfs,zavl,zcommon,znvpair,osd_zfs
zlib_deflate 26914 1 spl
+ zpool list
+ grep -w 'no pools available'
+ zpool destroy oss3pool
+ zpool list
+ grep -w 'no pools available'
no pools available
+ '[' -f 17.nvl ']'
+ draidcfg -r 17.nvl
dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare
Using 32 base permutations
15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0,
5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16,
10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8,
13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10,
13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15,
8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10,
16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3,
5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13,
4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1,
10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15,
2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14,
15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13,
1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15,
3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5,
15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11,
14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6,
7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15,
16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8,
0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13,
8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1,
9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1,
8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16,
5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2,
15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3,
15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14,
15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6,
15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1,
0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13,
7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4,
14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2,
9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16,
4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15,
+ zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr
+ zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
oss3pool 14,7G 612K 14,7G - 0% 0% 1.00x ONLINE -
+ zpool status
pool: oss3pool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
oss3pool ONLINE 0 0 0
draid1-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
sdn ONLINE 0 0 0
sdo ONLINE 0 0 0
sdp ONLINE 0 0 0
sdq ONLINE 0 0 0
sdr ONLINE 0 0 0
spares
$draid1-0-s0 AVAIL
$draid1-0-s1 AVAIL
errors: No known data errors
+ mount
+ grep oss3pool
oss3pool on /oss3pool type zfs (rw,xattr,noacl)
+ ./collect-info.sh
adding: debug_info.20170502_050643.778657696_0400-4594-node26/ (stored 0%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/Now (deflated 51%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.script.log (deflated 90%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.rpm-qa.txt (deflated 69%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lsmod.txt (deflated 67%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lsblk.txt (deflated 79%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.df.txt (deflated 55%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.mount.txt (deflated 74%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.dmesg.txt (deflated 73%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.zpool_events.txt (deflated 69%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lctl_dl.txt (stored 0%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lctl_dk.txt (deflated 68%)
adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.messages (deflated 84%)
+ mkfs.lustre --reformat --replace --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01
Permanent disk data:
Target: ZFS01-OST0003
Index: 3
Lustre FS: ZFS01
Mount type: zfs
Flags: 0x42
(OST update )
Persistent mount opts:
Parameters: mgsnode=172.17.32.220@tcp
mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01
Writing oss3pool/ZFS01 properties
lustre:version=1
lustre:flags=66
lustre:index=3
lustre:fsname=ZFS01
lustre:svname=ZFS01-OST0003
lustre:mgsnode=172.17.32.220@tcp
+ ./collect-info.sh
adding: debug_info.20170502_050655.040550023_0400-5778-node26/ (stored 0%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/Now (deflated 51%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.script.log (deflated 90%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.rpm-qa.txt (deflated 69%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lsmod.txt (deflated 67%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lsblk.txt (deflated 79%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.df.txt (deflated 56%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.mount.txt (deflated 74%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.dmesg.txt (deflated 73%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.zpool_events.txt (deflated 69%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lctl_dl.txt (stored 0%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lctl_dk.txt (deflated 9%)
adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.messages (deflated 84%)
+ '[' -d /lustre/ZFS01/. ']'
+ mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = oss3pool/ZFS01
arg[5] = /lustre/ZFS01
source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
options = rw
checking for existing Lustre data: found
Writing oss3pool/ZFS01 properties
lustre:version=1
lustre:flags=2
lustre:index=3
lustre:fsname=ZFS01
lustre:svname=ZFS01-OST0003
lustre:mgsnode=172.17.32.220@tcp
mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
Debug info will arrive in few minutes... |
| Comment by John Salinas (Inactive) [ 02/May/17 ] |
|
Looks like you are getting further in the process but failing early in the mount process due to underlying errors: [ 1946.710418] LNet: HW CPU cores: 4, npartitions: 1 [ 1946.718890] alg: No test for adler32 (adler32-zlib) [ 1946.719309] alg: No test for crc32 (crc32-table) [ 1951.762047] sha512_ssse3: Using AVX optimized SHA-512 implementation [ 1954.979635] Lustre: Lustre: Build Version: 2.9.0 [ 1955.302507] LNet: Added LNI 172.17.32.226@tcp [8/256/0/180] [ 1955.302711] LNet: Accept secure, port 988 [ 1959.056006] GPT:disk_guids don't match. [ 1959.056034] GPT:partition_entry_array_crc32 values don't match: 0x5d3c877c != 0x443a8464 [ 1959.056037] GPT: Use GNU Parted to correct GPT errors. [ 1959.056059] sdb: sdb1 sdb9 [ 1959.230444] sdb: sdb1 sdb9 [ 1959.406374] GPT:disk_guids don't match. [ 1959.406384] GPT:partition_entry_array_crc32 values don't match: 0x29fa53ae != 0xa19347b0 [ 1959.406387] GPT: Use GNU Parted to correct GPT errors. [ 1959.406408] sdc: sdc1 sdc9 [ 1959.610229] sdc: sdc1 sdc9 [ 1959.821418] Alternate GPT is invalid, using primary GPT. [ 1959.821444] sdd: sdd1 sdd9 [ 1959.903091] sdd: sdd1 sdd9 [ 1960.088271] GPT:disk_guids don't match. [ 1960.088279] GPT:partition_entry_array_crc32 values don't match: 0xda543dc7 != 0xdb0d75f4 [ 1960.088281] GPT: Use GNU Parted to correct GPT errors. [ 1960.088302] sde: sde1 sde9 [ 1960.324063] sde: sde1 sde9 [ 1960.347788] sde: sde1 sde9 [ 1960.515198] Alternate GPT is invalid, using primary GPT. [ 1960.515225] sdf: sdf1 sdf9 [ 1960.845503] sdf: sdf1 sdf9 [ 1960.869365] sdf: sdf1 sdf9 [ 1961.018646] GPT:disk_guids don't match. [ 1961.018654] GPT:partition_entry_array_crc32 values don't match: 0xf42c8d7b != 0x97a63590 [ 1961.018657] GPT: Use GNU Parted to correct GPT errors. [ 1961.018679] sdg: sdg1 sdg9 [ 1961.349725] sdg: sdg1 sdg9 [ 1961.373959] sdg: sdg1 sdg9 [ 1961.524544] Alternate GPT is invalid, using primary GPT. [ 1961.524569] sdh: sdh1 sdh9 [ 1961.655219] sdh: sdh1 sdh9 [ 1961.814506] GPT:disk_guids don't match. [ 1961.814515] GPT:partition_entry_array_crc32 values don't match: 0x3d5540f9 != 0x85f3e2e6 [ 1961.814517] GPT: Use GNU Parted to correct GPT errors. [ 1961.814537] sdi: sdi1 sdi9 [ 1961.867240] sdi: sdi1 sdi9 [ 1962.081393] Alternate GPT is invalid, using primary GPT. [ 1962.081420] sdj: sdj1 sdj9 [ 1962.261463] sdj: sdj1 sdj9 [ 1962.485817] Alternate GPT is invalid, using primary GPT. [ 1962.485841] sdk: sdk1 sdk9 [ 1962.617151] sdk: sdk1 sdk9 [ 1962.828196] GPT:disk_guids don't match. [ 1962.828206] GPT:partition_entry_array_crc32 values don't match: 0x7cf05c31 != 0xbfb68e7 [ 1962.828208] GPT: Use GNU Parted to correct GPT errors. [ 1962.828232] sdl: sdl1 sdl9 [ 1962.990115] sdl: sdl1 sdl9 [ 1963.188994] GPT:disk_guids don't match. [ 1963.189028] GPT:partition_entry_array_crc32 values don't match: 0x38ff1612 != 0xc53037f5 [ 1963.189031] GPT: Use GNU Parted to correct GPT errors. [ 1963.189055] sdm: sdm1 sdm9 [ 1963.453695] sdm: sdm1 sdm9 [ 1963.622171] GPT:disk_guids don't match. [ 1963.622179] GPT:partition_entry_array_crc32 values don't match: 0x6577aef4 != 0x1624515d [ 1963.622182] GPT: Use GNU Parted to correct GPT errors. [ 1963.622202] sdn: sdn1 sdn9 [ 1963.927932] sdn: sdn1 sdn9 [ 1964.131710] Alternate GPT is invalid, using primary GPT. [ 1964.131737] sdo: sdo1 sdo9 [ 1964.304545] sdo: sdo1 sdo9 [ 1964.537353] Alternate GPT is invalid, using primary GPT. [ 1964.537380] sdp: sdp1 sdp9 [ 1964.608130] sdp: sdp1 sdp9 [ 1964.861371] Alternate GPT is invalid, using primary GPT. [ 1964.861397] sdq: sdq1 sdq9 [ 1964.988531] sdq: sdq1 sdq9 [ 1965.295413] GPT:disk_guids don't match. [ 1965.295421] GPT:partition_entry_array_crc32 values don't match: 0x2cd0988d != 0x828d383e [ 1965.295424] GPT: Use GNU Parted to correct GPT errors. [ 1965.295458] sdr: sdr1 sdr9 [ 1965.577126] sdr: sdr1 sdr9 I have seen this before when disks are exact clones of each other and have identical UUID/WWIDs, but there are probably other reasons as well. |
| Comment by jno [ 03/May/17 ] |
|
Wow! It was a Hyper-V host out of my control... Thanks for the point, went to dig. |
| Comment by John Salinas (Inactive) [ 03/May/17 ] |
|
I understand, I have been there before. Please let us know if we can close this ticket. If you have any dRAID testing questions please keep in contact with us. |
| Comment by jno [ 05/May/17 ] |
|
Thanks for support, folks! I'll try to contact Hyper-V admin and re-create the disk set to work it out. |