[LU-9370] Lustre 2.9 + zfs 0.7 + draid = OSS hangup Created: 20/Apr/17  Updated: 21/Dec/17  Resolved: 21/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: jno Assignee: WC Triage
Resolution: Not a Bug Votes: 1
Labels: zfs
Environment:

CentOS 7 in a Hyper-V vm


Attachments: PNG File Screenshot from 2017-04-24 11-33-03.png     PNG File Screenshot from 2017-04-24 11-33-27.png     PNG File Screenshot from 2017-04-24 11-50-13.png     File collect-info.sh     Zip Archive debug_info.20170424_042703.274836974_0400-4268-node26.zip     Zip Archive debug_info.20170424_043500.551470248_0400-3183-node26.zip     Zip Archive debug_info.20170424_044901.648221747_0400-3235-node26.zip     Zip Archive debug_info.20170424_044924.231035970_0400-4268-node26.zip     Zip Archive debug_info.20170424_044934.896525307_0400-5362-node26.zip     Zip Archive debug_info.20170502_050611.727878977_0400-3460-node26.zip     Zip Archive debug_info.20170502_050643.778657696_0400-4594-node26.zip     Zip Archive debug_info.20170502_050655.040550023_0400-5778-node26.zip     Text File mkzpool.sh.txt     Text File setup-node.sh.txt    
Epic/Theme: lustre-2.9, zfs
Business Value: 1
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I'm trying to build a draid based OST for lustre.

Initially created as [https://github.com/thegreatgazoo/zfs/issues/2|thegreatgazoo/zfs issue].

Generic lustre MGS/MDT is up'n'running.

Fresh VM (4 CPU, 4G RAM) with CentOS 7 "minimal" install and 18 scsi disks (images).

Perform yum -y update ; reboot, then run setup-node.sh NODE from workstation.
Ssh to the NODE and run ./mkzpool.sh:

[root@node26 ~]# ./mkzpool.sh 
+ zpool list
+ grep -w 'no pools available'
+ zpool destroy oss3pool
+ zpool list
+ grep -w 'no pools available'
no pools available
+ '[' -f 17.nvl ']'
+ draidcfg -r 17.nvl
dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare
Using 32 base permutations
  15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0,
   5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16,
  10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8,
  13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10,
  13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15,
   8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10,
  16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3,
   5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13,
   4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1,
  10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15,
   2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14,
  15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13,
   1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15,
   3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5,
  15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11,
  14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6,
   7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15,
  16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8,
   0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13,
   8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1,
   9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1,
   8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16,
   5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2,
  15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3,
  15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14,
  15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6,
  15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1,
   0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13,
   7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4,
  14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2,
   9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16,
   4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15,
+ zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr
+ zpool list
NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
oss3pool  14,7G   612K  14,7G         -     0%     0%  1.00x  ONLINE  -
+ zpool status
  pool: oss3pool
 state: ONLINE
  scan: none requested
config:

	NAME            STATE     READ WRITE CKSUM
	oss3pool        ONLINE       0     0     0
	  draid1-0      ONLINE       0     0     0
	    sdb         ONLINE       0     0     0
	    sdc         ONLINE       0     0     0
	    sdd         ONLINE       0     0     0
	    sde         ONLINE       0     0     0
	    sdf         ONLINE       0     0     0
	    sdg         ONLINE       0     0     0
	    sdh         ONLINE       0     0     0
	    sdi         ONLINE       0     0     0
	    sdj         ONLINE       0     0     0
	    sdk         ONLINE       0     0     0
	    sdl         ONLINE       0     0     0
	    sdm         ONLINE       0     0     0
	    sdn         ONLINE       0     0     0
	    sdo         ONLINE       0     0     0
	    sdp         ONLINE       0     0     0
	    sdq         ONLINE       0     0     0
	    sdr         ONLINE       0     0     0
	spares
	  $draid1-0-s0  AVAIL   
	  $draid1-0-s1  AVAIL   

errors: No known data errors
+ grep oss3pool
+ mount
oss3pool on /oss3pool type zfs (rw,xattr,noacl)
+ mkfs.lustre --reformat --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01

   Permanent disk data:
Target:     ZFS01:OST0003
Index:      3
Lustre FS:  ZFS01
Mount type: zfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: 
Parameters: mgsnode=172.17.32.220@tcp

mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01
Writing oss3pool/ZFS01 properties
  lustre:version=1
  lustre:flags=98
  lustre:index=3
  lustre:fsname=ZFS01
  lustre:svname=ZFS01:OST0003
  lustre:mgsnode=172.17.32.220@tcp
+ '[' -d /lustre/ZFS01/. ']'
+ mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = oss3pool/ZFS01
arg[5] = /lustre/ZFS01
source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
options = rw
checking for existing Lustre data: found
Writing oss3pool/ZFS01 properties
  lustre:version=1
  lustre:flags=34
  lustre:index=3
  lustre:fsname=ZFS01
  lustre:svname=ZFS01:OST0003
  lustre:mgsnode=172.17.32.220@tcp
mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,virgin,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
mount.lustre: mount oss3pool/ZFS01 at /lustre/ZFS01 failed: Address already in use retries left: 0
mount.lustre: mount oss3pool/ZFS01 at /lustre/ZFS01 failed: Address already in use
The target service's index is already in use. (oss3pool/ZFS01)
[root@node26 ~]# mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = oss3pool/ZFS01
arg[5] = /lustre/ZFS01
source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
options = rw
checking for existing Lustre data: found
mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,virgin,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01



 Comments   
Comment by Andreas Dilger [ 20/Apr/17 ]

The "address already in use" message means that you have previously formatted an OST with the same index for this filesystem. You could use a new index, or use the --replace option to re-use the same index.

Comment by jno [ 20/Apr/17 ]

Ok, but why it hangs??

PS. Yes, EADDRINUSE gone, but it still hangs:

+ mkfs.lustre --reformat --replace --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01
Permanent disk data:
Target: ZFS01-OST0003
Index: 3
Lustre FS: ZFS01
Mount type: zfs
Flags: 0x42
 (OST update )
Persistent mount opts: 
Parameters: mgsnode=172.17.32.220@tcp
mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01
Writing oss3pool/ZFS01 properties
 lustre:version=1
 lustre:flags=66
 lustre:index=3
 lustre:fsname=ZFS01
 lustre:svname=ZFS01-OST0003
 lustre:mgsnode=172.17.32.220@tcp
+ '[' -d /lustre/ZFS01/. ']'
+ mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = oss3pool/ZFS01
arg[5] = /lustre/ZFS01
source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
options = rw
checking for existing Lustre data: found
Writing oss3pool/ZFS01 properties
 lustre:version=1
 lustre:flags=2
 lustre:index=3
 lustre:fsname=ZFS01
 lustre:svname=ZFS01-OST0003
 lustre:mgsnode=172.17.32.220@tcp
mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01

 

Comment by John Salinas (Inactive) [ 21/Apr/17 ]

Is the mount hanging or the system hanging? Can you provide the following for each node:

  1. cat collect_info.sh
    Now=$(date +%Y%m%d_%H%M%S |tr '\n' '';echo -n "$$";echo $HOSTNAME)
    Dir="debug_info.$Now"

mkdir -p "$Dir"

./show_kernelmod_params.sh > "$Dir"/OUTPUT.show_kernelmod_params.txt
cat /sys/kernel/debug/tracing/trace > "$Dir"/OUTPUT.kernel_debug_trace.txt
dmesg > "$Dir"/OUTPUT.dmesg.txt 2>&1
zpool events > "$Dir"/OUTPUT.zpool_events.txt 2>&1
zpool events -v > "$Dir"/OUTPUT.zpool_events_verbose.txt 2>&1
lctl dl > "$Dir"/OUTPUT.lctl_dl.txt 2>&1
lctl dk > "$Dir"/OUTPUT.lctl_dk.txt 2>&1
cp /var/log/messages "$Dir"/OUTPUT.messages

Comment by jno [ 24/Apr/17 ]

Well, 

It's the entier system (VM) hangs (crashes), not a mount, an app, or a session.

Not every time I'm lucky enough to see that panic on the console. Usually it just hangs.
 

There are 

  • collect-info.sh - the script used to collect debug info
  • debug_info.20170424_042703.274836974_0400-4268-node26.zip - collected info before the crash
  • debug_info.20170424_043500.551470248_0400-3183-node26.zip - collected info after the crash
  • - console at crash scrolled up
  • - console at crash scrolled down
     
    [root@node26 ~]# ./mkzpool.sh 
     + zpool list
     + grep -w 'no pools available'
     + zpool destroy oss3pool
     + zpool list
     + grep -w 'no pools available'
     no pools available
     + '[' -f 17.nvl ']'
     + draidcfg -r 17.nvl
     dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare
     Using 32 base permutations
     15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0,
     5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16,
     10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8,
     13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10,
     13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15,
     8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10,
     16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3,
     5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13,
     4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1,
     10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15,
     2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14,
     15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13,
     1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15,
     3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5,
     15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11,
     14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6,
     7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15,
     16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8,
     0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13,
     8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1,
     9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1,
     8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16,
     5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2,
     15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3,
     15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14,
     15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6,
     15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1,
     0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13,
     7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4,
     14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2,
     9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16,
     4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15,
     + zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr
     + zpool list
     NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
     oss3pool 14,7G 612K 14,7G - 0% 0% 1.00x ONLINE -
     + zpool status
     pool: oss3pool
     state: ONLINE
     scan: none requested
     config:
    NAME STATE READ WRITE CKSUM
     oss3pool ONLINE 0 0 0
     draid1-0 ONLINE 0 0 0
     sdb ONLINE 0 0 0
     sdc ONLINE 0 0 0
     sdd ONLINE 0 0 0
     sde ONLINE 0 0 0
     sdf ONLINE 0 0 0
     sdg ONLINE 0 0 0
     sdh ONLINE 0 0 0
     sdi ONLINE 0 0 0
     sdj ONLINE 0 0 0
     sdk ONLINE 0 0 0
     sdl ONLINE 0 0 0
     sdm ONLINE 0 0 0
     sdn ONLINE 0 0 0
     sdo ONLINE 0 0 0
     sdp ONLINE 0 0 0
     sdq ONLINE 0 0 0
     sdr ONLINE 0 0 0
     spares
     $draid1-0-s0 AVAIL 
     $draid1-0-s1 AVAIL
    errors: No known data errors
     + mount
     + grep oss3pool
     oss3pool on /oss3pool type zfs (rw,xattr,noacl)
     + mkfs.lustre --reformat --replace --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01
    Permanent disk data:
     Target: ZFS01-OST0003
     Index: 3
     Lustre FS: ZFS01
     Mount type: zfs
     Flags: 0x42
     (OST update )
     Persistent mount opts: 
     Parameters: mgsnode=172.17.32.220@tcp
    mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01
     Writing oss3pool/ZFS01 properties
     lustre:version=1
     lustre:flags=66
     lustre:index=3
     lustre:fsname=ZFS01
     lustre:svname=ZFS01-OST0003
     lustre:mgsnode=172.17.32.220@tcp
     + '[' -d /lustre/ZFS01/. ']'
     + mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
     arg[0] = /sbin/mount.lustre
     arg[1] = -v
     arg[2] = -o
     arg[3] = rw
     arg[4] = oss3pool/ZFS01
     arg[5] = /lustre/ZFS01
     source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
     options = rw
     checking for existing Lustre data: found
     Writing oss3pool/ZFS01 properties
     lustre:version=1
     lustre:flags=2
     lustre:index=3
     lustre:fsname=ZFS01
     lustre:svname=ZFS01-OST0003
     lustre:mgsnode=172.17.32.220@tcp
     mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
    
    

     
    Here it hangs.

Comment by jno [ 24/Apr/17 ]

I've added calls to collect-info.sh right into mkzpool.sh script (with sleep/sync/sleep magic to have the last zip kept).

Here we are:

  • debug_info.20170424_044901.648221747_0400-3235-node26.zip
  • debug_info.20170424_044924.231035970_0400-4268-node26.zip
  • debug_info.20170424_044934.896525307_0400-5362-node26.zip
  • - console at hang (I dunno what "dcla" means here)
     
    [root@node26 ~]# ./mkzpool.sh 
    + ./collect-info.sh
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/ (stored 0%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/Now (deflated 51%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.script.log (deflated 89%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.rpm-qa.txt (deflated 69%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.lsmod.txt (deflated 66%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.lsblk.txt (deflated 79%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.df.txt (deflated 54%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.mount.txt (deflated 74%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.dmesg.txt (deflated 73%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.zpool_events.txt (deflated 62%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.zpool_events_verbose.txt (deflated 79%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.lctl_dl.txt (stored 0%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.lctl_dk.txt (deflated 12%)
      adding: debug_info.20170424_044901.648221747_0400-3235-node26/OUTPUT.messages (deflated 83%)
    + zpool list
    + grep -w 'no pools available'
    + zpool destroy oss3pool
    + zpool list
    + grep -w 'no pools available'
    no pools available
    + '[' -f 17.nvl ']'
    + draidcfg -r 17.nvl
    dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare
    Using 32 base permutations
      15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0,
       5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16,
      10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8,
      13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10,
      13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15,
       8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10,
      16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3,
       5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13,
       4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1,
      10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15,
       2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14,
      15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13,
       1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15,
       3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5,
      15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11,
      14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6,
       7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15,
      16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8,
       0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13,
       8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1,
       9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1,
       8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16,
       5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2,
      15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3,
      15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14,
      15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6,
      15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1,
       0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13,
       7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4,
      14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2,
       9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16,
       4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15,
    + zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr
    + zpool list
    NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
    oss3pool  14,7G   612K  14,7G         -     0%     0%  1.00x  ONLINE  -
    + zpool status
      pool: oss3pool
     state: ONLINE
      scan: none requested
    config:
    
    	NAME            STATE     READ WRITE CKSUM
    	oss3pool        ONLINE       0     0     0
    	  draid1-0      ONLINE       0     0     0
    	    sdb         ONLINE       0     0     0
    	    sdc         ONLINE       0     0     0
    	    sdd         ONLINE       0     0     0
    	    sde         ONLINE       0     0     0
    	    sdf         ONLINE       0     0     0
    	    sdg         ONLINE       0     0     0
    	    sdh         ONLINE       0     0     0
    	    sdi         ONLINE       0     0     0
    	    sdj         ONLINE       0     0     0
    	    sdk         ONLINE       0     0     0
    	    sdl         ONLINE       0     0     0
    	    sdm         ONLINE       0     0     0
    	    sdn         ONLINE       0     0     0
    	    sdo         ONLINE       0     0     0
    	    sdp         ONLINE       0     0     0
    	    sdq         ONLINE       0     0     0
    	    sdr         ONLINE       0     0     0
    	spares
    	  $draid1-0-s0  AVAIL   
    	  $draid1-0-s1  AVAIL   
    
    errors: No known data errors
    + grep oss3pool
    + mount
    oss3pool on /oss3pool type zfs (rw,xattr,noacl)
    + ./collect-info.sh
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/ (stored 0%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/Now (deflated 51%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.script.log (deflated 89%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.rpm-qa.txt (deflated 69%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.lsmod.txt (deflated 66%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.lsblk.txt (deflated 79%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.df.txt (deflated 55%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.mount.txt (deflated 74%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.dmesg.txt (deflated 73%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.zpool_events.txt (deflated 71%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.lctl_dl.txt (stored 0%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.lctl_dk.txt (deflated 12%)
      adding: debug_info.20170424_044924.231035970_0400-4268-node26/OUTPUT.messages (deflated 83%)
    + mkfs.lustre --reformat --replace --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01
    
       Permanent disk data:
    Target:     ZFS01-OST0003
    Index:      3
    Lustre FS:  ZFS01
    Mount type: zfs
    Flags:      0x42
                  (OST update )
    Persistent mount opts: 
    Parameters: mgsnode=172.17.32.220@tcp
    
    mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01
    Writing oss3pool/ZFS01 properties
      lustre:version=1
      lustre:flags=66
      lustre:index=3
      lustre:fsname=ZFS01
      lustre:svname=ZFS01-OST0003
      lustre:mgsnode=172.17.32.220@tcp
    + ./collect-info.sh
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/ (stored 0%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/Now (deflated 51%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.script.log (deflated 89%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.rpm-qa.txt (deflated 69%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.lsmod.txt (deflated 66%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.lsblk.txt (deflated 79%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.df.txt (deflated 55%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.mount.txt (deflated 74%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.dmesg.txt (deflated 73%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.zpool_events.txt (deflated 72%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.lctl_dl.txt (stored 0%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.lctl_dk.txt (deflated 12%)
      adding: debug_info.20170424_044934.896525307_0400-5362-node26/OUTPUT.messages (deflated 83%)
    + '[' -d /lustre/ZFS01/. ']'
    + mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
    arg[0] = /sbin/mount.lustre
    arg[1] = -v
    arg[2] = -o
    arg[3] = rw
    arg[4] = oss3pool/ZFS01
    arg[5] = /lustre/ZFS01
    source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
    options = rw
    checking for existing Lustre data: found
    Writing oss3pool/ZFS01 properties
      lustre:version=1
      lustre:flags=2
      lustre:index=3
      lustre:fsname=ZFS01
      lustre:svname=ZFS01-OST0003
      lustre:mgsnode=172.17.32.220@tcp
    mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01
    
    
    
    
    

     And yes, it now (after fixing my mistake with --index) crashes on the 1st mount.

Comment by John Salinas (Inactive) [ 24/Apr/17 ]

Greetings,

Maybe I am not looking at this right but it does not look like Lustre is installed on the OSS node? Can you confirm? In the rpm list I didn't see the rpms and the Lustre command did not appear to be able to run.

Comment by jno [ 25/Apr/17 ]

Hi there,

 

Yes, it was installed. From build (make install).

I.e. one may see

[root@node26 ~]# lustre_
lustre_req_history lustre_routes_config lustre_rsync
lustre_rmmod lustre_routes_conversion lustre_start
[root@node26 ~]# lustre_
Comment by John Salinas (Inactive) [ 25/Apr/17 ]

Right but look in your lsmod output – it does not appear lustre or lnet are there.

$ grep lustre OUTPUT.lsmod.txt
$

This is why all of your Lustre commands are failing – such as: invalid parameter 'dump_kernel'
open(dump_kernel) failed: No such file or directory

Could you please load the Lustre & lnet kernel modules and try this again? Also I do not see output from the mds. If there are still issues that would be helpful.

Thank you

Comment by jno [ 02/May/17 ]

BTW, there are quite a while (37) of modules here:

[root@node26 ~]# find . -name '*.ko' 
./spl/module/spl/spl.ko
./spl/module/splat/splat.ko
./zfs/module/avl/zavl.ko
./zfs/module/icp/icp.ko
./zfs/module/nvpair/znvpair.ko
./zfs/module/unicode/zunicode.ko
./zfs/module/zcommon/zcommon.ko
./zfs/module/zfs/zfs.ko
./zfs/module/zpios/zpios.ko
./lustre-release/libcfs/libcfs/libcfs.ko
./lustre-release/lnet/klnds/o2iblnd/ko2iblnd.ko
./lustre-release/lnet/klnds/socklnd/ksocklnd.ko
./lustre-release/lnet/lnet/lnet.ko
./lustre-release/lnet/selftest/lnet_selftest.ko
./lustre-release/lustre/fid/fid.ko
./lustre-release/lustre/fld/fld.ko
./lustre-release/lustre/lfsck/lfsck.ko
./lustre-release/lustre/llite/llite_lloop.ko
./lustre-release/lustre/llite/lustre.ko
./lustre-release/lustre/lmv/lmv.ko
./lustre-release/lustre/lod/lod.ko
./lustre-release/lustre/lov/lov.ko
./lustre-release/lustre/mdc/mdc.ko
./lustre-release/lustre/mdd/mdd.ko
./lustre-release/lustre/mdt/mdt.ko
./lustre-release/lustre/mgc/mgc.ko
./lustre-release/lustre/mgs/mgs.ko
./lustre-release/lustre/obdclass/obdclass.ko
./lustre-release/lustre/obdclass/llog_test.ko
./lustre-release/lustre/obdecho/obdecho.ko
./lustre-release/lustre/ofd/ofd.ko
./lustre-release/lustre/osc/osc.ko
./lustre-release/lustre/osd-zfs/osd_zfs.ko
./lustre-release/lustre/osp/osp.ko
./lustre-release/lustre/ost/ost.ko
./lustre-release/lustre/ptlrpc/ptlrpc.ko
./lustre-release/lustre/quota/lquota.ko

and it's not obvious to me which and when to load...
In the debug_info.20170424_044934.896525307_0400-5362-node26.zip one may see spl and zfs things loaded.

Ok, I'll try to load all or some of them now and re-try.

Comment by jno [ 02/May/17 ]

Well, same GOOD different days with some quite randomly chosen modules:

[root@node26 ~]# ./mkzpool.sh 
+ modules=(spl zfs lnet lustre ost osd_zfs)
+ typeset -a modules
+ ./collect-info.sh
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/ (stored 0%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/Now (deflated 51%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.script.log (deflated 89%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.rpm-qa.txt (deflated 69%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lsmod.txt (deflated 66%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lsblk.txt (deflated 79%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.df.txt (deflated 54%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.mount.txt (deflated 74%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.dmesg.txt (deflated 73%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.zpool_events.txt (deflated 61%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.zpool_events_verbose.txt (deflated 79%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lctl_dl.txt (stored 0%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.lctl_dk.txt (deflated 12%)
 adding: debug_info.20170502_050611.727878977_0400-3460-node26/OUTPUT.messages (deflated 84%)
+ modFilter=
+ for module in '${modules[*]}'
+ echo '+ [spl]'
+ [spl]
++ test -z ''
++ echo ''
+ modFilter=spl
+ modprobe -v spl
+ for module in '${modules[*]}'
+ echo '+ [zfs]'
+ [zfs]
++ test -z spl
++ echo 'spl|'
+ modFilter='spl|zfs'
+ modprobe -v zfs
+ for module in '${modules[*]}'
+ echo '+ [lnet]'
+ [lnet]
++ test -z 'spl|zfs'
++ echo 'spl|zfs|'
+ modFilter='spl|zfs|lnet'
+ modprobe -v lnet
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/net/lustre/libcfs.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/net/lustre/lnet.ko 
+ for module in '${modules[*]}'
+ echo '+ [lustre]'
+ [lustre]
++ test -z 'spl|zfs|lnet'
++ echo 'spl|zfs|lnet|'
+ modFilter='spl|zfs|lnet|lustre'
+ modprobe -v lustre
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/obdclass.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/ptlrpc.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/fld.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/fid.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lov.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/mdc.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lmv.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lustre.ko 
+ for module in '${modules[*]}'
+ echo '+ [ost]'
+ [ost]
++ test -z 'spl|zfs|lnet|lustre'
++ echo 'spl|zfs|lnet|lustre|'
+ modFilter='spl|zfs|lnet|lustre|ost'
+ modprobe -v ost
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/ost.ko 
+ for module in '${modules[*]}'
+ echo '+ [osd_zfs]'
+ [osd_zfs]
++ test -z 'spl|zfs|lnet|lustre|ost'
++ echo 'spl|zfs|lnet|lustre|ost|'
+ modFilter='spl|zfs|lnet|lustre|ost|osd_zfs'
+ modprobe -v osd_zfs
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/lquota.ko 
insmod /lib/modules/3.10.0-514.16.1.el7.x86_64/extra/kernel/fs/lustre/osd_zfs.ko 
+ lsmod
+ grep -E 'spl|zfs|lnet|lustre|ost|osd_zfs'
osd_zfs 252589 0 
lquota 354067 1 osd_zfs
ost 14991 0 
lustre 816649 0 
lmv 222021 1 lustre
mdc 173180 1 lustre
lov 295937 1 lustre
fid 90581 2 mdc,osd_zfs
fld 85860 3 fid,lmv,osd_zfs
ptlrpc 2129791 8 fid,fld,lmv,mdc,lov,ost,lquota,lustre
obdclass 1909130 20 fid,fld,lmv,mdc,lov,ost,lquota,lustre,ptlrpc,osd_zfs
lnet 444969 4 lustre,obdclass,ptlrpc,ksocklnd
libcfs 405310 13 fid,fld,lmv,mdc,lov,ost,lnet,lquota,lustre,obdclass,ptlrpc,osd_zfs,ksocklnd
zfs 4026085 1 osd_zfs
zunicode 331170 1 zfs
zavl 19839 1 zfs
icp 299501 1 zfs
zcommon 77836 2 zfs,osd_zfs
znvpair 93348 3 zfs,zcommon,osd_zfs
spl 130321 6 icp,zfs,zavl,zcommon,znvpair,osd_zfs
zlib_deflate 26914 1 spl
+ zpool list
+ grep -w 'no pools available'
+ zpool destroy oss3pool
+ zpool list
+ grep -w 'no pools available'
no pools available
+ '[' -f 17.nvl ']'
+ draidcfg -r 17.nvl
dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare
Using 32 base permutations
 15, 2, 8, 7,10, 5, 4,16, 1,13,14, 9,11,12, 3, 6, 0,
 5,15,14, 9, 0,11,13, 4, 3,12, 8,10, 7, 1, 6, 2,16,
 10,11,14, 5,15, 2,13, 6, 1, 3, 4, 7,12,16, 9, 0, 8,
 13, 2,12,14, 8, 0, 7, 4, 9,15,11, 6, 3,16, 1, 5,10,
 13, 5, 2,16, 6, 0, 4, 8,10, 1, 3,14, 9,11,12, 7,15,
 8,12, 3,14, 0, 4,16, 6, 2,11, 1, 7, 9,15,13, 5,10,
 16,14, 2, 9, 7, 4,11, 0, 6,12,10, 8, 1,13,15, 5, 3,
 5,16, 6, 1,10,15,11, 3, 8,14, 2,12, 0, 7, 9, 4,13,
 4,12, 8,10,14, 9, 6,11,15, 0, 3,13, 7, 2, 5,16, 1,
 10,14,16,11,12, 2, 5, 3, 4, 7, 0, 1, 6, 9,13, 8,15,
 2, 1,11,15,16, 6,12, 3,10,13, 8, 5, 4, 0, 7, 9,14,
 15,14, 1, 5,16, 2,12, 8, 9, 6,11,10, 3, 0, 7, 4,13,
 1, 5,10, 9, 2, 8, 4,16, 7,11, 3,12, 6,14, 0,13,15,
 3, 7,16,10,13, 2, 6, 8,14,15,12,11, 0, 9, 1, 4, 5,
 15, 2,14, 8, 5,16, 3,13, 4, 1, 9,12,10, 0, 6, 7,11,
 14,12,11,15,16,10, 2, 9, 8, 4, 3, 1,13, 5, 7, 0, 6,
 7,13, 2,11,14, 0, 1, 8, 9,10,16, 4, 6,12, 5, 3,15,
 16, 1,11, 4, 3, 9, 6,13, 5, 7,10,15,14,12, 2, 0, 8,
 0, 5, 2,10,16,12, 6, 3,11,14, 1, 9, 7,15, 4, 8,13,
 8,13,11, 4,10, 6, 7,16, 5,12, 9,14, 2, 3, 0,15, 1,
 9, 6,12,16, 4, 7, 3, 0, 2,15,13, 8,11,14, 5,10, 1,
 8,12, 0, 6,15, 7, 4,13,14,10, 1, 9, 5, 3,11, 2,16,
 5,15, 9,10,16, 6,11, 0, 7,13, 8,14, 3, 4, 1,12, 2,
 15,14, 2, 9, 4,11, 7, 1, 6,10, 5, 0, 8,12,13,16, 3,
 15,16, 0,10, 3,12,11, 7, 1, 8, 6,13, 4, 5, 9, 2,14,
 15, 4, 7,13,14, 2, 9,10,16, 1,11,12, 8, 0, 3, 5, 6,
 15, 8,13, 0, 4, 7, 3,14, 5,12, 2, 9,10,11, 6,16, 1,
 0, 7, 5, 3, 1,14,16, 4, 2,15,12, 8,10, 6, 9,11,13,
 7, 6, 0,15,16,11, 8, 1, 5,12,13,14,10, 9, 3, 2, 4,
 14,16,10, 6, 4,13, 3, 1,15,12,11, 8, 9, 5, 0, 7, 2,
 9, 3, 5,15,10,11, 8, 7, 2,14, 6,13, 0, 4, 1,12,16,
 4, 6, 7,14, 5, 3,12, 1,13, 9,16, 2, 0,10, 8,11,15,
+ zpool create -f oss3pool draid1 cfg=17.nvl /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr
+ zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
oss3pool 14,7G 612K 14,7G - 0% 0% 1.00x ONLINE -
+ zpool status
 pool: oss3pool
 state: ONLINE
 scan: none requested
config:

 NAME STATE READ WRITE CKSUM
 oss3pool ONLINE 0 0 0
 draid1-0 ONLINE 0 0 0
 sdb ONLINE 0 0 0
 sdc ONLINE 0 0 0
 sdd ONLINE 0 0 0
 sde ONLINE 0 0 0
 sdf ONLINE 0 0 0
 sdg ONLINE 0 0 0
 sdh ONLINE 0 0 0
 sdi ONLINE 0 0 0
 sdj ONLINE 0 0 0
 sdk ONLINE 0 0 0
 sdl ONLINE 0 0 0
 sdm ONLINE 0 0 0
 sdn ONLINE 0 0 0
 sdo ONLINE 0 0 0
 sdp ONLINE 0 0 0
 sdq ONLINE 0 0 0
 sdr ONLINE 0 0 0
 spares
 $draid1-0-s0 AVAIL 
 $draid1-0-s1 AVAIL 

errors: No known data errors
+ mount
+ grep oss3pool
oss3pool on /oss3pool type zfs (rw,xattr,noacl)
+ ./collect-info.sh
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/ (stored 0%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/Now (deflated 51%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.script.log (deflated 90%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.rpm-qa.txt (deflated 69%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lsmod.txt (deflated 67%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lsblk.txt (deflated 79%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.df.txt (deflated 55%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.mount.txt (deflated 74%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.dmesg.txt (deflated 73%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.zpool_events.txt (deflated 69%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lctl_dl.txt (stored 0%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.lctl_dk.txt (deflated 68%)
 adding: debug_info.20170502_050643.778657696_0400-4594-node26/OUTPUT.messages (deflated 84%)
+ mkfs.lustre --reformat --replace --ost --backfstype=zfs --fsname=ZFS01 --index=3 --mgsnode=mgs@tcp0 oss3pool/ZFS01

 Permanent disk data:
Target: ZFS01-OST0003
Index: 3
Lustre FS: ZFS01
Mount type: zfs
Flags: 0x42
 (OST update )
Persistent mount opts: 
Parameters: mgsnode=172.17.32.220@tcp

mkfs_cmd = zfs create -o canmount=off -o xattr=sa oss3pool/ZFS01
Writing oss3pool/ZFS01 properties
 lustre:version=1
 lustre:flags=66
 lustre:index=3
 lustre:fsname=ZFS01
 lustre:svname=ZFS01-OST0003
 lustre:mgsnode=172.17.32.220@tcp
+ ./collect-info.sh
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/ (stored 0%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/Now (deflated 51%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.script.log (deflated 90%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.rpm-qa.txt (deflated 69%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lsmod.txt (deflated 67%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lsblk.txt (deflated 79%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.df.txt (deflated 56%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.mount.txt (deflated 74%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.show_kernelmod_params.txt (deflated 67%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.kernel_debug_trace.txt (deflated 57%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.dmesg.txt (deflated 73%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.zpool_events.txt (deflated 69%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.zpool_events_verbose.txt (deflated 85%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lctl_dl.txt (stored 0%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.lctl_dk.txt (deflated 9%)
 adding: debug_info.20170502_050655.040550023_0400-5778-node26/OUTPUT.messages (deflated 84%)
+ '[' -d /lustre/ZFS01/. ']'
+ mount -v -t lustre oss3pool/ZFS01 /lustre/ZFS01
arg[0] = /sbin/mount.lustre
arg[1] = -v
arg[2] = -o
arg[3] = rw
arg[4] = oss3pool/ZFS01
arg[5] = /lustre/ZFS01
source = oss3pool/ZFS01 (oss3pool/ZFS01), target = /lustre/ZFS01
options = rw
checking for existing Lustre data: found
Writing oss3pool/ZFS01 properties
 lustre:version=1
 lustre:flags=2
 lustre:index=3
 lustre:fsname=ZFS01
 lustre:svname=ZFS01-OST0003
 lustre:mgsnode=172.17.32.220@tcp
mounting device oss3pool/ZFS01 at /lustre/ZFS01, flags=0x1000000 options=osd=osd-zfs,,mgsnode=172.17.32.220@tcp,update,param=mgsnode=172.17.32.220@tcp,svname=ZFS01-OST0003,device=oss3pool/ZFS01

Debug info will arrive in few minutes...

Comment by John Salinas (Inactive) [ 02/May/17 ]

Looks like you are getting further in the process but failing early in the mount process due to underlying errors:

[ 1946.710418] LNet: HW CPU cores: 4, npartitions: 1
[ 1946.718890] alg: No test for adler32 (adler32-zlib)
[ 1946.719309] alg: No test for crc32 (crc32-table)
[ 1951.762047] sha512_ssse3: Using AVX optimized SHA-512 implementation
[ 1954.979635] Lustre: Lustre: Build Version: 2.9.0
[ 1955.302507] LNet: Added LNI 172.17.32.226@tcp [8/256/0/180]
[ 1955.302711] LNet: Accept secure, port 988
[ 1959.056006] GPT:disk_guids don't match.
[ 1959.056034] GPT:partition_entry_array_crc32 values don't match: 0x5d3c877c != 0x443a8464
[ 1959.056037] GPT: Use GNU Parted to correct GPT errors.
[ 1959.056059]  sdb: sdb1 sdb9
[ 1959.230444]  sdb: sdb1 sdb9
[ 1959.406374] GPT:disk_guids don't match.
[ 1959.406384] GPT:partition_entry_array_crc32 values don't match: 0x29fa53ae != 0xa19347b0
[ 1959.406387] GPT: Use GNU Parted to correct GPT errors.
[ 1959.406408]  sdc: sdc1 sdc9
[ 1959.610229]  sdc: sdc1 sdc9
[ 1959.821418] Alternate GPT is invalid, using primary GPT.
[ 1959.821444]  sdd: sdd1 sdd9
[ 1959.903091]  sdd: sdd1 sdd9
[ 1960.088271] GPT:disk_guids don't match.
[ 1960.088279] GPT:partition_entry_array_crc32 values don't match: 0xda543dc7 != 0xdb0d75f4
[ 1960.088281] GPT: Use GNU Parted to correct GPT errors.
[ 1960.088302]  sde: sde1 sde9
[ 1960.324063]  sde: sde1 sde9
[ 1960.347788]  sde: sde1 sde9
[ 1960.515198] Alternate GPT is invalid, using primary GPT.
[ 1960.515225]  sdf: sdf1 sdf9
[ 1960.845503]  sdf: sdf1 sdf9
[ 1960.869365]  sdf: sdf1 sdf9
[ 1961.018646] GPT:disk_guids don't match.
[ 1961.018654] GPT:partition_entry_array_crc32 values don't match: 0xf42c8d7b != 0x97a63590
[ 1961.018657] GPT: Use GNU Parted to correct GPT errors.
[ 1961.018679]  sdg: sdg1 sdg9
[ 1961.349725]  sdg: sdg1 sdg9
[ 1961.373959]  sdg: sdg1 sdg9
[ 1961.524544] Alternate GPT is invalid, using primary GPT.
[ 1961.524569]  sdh: sdh1 sdh9
[ 1961.655219]  sdh: sdh1 sdh9
[ 1961.814506] GPT:disk_guids don't match.
[ 1961.814515] GPT:partition_entry_array_crc32 values don't match: 0x3d5540f9 != 0x85f3e2e6
[ 1961.814517] GPT: Use GNU Parted to correct GPT errors.
[ 1961.814537]  sdi: sdi1 sdi9
[ 1961.867240]  sdi: sdi1 sdi9
[ 1962.081393] Alternate GPT is invalid, using primary GPT.
[ 1962.081420]  sdj: sdj1 sdj9
[ 1962.261463]  sdj: sdj1 sdj9
[ 1962.485817] Alternate GPT is invalid, using primary GPT.
[ 1962.485841]  sdk: sdk1 sdk9
[ 1962.617151]  sdk: sdk1 sdk9
[ 1962.828196] GPT:disk_guids don't match.
[ 1962.828206] GPT:partition_entry_array_crc32 values don't match: 0x7cf05c31 != 0xbfb68e7
[ 1962.828208] GPT: Use GNU Parted to correct GPT errors.
[ 1962.828232]  sdl: sdl1 sdl9
[ 1962.990115]  sdl: sdl1 sdl9
[ 1963.188994] GPT:disk_guids don't match.
[ 1963.189028] GPT:partition_entry_array_crc32 values don't match: 0x38ff1612 != 0xc53037f5
[ 1963.189031] GPT: Use GNU Parted to correct GPT errors.
[ 1963.189055]  sdm: sdm1 sdm9
[ 1963.453695]  sdm: sdm1 sdm9
[ 1963.622171] GPT:disk_guids don't match.
[ 1963.622179] GPT:partition_entry_array_crc32 values don't match: 0x6577aef4 != 0x1624515d
[ 1963.622182] GPT: Use GNU Parted to correct GPT errors.
[ 1963.622202]  sdn: sdn1 sdn9
[ 1963.927932]  sdn: sdn1 sdn9
[ 1964.131710] Alternate GPT is invalid, using primary GPT.
[ 1964.131737]  sdo: sdo1 sdo9
[ 1964.304545]  sdo: sdo1 sdo9
[ 1964.537353] Alternate GPT is invalid, using primary GPT.
[ 1964.537380]  sdp: sdp1 sdp9
[ 1964.608130]  sdp: sdp1 sdp9
[ 1964.861371] Alternate GPT is invalid, using primary GPT.
[ 1964.861397]  sdq: sdq1 sdq9
[ 1964.988531]  sdq: sdq1 sdq9
[ 1965.295413] GPT:disk_guids don't match.
[ 1965.295421] GPT:partition_entry_array_crc32 values don't match: 0x2cd0988d != 0x828d383e
[ 1965.295424] GPT: Use GNU Parted to correct GPT errors.
[ 1965.295458]  sdr: sdr1 sdr9
[ 1965.577126]  sdr: sdr1 sdr9 

I have seen this before when disks are exact clones of each other and have identical UUID/WWIDs, but there are probably other reasons as well.

Comment by jno [ 03/May/17 ]

Wow!

It was a Hyper-V host out of my control...

Thanks for the point, went to dig.

Comment by John Salinas (Inactive) [ 03/May/17 ]

I understand, I have been there before. Please let us know if we can close this ticket. If you have any dRAID testing questions please keep in contact with us.

Comment by jno [ 05/May/17 ]

Thanks for support, folks!

I'll try to contact Hyper-V admin and re-create the disk set to work it out.
But it may take a while.

Generated at Sat Feb 10 02:25:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.