[LU-8508] kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Created: 16/Aug/16 Updated: 16/Sep/16 Resolved: 16/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Adam Roe (Inactive) | Assignee: | Kit Westneat |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS 7.2 (Kernel: Various) Hardware: kernel-3.10.0-327.22.2.el7_lustre.x86_64 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Lustre DNE2 Testing, noticed some issue with latest master builds. When mounting storage targets on servers other than ones with the MGT i get a kernel panic with the below; I have validated this is not (to the best of my ability) network, I have also tried and FE build which works and another master build (3419) which works: [root@zlfs2-oss1 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/MDT0000 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/nvme0n1 arg[5] = /mnt/MDT0000 source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/MDT0000 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/nvme0n1 at /mnt/MDT0000, flags=0x1000000 options=osd=osd-ldiskfs,user_xattr,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-MDT0000,device=/dev/nvme0n1 mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler' Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) LBUG Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:Kernel panic - not syncing: LBUG Attached is some debugging / more info. Builds Tried: |
| Comments |
| Comment by Adam Roe (Inactive) [ 16/Aug/16 ] |
|
Format Commands
[root@lfsmaster FORMAT_SCRIPTS]# cat zlfs2-dne2-ldiskfs_1MDT.sh
#!/bin/bash
MGSNID=192.168.5.21@o2ib0
FSNAME=zlfs2
RECORDSIZE=1024k
ssh zlfs2-mds1 zpool destroy MGT0000
ssh zlfs2-mds1 zpool create MGT0000 mirror -f /dev/sdc /dev/sdd
ssh zlfs2-mds1 mkfs.lustre --reformat --mgs --backfstype=zfs MGT0000/MGT0000
ssh zlfs2-mds1 mkfs.lustre --reformat --ost --fsname=${FSNAME} --mgsnode=${MGSNID} --index=0 /dev/nvme0n1
ssh zlfs2-mds1 mkfs.lustre --reformat --ost --fsname=${FSNAME} --mgsnode=${MGSNID} --index=1 /dev/nvme1n1
ssh zlfs2-mds1 mkfs.lustre --reformat --ost --fsname=${FSNAME} --mgsnode=${MGSNID} --index=2 /dev/nvme2n1
ssh zlfs2-mds1 mkfs.lustre --reformat --ost --fsname=${FSNAME} --mgsnode=${MGSNID} --index=3 /dev/nvme3n1
ssh zlfs2-mds2 mkfs.lustre --reformat --ost --fsname=${FSNAME} --mgsnode=${MGSNID} --index=4 /dev/nvme0n1
ssh zlfs2-mds2 mkfs.lustre --reformat --ost --fsname=${FSNAME} --mgsnode=${MGSNID} --index=5 /dev/nvme1n1
ssh zlfs2-mds2 mkfs.lustre --reformat --ost --fsname=${FSNAME} --mgsnode=${MGSNID} --index=6 /dev/nvme2n1
ssh zlfs2-mds2 mkfs.lustre --reformat --ost --fsname=${FSNAME} --mgsnode=${MGSNID} --index=7 /dev/nvme3n1
#
ssh zlfs2-oss1 mkfs.lustre --reformat --mdt --fsname=${FSNAME} --mgsnode=${MGSNID} --index=0 /dev/nvme0n1
ssh zlfs2-oss2 mkfs.lustre --reformat --mdt --fsname=${FSNAME} --mgsnode=${MGSNID} --index=1 /dev/nvme0n1
ssh zlfs2-oss3 mkfs.lustre --reformat --mdt --fsname=${FSNAME} --mgsnode=${MGSNID} --index=2 /dev/nvme0n1
ssh zlfs2-oss4 mkfs.lustre --reformat --mdt --fsname=${FSNAME} --mgsnode=${MGSNID} --index=3 /dev/nvme0n1
ssh zlfs2-oss5 mkfs.lustre --reformat --mdt --fsname=${FSNAME} --mgsnode=${MGSNID} --index=4 /dev/nvme0n1
ssh zlfs2-oss6 mkfs.lustre --reformat --mdt --fsname=${FSNAME} --mgsnode=${MGSNID} --index=5 /dev/nvme0n1
ssh zlfs2-oss7 mkfs.lustre --reformat --mdt --fsname=${FSNAME} --mgsnode=${MGSNID} --index=6 /dev/nvme0n1
ssh zlfs2-oss8 mkfs.lustre --reformat --mdt --fsname=${FSNAME} --mgsnode=${MGSNID} --index=7 /dev/nvme0n1
Output [root@lfsmaster FORMAT_SCRIPTS]# ./zlfs2-dne2-ldiskfs_1MDT.sh
Permanent disk data:
Target: MGS
Index: unassigned
Lustre FS:
Mount type: zfs
Flags: 0x64
(MGS first_time update )
Persistent mount opts:
Parameters:
mkfs_cmd = zfs create -o canmount=off -o xattr=sa MGT0000/MGT0000
Writing MGT0000/MGT0000 properties
lustre:version=1
lustre:flags=100
lustre:index=65535
lustre:svname=MGS
Permanent disk data:
Target: zlfs2:OST0000
Index: 0
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:OST0000
4k blocks 195353046
options -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:OST0000 -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/nvme0n1 195353046
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:OST0001
Index: 1
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme1n1
target name zlfs2:OST0001
4k blocks 195353046
options -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:OST0001 -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/nvme1n1 195353046
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:OST0002
Index: 2
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme2n1
target name zlfs2:OST0002
4k blocks 195353046
options -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:OST0002 -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/nvme2n1 195353046
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:OST0003
Index: 3
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme3n1
target name zlfs2:OST0003
4k blocks 195353046
options -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:OST0003 -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/nvme3n1 195353046
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:OST0004
Index: 4
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:OST0004
4k blocks 195353046
options -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:OST0004 -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/nvme0n1 195353046
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:OST0005
Index: 5
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme1n1
target name zlfs2:OST0005
4k blocks 195353046
options -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:OST0005 -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/nvme1n1 195353046
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:OST0006
Index: 6
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme2n1
target name zlfs2:OST0006
4k blocks 195353046
options -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:OST0006 -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/nvme2n1 195353046
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:OST0007
Index: 7
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x62
(OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 763097MB
formatting backing filesystem ldiskfs on /dev/nvme3n1
target name zlfs2:OST0007
4k blocks 195353046
options -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:OST0007 -J size=400 -I 256 -i 69905 -q -O extents,uninit_bg,dir_nlink,quota,huge_file,flex_bg -G 256 -E resize="4290772992",lazy_journal_init -F /dev/nvme3n1 195353046
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:MDT0000
Index: 0
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 1907729MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:MDT0000
4k blocks 488378646
options -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:MDT0000 -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/nvme0n1 488378646
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:MDT0001
Index: 1
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 1907729MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:MDT0001
4k blocks 488378646
options -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:MDT0001 -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/nvme0n1 488378646
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:MDT0002
Index: 2
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 1907729MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:MDT0002
4k blocks 488378646
options -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:MDT0002 -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/nvme0n1 488378646
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:MDT0003
Index: 3
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 1907729MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:MDT0003
4k blocks 488378646
options -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:MDT0003 -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/nvme0n1 488378646
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:MDT0004
Index: 4
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 1907729MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:MDT0004
4k blocks 488378646
options -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:MDT0004 -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/nvme0n1 488378646
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:MDT0005
Index: 5
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 1907729MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:MDT0005
4k blocks 488378646
options -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:MDT0005 -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/nvme0n1 488378646
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:MDT0006
Index: 6
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 1907729MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:MDT0006
4k blocks 488378646
options -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:MDT0006 -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/nvme0n1 488378646
Writing CONFIGS/mountdata
Permanent disk data:
Target: zlfs2:MDT0007
Index: 7
Lustre FS: zlfs2
Mount type: ldiskfs
Flags: 0x61
(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.5.21@o2ib
device size = 1907729MB
formatting backing filesystem ldiskfs on /dev/nvme0n1
target name zlfs2:MDT0007
4k blocks 488378646
options -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L zlfs2:MDT0007 -J size=4096 -I 512 -i 2048 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/nvme0n1 488378646
Writing CONFIGS/mountdata
|
| Comment by Adam Roe (Inactive) [ 16/Aug/16 ] |
|
/var/log/messages [root@zlfs2-oss1 ~]# tail -f /var/log/messages Aug 16 21:51:43 zlfs2-oss1 kernel: Lustre: Failing over zlfs2-MDT0000 Aug 16 21:51:49 zlfs2-oss1 kernel: Lustre: zlfs2-MDT0000: Not available for connect from 192.168.5.21@o2ib (stopping) Aug 16 21:51:49 zlfs2-oss1 kernel: LustreError: 3732:0:(import.c:338:ptlrpc_invalidate_import()) zlfs2-OST0004_UUID: rc = -110 waiting for callback (1 != 0) Aug 16 21:51:49 zlfs2-oss1 kernel: LustreError: 3732:0:(import.c:364:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff881fe3818300 x1542853838766384/t0(0) o8->zlfs2-OST0004-osc-MDT0000@192.168.5.22@o2ib:28/4 lens 520/544 e 0 to 0 dl 1471380708 ref 2 fl UnregRPC:EN/0/ffffffff rc -5/-1 Aug 16 21:51:49 zlfs2-oss1 kernel: LustreError: 3732:0:(import.c:378:ptlrpc_invalidate_import()) zlfs2-OST0004_UUID: Unregistering RPCs found (1). Network is sluggish? Waiting them to error out. Aug 16 21:52:14 zlfs2-oss1 kernel: Lustre: zlfs2-MDT0000: Not available for connect from 192.168.5.21@o2ib (stopping) Aug 16 21:52:14 zlfs2-oss1 kernel: Lustre: Skipped 3 previous similar messages Aug 16 21:52:16 zlfs2-oss1 systemd-logind: New session 6 of user root. Aug 16 21:52:16 zlfs2-oss1 systemd: Started Session 6 of user root. Aug 16 21:52:16 zlfs2-oss1 systemd: Starting Session 6 of user root. Aug 16 21:52:33 zlfs2-oss1 kernel: LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Aug 16 21:52:33 zlfs2-oss1 kernel: LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) LBUG Aug 16 21:52:33 zlfs2-oss1 kernel: Pid: 3842, comm: osp-syn-4-0 Aug 16 21:52:33 zlfs2-oss1 kernel: #012Call Trace: Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0ba27d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0ba2d75>] lbug_with_loc+0x45/0xc0 [libcfs] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0cdd9d8>] lu_device_fini+0xb8/0xc0 [obdclass] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0cc2b42>] ls_device_put+0x82/0x2a0 [obdclass] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0cc2e3d>] local_oid_storage_fini+0xdd/0x210 [obdclass] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0c9fa1c>] llog_osd_cleanup+0x3c/0x50 [obdclass] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0c9cc23>] __llog_ctxt_put+0x93/0x140 [obdclass] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0c9d113>] llog_cleanup+0xc3/0x490 [obdclass] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0c9381d>] ? llog_handle_put+0x2d/0x70 [obdclass] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0c938b9>] ? llog_close+0x59/0x1a0 [obdclass] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa166830b>] osp_sync_thread+0x46b/0x990 [osp] Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:LustreError: 3842:0:(lu_object.c:1243:lu_device_fini()) LBUG Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa1667ea0>] ? osp_sync_thread+0x0/0x990 [osp] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0 Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0 Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffff816469d8>] ret_from_fork+0x58/0x90 Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0 Aug 16 21:52:33 zlfs2-oss1 kernel: Aug 16 21:52:33 zlfs2-oss1 kernel: Kernel panic - not syncing: LBUG Message from syslogd@zlfs2-oss1 at Aug 16 21:52:33 ... kernel:Kernel panic - not syncing: LBUG Aug 16 21:52:33 zlfs2-oss1 kernel: CPU: 16 PID: 3842 Comm: osp-syn-4-0 Tainted: P OE ------------ 3.10.0-327.22.2.el7_lustre.x86_64 #1 Aug 16 21:52:33 zlfs2-oss1 kernel: Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 Aug 16 21:52:33 zlfs2-oss1 kernel: ffffffffa0bbfdef 000000008386d797 ffff880ff7ba7bf8 ffffffff81636324 Aug 16 21:52:33 zlfs2-oss1 kernel: ffff880ff7ba7c78 ffffffff8162fb9a ffffffff00000008 ffff880ff7ba7c88 Aug 16 21:52:33 zlfs2-oss1 kernel: ffff880ff7ba7c28 000000008386d797 ffffffffa0d0d1d5 0000000000000000 Aug 16 21:52:33 zlfs2-oss1 kernel: Call Trace: Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffff81636324>] dump_stack+0x19/0x1b Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffff8162fb9a>] panic+0xd8/0x1e7 Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0ba2ddb>] lbug_with_loc+0xab/0xc0 [libcfs] Aug 16 21:52:33 zlfs2-oss1 kernel: [<ffffffffa0cdd9d8>] lu_device_fini+0xb8/0xc0 [obdclass] |
| Comment by Peter Jones [ 16/Aug/16 ] |
|
Fan Yong What do you advise here? Peter |
| Comment by Adam Roe (Inactive) [ 17/Aug/16 ] |
|
I have just tried this with Build 3420, and I get the below, I then re provisioned with Build 3419 to validate and all is working fine - I would suggest one of the b3420 patches is causing this. [root@zlfs2-oss1 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/MDT0000 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/nvme0n1 arg[5] = /mnt/MDT0000 source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/MDT0000 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/nvme0n1 at /mnt/MDT0000, flags=0x1000000 options=osd=osd-ldiskfs,user_xattr,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-MDT0000,device=/dev/nvme0n1 mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler' Message from syslogd@zlfs2-oss1 at Aug 17 08:06:45 ... kernel:LustreError: 3895:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Message from syslogd@zlfs2-oss1 at Aug 17 08:06:45 ... kernel:LustreError: 3895:0:(lu_object.c:1243:lu_device_fini()) LBUG /var/log/messages + syslogd Aug 17 08:06:44 zlfs2-oss1 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro Aug 17 08:06:45 zlfs2-oss1 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache Aug 17 08:06:45 zlfs2-oss1 kernel: LustreError: 3788:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.5.21@o2ib: failed processing log, type 4: rc = -22 Aug 17 08:06:45 zlfs2-oss1 kernel: Lustre: ctl-zlfs2-MDT0000: No data found on store. Initialize space Aug 17 08:06:45 zlfs2-oss1 kernel: Lustre: zlfs2-MDT0000: new disk, initializing Aug 17 08:06:45 zlfs2-oss1 kernel: Lustre: ctl-zlfs2-MDT0000: super-sequence allocation rc = 0 [0x0000000200000400-0x0000000240000400]:0:mdt Aug 17 08:06:45 zlfs2-oss1 kernel: LustreError: 3788:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS. Aug 17 08:06:45 zlfs2-oss1 kernel: LustreError: 3788:0:(nodemap_storage.c:1313:nodemap_fs_init()) zlfs2-MDD0000: error loading nodemap config file, file must be removed via ldiskfs: rc = -22 Aug 17 08:06:45 zlfs2-oss1 kernel: LustreError: 3788:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -22 Aug 17 08:06:45 zlfs2-oss1 kernel: Lustre: Failing over zlfs2-MDT0000 Aug 17 08:06:45 zlfs2-oss1 kernel: LustreError: 3895:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Aug 17 08:06:45 zlfs2-oss1 kernel: LustreError: 3895:0:(lu_object.c:1243:lu_device_fini()) LBUG Aug 17 08:06:45 zlfs2-oss1 kernel: Pid: 3895, comm: osp-syn-3-0 Message from syslogd@zlfs2-oss1 at Aug 17 08:06:45 ... kernel:LustreError: 3895:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Message from syslogd@zlfs2-oss1 at Aug 17 08:06:45 ... kernel:LustreError: 3895:0:(lu_object.c:1243:lu_device_fini()) LBUG Aug 17 08:06:45 zlfs2-oss1 kernel: #012Call Trace: Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0c017d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0c01d75>] lbug_with_loc+0x45/0xc0 [libcfs] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0d239d8>] lu_device_fini+0xb8/0xc0 [obdclass] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0d08b42>] ls_device_put+0x82/0x2a0 [obdclass] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0d08e3d>] local_oid_storage_fini+0xdd/0x210 [obdclass] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0ce5a1c>] llog_osd_cleanup+0x3c/0x50 [obdclass] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0ce2c23>] __llog_ctxt_put+0x93/0x140 [obdclass] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0ce3113>] llog_cleanup+0xc3/0x490 [obdclass] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0cd981d>] ? llog_handle_put+0x2d/0x70 [obdclass] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa0cd98b9>] ? llog_close+0x59/0x1a0 [obdclass] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa166430b>] osp_sync_thread+0x46b/0x990 [osp] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffffa1663ea0>] ? osp_sync_thread+0x0/0x990 [osp] Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0 Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0 Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffff816469d8>] ret_from_fork+0x58/0x90 Aug 17 08:06:45 zlfs2-oss1 kernel: [<ffffffff810a5a20>] ? kthread+0x0/0xe0 Aug 17 08:06:45 zlfs2-oss1 kernel: |
| Comment by Gerrit Updater [ 17/Aug/16 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/21961 |
| Comment by Adam Roe (Inactive) [ 17/Aug/16 ] |
|
Okay I tested the build - to update. It didn't crash the system, but it fails to mount the target. See below: Some stuff mounts, some wont - strange behavior going on, will report back soon. Mount [root@lfsmaster FORMAT_SCRIPTS]# ssh zlfs2-mds2 Last login: Wed Aug 17 08:46:40 2016 from 10.10.100.99 [root@zlfs2-mds2 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/OST0004 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/nvme0n1 arg[5] = /mnt/OST0004 source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/OST0004 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/nvme0n1 at /mnt/OST0004, flags=0x1000000 options=osd=osd-ldiskfs,,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-OST0004,device=/dev/nvme0n1 mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler' mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument retries left: 0 mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. Logs Aug 17 18:10:22 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5
Aug 17 18:10:22 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Aug 17 18:10:23 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5
Aug 17 18:10:23 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.5.21@o2ib: failed processing log, type 4: rc = -22
Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: zlfs2-OST0004: new disk, initializing
Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: srv-zlfs2-OST0004: No data found on store. Initialize space
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS.
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(nodemap_storage.c:1315:nodemap_fs_init()) zlfs2-OST0004: error loading nodemap config file, file must be removed via ldiskfs: rc = -22
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed5c0[0x0, 1, [0x1:0x0:0x0] hash exist]{
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed610
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59100osd-ldiskfs-object@ffff882022c59100(i:ffff881ff8e91e88:78/106428767)[plain]
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed5c0
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed800[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed850
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59e00osd-ldiskfs-object@ffff882022c59e00(i:ffff881ff8e88d88:77/106428733)[plain]
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed800
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed2c0[0x0, 1, [0xa:0x0:0x0] hash exist]{
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed310
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c58600osd-ldiskfs-object@ffff882022c58600(i:ffff881ff8e9af88:79/106428801)[plain]
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed2c0
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff882007239140[0x0, 1, [0xa:0x2:0x0] hash exist]{
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff882007239190
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200f958800osd-ldiskfs-object@ffff88200f958800(i:ffff881ff8ea80c8:80/106428802)[plain]
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff882007239140
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffb9ed440[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffb9ed490
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882022c59600osd-ldiskfs-object@ffff882022c59600(i:ffff881ff8e71e88:5898241/3450894875)[plain]
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffb9ed440
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(obd_config.c:578:class_setup()) setup zlfs2-OST0004 failed (-22)
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14203:0:(obd_config.c:1671:class_config_llog_handler()) MGC192.168.5.21@o2ib: cfg command failed: rc = -22
Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: cmd=cf003 0:zlfs2-OST0004 1:dev 2:0 3:f
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 15b-f: MGC192.168.5.21@o2ib: The configuration from log 'zlfs2-OST0004'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre.
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server zlfs2-OST0004: -22
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -22
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_config.c:625:class_cleanup()) Device 3 not setup
Aug 17 18:10:23 zlfs2-mds2 kernel: Lustre: server umount zlfs2-OST0004 complete
Aug 17 18:10:23 zlfs2-mds2 kernel: LustreError: 14037:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount /dev/nvme0n1 (-22)
|
| Comment by Adam Roe (Inactive) [ 17/Aug/16 ] |
|
Okay so the strange behavior: The first target I try to mount which isn't on the same server as the MGT will fail and get stuck in this state. Not mounted but in a lock somewhere, its like it starts the service without a target. mount.lustre: mount /dev/nvme1n1 at /mnt/OST0005 failed: Operation already in progress The target service is already running. (/dev/nvme1n1) All other targets after the first will error out, but not get stuck: mount.lustre: mount /dev/nvme0n1 at /mnt/MDT0004 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. If I then run the mount command for a second time it will mount. But I have not found a way to recover the first locked target. I have to reboot and remount. |
| Comment by Adam Roe (Inactive) [ 17/Aug/16 ] |
|
Some more verbose information on the failed first mount: [root@zlfs2-mds2 ~]# mount -vvv -t lustre /dev/nvme0n1 /mnt/OST0004 arg[0] = /sbin/mount.lustre arg[1] = -v arg[2] = -o arg[3] = rw arg[4] = /dev/nvme0n1 arg[5] = /mnt/OST0004 source = /dev/nvme0n1 (/dev/nvme0n1), target = /mnt/OST0004 options = rw checking for existing Lustre data: found Reading CONFIGS/mountdata Writing CONFIGS/mountdata mounting device /dev/nvme0n1 at /mnt/OST0004, flags=0x1000000 options=osd=osd-ldiskfs,,errors=remount-ro,mgsnode=192.168.5.21@o2ib,virgin,update,param=mgsnode=192.168.5.21@o2ib,svname=zlfs2-OST0004,device=/dev/nvme0n1 mount.lustre: cannot parse scheduler options for '/sys/block/nvme0n1/queue/scheduler' mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument retries left: 0 mount.lustre: mount /dev/nvme0n1 at /mnt/OST0004 failed: Invalid argument This may have multiple causes. Are the mount options correct? Check the syslog for more info. Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5
Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): file extents enabled, maximum tree depth=5
Aug 17 22:10:37 zlfs2-mds2 kernel: LDISKFS-fs (nvme0n1): mounted filesystem with ordered data mode. Opts: ,errors=remount-ro,no_mbcache
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(mgc_request.c:257:do_config_log_add()) MGC192.168.5.21@o2ib: failed processing log, type 4: rc = -22
Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: zlfs2-OST0004: new disk, initializing
Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: srv-zlfs2-OST0004: No data found on store. Initialize space
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS.
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(nodemap_storage.c:1315:nodemap_fs_init()) zlfs2-OST0004: error loading nodemap config file, file must be removed via ldiskfs: rc = -22
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa035080[0x0, 1, [0x1:0x0:0x0] hash exist]{
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa0350d0
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff882006418700osd-ldiskfs-object@ffff882006418700(i:ffff881ff9da1e88:78/1354905553)[plain]
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa035080
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa034fc0[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa035010
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641b700osd-ldiskfs-object@ffff88200641b700(i:ffff881ff9d98d88:77/1354905519)[plain]
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa034fc0
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa034c00[0x0, 1, [0xa:0x0:0x0] hash exist]{
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa034c50
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641b100osd-ldiskfs-object@ffff88200641b100(i:ffff881ff9daaf88:79/1354905587)[plain]
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa034c00
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff88202684cd80[0x0, 1, [0xa:0x2:0x0] hash exist]{
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff88202684cdd0
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff881ffa237e00osd-ldiskfs-object@ffff881ffa237e00(i:ffff882024048948:80/1354905588)[plain]
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff88202684cd80
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff881ffa035380[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff881ffa0353d0
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff88200641be00osd-ldiskfs-object@ffff88200641be00(i:ffff882024309a48:7864321/2679038361)[plain]
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff881ffa035380
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(obd_config.c:578:class_setup()) setup zlfs2-OST0004 failed (-22)
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4314:0:(obd_config.c:1671:class_config_llog_handler()) MGC192.168.5.21@o2ib: cfg command failed: rc = -22
Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: cmd=cf003 0:zlfs2-OST0004 1:dev 2:0 3:f
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 15b-f: MGC192.168.5.21@o2ib: The configuration from log 'zlfs2-OST0004'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre.
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server zlfs2-OST0004: -22
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -22
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_config.c:625:class_cleanup()) Device 3 not setup
Aug 17 22:10:38 zlfs2-mds2 kernel: Lustre: server umount zlfs2-OST0004 complete
Aug 17 22:10:38 zlfs2-mds2 kernel: LustreError: 4148:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount /dev/nvme0n1 (-22)
I then try to mount again and get the same as above with this extra: The target service is already running. (/dev/nvme0n1) I reboot the server, mounting that target still fails, however If I mount a different target on the same server beforehand, say nvme1n1 this I am then able to mount nvme0n1 without issue. |
| Comment by nasf (Inactive) [ 18/Aug/16 ] |
|
Fail to mount the OST is another issue that is different from the original "ASSERTION( atomic_read(&d->ld_ref) == 0 )".
Have you mounted up the MGS before mounting the MDT or OTS? If no, please mount up the MGS (or say MGT on the MGS node) firstly. Otherwise, please enable -1 level Lustre kernel debug on both the MGS and OSS/MDS, then try again and attach the Lustre debug logs. Thanks! |
| Comment by Gerrit Updater [ 18/Aug/16 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/22004 |
| Comment by Kit Westneat [ 18/Aug/16 ] |
|
This patch is still a work in progress, but addresses both these issues. |
| Comment by Kit Westneat [ 18/Aug/16 ] |
|
BTW the cause of the second bug is that if a new OST mounts before the MGC has pulled the nodemap config from the MGS, it creates a new blank config on disk. Part of that code was erroneously assuming that it was in the MGS, as normally all new records are created there and then sent to the OSTs, so it was returning an error. That's why the first OST failed to mount. When the other OSTs were mounted, the MGC was already connected to the MGS, so it was able to pull the config and save it properly. That's why the other OSTs were able to mount after rebooting, but nvme0n1 wasn't able to until the others were mounted. |
| Comment by Peter Jones [ 01/Sep/16 ] |
|
Kit Will you be refreshing the patch in light of Andreas's review feedback? Peter |
| Comment by Kit Westneat [ 01/Sep/16 ] |
|
Hey Peter, I wasn't planning on it since he +1'd it, unless there were other issues found, but I can if that's desired.
|
| Comment by Peter Jones [ 01/Sep/16 ] |
|
Kit I think that at the moment a second reviewer is holding off in anticipation of another version being forthcoming given that there are quite a number of comments so I tihnk that it would be good to refresh it Peter |
| Comment by Kit Westneat [ 01/Sep/16 ] |
|
Hey Peter, Are we talking about change 22004? I only see two style comments from Andreas. There are a few over 80 chars autocomments as well, but I thought we were ignoring those now to match the Linux style guide. I'll refresh it, but I want to make sure I'm not missing something. Thanks, |
| Comment by Peter Jones [ 01/Sep/16 ] |
|
Hi Kit I checked with Oleg and you are right - sorry about that - so I have requested a second reviewer so that we can get this landed Peter |
| Comment by Kit Westneat [ 01/Sep/16 ] |
|
Hey Peter, No problem. I made the changes, would it be better to upload them and face the tests again, or leave it as is? Thanks, |
| Comment by Peter Jones [ 01/Sep/16 ] |
|
Let's see how the second review goes to see whether the refresh is needed |
| Comment by Gerrit Updater [ 15/Sep/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22004/ |
| Comment by Peter Jones [ 16/Sep/16 ] |
|
Landed for 2.9 |