Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.9.0
-
Running Lustre 2.9 + coral-betal-combined branch based on RC3:
kmod-lustre-tests-2.9.0_dirty-1.el7.centos.x86_64
lustre-tests-2.9.0_dirty-1.el7.centos.x86_64
lustre-osd-zfs-mount-2.9.0_dirty-1.el7.centos.x86_64
lustre-2.9.0_dirty-1.el7.centos.x86_64
lustre-iokit-2.9.0_dirty-1.el7.centos.x86_64
kmod-lustre-2.9.0_dirty-1.el7.centos.x86_64
kmod-lustre-osd-zfs-2.9.0_dirty-1.el7.centos.x86_64
[root@wolf-3 debug_info.20170330_034804_14535_wolf-3.wolf.hpdd.intel.com]# rpm -qa |grep -i zfs
libzfs2-0.7.0-rc3_28_g4661777.el7.centos.x86_64
kmod-zfs-0.7.0-rc3_28_g4661777.el7.centos.x86_64
zfs-kmod-debuginfo-0.7.0-rc3_28_g4661777.el7.centos.x86_64
lustre-osd-zfs-mount-2.9.0_dirty-1.el7.centos.x86_64
zfs-0.7.0-rc3_28_g4661777.el7.centos.x86_64
zfs-test-0.7.0-rc3_28_g4661777.el7.centos.x86_64
kmod-lustre-osd-zfs-2.9.0_dirty-1.el7.centos.x86_64
zfs-debuginfo-0.7.0-rc3_28_g4661777.el7.centos.x86_64
Pool configuration:
quick_oss1.sh:zpool create -f -o ashift=12 -o cachefile=none -O recordsize=16MB ost0 draid2 cfg=test_2_5_4_18_draidcfg.nvl mpathaa mpathab mpathac mpathad mpathae mpathaf mpathag mpathah mpathai mpathaj mpathak mpathal mpatham mpathan mpathao mpathap mpathaq mpathar
quick_oss1.sh:zpool status -v ost0
quick_oss1.sh:zpool feature@large_blocks=enabled ost0
quick_oss1.sh:zpool get all ost0 |grep large_blocks
quick_oss2.sh:zpool create -f -o ashift=12 -o cachefile=none -O recordsize=16MB ost1 draid2 cfg=test_2_5_4_18_draidcfg.nvl mpatha mpathb mpathc mpathd mpathe mpathf mpathg mpathh mpathi mpathj mpathk mpathl mpathm mpathn mpatho mpathp mpathq mpathr
quick_oss2.sh:zpool status -v ost1
quick_oss2.sh:zpool feature@large_blocks=enabled ost1
quick_oss2.sh:zpool get all ost1 |grep large_blocks
Example from ost1.
pool: ost1
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
ost1 ONLINE 0 0 0
draid2-0 ONLINE 0 0 0
mpatha ONLINE 0 0 0
mpathb ONLINE 0 0 0
mpathc ONLINE 0 0 0
mpathd ONLINE 0 0 0
mpathe ONLINE 0 0 0
mpathf ONLINE 0 0 0
mpathg ONLINE 0 0 0
mpathh ONLINE 0 0 0
mpathi ONLINE 0 0 0
mpathj ONLINE 0 0 0
mpathk ONLINE 0 0 0
mpathl ONLINE 0 0 0
mpathm ONLINE 0 0 0
mpathn ONLINE 0 0 0
mpatho ONLINE 0 0 0
mpathp ONLINE 0 0 0
mpathq ONLINE 0 0 0
mpathr ONLINE 0 0 0
spares
$draid2-0-s0 AVAIL
$draid2-0-s1 AVAIL
$draid2-0-s2 AVAIL
$draid2-0-s3 AVAIL
Running Lustre 2.9 + coral-betal-combined branch based on RC3: kmod-lustre-tests-2.9.0_dirty-1.el7.centos.x86_64 lustre-tests-2.9.0_dirty-1.el7.centos.x86_64 lustre-osd-zfs-mount-2.9.0_dirty-1.el7.centos.x86_64 lustre-2.9.0_dirty-1.el7.centos.x86_64 lustre-iokit-2.9.0_dirty-1.el7.centos.x86_64 kmod-lustre-2.9.0_dirty-1.el7.centos.x86_64 kmod-lustre-osd-zfs-2.9.0_dirty-1.el7.centos.x86_64 [ root@wolf-3 debug_info.20170330_034804_14535_wolf-3.wolf.hpdd.intel.com]# rpm -qa |grep -i zfs libzfs2-0.7.0-rc3_28_g4661777.el7.centos.x86_64 kmod-zfs-0.7.0-rc3_28_g4661777.el7.centos.x86_64 zfs-kmod-debuginfo-0.7.0-rc3_28_g4661777.el7.centos.x86_64 lustre-osd-zfs-mount-2.9.0_dirty-1.el7.centos.x86_64 zfs-0.7.0-rc3_28_g4661777.el7.centos.x86_64 zfs-test-0.7.0-rc3_28_g4661777.el7.centos.x86_64 kmod-lustre-osd-zfs-2.9.0_dirty-1.el7.centos.x86_64 zfs-debuginfo-0.7.0-rc3_28_g4661777.el7.centos.x86_64 Pool configuration: quick_oss1.sh:zpool create -f -o ashift=12 -o cachefile=none -O recordsize=16MB ost0 draid2 cfg=test_2_5_4_18_draidcfg.nvl mpathaa mpathab mpathac mpathad mpathae mpathaf mpathag mpathah mpathai mpathaj mpathak mpathal mpatham mpathan mpathao mpathap mpathaq mpathar quick_oss1.sh:zpool status -v ost0 quick_oss1.sh:zpool feature@large_blocks=enabled ost0 quick_oss1.sh:zpool get all ost0 |grep large_blocks quick_oss2.sh:zpool create -f -o ashift=12 -o cachefile=none -O recordsize=16MB ost1 draid2 cfg=test_2_5_4_18_draidcfg.nvl mpatha mpathb mpathc mpathd mpathe mpathf mpathg mpathh mpathi mpathj mpathk mpathl mpathm mpathn mpatho mpathp mpathq mpathr quick_oss2.sh:zpool status -v ost1 quick_oss2.sh:zpool feature@large_blocks=enabled ost1 quick_oss2.sh:zpool get all ost1 |grep large_blocks Example from ost1. pool: ost1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM ost1 ONLINE 0 0 0 draid2-0 ONLINE 0 0 0 mpatha ONLINE 0 0 0 mpathb ONLINE 0 0 0 mpathc ONLINE 0 0 0 mpathd ONLINE 0 0 0 mpathe ONLINE 0 0 0 mpathf ONLINE 0 0 0 mpathg ONLINE 0 0 0 mpathh ONLINE 0 0 0 mpathi ONLINE 0 0 0 mpathj ONLINE 0 0 0 mpathk ONLINE 0 0 0 mpathl ONLINE 0 0 0 mpathm ONLINE 0 0 0 mpathn ONLINE 0 0 0 mpatho ONLINE 0 0 0 mpathp ONLINE 0 0 0 mpathq ONLINE 0 0 0 mpathr ONLINE 0 0 0 spares $draid2-0-s0 AVAIL $draid2-0-s1 AVAIL $draid2-0-s2 AVAIL $draid2-0-s3 AVAIL
-
1
-
9223372036854775807
Description
Running Lustre 2.9 + coral-betal-combined branch based on RC3:
IOR tests:
IOR-3.0.1: MPI Coordinated Test of Parallel I/O
Began: Thu Mar 30 00:07:33 2017
Command line used: /home/johnsali/wolf-3/ior/src/ior -a POSIX -F -N 4 -d 2 -i 1 -s 1024 -b 1m -t 1m
Machine: Linux wolf-6.wolf.hpdd.intel.com
Test 0 started: Thu Mar 30 00:07:33 2017
Summary:
api = POSIX
test filename = testFile
access = file-per-process
ordering in a file = sequential offsets
ordering inter file= no tasks offsets
clients = 4 (1 per node)
repetitions = 1
xfersize = 1 MiB
blocksize = 1 MiB
aggregate filesize = 4 GiB
access bw(MiB/s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---------- --------- -------- -------- -------- -------- ----
While IOR was writing we hit the following error:
[19744.556366] LustreError: 84625:0:(osd_object.c:597:osd_object_destroy()) lsdraid-OST0000: failed to remove [0x100000000:0x1c:0x0] from accounting ZAP for usr 0: rc = -5 [19744.580303] LustreError: 84625:0:(osd_object.c:597:osd_object_destroy()) Skipped 1 previous similar message [19745.014350] LustreError: 84625:0:(osd_object.c:603:osd_object_destroy()) lsdraid-OST0000: failed to remove [0x100000000:0x1c:0x0] from accounting ZAP for grp 0: rc = -5 [19745.037113] LustreError: 84625:0:(osd_object.c:603:osd_object_destroy()) Skipped 2 previous similar messages [19768.423554] LustreError: 84625:0:(osd_object.c:597:osd_object_destroy()) lsdraid-OST0000: failed to remove [0x100000000:0x1f:0x0] from accounting ZAP for usr 0: rc = -52 [19768.586567] LustreError: 84625:0:(osd_object.c:603:osd_object_destroy()) lsdraid-OST0000: failed to remove [0x100000000:0x1f:0x0] from accounting ZAP for grp 0: rc = -52 [19779.750997] LustreError: 52432:0:(osd_object.c:745:osd_attr_get()) ASSERTION( obj->oo_db ) failed: [19779.751007] LustreError: 50225:0:(osd_object.c:745:osd_attr_get()) ASSERTION( obj->oo_db ) failed: [19779.751010] LustreError: 50225:0:(osd_object.c:745:osd_attr_get()) LBUG [19779.751012] Pid: 50225, comm: ll_ost01_002 [19779.751012] Call Trace: [19779.751043] [<ffffffffa0a1b7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [19779.751054] [<ffffffffa0a1b841>] lbug_with_loc+0x41/0xb0 [libcfs] [19779.751072] [<ffffffffa0968210>] osd_attr_set+0x0/0xce0 [osd_zfs] [19779.751096] [<ffffffffa0f1b405>] ofd_attr_get+0xa5/0x230 [ofd] [19779.751111] [<ffffffffa0f29bfd>] ofd_lvbo_init+0x42d/0xb02 [ofd] [19779.751248] [<ffffffffa0cd22d9>] ldlm_handle_enqueue0+0x8f9/0x1680 [ptlrpc] [19779.751322] [<ffffffffa0cfa0f0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] [19779.751407] [<ffffffffa0d52dc2>] tgt_enqueue+0x62/0x210 [ptlrpc] [19779.751483] [<ffffffffa0d57225>] tgt_request_handle+0x915/0x1320 [ptlrpc] [19779.751545] [<ffffffffa0d031ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [19779.751563] [<ffffffffa0a28128>] ? lc_watchdog_touch+0x68/0x180 [libcfs] [19779.751621] [<ffffffffa0d00d68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [19779.751635] [<ffffffff810b8952>] ? default_wake_function+0x12/0x20 [19779.751639] [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90 [19779.751708] [<ffffffffa0d07260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] [19779.751765] [<ffffffffa0d067c0>] ? ptlrpc_main+0x0/0x1de0 [ptlrpc] [19779.751775] [<ffffffff810a5b8f>] kthread+0xcf/0xe0 [19779.751779] [<ffffffff810a5ac0>] ? kthread+0x0/0xe0 [19779.751789] [<ffffffff81646a98>] ret_from_fork+0x58/0x90 [19779.751794] [<ffffffff810a5ac0>] ? kthread+0x0/0xe0 [19779.751795] [19779.751797] Kernel panic - not syncing: LBUG [19779.751801] CPU: 26 PID: 50225 Comm: ll_ost01_002 Tainted: G IOE ------------ 3.10.0-327.36.3.el7.x86_64 #1 [19779.751803] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [19779.751813] ffffffffa0a38d4c 00000000e5fc8e4d ffff880fe9b33a78 ffffffff81636431 [19779.751820] ffff880fe9b33af8 ffffffff8162fcc0 ffffffff00000008 ffff880fe9b33b08 [19779.751827] ffff880fe9b33aa8 00000000e5fc8e4d 00000000e5fc8e4d 0000000000000092 [19779.751828] Call Trace: [19779.751843] [<ffffffff81636431>] dump_stack+0x19/0x1b [19779.751847] [<ffffffff8162fcc0>] panic+0xd8/0x1e7 [19779.751859] [<ffffffffa0a1b859>] lbug_with_loc+0x59/0xb0 [libcfs] [19779.751871] [<ffffffffa0968210>] osd_attr_get+0x2d0/0x2d0 [osd_zfs] [19779.751885] [<ffffffffa0f1b405>] ofd_attr_get+0xa5/0x230 [ofd] [19779.751898] [<ffffffffa0f29bfd>] ofd_lvbo_init+0x42d/0xb02 [ofd] [19779.751952] [<ffffffffa0cd22d9>] ldlm_handle_enqueue0+0x8f9/0x1680 [ptlrpc] [19779.752010] [<ffffffffa0cfa0f0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [19779.752084] [<ffffffffa0d52dc2>] tgt_enqueue+0x62/0x210 [ptlrpc] [19779.752165] [<ffffffffa0d57225>] tgt_request_handle+0x915/0x1320 [ptlrpc] [19779.752238] [<ffffffffa0d031ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [19779.752255] [<ffffffffa0a28128>] ? lc_watchdog_touch+0x68/0x180 [libcfs] [19779.752326] [<ffffffffa0d00d68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [19779.752333] [<ffffffff810b8952>] ? default_wake_function+0x12/0x20 [19779.752337] [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90 [19779.752409] [<ffffffffa0d07260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc] [19779.752482] [<ffffffffa0d067c0>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc] [19779.752489] [<ffffffff810a5b8f>] kthread+0xcf/0xe0 [19779.752494] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140 [19779.752500] [<ffffffff81646a98>] ret_from_fork+0x58/0x90 [19779.752505] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
osd-zfs/osd_object.c:937
932 * dmu_tx_hold_bonus(tx, oid) called and then assigned 933 * to a transaction group. 934 */ 935 static int osd_attr_set(const struct lu_env *env, struct dt_object *dt, 936 const struct lu_attr *la, struct thandle *handle) 937 { 938 struct osd_thread_info *info = osd_oti_get(env); 939 sa_bulk_attr_t *bulk = osd_oti_get(env)->oti_attr_bulk; 940 struct osd_object *obj = osd_dt_obj(dt); 941 struct osd_device *osd = osd_obj2dev(obj);
ofd/ofd_objects.c:780
775 * \retval 0 if successful 776 * \retval negative value on error 777 */ 778 int ofd_attr_get(const struct lu_env *env, struct ofd_object *fo, 779 struct lu_attr *la) 780 { 781 int rc = 0; 782 783 ENTRY; 784
Dump is at:
/scratch/dumps/wolf-3.wolf.hpdd.intel.com/10.8.1.3-2017-03-30-00:08:02/
Attachments
Issue Links
- mentioned in
-
Page Loading...