Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.6.0
-
3
-
12217
Description
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>
This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/c2950242-75cc-11e3-b3c0-52540035b04c.
The sub-test test_1c failed with the following error:
test failed to respond and timed out
Info required for matching: sanity-scrub 1c
This tests seems to be failing regularly and always on umount mdt though there seems to be multiple culprits (always mdt console log):
review (ldiskfs):
http://maloo.whamcloud.com/test_sets/c2950242-75cc-11e3-b3c0-52540035b04c
19:21:12:LustreError: 4484:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880053639400 x1456340331892024/t0(0) o13->lustre-OST0003-osc-MDT0000@10.10.16.81@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 19:21:12:LustreError: 4484:0:(client.c:1076:ptlrpc_import_delay_req()) Skipped 6 previous similar messages 19:21:12:BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 19:21:12:IP: [<ffffffff81510dcf>] _spin_lock_irqsave+0x1f/0x40 19:21:12:PGD 0 19:21:12:Oops: 0002 [#1] SMP 19:21:12:last sysfs file: /sys/devices/system/cpu/possible 19:21:12:CPU 0 19:21:12:Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic jbd2 nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet_selftest] 19:21:12: 19:21:12:Pid: 4485, comm: ptlrpcd_1 Not tainted 2.6.32-358.23.2.el6_lustre.gc9be53c.x86_64 #1 Red Hat KVM 19:21:12:RIP: 0010:[<ffffffff81510dcf>] [<ffffffff81510dcf>] _spin_lock_irqsave+0x1f/0x40 19:21:12:RSP: 0018:ffff880055fc7c60 EFLAGS: 00010082 19:21:12:RAX: 0000000000010000 RBX: 0000000000000030 RCX: 0000000000000000 19:21:12:RDX: 0000000000000282 RSI: 0000000000000003 RDI: 0000000000000030 19:21:12:RBP: ffff880055fc7c60 R08: 00000000ffffff0a R09: 00000000fffffffb 19:21:12:R10: 0000000000000000 R11: 000000000000000f R12: ffff880053639400 19:21:12:R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000 19:21:12:FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 19:21:12:CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 19:21:12:CR2: 0000000000000030 CR3: 0000000037aba000 CR4: 00000000000006f0 19:21:12:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 19:21:12:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 19:21:12:Process ptlrpcd_1 (pid: 4485, threadinfo ffff880055fc6000, task ffff880037b60080) 19:21:12:Stack: 19:21:12: ffff880055fc7ca0 ffffffff81055ab2 0000000000000000 ffff880037a5c000 19:21:12:<d> ffff880053639400 00000000fffffffb ffff88007c897800 00000000ebc0de03 19:21:12:<d> ffff880055fc7cd0 ffffffffa14afa66 ffff880053639400 ffff880079b29e40 19:21:12:Call Trace: 19:21:12: [<ffffffff81055ab2>] __wake_up+0x32/0x70 19:21:12: [<ffffffffa14afa66>] osp_statfs_interpret+0x86/0x3b0 [osp] 19:21:12: [<ffffffffa0b0bef4>] ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc] 19:21:12: [<ffffffffa0b385db>] ptlrpcd_check+0x53b/0x560 [ptlrpc] 19:21:12: [<ffffffffa0b38b9d>] ptlrpcd+0x2ad/0x3f0 [ptlrpc] 19:21:12: [<ffffffff81063990>] ? default_wake_function+0x0/0x20 19:21:12: [<ffffffffa0b388f0>] ? ptlrpcd+0x0/0x3f0 [ptlrpc] 19:21:12: [<ffffffff81096a36>] kthread+0x96/0xa0 19:21:12: [<ffffffff8100c0ca>] child_rip+0xa/0x20 19:21:12: [<ffffffff810969a0>] ? kthread+0x0/0xa0 19:21:12: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 19:21:12:Code: c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 0f b7 c8 c1 e8 10 39 c1 74 0e f3 90 0f 1f 44 00 00 19:21:12:RIP [<ffffffff81510dcf>] _spin_lock_irqsave+0x1f/0x40 19:21:12: RSP <ffff880055fc7c60> 19:21:12:CR2: 0000000000000030
review-dne (ldiskfs)
https://maloo.whamcloud.com/test_sets/50a115f0-6c16-11e3-8fd6-52540035b04c
10:41:26:LustreError: 2822:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880063df4c00 x1455238614618676/t0(0) o13->lustre-OST0004-osc-MDT0000@10.10.4.237@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 10:41:26:LustreError: 2822:0:(client.c:1076:ptlrpc_import_delay_req()) Skipped 8 previous similar messages 10:41:26:Lustre: lustre-MDT0000: Not available for connect from 10.10.4.237@tcp (stopping) 10:41:26:Lustre: Skipped 6 previous similar messages 10:41:26:LustreError: 10839:0:(lfsck_lib.c:202:lfsck_tgt_descs_fini()) ASSERTION( list_empty(<d->ltd_layout_list) ) failed: 10:41:26:LustreError: 10839:0:(lfsck_lib.c:202:lfsck_tgt_descs_fini()) LBUG 10:41:26:Pid: 10839, comm: umount 10:41:26: 10:41:26:Call Trace: 10:41:26: [<ffffffffa0479895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 10:41:26: [<ffffffffa0479e97>] lbug_with_loc+0x47/0xb0 [libcfs] 10:41:26: [<ffffffffa0e41bd6>] lfsck_tgt_descs_fini+0x606/0x670 [lfsck] 10:41:26: [<ffffffffa0e41d3d>] lfsck_instance_cleanup+0xfd/0x360 [lfsck] 10:41:26: [<ffffffffa0e430f0>] lfsck_degister+0xa0/0xb0 [lfsck] 10:41:27: [<ffffffffa0fc1a8e>] mdd_process_config+0x22e/0x5e0 [mdd] 10:41:27: [<ffffffffa0e8d4b3>] ? lustre_cfg_new+0x2d3/0x6e0 [mdt] 10:41:27: [<ffffffffa0e8da36>] mdt_stack_fini+0x176/0xb90 [mdt] 10:41:27: [<ffffffffa0fc05a5>] ? mdd_init_capa_ctxt+0xf5/0x130 [mdd] 10:41:27: [<ffffffffa0e8eab6>] mdt_device_fini+0x666/0xda0 [mdt] 10:41:27: [<ffffffffa05ae706>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] 10:41:27: [<ffffffffa05d2813>] class_cleanup+0x573/0xd30 [obdclass] 10:41:27: [<ffffffffa05ac056>] ? class_name2dev+0x56/0xe0 [obdclass] 10:41:27: [<ffffffffa05d453a>] class_process_config+0x156a/0x1ad0 [obdclass] 10:41:27: [<ffffffffa05cc813>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] 10:41:27: [<ffffffffa05d4c19>] class_manual_cleanup+0x179/0x6f0 [obdclass] 10:41:27: [<ffffffffa05ac056>] ? class_name2dev+0x56/0xe0 [obdclass] 10:41:27: [<ffffffffa060ce19>] server_put_super+0x8d9/0xe30 [obdclass] 10:41:27: [<ffffffff8118366b>] generic_shutdown_super+0x5b/0xe0 10:41:27: [<ffffffff81183756>] kill_anon_super+0x16/0x60 10:41:27: [<ffffffffa05d6ad6>] lustre_kill_super+0x36/0x60 [obdclass] 10:41:27: [<ffffffff81183ef7>] deactivate_super+0x57/0x80 10:41:27: [<ffffffff811a21ef>] mntput_no_expire+0xbf/0x110 10:41:27: [<ffffffff811a2c5b>] sys_umount+0x7b/0x3a0 10:41:27: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 10:41:27: 10:41:27:Kernel panic - not syncing: LBUG
review-dne:
https://maloo.whamcloud.com/test_sets/8e6d11fa-6a6c-11e3-9248-52540035b04c
00:29:59:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1 00:29:59:LustreError: 6423:0:(osp_precreate.c:481:osp_precreate_send()) lustre-OST0001-osc-MDT0000: can't precreate: rc = -5 00:29:59:LustreError: 6423:0:(osp_precreate.c:481:osp_precreate_send()) Skipped 1 previous similar message 00:29:59:LustreError: 6423:0:(osp_precreate.c:984:osp_precreate_thread()) lustre-OST0001-osc-MDT0000: cannot precreate objects: rc = -5 00:29:59:LustreError: 6423:0:(osp_precreate.c:984:osp_precreate_thread()) Skipped 1 previous similar message 00:29:59:LustreError: 6427:0:(osp_precreate.c:734:osp_precreate_cleanup_orphans()) lustre-OST0003-osc-MDT0000: cannot cleanup orphans: rc = -5 00:29:59:Lustre: lustre-MDT0000: Not available for connect from 10.10.16.178@tcp (stopping) 00:29:59:Lustre: Skipped 66 previous similar messages 00:29:59:LustreError: 3144:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88006635c000 x1454998270941364/t0(0) o13->lustre-OST0007-osc-MDT0000@10.10.16.184@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 00:29:59:LustreError: 3144:0:(client.c:1076:ptlrpc_import_delay_req()) Skipped 19 previous similar messages 00:29:59:LustreError: 166-1: MGC10.10.16.179@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail 00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck? 00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck? 00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck? 00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck? 00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
Though this last one may be better put under LU-4449
Attachments
Issue Links
- is duplicated by
-
LU-4519 (osp_sync.c:351:osp_sync_interpret()) ASSERTION( d->opd_pre != ((void *)0) )
- Resolved