Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4456

Test failure on sanity-scrub test_1c: hang on umount of mdt

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0
    • Lustre 2.6.0
    • 3
    • 12217

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/c2950242-75cc-11e3-b3c0-52540035b04c.

      The sub-test test_1c failed with the following error:

      test failed to respond and timed out

      Info required for matching: sanity-scrub 1c

      This tests seems to be failing regularly and always on umount mdt though there seems to be multiple culprits (always mdt console log):

      review (ldiskfs):
      http://maloo.whamcloud.com/test_sets/c2950242-75cc-11e3-b3c0-52540035b04c

      19:21:12:LustreError: 4484:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880053639400 x1456340331892024/t0(0) o13->lustre-OST0003-osc-MDT0000@10.10.16.81@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      19:21:12:LustreError: 4484:0:(client.c:1076:ptlrpc_import_delay_req()) Skipped 6 previous similar messages
      19:21:12:BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      19:21:12:IP: [<ffffffff81510dcf>] _spin_lock_irqsave+0x1f/0x40
      19:21:12:PGD 0 
      19:21:12:Oops: 0002 [#1] SMP 
      19:21:12:last sysfs file: /sys/devices/system/cpu/possible
      19:21:12:CPU 0 
      19:21:12:Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) ldiskfs(U) sha512_generic sha256_generic jbd2 nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet_selftest]
      19:21:12:
      19:21:12:Pid: 4485, comm: ptlrpcd_1 Not tainted 2.6.32-358.23.2.el6_lustre.gc9be53c.x86_64 #1 Red Hat KVM
      19:21:12:RIP: 0010:[<ffffffff81510dcf>]  [<ffffffff81510dcf>] _spin_lock_irqsave+0x1f/0x40
      19:21:12:RSP: 0018:ffff880055fc7c60  EFLAGS: 00010082
      19:21:12:RAX: 0000000000010000 RBX: 0000000000000030 RCX: 0000000000000000
      19:21:12:RDX: 0000000000000282 RSI: 0000000000000003 RDI: 0000000000000030
      19:21:12:RBP: ffff880055fc7c60 R08: 00000000ffffff0a R09: 00000000fffffffb
      19:21:12:R10: 0000000000000000 R11: 000000000000000f R12: ffff880053639400
      19:21:12:R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
      19:21:12:FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      19:21:12:CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      19:21:12:CR2: 0000000000000030 CR3: 0000000037aba000 CR4: 00000000000006f0
      19:21:12:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      19:21:12:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      19:21:12:Process ptlrpcd_1 (pid: 4485, threadinfo ffff880055fc6000, task ffff880037b60080)
      19:21:12:Stack:
      19:21:12: ffff880055fc7ca0 ffffffff81055ab2 0000000000000000 ffff880037a5c000
      19:21:12:<d> ffff880053639400 00000000fffffffb ffff88007c897800 00000000ebc0de03
      19:21:12:<d> ffff880055fc7cd0 ffffffffa14afa66 ffff880053639400 ffff880079b29e40
      19:21:12:Call Trace:
      19:21:12: [<ffffffff81055ab2>] __wake_up+0x32/0x70
      19:21:12: [<ffffffffa14afa66>] osp_statfs_interpret+0x86/0x3b0 [osp]
      19:21:12: [<ffffffffa0b0bef4>] ptlrpc_check_set+0x2c4/0x1b40 [ptlrpc]
      19:21:12: [<ffffffffa0b385db>] ptlrpcd_check+0x53b/0x560 [ptlrpc]
      19:21:12: [<ffffffffa0b38b9d>] ptlrpcd+0x2ad/0x3f0 [ptlrpc]
      19:21:12: [<ffffffff81063990>] ? default_wake_function+0x0/0x20
      19:21:12: [<ffffffffa0b388f0>] ? ptlrpcd+0x0/0x3f0 [ptlrpc]
      19:21:12: [<ffffffff81096a36>] kthread+0x96/0xa0
      19:21:12: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      19:21:12: [<ffffffff810969a0>] ? kthread+0x0/0xa0
      19:21:12: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      19:21:12:Code: c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 0f b7 c8 c1 e8 10 39 c1 74 0e f3 90 0f 1f 44 00 00 
      19:21:12:RIP  [<ffffffff81510dcf>] _spin_lock_irqsave+0x1f/0x40
      19:21:12: RSP <ffff880055fc7c60>
      19:21:12:CR2: 0000000000000030
      

      review-dne (ldiskfs)
      https://maloo.whamcloud.com/test_sets/50a115f0-6c16-11e3-8fd6-52540035b04c

      10:41:26:LustreError: 2822:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff880063df4c00 x1455238614618676/t0(0) o13->lustre-OST0004-osc-MDT0000@10.10.4.237@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      10:41:26:LustreError: 2822:0:(client.c:1076:ptlrpc_import_delay_req()) Skipped 8 previous similar messages
      10:41:26:Lustre: lustre-MDT0000: Not available for connect from 10.10.4.237@tcp (stopping)
      10:41:26:Lustre: Skipped 6 previous similar messages
      10:41:26:LustreError: 10839:0:(lfsck_lib.c:202:lfsck_tgt_descs_fini()) ASSERTION( list_empty(&ltd->ltd_layout_list) ) failed: 
      10:41:26:LustreError: 10839:0:(lfsck_lib.c:202:lfsck_tgt_descs_fini()) LBUG
      10:41:26:Pid: 10839, comm: umount
      10:41:26:
      10:41:26:Call Trace:
      10:41:26: [<ffffffffa0479895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      10:41:26: [<ffffffffa0479e97>] lbug_with_loc+0x47/0xb0 [libcfs]
      10:41:26: [<ffffffffa0e41bd6>] lfsck_tgt_descs_fini+0x606/0x670 [lfsck]
      10:41:26: [<ffffffffa0e41d3d>] lfsck_instance_cleanup+0xfd/0x360 [lfsck]
      10:41:26: [<ffffffffa0e430f0>] lfsck_degister+0xa0/0xb0 [lfsck]
      10:41:27: [<ffffffffa0fc1a8e>] mdd_process_config+0x22e/0x5e0 [mdd]
      10:41:27: [<ffffffffa0e8d4b3>] ? lustre_cfg_new+0x2d3/0x6e0 [mdt]
      10:41:27: [<ffffffffa0e8da36>] mdt_stack_fini+0x176/0xb90 [mdt]
      10:41:27: [<ffffffffa0fc05a5>] ? mdd_init_capa_ctxt+0xf5/0x130 [mdd]
      10:41:27: [<ffffffffa0e8eab6>] mdt_device_fini+0x666/0xda0 [mdt]
      10:41:27: [<ffffffffa05ae706>] ? class_disconnect_exports+0x116/0x2f0 [obdclass]
      10:41:27: [<ffffffffa05d2813>] class_cleanup+0x573/0xd30 [obdclass]
      10:41:27: [<ffffffffa05ac056>] ? class_name2dev+0x56/0xe0 [obdclass]
      10:41:27: [<ffffffffa05d453a>] class_process_config+0x156a/0x1ad0 [obdclass]
      10:41:27: [<ffffffffa05cc813>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
      10:41:27: [<ffffffffa05d4c19>] class_manual_cleanup+0x179/0x6f0 [obdclass]
      10:41:27: [<ffffffffa05ac056>] ? class_name2dev+0x56/0xe0 [obdclass]
      10:41:27: [<ffffffffa060ce19>] server_put_super+0x8d9/0xe30 [obdclass]
      10:41:27: [<ffffffff8118366b>] generic_shutdown_super+0x5b/0xe0
      10:41:27: [<ffffffff81183756>] kill_anon_super+0x16/0x60
      10:41:27: [<ffffffffa05d6ad6>] lustre_kill_super+0x36/0x60 [obdclass]
      10:41:27: [<ffffffff81183ef7>] deactivate_super+0x57/0x80
      10:41:27: [<ffffffff811a21ef>] mntput_no_expire+0xbf/0x110
      10:41:27: [<ffffffff811a2c5b>] sys_umount+0x7b/0x3a0
      10:41:27: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      10:41:27:
      10:41:27:Kernel panic - not syncing: LBUG
      

      review-dne:
      https://maloo.whamcloud.com/test_sets/8e6d11fa-6a6c-11e3-9248-52540035b04c

      00:29:59:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1
      00:29:59:LustreError: 6423:0:(osp_precreate.c:481:osp_precreate_send()) lustre-OST0001-osc-MDT0000: can't precreate: rc = -5
      00:29:59:LustreError: 6423:0:(osp_precreate.c:481:osp_precreate_send()) Skipped 1 previous similar message
      00:29:59:LustreError: 6423:0:(osp_precreate.c:984:osp_precreate_thread()) lustre-OST0001-osc-MDT0000: cannot precreate objects: rc = -5
      00:29:59:LustreError: 6423:0:(osp_precreate.c:984:osp_precreate_thread()) Skipped 1 previous similar message
      00:29:59:LustreError: 6427:0:(osp_precreate.c:734:osp_precreate_cleanup_orphans()) lustre-OST0003-osc-MDT0000: cannot cleanup orphans: rc = -5
      00:29:59:Lustre: lustre-MDT0000: Not available for connect from 10.10.16.178@tcp (stopping)
      00:29:59:Lustre: Skipped 66 previous similar messages
      00:29:59:LustreError: 3144:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88006635c000 x1454998270941364/t0(0) o13->lustre-OST0007-osc-MDT0000@10.10.16.184@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      00:29:59:LustreError: 3144:0:(client.c:1076:ptlrpc_import_delay_req()) Skipped 19 previous similar messages
      00:29:59:LustreError: 166-1: MGC10.10.16.179@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
      00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
      00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
      00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?
      00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 5. Is it stuck?
      00:29:59:Lustre: MGS is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 5. Is it stuck?
      

      Though this last one may be better put under LU-4449

      Attachments

        Issue Links

          Activity

            People

              yong.fan nasf (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: