Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2505

lfsck: BUG: soft lockup - CPU#0 stuck for 67s! [umount:22194]

    XMLWordPrintable

Details

    • 3
    • 5875

    Description

      lfsck test hung as follows:

      <~snip~>
      creating /mnt/lustre/d0.lfsck/testfile.30
      creating files in /mnt/lustre/d0.lfsck/d9
      creating files in /mnt/lustre/d0.lfsck/d10
      creating files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      linking files in /mnt/lustre/d0.lfsck/d11
      CMD: client-26vm8 lctl get_param -n obdfilter.*.uuid
      CMD: client-26vm8 lctl get_param -n obdfilter.lustre-OST0000.mntdev
      Stopping clients: client-26vm1,client-26vm2.lab.whamcloud.com /mnt/lustre (opts:-f)
      

      Console log on client-26vm2 showed that:

      Lustre: 4267:0:(llite_lib.c:2290:ll_dirty_page_discard_warn()) dirty page discard: 10.10.4.154@tcp:/lustre/fid: [0x200002b13:0x1224:0x0]/ may get corrupted (rc -108)
      Lustre: 4267:0:(llite_lib.c:2290:ll_dirty_page_discard_warn()) dirty page discard: 10.10.4.154@tcp:/lustre/fid: [0x200002b13:0x1237:0x0]/ may get corrupted (rc -108)
      Lustre: 22194:0:(cl_lock.c:2026:cl_lock_page_out()) Writing 1 pages error: -108
      BUG: soft lockup - CPU#0 stuck for 67s! [umount:22194]
      Modules linked in: lustre(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) ext2 nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
      CPU 0 
      Modules linked in: lustre(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) ext2 nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
      
      Pid: 22194, comm: umount Not tainted 2.6.32-279.14.1.el6.x86_64 #1 Red Hat KVM
      RIP: 0010:[<ffffffff8150098e>]  [<ffffffff8150098e>] _spin_lock+0x1e/0x30
      RSP: 0018:ffff88007b7ef508  EFLAGS: 00000206
      RAX: 0000000000000001 RBX: ffff88007b7ef508 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff88007ae2c400 RDI: ffff880076b3a734
      RBP: ffffffff8100bc0e R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800760154e0
      R13: ffff8800760154e0 R14: ffff880024017cf0 R15: ffff88007b7ef488
      FS:  00007f7464baa740(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007f74642c81b0 CR3: 0000000078444000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process umount (pid: 22194, threadinfo ffff88007b7ee000, task ffff88007d059500)
      Stack:
       ffff88007b7ef5a8 ffffffffa08ee29c ffff8800754f3d98 ffff880073c08400
      <d> ffff88007ae2c400 ffff880078d76c38 ffff88007ae2c510 ffff88007ae2c4f0
      <d> ffff880076b3a648 0000000100000001 ffff88007ae2c4f8 ffff88007d0449e8
      Call Trace:
       [<ffffffffa08ee29c>] ? osc_io_submit+0x31c/0x540 [osc]
       [<ffffffffa081c148>] ? cl_io_submit_rw+0x78/0x180 [obdclass]
       [<ffffffffa0958ad5>] ? lov_io_submit+0x355/0xc30 [lov]
       [<ffffffffa0813c65>] ? cl_page_put+0x1e5/0x580 [obdclass]
       [<ffffffffa081680a>] ? pgoff_at_lock+0x1a/0x70 [obdclass]
       [<ffffffffa081c148>] ? cl_io_submit_rw+0x78/0x180 [obdclass]
       [<ffffffffa081f29f>] ? cl_io_submit_sync+0x9f/0x210 [obdclass]
       [<ffffffffa08166c7>] ? cl_lock_page_out+0x207/0x330 [obdclass]
       [<ffffffffa0816860>] ? pageout_cb+0x0/0x100 [obdclass]
       [<ffffffffa08ea99f>] ? osc_lock_flush+0x4f/0x90 [osc]
       [<ffffffffa08eaaaf>] ? osc_lock_cancel+0xcf/0x1b0 [osc]
       [<ffffffffa08152a5>] ? cl_lock_cancel0+0x75/0x160 [obdclass]
       [<ffffffffa0815f8b>] ? cl_lock_cancel+0x13b/0x140 [obdclass]
       [<ffffffffa08ebb8a>] ? osc_ldlm_blocking_ast+0x13a/0x380 [osc]
       [<ffffffffa0420cc0>] ? ldlm_cancel_callback+0x60/0x100 [ptlrpc]
       [<ffffffffa043b14b>] ? ldlm_cli_cancel_local+0x7b/0x380 [ptlrpc]
       [<ffffffffa043efd8>] ? ldlm_cli_cancel+0x58/0x3a0 [ptlrpc]
       [<ffffffffa0428db8>] ? cleanup_resource+0x168/0x300 [ptlrpc]
       [<ffffffffa06baefe>] ? cfs_hash_spin_lock+0xe/0x10 [libcfs]
       [<ffffffffa0428f7a>] ? ldlm_resource_clean+0x2a/0x50 [ptlrpc]
       [<ffffffffa06bb18f>] ? cfs_hash_for_each_relax+0x17f/0x380 [libcfs]
       [<ffffffffa0428f50>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
       [<ffffffffa0428f50>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
       [<ffffffffa06bcbaf>] ? cfs_hash_for_each_nolock+0x7f/0x1c0 [libcfs]
       [<ffffffffa0425a49>] ? ldlm_namespace_cleanup+0x29/0xb0 [ptlrpc]
       [<ffffffffa08e2122>] ? osc_import_event+0x3d2/0x1470 [osc]
       [<ffffffffa0487c59>] ? ptlrpc_invalidate_import+0x2a9/0x8d0 [ptlrpc]
       [<ffffffffa08e283c>] ? osc_import_event+0xaec/0x1470 [osc]
       [<ffffffffa045f53d>] ? ptlrpc_set_import_active+0x33d/0x480 [ptlrpc]
       [<ffffffffa08dd070>] ? osc_iocontrol+0xc80/0x11b0 [osc]
       [<ffffffff8127cf26>] ? vsnprintf+0x2b6/0x5f0
       [<ffffffffa092f7ff>] ? lov_iocontrol+0x27f/0x1e90 [lov]
       [<ffffffffa064eadf>] ? lmv_iocontrol+0x25f/0x1cf0 [lmv]
       [<ffffffff81188961>] ? path_put+0x31/0x40
       [<ffffffff81188325>] ? putname+0x35/0x50
       [<ffffffffa09dd640>] ? ll_umount_begin+0x2c0/0xc30 [lustre]
       [<ffffffff8119b52a>] ? sys_umount+0x11a/0x3a0
       [<ffffffff810d6d42>] ? audit_syscall_entry+0x272/0x2a0
       [<ffffffff81503ade>] ? do_page_fault+0x3e/0xa0
       [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
      

      The full console log is attached.

      Maloo report: https://maloo.whamcloud.com/test_sets/679bdff0-4828-11e2-a406-52540035b04c

      Attachments

        1. client-12vm1-console.log
          102 kB
          Jian Yu
        2. client-26vm2-console.log
          733 kB
          Jian Yu

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: