Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11003

LDLM lock list problem in sanity-dom test_12

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1a3238a0-5122-11e8-abc3-52540065bddc

      test_12 failed with the following error:

      Test crashed during sanity-dom test_12
      

      The dmesg on the client is full of:

      [19557.893256] ------------[ cut here ]------------
      [19557.893710] WARNING: CPU: 1 PID: 13247 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
      [19557.894327] list_del corruption. prev->next should be ffff880003a47010, but was 5a5a5a5a5a5a5a5a
      [19557.894951] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) brd loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_en(OE) ptp pps_core mlx4_ib(OE) ib_core(OE) mlx4_core(OE) mlx_compat(OE) devlink iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 ppdev virtio_balloon i2c_core parport_pc parport joydev pcspkr nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi ata_piix libata virtio_blk crct10dif_pclmul crct10dif_common 8139too crc32c_intel serio_raw virtio_pci
      [19557.901071]  virtio_ring virtio 8139cp mii floppy [last unloaded: libcfs]
      [19557.901559] CPU: 1 PID: 13247 Comm: unlinkmany Tainted: G           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
      [19557.902302] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [19557.902712] Call Trace:
      [19557.902914]  [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
      [19557.903299]  [<ffffffff8108ae58>] __warn+0xd8/0x100
      [19557.903656]  [<ffffffff8108aedf>] warn_slowpath_fmt+0x5f/0x80
      [19557.904074]  [<ffffffff81343ee1>] __list_del_entry+0xa1/0xd0
      [19557.904515]  [<ffffffffc09e9143>] ldlm_lock_remove_from_lru_nolock+0x33/0xd0 [ptlrpc]
      [19557.905090]  [<ffffffffc09e98e6>] ldlm_lock_touch_in_lru+0x86/0x200 [ptlrpc]
      [19557.905606]  [<ffffffffc09eabbc>] lock_matches+0x17c/0x180 [ptlrpc]
      [19557.906075]  [<ffffffffc09ebe1c>] ldlm_lock_match+0x12c/0xa30 [ptlrpc]
      [19557.906545]  [<ffffffff81333563>] ? number.isra.2+0x323/0x360
      [19557.906968]  [<ffffffffc0b43312>] mdc_lock_match+0xb2/0x180 [mdc]
      [19557.907407]  [<ffffffffc09a872d>] lmv_lock_match+0x27d/0x4f0 [lmv]
      [19557.907879]  [<ffffffffc0cb3639>] ll_have_md_lock+0x129/0x550 [lustre]
      [19557.908370]  [<ffffffffc0cd7256>] ll_lock_cancel_bits+0x276/0x5b0 [lustre]
      [19557.908863]  [<ffffffffc0cd7769>] ll_md_blocking_ast+0x79/0x2b0 [lustre]
      [19557.909361]  [<ffffffffc09ef1ca>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc]
      [19557.909867]  [<ffffffffc09f9f90>] ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc]
      [19557.910386]  [<ffffffffc09fea4a>] ldlm_cli_cancel_list_local+0xea/0x280 [ptlrpc]
      [19557.910941]  [<ffffffffc09fed6b>] ldlm_cancel_resource_local+0x18b/0x280 [ptlrpc]
      [19557.911482]  [<ffffffffc0b3df92>] mdc_resource_get_unused+0x142/0x2a0 [mdc]
      [19557.911986]  [<ffffffffc0b3ef03>] mdc_unlink+0x273/0x450 [mdc]
      [19557.912407]  [<ffffffffc09b4adb>] lmv_unlink+0x5ab/0x960 [lmv]
      [19557.912829]  [<ffffffffc0cd79d7>] ? ll_i2suppgid+0x37/0x40 [lustre]
      [19557.913294]  [<ffffffffc0cd7a04>] ? ll_i2gids+0x24/0xb0 [lustre]
      [19557.913741]  [<ffffffffc0cdcd57>] ll_unlink+0x177/0x5d0 [lustre]
      [19557.914189]  [<ffffffff812124ec>] vfs_unlink+0x10c/0x190
      [19557.914569]  [<ffffffff812171b5>] do_unlinkat+0x285/0x2d0
      [19557.914963]  [<ffffffff81218166>] SyS_unlink+0x16/0x20
      [19557.915338]  [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
      [19557.915760] ---[ end trace d066d424bfa3c997 ]---
      

      Eventually terminating with

      [19558.738430] LustreError: 13247:0:(ldlm_lock.c:286:ldlm_lock_add_to_lru_nolock()) ASSERTION( list_empty(&lock->l_lru) ) failed: 
      [19558.739284] LustreError: 13247:0:(ldlm_lock.c:286:ldlm_lock_add_to_lru_nolock()) LBUG
      [19558.739855] Pid: 13247, comm: unlinkmany
      [19558.740156] 
      Call Trace:
      [19558.740461]  [<ffffffffc06867ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      [19558.740948]  [<ffffffffc068683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [19558.741419]  [<ffffffffc09e96c5>] ldlm_lock_add_to_lru_nolock+0xa5/0x110 [ptlrpc]
      [19558.741982]  [<ffffffffc09e9773>] ldlm_lock_add_to_lru+0x43/0x130 [ptlrpc]
      [19558.742504]  [<ffffffffc09eb398>] ldlm_lock_decref_internal+0x2f8/0xa80 [ptlrpc]
      [19558.743071]  [<ffffffffc08274b3>] ? class_handle2object+0xc3/0x1c0 [obdclass]
      [19558.743608]  [<ffffffffc09ebb4e>] ldlm_lock_decref+0x2e/0x80 [ptlrpc]
      [19558.744099]  [<ffffffffc0c943ba>] ll_intent_drop_lock.part.14+0x4a/0x170 [lustre]
      [19558.744654]  [<ffffffffc0c95a48>] ll_lookup_finish_locks+0x208/0x780 [lustre]
      [19558.745194]  [<ffffffffc0ca5291>] ll_inode_revalidate+0x1c1/0x800 [lustre]
      [19558.745704]  [<ffffffffc0ca859b>] ll_inode_permission+0x1db/0x3c0 [lustre]
      [19558.746216]  [<ffffffff81210eb1>] __inode_permission+0x71/0xd0
      [19558.746645]  [<ffffffff81210f28>] inode_permission+0x18/0x50
      [19558.747070]  [<ffffffff8121305e>] link_path_walk+0x27e/0x8b0
      [19558.747491]  [<ffffffffc0b3ee96>] ? mdc_unlink+0x206/0x450 [mdc]
      [19558.747937]  [<ffffffff812137eb>] path_lookupat+0x6b/0x7c0
      [19558.748362]  [<ffffffffc0a14b9c>] ? __ptlrpc_req_finished+0x9c/0x730 [ptlrpc]
      [19558.748887]  [<ffffffff811e4049>] ? kmem_cache_alloc+0x179/0x1f0
      [19558.749334]  [<ffffffff8121695f>] ? getname_flags+0x4f/0x1a0
      [19558.749749]  [<ffffffff81213f6b>] filename_lookup+0x2b/0xc0
      [19558.750164]  [<ffffffff81216ce7>] user_path_parent+0x47/0x70
      [19558.750581]  [<ffffffff81216f94>] do_unlinkat+0x64/0x2d0
      [19558.750979]  [<ffffffff81218166>] SyS_unlink+0x16/0x20
      [19558.751364]  [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
      [19558.751801] 
      [19558.751929] Kernel panic - not syncing: LBUG
      

      Since it happened after the latest ibits conversion logic, I suspect that might be the culprit here?

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-dom test_12 - Test crashed during sanity-dom test_12

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: