Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Oleg Drokin <green@whamcloud.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1a3238a0-5122-11e8-abc3-52540065bddc
test_12 failed with the following error:
Test crashed during sanity-dom test_12
The dmesg on the client is full of:
[19557.893256] ------------[ cut here ]------------ [19557.893710] WARNING: CPU: 1 PID: 13247 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 [19557.894327] list_del corruption. prev->next should be ffff880003a47010, but was 5a5a5a5a5a5a5a5a [19557.894951] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) brd loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_en(OE) ptp pps_core mlx4_ib(OE) ib_core(OE) mlx4_core(OE) mlx_compat(OE) devlink iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 ppdev virtio_balloon i2c_core parport_pc parport joydev pcspkr nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi ata_piix libata virtio_blk crct10dif_pclmul crct10dif_common 8139too crc32c_intel serio_raw virtio_pci [19557.901071] virtio_ring virtio 8139cp mii floppy [last unloaded: libcfs] [19557.901559] CPU: 1 PID: 13247 Comm: unlinkmany Tainted: G OE ------------ 3.10.0-693.21.1.el7.x86_64 #1 [19557.902302] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [19557.902712] Call Trace: [19557.902914] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b [19557.903299] [<ffffffff8108ae58>] __warn+0xd8/0x100 [19557.903656] [<ffffffff8108aedf>] warn_slowpath_fmt+0x5f/0x80 [19557.904074] [<ffffffff81343ee1>] __list_del_entry+0xa1/0xd0 [19557.904515] [<ffffffffc09e9143>] ldlm_lock_remove_from_lru_nolock+0x33/0xd0 [ptlrpc] [19557.905090] [<ffffffffc09e98e6>] ldlm_lock_touch_in_lru+0x86/0x200 [ptlrpc] [19557.905606] [<ffffffffc09eabbc>] lock_matches+0x17c/0x180 [ptlrpc] [19557.906075] [<ffffffffc09ebe1c>] ldlm_lock_match+0x12c/0xa30 [ptlrpc] [19557.906545] [<ffffffff81333563>] ? number.isra.2+0x323/0x360 [19557.906968] [<ffffffffc0b43312>] mdc_lock_match+0xb2/0x180 [mdc] [19557.907407] [<ffffffffc09a872d>] lmv_lock_match+0x27d/0x4f0 [lmv] [19557.907879] [<ffffffffc0cb3639>] ll_have_md_lock+0x129/0x550 [lustre] [19557.908370] [<ffffffffc0cd7256>] ll_lock_cancel_bits+0x276/0x5b0 [lustre] [19557.908863] [<ffffffffc0cd7769>] ll_md_blocking_ast+0x79/0x2b0 [lustre] [19557.909361] [<ffffffffc09ef1ca>] ldlm_cancel_callback+0x8a/0x330 [ptlrpc] [19557.909867] [<ffffffffc09f9f90>] ldlm_cli_cancel_local+0xa0/0x420 [ptlrpc] [19557.910386] [<ffffffffc09fea4a>] ldlm_cli_cancel_list_local+0xea/0x280 [ptlrpc] [19557.910941] [<ffffffffc09fed6b>] ldlm_cancel_resource_local+0x18b/0x280 [ptlrpc] [19557.911482] [<ffffffffc0b3df92>] mdc_resource_get_unused+0x142/0x2a0 [mdc] [19557.911986] [<ffffffffc0b3ef03>] mdc_unlink+0x273/0x450 [mdc] [19557.912407] [<ffffffffc09b4adb>] lmv_unlink+0x5ab/0x960 [lmv] [19557.912829] [<ffffffffc0cd79d7>] ? ll_i2suppgid+0x37/0x40 [lustre] [19557.913294] [<ffffffffc0cd7a04>] ? ll_i2gids+0x24/0xb0 [lustre] [19557.913741] [<ffffffffc0cdcd57>] ll_unlink+0x177/0x5d0 [lustre] [19557.914189] [<ffffffff812124ec>] vfs_unlink+0x10c/0x190 [19557.914569] [<ffffffff812171b5>] do_unlinkat+0x285/0x2d0 [19557.914963] [<ffffffff81218166>] SyS_unlink+0x16/0x20 [19557.915338] [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21 [19557.915760] ---[ end trace d066d424bfa3c997 ]---
Eventually terminating with
[19558.738430] LustreError: 13247:0:(ldlm_lock.c:286:ldlm_lock_add_to_lru_nolock()) ASSERTION( list_empty(&lock->l_lru) ) failed: [19558.739284] LustreError: 13247:0:(ldlm_lock.c:286:ldlm_lock_add_to_lru_nolock()) LBUG [19558.739855] Pid: 13247, comm: unlinkmany [19558.740156] Call Trace: [19558.740461] [<ffffffffc06867ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [19558.740948] [<ffffffffc068683c>] lbug_with_loc+0x4c/0xb0 [libcfs] [19558.741419] [<ffffffffc09e96c5>] ldlm_lock_add_to_lru_nolock+0xa5/0x110 [ptlrpc] [19558.741982] [<ffffffffc09e9773>] ldlm_lock_add_to_lru+0x43/0x130 [ptlrpc] [19558.742504] [<ffffffffc09eb398>] ldlm_lock_decref_internal+0x2f8/0xa80 [ptlrpc] [19558.743071] [<ffffffffc08274b3>] ? class_handle2object+0xc3/0x1c0 [obdclass] [19558.743608] [<ffffffffc09ebb4e>] ldlm_lock_decref+0x2e/0x80 [ptlrpc] [19558.744099] [<ffffffffc0c943ba>] ll_intent_drop_lock.part.14+0x4a/0x170 [lustre] [19558.744654] [<ffffffffc0c95a48>] ll_lookup_finish_locks+0x208/0x780 [lustre] [19558.745194] [<ffffffffc0ca5291>] ll_inode_revalidate+0x1c1/0x800 [lustre] [19558.745704] [<ffffffffc0ca859b>] ll_inode_permission+0x1db/0x3c0 [lustre] [19558.746216] [<ffffffff81210eb1>] __inode_permission+0x71/0xd0 [19558.746645] [<ffffffff81210f28>] inode_permission+0x18/0x50 [19558.747070] [<ffffffff8121305e>] link_path_walk+0x27e/0x8b0 [19558.747491] [<ffffffffc0b3ee96>] ? mdc_unlink+0x206/0x450 [mdc] [19558.747937] [<ffffffff812137eb>] path_lookupat+0x6b/0x7c0 [19558.748362] [<ffffffffc0a14b9c>] ? __ptlrpc_req_finished+0x9c/0x730 [ptlrpc] [19558.748887] [<ffffffff811e4049>] ? kmem_cache_alloc+0x179/0x1f0 [19558.749334] [<ffffffff8121695f>] ? getname_flags+0x4f/0x1a0 [19558.749749] [<ffffffff81213f6b>] filename_lookup+0x2b/0xc0 [19558.750164] [<ffffffff81216ce7>] user_path_parent+0x47/0x70 [19558.750581] [<ffffffff81216f94>] do_unlinkat+0x64/0x2d0 [19558.750979] [<ffffffff81218166>] SyS_unlink+0x16/0x20 [19558.751364] [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21 [19558.751801] [19558.751929] Kernel panic - not syncing: LBUG
Since it happened after the latest ibits conversion logic, I suspect that might be the culprit here?
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-dom test_12 - Test crashed during sanity-dom test_12
Attachments
Issue Links
- mentioned in
-
Page No Confluence page found with the given URL.
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32692/
Subject:
LU-11003ldlm: don't add canceling lock back to LRUProject: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ad52f394bd8208c87398a3f593112c5340a7ceec