[LU-9485] conf-sanity test_46a: cl_file_inode_init()) Failure to initialize cl object Created: 10/May/17  Updated: 22/Jun/17  Resolved: 22/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Environment:

FSTYPE=zfs


Attachments: Text File vmcore-dmesg.txt    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for John Hammond <john.hammond@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2d9acc98-3546-11e7-b0a8-5254006e85c2.

The sub-test test_46a failed with the following error:

test failed to respond and timed out

From the client:

01:20:15:[ 8727.094028] Lustre: Mounted lustre-client
01:20:15:[ 8727.326631] LustreError: 14973:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
01:20:15:[ 8727.329204] Lustre: 14973:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
01:20:15:[ 8727.331709] Lustre: 14973:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
01:20:15:[ 8727.334195] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
01:20:15:[ 8727.336551] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
01:20:15:[ 8727.338877] LustreError: 14973:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
01:20:15:[ 8727.343166] Lustre: 14973:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
01:20:15:[ 8727.345641] Lustre: 14973:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
01:20:15:[ 8727.348135] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
01:20:15:[ 8727.350427] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
01:20:15:[ 8727.352806] LustreError: 14973:0:(llite_lib.c:2256:ll_prep_inode()) new_inode -fatal: rc -22
01:20:15:[ 8727.362035] BUG: unable to handle kernel paging request at ffffffa0c9b0b0ff
01:20:15:[ 8727.363006] IP: [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
01:20:15:[ 8727.363006] PGD 19bd067 PUD 0 
01:20:15:[ 8727.363006] Oops: 0010 [#1] SMP 
01:20:15:[ 8727.363006] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel parport_pc parport lrw gf128mul glue_helper ablk_helper cryptd virtio_balloon i2c_piix4 pcspkr nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper 8139too ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel serio_raw ttm virtio_pci virtio_ring virtio drm 8139cp i2c_core mii libata floppy [last unloaded: libcfs]
01:20:15:[ 8727.363006] CPU: 0 PID: 14979 Comm: bash Tainted: G        W  OE  ------------   3.10.0-514.16.1.el7.x86_64 #1
01:20:15:[ 8727.363006] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
01:20:15:[ 8727.363006] task: ffff88007ae76dd0 ti: ffff88007a240000 task.ti: ffff88007a240000
01:20:15:[ 8727.363006] RIP: 0010:[<ffffffa0c9b0b0ff>]  [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
01:20:15:[ 8727.363006] RSP: 0018:ffff88007fc03e78  EFLAGS: 00010286
01:20:15:[ 8727.363006] RAX: ffffffa0c9b0b0ff RBX: ffffffff81a1a5c0 RCX: 0000000000019b00
01:20:15:[ 8727.363006] RDX: ffff880067b8caa7 RSI: 0000000000000003 RDI: ffff880067b8caa7
01:20:15:[ 8727.363006] RBP: ffff88007fc03ed0 R08: ffff88007b7b7e00 R09: 000000018010000f
01:20:15:[ 8727.363006] R10: 0000000000000001 R11: ffffea0001ededc0 R12: 000000000000000a
01:20:15:[ 8727.363006] R13: 0000000000000001 R14: ff8800658b5fb0ff R15: ffff88007fc101c0
01:20:15:[ 8727.363006] FS:  00007f4f234d3740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
01:20:15:[ 8727.363006] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
01:20:15:[ 8727.363006] CR2: ffffffa0c9b0b0ff CR3: 00000000782bb000 CR4: 00000000000406f0
01:20:15:[ 8727.363006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
01:20:15:[ 8727.363006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
01:20:15:[ 8727.363006] Stack:
01:20:15:[ 8727.363006]  ffffffff81139bbd ffff880066441800 0000000000000001 ffff88007fc101e8
01:20:15:[ 8727.363006]  ffff88007ae76dd0 ffff88007a243fd8 0000000000000009 0000000000000009
01:20:15:[ 8727.363006]  ffff880067d22f58 ffffffff819b3108 0000000000000001 ffff88007fc03f40
01:20:15:[ 8727.363006] Call Trace:
01:20:15:[ 8727.363006]  <IRQ> 
01:20:15:[ 8727.363006]  [<ffffffff81139bbd>] ? rcu_process_callbacks+0x1dd/0x550
01:20:15:[ 8727.363006]  [<ffffffff8108f63f>] __do_softirq+0xef/0x280
01:20:15:[ 8727.363006]  [<ffffffff81698c1c>] call_softirq+0x1c/0x30
01:20:15:[ 8727.363006]  [<ffffffff8102d365>] do_softirq+0x65/0xa0
01:20:15:[ 8727.363006]  [<ffffffff8108f9d5>] irq_exit+0x115/0x120
01:20:15:[ 8727.363006]  [<ffffffff81699895>] smp_apic_timer_interrupt+0x45/0x60
01:20:15:[ 8727.363006]  [<ffffffff81697ddd>] apic_timer_interrupt+0x6d/0x80
01:20:15:[ 8727.363006]  <EOI> 
01:20:15:[ 8727.363006]  [<ffffffff8121ccf3>] ? dup_fd+0x153/0x2d0
01:20:15:[ 8727.363006]  [<ffffffff8121ccbf>] ? dup_fd+0x11f/0x2d0
01:20:15:[ 8727.363006]  [<ffffffff81083f70>] copy_process+0xc10/0x1960
01:20:15:[ 8727.363006]  [<ffffffff81084e71>] do_fork+0x91/0x2c0
01:20:16:[ 8727.363006]  [<ffffffff81085126>] SyS_clone+0x16/0x20
01:20:16:[ 8727.363006]  [<ffffffff816974d9>] stub_clone+0x69/0x90
01:20:16:[ 8727.363006]  [<ffffffff81697189>] ? system_call_fastpath+0x16/0x1b
01:20:16:[ 8727.363006] Code:  Bad RIP value.
01:20:16:[ 8727.363006] RIP  [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
01:20:16:[ 8727.363006]  RSP <ffff88007fc03e78>
01:20:16:[ 8727.363006] CR2: ffffffa0c9b0b0ff

https://testing.hpdd.intel.com/test_logs/2f0f1c78-3546-11e7-b0a8-5254006e85c2/show_text

Info required for matching: conf-sanity 46a



 Comments   
Comment by Jian Yu [ 11/May/17 ]

This has been blocking patch review testing on review-zfs-part-2 session against master branch since 2017-05-09:
https://testing.hpdd.intel.com/test_sets/fb1a837a-3537-11e7-a0e9-5254006e85c2
https://testing.hpdd.intel.com/test_sets/2d9acc98-3546-11e7-b0a8-5254006e85c2
https://testing.hpdd.intel.com/test_sets/94fc8c78-3550-11e7-b0a8-5254006e85c2
https://testing.hpdd.intel.com/test_sets/83fa4200-3599-11e7-b0a8-5254006e85c2
https://testing.hpdd.intel.com/test_sets/709ad224-35c6-11e7-8847-5254006e85c2

Comment by Peter Jones [ 16/May/17 ]

Jinshan

Could you please look into this one?

Thanks

Peter

Comment by Jinshan Xiong (Inactive) [ 16/May/17 ]

This bug may imply several latent issues.

1. wait_osc_import_state client ost FULL is broken by the commit: 5c315184152, and the update in question is:

diff --git a/lustre/tests/test-framework.sh b/lustre/tests/test-framework.sh
index d27cb30656..37e8475891 100755
--- a/lustre/tests/test-framework.sh
+++ b/lustre/tests/test-framework.sh
@@ -6020,9 +6020,8 @@ _wait_osc_import_state() {
 
        if [[ $facet == client* ]]; then
                # During setup time, the osc might not be setup, it need wait
-               # until list_param can return valid value. And also if there
-               # are mulitple osc entries we should list all of them before
-               # go to wait.
+               # until list_param can return valid value.
+               param="osc.${ost%?}[^mM]*.ost_server_uuid"
                params=$($LCTL list_param $param 2>/dev/null || true)
                while [ -z "$params" ]; do
                        if [ $i -ge $maxtime ]; then

As you can see, the variable ost is not even defined, so it just lists the existing OSCs.

2. Somehow the above error triggered memory corruption somewhere. I checked the corresponding code of PFL in the LOV layer and they seemed to be ok.

Comment by Gerrit Updater [ 16/May/17 ]

Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27148
Subject: LU-9485 test: revert a change from LU-5361
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ddf69a94fd2d77c78185709c39b70309c3b0322a

Comment by Jinshan Xiong (Inactive) [ 16/May/17 ]

The above patch should be able to hide the issue

However, I think we should assign someone to reproduce this issue manually(I couldn't reproduce it with my VM), and once we have a reliable way of reproducing it, we should load a debug kernel to catch the memory corruption issue.

Comment by Jinshan Xiong (Inactive) [ 17/May/17 ]

It would be easier if we had a crash dump for this.

Comment by Jian Yu [ 17/May/17 ]

Hi Jinshan,

It would be easier if we had a crash dump for this.

FYI, from the client console log https://testing.hpdd.intel.com/test_logs/2f0f1c78-3546-11e7-b0a8-5254006e85c2/show_text , you can find the following messages:

01:20:17:         Starting Kdump Vmcore Save Service...
01:20:17:kdump: dump target is trevis-2.trevis.hpdd.intel.com:/export/scratch
01:20:17:kdump: saving to /kdumproot/scratch//dumps/trevis-41vm1.trevis.hpdd.intel.com/10.9.5.237-2017-05-10-01:19:54/
01:20:17:kdump: saving vmcore-dmesg.txt
01:20:17:kdump: saving vmcore-dmesg.txt complete
01:20:17:kdump: saving vmcore
01:20:17:
Checking for memory holes          : [  0.0 %] /
Checking for memory holes          : [100.0 %] |
Excluding unnecessary pages        : [100.0 %] \
Checking for memory holes          : [100.0 %] -
Checking for memory holes          : [100.0 %] /
Excluding unnecessary pages        : [100.0 %] |
Copying data                       : [ 69.0 %] \
Copying data                       : [100.0 %] -
Copying data                       : [100.0 %] /
01:20:17:kdump: saving vmcore complete

Comment by Jinshan Xiong (Inactive) [ 17/May/17 ]

I looked at the crash dump at https://testing.hpdd.intel.com/test_sets/e592b0c0-39b7-11e7-8847-5254006e85c2

It turned out the address of ll_inode_destroy_callback is 0xffffffffa0ca60b0, and the corresponding rcu_head should be 0xffff880066067328, but somehow it was referenced as 0xffff880066067327, i.e. one byte difference.

Since the rcu list is maintained by kernel, the pointer to rcu_head of inode is dereferenced from the next of the previous rcu_head, so it's possible that that piece of memory is somehow reused and decreased by 1.

Comment by John Hammond [ 17/May/17 ]

Jinshan, please note that this has only been seen on zfs test sessions and only for changes that included the last commit of the May 9th landing batch (e0deeb237ae111e959e916ae910d15272ed4c120).

So it may be from one of:

e0deeb2 LU-9140 nrs: add some debug log for NRS TBF
57cc12f LU-9312 hsm: release restore lock without object
8946ce9 LU-9447 o2iblnd: Check for 2 arg ib_alloc_pd
af73b39 LU-9443 mdd: omit changelog records for volatile files
b0404c1 LU-9369 lfs: make lfs find work correctly
92bbd06 LU-8746 update: restore tdtd_refcount during failure
e5eec25 LU-9385 mdt: remove XATTR locking from mdt_add_dirty_flag()
77a60b0 LU-9404 mdt: set HSM xattr only when needed
cd58a3b LU-9403 mdt: prevent HSM leak on re-archive
945fd61 LU-8364 ldiskfs: fixes for failover mode
87a233e LU-9411 tests: skip llapi_layout_test 30, 31 for interop
dded1e7 LU-9312 hsm: add a cookie indexed request hash
d29e759 LU-9312 hsm: fix error handling around mdt_hsm_get_md_hsm()
2585154 LU-9119 lnet: selftest MR fix
f2cba08 LU-9394 osd: __osd_obj2dnode() to return negative errors
ad75291 LU-9040 scrub: detect dead loop scanning
6746484 LU-8998 llapi: add LLAPI_LAYOUT_COMP_USE_PREV
293da20 LU-9201 test: avoid long sleeps in mount_facet()
191608c LU-9205 tests: fix failures in CLIENTONLY mode
1b3028a LU-8367 osp: orphan cleanup do not wait for reserved
9a6b262 LU-9285 osp: revert patches LU-8367 and LU-8973
e9a1d3d LU-9183 llite: handle removal the pos argument of generic_write_sync
4d1bb7a LU-9183 llite: handle make the string hashes salt the hash
cf4481e LU-9125 test: Update setstripe options
de7a14b LU-4423 ptlrpc: use 64-bit times for request times
4ce3219 LU-4017 quota: cleanup to improve quota codes
672986c LU-8376 ost: enhance end to end bulk cksum error report
6dc05f0 LU-8650 mdt: enable REP-ACK for DNE
c156613 LU-8307 ldlm: cond_resched in ldlm_bl_thread_main
2ffbcc9 LU-9430 utils: fix logic errors and putchar in sk_name2hmac()
7d30275 LU-7062 ldlm: GPF in _ldlm_lock_debug()
Comment by Jinshan Xiong (Inactive) [ 17/May/17 ]

Those changes seem pretty simple on the client side. Also the test only handles a plain file w/o PFL layout.

I wonder if there is a test env related or server side change around May 9th so that the clients become more likely to see that OST2 is not ready in conf-sanity:46a, and then the memory corruption is just a fallout in the failure path. The issue may have been existing but wasn't triggered somehow

Comment by John Hammond [ 18/May/17 ]

> I wonder if there is a test env related or server side change around May 9th so that the clients become more likely to see that OST2 is not ready in conf-sanity:46a, and then the memory corruption is just a fallout in the failure path. The issue may have been existing but wasn't triggered somehow

That's what I was thinking. Maybe one of:

293da20 LU-9201 test: avoid long sleeps in mount_facet()
1b3028a LU-8367 osp: orphan cleanup do not wait for reserved
9a6b262 LU-9285 osp: revert patches LU-8367 and LU-8973

Comment by Jinshan Xiong (Inactive) [ 18/May/17 ]

I'm more concerned about the memory corruption caused by the failure path. If you have time, please double check the code path and see if you can find something new.

Comment by John Hammond [ 22/May/17 ]

Hi Jinshan, I double checked but didn't find anything. Can you try running with a debug enable kernel and injecting the error see in lsme_unpack()?

Comment by Jinshan Xiong (Inactive) [ 22/May/17 ]

I injected the error before but it didn't cause the kernel crash. Debug kernel probably won't help in this case. We need to load a kernel with kasan enabled, but that will need a Ubuntu test node. Too much work for now.

Comment by Sarah Liu [ 03/Jun/17 ]

with debug kernel, PTLDEBUG=-1

running: conf-sanity ONLY=46a 
run_suite conf-sanity /usr/lib64/lustre/tests/conf-sanity.sh
[ 1539.918416] Lustre: DEBUG MARKER: -----============= acceptance-small: conf-sanity ============----- Sat Jun 3 00:21:33 UTC 2017
[ 1540.345014] Lustre: DEBUG MARKER: excepting tests: 32newtarball 101
[ 1540.509025] Lustre: Unmounted lustre-client
[ 1594.485769] LNet: Removed LNI 10.9.6.136@tcp
[ 1595.152511] LNet: 22761:0:(workitem.c:319:cfs_wi_sched_destroy()) waiting for 2 threads of WI sched[cfs_rh] to terminate
[ 1596.205687] LNet: HW nodes: 1, HW CPU cores: 2, npartitions: 1
[ 1596.208649] alg: No test for adler32 (adler32-zlib)
[ 1596.209259] alg: No test for crc32 (crc32-table)
[ 1597.151613] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 00:22:29 (1496449349)
[ 1598.952811] Lustre: Lustre: Build Version: 2.9.58_22_gdb59ecb
[ 1598.987313] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180]
[ 1598.988049] LNet: Accept secure, port 988
[ 1599.099714] Lustre: Echo OBD driver; http://www.lustre.org/
[ 1634.101425] Lustre: Mounted lustre-client
[ 1640.492000] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800362bb000.ost_server_uuid 40
[ 1640.724598] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800362bb000.ost_server_uuid in FULL state after 0 sec
[ 1640.792619] Lustre: Mounted lustre-client
[ 1640.978831] LustreError: 25798:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
[ 1640.979873] Lustre: 25798:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
[ 1640.980876] Lustre: 25798:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
[ 1640.981880] Lustre: 25798:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0:2
[ 1640.982705] Lustre: 25798:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 0 subobj 0x0:34
[ 1640.983574] LustreError: 25798:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
[ 1640.985037] Lustre: 25798:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
[ 1640.986008] Lustre: 25798:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
[ 1640.987005] Lustre: 25798:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 1 subobj 0x0:2
[ 1640.987830] Lustre: 25798:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 0 subobj 0x0:34
[ 1640.988788] LustreError: 25798:0:(llite_lib.c:2257:ll_prep_inode()) new_inode -fatal: rc -22
[ 1640.992995] ------------[ cut here ]------------
[ 1640.993476] WARNING: at lib/debugobjects.c:260 debug_print_object+0x83/0xa0()
[ 1640.994159] ODEBUG: active_state not available (active state 0) object type: rcu_head hint:           (null)
[ 1640.995086] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper virtio_balloon i2c_piix4 i2c_core ppdev pcspkr ablk_helper cryptd parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk crct10dif_pclmul crct10dif_common crc32c_intel 8139too serio_raw virtio_pci virtio_ring virtio ata_piix 8139cp mii libata floppy [last unloaded: libcfs]
[ 1641.003768] CPU: 0 PID: 25801 Comm: grep Tainted: G           OE  ------------   3.10.0-514.16.1.el7.x86_64.debug #1
[ 1641.004767] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1641.005314]  ffff88007d203d48 000000000a3b44cf ffff88007d203d00 ffffffff81759e39
[ 1641.006097]  ffff88007d203d38 ffffffff81095cd0 ffff88007d203df0 ffffffff81afe840
[ 1641.006875]  ffffffff819f6712 0000000000000296 ffffffff81afe840 ffff88007d203da0
[ 1641.007654] Call Trace:
[ 1641.007904]  <IRQ>  [<ffffffff81759e39>] dump_stack+0x19/0x1b
[ 1641.008504]  [<ffffffff81095cd0>] warn_slowpath_common+0x70/0xb0
[ 1641.009079]  [<ffffffff81095d6c>] warn_slowpath_fmt+0x5c/0x80
[ 1641.009632]  [<ffffffff813aa023>] debug_print_object+0x83/0xa0
[ 1641.010191]  [<ffffffff813aafe0>] debug_object_active_state+0x150/0x190
[ 1641.010837]  [<ffffffff81228f6c>] ? kfree+0xdc/0x2d0
[ 1641.011314]  [<ffffffff811723d6>] rcu_process_callbacks+0x286/0x830
[ 1641.011919]  [<ffffffff810a11df>] __do_softirq+0x10f/0x470
[ 1641.012443]  [<ffffffff8176fb7c>] call_softirq+0x1c/0x30
[ 1641.012962]  [<ffffffff81035bed>] do_softirq+0x8d/0xc0
[ 1641.013454]  [<ffffffff810a1785>] irq_exit+0x125/0x140
[ 1641.013953]  [<ffffffff81770815>] smp_apic_timer_interrupt+0x45/0x60
[ 1641.014555]  [<ffffffff8176ed32>] apic_timer_interrupt+0x72/0x80
[ 1641.015131]  <EOI>  [<ffffffff811ce00e>] ? get_page_from_freelist+0x35e/0xb50
[ 1641.015850]  [<ffffffff811cdd8f>] ? get_page_from_freelist+0xdf/0xb50
[ 1641.016463]  [<ffffffff811cea03>] __alloc_pages_nodemask+0x203/0x580
[ 1641.017072]  [<ffffffff8121c63a>] alloc_pages_vma+0x9a/0x150
[ 1641.017621]  [<ffffffff811f9712>] handle_mm_fault+0xc92/0x1000
[ 1641.018184]  [<ffffffff81768959>] ? __do_page_fault+0x109/0x4e0
[ 1641.018754]  [<ffffffff817689c7>] __do_page_fault+0x177/0x4e0
[ 1641.019302]  [<ffffffff81768e16>] trace_do_page_fault+0x56/0x2c0
[ 1641.019881]  [<ffffffff8176842b>] do_async_page_fault+0x1b/0xd0
[ 1641.020447]  [<ffffffff81764a18>] async_page_fault+0x28/0x30
[ 1641.020994] ---[ end trace 9779c2b1cea644fb ]---
[ 1641.021458] ------------[ cut here ]------------
[ 1641.021920] WARNING: at lib/debugobjects.c:260 debug_print_object+0x83/0xa0()
[ 1641.022635] ODEBUG: deactivate not available (active state 0) object type: rcu_head hint:           (null)
[ 1641.023558] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper virtio_balloon i2c_piix4 i2c_core ppdev pcspkr ablk_helper cryptd parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk crct10dif_pclmul crct10dif_common crc32c_intel 8139too serio_raw virtio_pci virtio_ring virtio ata_piix 8139cp mii libata floppy [last unloaded: libcfs]
[ 1641.033087] CPU: 0 PID: 25801 Comm: grep Tainted: G        W  OE  ------------   3.10.0-514.16.1.el7.x86_64.debug #1
[ 1641.034082] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1641.034629]  ffff88007d203d50 000000000a3b44cf ffff88007d203d08 ffffffff81759e39
[ 1641.035413]  ffff88007d203d40 ffffffff81095cd0 ffff88007d203df0 ffffffff81afe840
[ 1641.036201]  ffffffff819cee2f 0000000000000286 ffffffff81afe840 ffff88007d203da8
[ 1641.036991] Call Trace:
[ 1641.037235]  <IRQ>  [<ffffffff81759e39>] dump_stack+0x19/0x1b
[ 1641.037829]  [<ffffffff81095cd0>] warn_slowpath_common+0x70/0xb0
[ 1641.038409]  [<ffffffff81095d6c>] warn_slowpath_fmt+0x5c/0x80
[ 1641.038962]  [<ffffffff813aa023>] debug_print_object+0x83/0xa0
[ 1641.039520]  [<ffffffff813aa9c2>] debug_object_deactivate+0x142/0x1a0
[ 1641.040147]  [<ffffffff81228f6c>] ? kfree+0xdc/0x2d0
[ 1641.040626]  [<ffffffff811723e5>] rcu_process_callbacks+0x295/0x830
[ 1641.041236]  [<ffffffff810a11df>] __do_softirq+0x10f/0x470
[ 1641.041770]  [<ffffffff8176fb7c>] call_softirq+0x1c/0x30
[ 1641.042353]  [<ffffffff81035bed>] do_softirq+0x8d/0xc0
[ 1641.042855]  [<ffffffff810a1785>] irq_exit+0x125/0x140
[ 1641.043354]  [<ffffffff81770815>] smp_apic_timer_interrupt+0x45/0x60
[ 1641.043967]  [<ffffffff8176ed32>] apic_timer_interrupt+0x72/0x80
[ 1641.044543]  <EOI>  [<ffffffff811ce00e>] ? get_page_from_freelist+0x35e/0xb50
[ 1641.045270]  [<ffffffff811cdd8f>] ? get_page_from_freelist+0xdf/0xb50
[ 1641.045887]  [<ffffffff811cea03>] __alloc_pages_nodemask+0x203/0x580
[ 1641.046500]  [<ffffffff8121c63a>] alloc_pages_vma+0x9a/0x150
[ 1641.047048]  [<ffffffff811f9712>] handle_mm_fault+0xc92/0x1000
[ 1641.047610]  [<ffffffff81768959>] ? __do_page_fault+0x109/0x4e0
[ 1641.048181]  [<ffffffff817689c7>] __do_page_fault+0x177/0x4e0
[ 1641.048735]  [<ffffffff81768e16>] trace_do_page_fault+0x56/0x2c0
[ 1641.049307]  [<ffffffff8176842b>] do_async_page_fault+0x1b/0xd0
[ 1641.049878]  [<ffffffff81764a18>] async_page_fault+0x28/0x30
[ 1641.050421] ---[ end trace 9779c2b1cea644fc ]---
[ 1641.050894] BUG: unable to handle kernel paging request at ffffffa0bbe200ff
[ 1641.051603] IP: [<ffffffa0bbe200ff>] 0xffffffa0bbe200fe
[ 1641.052131] PGD 1abd067 PUD 0 
[ 1641.052560] Oops: 0010 [#1] SMP 
[ 1641.053100] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper virtio_balloon i2c_piix4 i2c_core ppdev pcspkr ablk_helper cryptd parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk crct10dif_pclmul crct10dif_common crc32c_intel 8139too serio_raw virtio_pci virtio_ring virtio ata_piix 8139cp mii libata floppy [last unloaded: libcfs]
[ 1641.062225] CPU: 0 PID: 25801 Comm: grep Tainted: G        W  OE  ------------   3.10.0-514.16.1.el7.x86_64.debug #1
[ 1641.063219] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1641.063772] task: ffff880079024000 ti: ffff8800493b0000 task.ti: ffff8800493b0000
[ 1641.064483] RIP: 0010:[<ffffffa0bbe200ff>]  [<ffffffa0bbe200ff>] 0xffffffa0bbe200fe
[ 1641.065235] RSP: 0000:ffff88007d203e58  EFLAGS: 00010286
[ 1641.065752] RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000006
[ 1641.066426] RDX: ffffffa0bbe200ff RSI: 0000000000000001 RDI: ffff8800425918af
[ 1641.067111] RBP: ffff88007d203ec0 R08: 0000000000000000 R09: 0000000000000001
[ 1641.067792] R10: 0000000000000000 R11: ffff88007d2039fe R12: 0000000000000001
[ 1641.068463] R13: ffff8800425918af R14: 0000000000000001 R15: ff880060db0d58ff
[ 1641.069147] FS:  00007f37f7791740(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000
[ 1641.069919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1641.070466] CR2: ffffffa0bbe200ff CR3: 000000004938a000 CR4: 00000000000406f0
[ 1641.071154] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1641.071835] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1641.072506] Stack:
[ 1641.072718]  ffffffff8117238a ffff88007b2e56a8 ffff88007d3d0760 ffffffff81b38400
[ 1641.073497]  0000000000000000 ffff88007d3d0788 ffff880079024000 ffff8800493b3fd8
[ 1641.074279]  0000000000000009 0000000000000009 0000000000000010 ffffffff81ab3188
[ 1641.075063] Call Trace:
[ 1641.075316]  <IRQ> 
[ 1641.075512]  [<ffffffff8117238a>] ? rcu_process_callbacks+0x23a/0x830
[ 1641.076203]  [<ffffffff810a11df>] __do_softirq+0x10f/0x470
[ 1641.076738]  [<ffffffff8176fb7c>] call_softirq+0x1c/0x30
[ 1641.077255]  [<ffffffff81035bed>] do_softirq+0x8d/0xc0
[ 1641.077755]  [<ffffffff810a1785>] irq_exit+0x125/0x140
[ 1641.078247]  [<ffffffff81770815>] smp_apic_timer_interrupt+0x45/0x60
[ 1641.078855]  [<ffffffff8176ed32>] apic_timer_interrupt+0x72/0x80
[ 1641.079432]  <EOI> 
[ 1641.079627]  [<ffffffff811ce00e>] ? get_page_from_freelist+0x35e/0xb50
[ 1641.080313]  [<ffffffff811cdd8f>] ? get_page_from_freelist+0xdf/0xb50
[ 1641.080935]  [<ffffffff811cea03>] __alloc_pages_nodemask+0x203/0x580
[ 1641.081537]  [<ffffffff8121c63a>] alloc_pages_vma+0x9a/0x150
[ 1641.082087]  [<ffffffff811f9712>] handle_mm_fault+0xc92/0x1000
[ 1641.082641]  [<ffffffff81768959>] ? __do_page_fault+0x109/0x4e0
[ 1641.083216]  [<ffffffff817689c7>] __do_page_fault+0x177/0x4e0
[ 1641.083777]  [<ffffffff81768e16>] trace_do_page_fault+0x56/0x2c0
[ 1641.084349]  [<ffffffff8176842b>] do_async_page_fault+0x1b/0xd0
[ 1641.084919]  [<ffffffff81764a18>] async_page_fault+0x28/0x30
[ 1641.085455] Code:  Bad RIP value.
[ 1641.085830] RIP  [<ffffffa0bbe200ff>] 0xffffffa0bbe200fe
[ 1641.086363]  RSP <ffff88007d203e58>
[ 1641.086714] CR2: ffffffa0bbe200ff
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-514.16.1.el7.x86_64.debug (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Apr 12 15:21:14 UTC 2017
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.16.1.el7.x86_64.debug root=UUID=94c86014-757d-40d3-8e56-e1aec6f393a5 ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867700K
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000009f7ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000021000000-0x0000000034f5cfff] usable
[    0.000000] BIOS-e820: [mem 0x0000000034fff800-0x0000000034ffffff] usable
[    0.000000] BIOS-e820: [mem 0x000000007fffe000-0x000000007fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.4 present.
[    0.000000] Hypervisor detected: KVM
[    0.000000] e820: last_pfn = 0x35000 max_arch_pfn = 0x400000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x7010600070106, new 0x7010600070106
[    0.000000] x2apic enabled by BIOS, switching to x2apic ops
[    0.000000] found SMP MP-table at [mem 0x000f1fe0-0x000f1fef] mapped at [ffff8800000f1fe0]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Using GB pages for direct mapping
[    0.000000] RAMDISK: [mem 0x30f4e000-0x31ffffff]
[    0.000000] ACPI: RSDP 00000000000f1e10 00014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 000000007ffffb04 00030 (v01 BOCHS  BXPCRSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 000000007ffff177 00074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 000000007fffe040 01137 (v01   BXPC   BXDSDT 00000001 INTL 20150619)
[    0.000000] ACPI: FACS 000000007fffe000 00040
[    0.000000] ACPI: SSDT 000000007ffff1eb 00899 (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 000000007ffffa84 00080 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.000000] Setting APIC routing to cluster x2apic.
[    0.000000] NUMA turned off
[    0.000000] Faking a node at [mem 0x0000000000000000-0x0000000034ffffff]
[    0.000000] Initmem setup node 0 [mem 0x00000000-0x34ffffff]
[    0.000000]   NODE_DATA [mem 0x34f36000-0x34f5cfff]
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 0:34ee6001, primary cpu clock
[    0.000000] kvm-clock: using sched offset of 94846834025781 cycles
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009efff]
[    0.000000]   node   0: [mem 0x21000000-0x34f5cfff]
[    0.000000] ACPI: PM-Timer IO Port: 0x608
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: NR_CPUS/possible_cpus limit of 1 reached.  Processor 1/0x1 ignored.
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] smpboot: 2 Processors exceeds NR_CPUS limit of 1
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at include/linux/cpumask.h:116 cpumask_check.part.1+0x1e/0x27()
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-514.16.1.el7.x86_64.debug #1
[    0.000000] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[    0.000000]  0000000000000000 b437bd1d0e6f8cfa ffffffff81aafe30 ffffffff81759e39
[    0.000000]  ffffffff81aafe68 ffffffff81095cd0 0000000000000002 0000000000000001
[    0.000000]  0000000032000000 0000000000000000 0000000000000000 ffffffff81aafe78
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81759e39>] dump_stack+0x19/0x1b
[    0.000000]  [<ffffffff81095cd0>] warn_slowpath_common+0x70/0xb0
[    0.000000]  [<ffffffff81095e1a>] warn_slowpath_null+0x1a/0x20
[    0.000000]  [<ffffffff817522cc>] cpumask_check.part.1+0x1e/0x27
[    0.000000]  [<ffffffff8109a7cf>] set_cpu_possible+0x4f/0x60
[    0.000000]  [<ffffffff81e29fe7>] prefill_possible_map+0xf7/0x105
[    0.000000]  [<ffffffff81e209b8>] setup_arch+0xd44/0xf0d
[    0.000000]  [<ffffffff8175214c>] ? printk+0x72/0x91
[    0.000000]  [<ffffffff81e15d1b>] start_kernel+0xed/0x489
[    0.000000]  [<ffffffff81e15120>] ? early_idt_handlers+0x120/0x120
[    0.000000]  [<ffffffff81e155ee>] x86_64_start_reservations+0x2a/0x2c
[    0.000000]  [<ffffffff81e15742>] x86_64_start_kernel+0x152/0x175
[    0.000000] ---[ end trace 3b72577c75e4d8f3 ]---
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[    0.000000] PM: Registered nosave memory: [mem 0x00100000-0x20ffffff]
[    0.000000] PM: Registered nosave memory: [mem 0x34f5d000-0x34ffffff]
[    0.000000] e820: [mem 0x80000000-0xfeffbfff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] setup_percpu: NR_CPUS:5120 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] PERCPU: Embedded 482 pages/cpu @ffff880034c00000 s1936216 r8192 d29864 u2097152
[    0.000000] KVM setup async PF for cpu 0
[    0.000000] kvm-stealtime: cpu 0, msr 34c0f600
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 80613
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.16.1.el7.x86_64.debug root=UUID=94c86014-757d-40d3-8e56-e1aec6f393a5 ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867700K
[    0.000000] Misrouted IRQ fixup and polling support enabled
[    0.000000] This may significantly impact system performance
[    0.000000] Disabling memory control group subsystem
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[    0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form
[    0.000000] Memory: 261628k/868352k available (7627k kernel code, 540692k absent, 66032k reserved, 4894k data, 6804k init)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU lockdep checking is enabled.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=1.
[    0.000000] NR_IRQS:327936 nr_irqs:256 0
[    0.000000] Spurious LAPIC timer interrupt on cpu 0
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.000000] ... MAX_LOCK_DEPTH:          48
[    0.000000] ... MAX_LOCKDEP_KEYS:        8191
[    0.000000] ... CLASSHASH_SIZE:          4096
[    0.000000] ... MAX_LOCKDEP_ENTRIES:     32768
[    0.000000] ... MAX_LOCKDEP_CHAINS:      65536
[    0.000000] ... CHAINHASH_SIZE:          32768
[    0.000000]  memory used by lock dependency info: 8671 kB
[    0.000000]  per task-struct memory footprint: 2688 bytes
[    0.000000] kmemleak: Kernel memory leak detector disabled
[    0.000000] tsc: Detected 2593.746 MHz processor
[    0.123552] Calibrating delay loop (skipped) preset value.. 5187.49 BogoMIPS (lpj=2593746)
[    0.124514] pid_max: default: 32768 minimum: 301
[    0.125115] Security Framework initialized
[    0.125578] SELinux:  Initializing.
[    0.126156] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.127064] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.127845] Mount-cache hash table entries: 4096
[    0.129197] Initializing cgroup subsys memory
[    0.129714] Initializing cgroup subsys devices
[    0.130247] Initializing cgroup subsys freezer
[    0.130747] Initializing cgroup subsys net_cls
[    0.131261] Initializing cgroup subsys blkio
[    0.131735] Initializing cgroup subsys perf_event
[    0.132269] Initializing cgroup subsys hugetlb
[    0.132755] Initializing cgroup subsys pids
[    0.133238] Initializing cgroup subsys net_prio

Comment by Gerrit Updater [ 03/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27148/
Subject: LU-9485 test: revert a change from LU-5361
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e125515c4c48094379cf828085ddcd166861cb6a

Comment by Jinshan Xiong (Inactive) [ 05/Jun/17 ]

Sarah - as we discussed last Friday, please make crash dump work and enable MALLOC debug only.

Comment by Sarah Liu [ 05/Jun/17 ]

please check crash dump on
[root@trevis-58vm1 ~]# ls /var/crash/
127.0.0.1-2017-06-05-22:42:02

Comment by Jinshan Xiong (Inactive) [ 06/Jun/17 ]

I found something about this issue.

It turned out the previous data structure on the RCU list was as follows:

[ 3782.106995] next = ffff8800afe2d2a7, list = ffff88003553dd00, list func:           (null)
[ 3782.107782] next list: next = ff88007085b2f0ff, list = ffff8800afe2d2a7, list_func = ffffffa0b53fe0ff

Unfortunately it doesn't come with a free function that means it should be freed by free_rcu. I also turned on malloc debug log but it seems this piece of memory has never been used by Lustre. It's not the corresponding dentry that failed inode initialization either.

Comment by Oleg Drokin [ 08/Jun/17 ]

There's interesting message I see in the above dmesg dump:

[ 1640.994159] ODEBUG: active_state not available (active state 0) object type: rcu_head hint:           (null)

I guess that means a double free or something similar?

Comment by Jinshan Xiong (Inactive) [ 13/Jun/17 ]

Oleg - if this were double free, wouldn't it be caught at the 2nd time of call_rcu()?

Comment by Zhenyu Xu [ 14/Jun/17 ]

FYI, tried several error injections on my local VM (ldiskfs backend), cannot trigger the memory corruption yet.

Comment by Sarah Liu [ 14/Jun/17 ]

this issue is easy to reproduce on zfs.

Comment by Gerrit Updater [ 15/Jun/17 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/27658
Subject: LU-9485 debug: check a suspicion
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 233342f4850ce7dea0c5026b5d09b0f8dc773968

Comment by Zhenyu Xu [ 15/Jun/17 ]

Hi Sarah,

Can you try to reproduce this issue again with this debug patch? I want to know whether it has something to do with the memory corruption, if the reproduction fails the assertion which I adds, then I'll make another fix patch.

Comment by Sarah Liu [ 15/Jun/17 ]

Sure, I will update the ticket if I get sth.

Comment by Sarah Liu [ 15/Jun/17 ]

Bobi,

Your debug patch is based on Jinshan's https://review.whamcloud.com/#/c/27148/, with his patch, it is not easy to reproduce. So I added the line back to test-framework.sh and reproduce it.

running: conf-sanity ONLY=46a 
run_suite conf-sanity /usr/lib64/lustre/tests/conf-sanity.sh
[15069.537681] Lustre: DEBUG MARKER: -----============= acceptance-small: conf-sanity ============----- Thu Jun 15 21:38:44 UTC 2017
[15069.902165] Lustre: DEBUG MARKER: excepting tests: 32newtarball 101
[15070.034791] Lustre: Unmounted lustre-client
[15119.059428] Key type lgssc unregistered
[15119.059933] Lustre: 18766:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism
[15124.614091] LNet: Removed LNI 10.9.6.136@tcp
[15125.631080] LNet: HW nodes: 1, HW CPU cores: 4, npartitions: 1
[15125.634023] alg: No test for adler32 (adler32-zlib)
[15125.634592] alg: No test for crc32 (crc32-table)
[15126.534615] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:39:40 (1497562780)
[15128.207906] Lustre: Lustre: Build Version: 2.9.59_16_g233342f
[15128.232085] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180]
[15128.232728] LNet: Accept secure, port 988
[15129.739865] Lustre: 19557:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
[15129.741378] Lustre: 19557:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism
[15129.742225] Key type lgssc registered
[15129.794743] Lustre: Echo OBD driver; http://www.lustre.org/
[15167.777159] Lustre: Mounted lustre-client
[15173.996457] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid 40
[15174.200433] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid in FULL state after 0 sec
[15174.277151] Lustre: Mounted lustre-client
[15174.378436] LustreError: 21918:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
[15174.379473] Lustre: 21918:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
[15174.380433] Lustre: 21918:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
[15174.381483] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
[15174.382314] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
[15174.383184] LustreError: 21918:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
[15174.384297] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: 
[15174.385191] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG

Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ...
 kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: 

Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ...
 kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG
[15174.387462] Pid: 21918, comm: stat
[15174.387797] 
[15174.387797] Call Trace:
[15174.388182]  [<ffffffffa05467ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
[15174.388817]  [<ffffffffa054687c>] lbug_with_loc+0x4c/0xb0 [libcfs]
[15174.389508]  [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre]
[15174.390133]  [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50
[15174.390717]  [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre]
[15174.391397]  [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre]
[15174.391956]  [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre]
[15174.392599]  [<ffffffffa08cb580>] ? lustre_msg_buf+0x0/0x60 [ptlrpc]
[15174.393298]  [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre]
[15174.393957]  [<ffffffffa08cf4e0>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc]
[15174.394632]  [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre]
[15174.395373]  [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv]
[15174.395998]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
[15174.396518]  [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre]
[15174.397147]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
[15174.397660]  [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre]
[15174.398373]  [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre]
[15174.398983]  [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre]
[15174.399579]  [<ffffffff81208add>] lookup_real+0x1d/0x50
[15174.400159]  [<ffffffff81209452>] __lookup_hash+0x42/0x60
[15174.400684]  [<ffffffff816841f2>] lookup_slow+0x42/0xa7
[15174.401252]  [<ffffffff8120c973>] path_lookupat+0x773/0x7a0
[15174.401783]  [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0
[15174.402346]  [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0
[15174.402945]  [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0
[15174.403480]  [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0
[15174.404092]  [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0
[15174.404623]  [<ffffffff81210451>] user_path_at+0x11/0x20
[15174.405174]  [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0
[15174.405679]  [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60
[15174.406259]  [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150
[15174.406853]  [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0
[15174.407435]  [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30
[15174.408068]  [<ffffffff8120411e>] SyS_newlstat+0xe/0x10
[15174.408581]  [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
[15174.409197] 
[15174.409364] Kernel panic - not syncing: LBUG
[15174.409818] CPU: 3 PID: 21918 Comm: stat Tainted: G           OE  ------------   3.10.0-514.21.1.el7.x86_64 #1
[15174.410752] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[15174.411297]  ffffffffa0564e8b 000000000694263c ffff880081abf820 ffffffff81686f13
[15174.412073]  ffff880081abf8a0 ffffffff8168031a ffffffff00000008 ffff880081abf8b0
[15174.412848]  ffff880081abf850 000000000694263c 000000000694263c ffff88013fd8f838
[15174.413621] Call Trace:
[15174.413871]  [<ffffffff81686f13>] dump_stack+0x19/0x1b
[15174.414358]  [<ffffffff8168031a>] panic+0xe3/0x1f2
[15174.414825]  [<ffffffffa0546894>] lbug_with_loc+0x64/0xb0 [libcfs]
[15174.415415]  [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre]
[15174.416038]  [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50
[15174.416601]  [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre]
[15174.417216]  [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre]
[15174.417773]  [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre]
[15174.418414]  [<ffffffffa08cb580>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc]
[15174.419074]  [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
[15174.419852]  [<ffffffffa08cf4e0>] ? lustre_swab_generic_32s+0x20/0x20 [ptlrpc]
[15174.420543]  [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre]
[15174.421200]  [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv]
[15174.421830]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
[15174.422328]  [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre]
[15174.422904]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
[15174.423403]  [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
[15174.424179]  [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre]
[15174.424789]  [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre]
[15174.425368]  [<ffffffff81208add>] lookup_real+0x1d/0x50
[15174.425864]  [<ffffffff81209452>] __lookup_hash+0x42/0x60
[15174.426378]  [<ffffffff816841f2>] lookup_slow+0x42/0xa7
[15174.426878]  [<ffffffff8120c973>] path_lookupat+0x773/0x7a0
[15174.427409]  [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0
[15174.427971]  [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0
[15174.428504]  [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0
[15174.429036]  [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0
[15174.429587]  [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0
[15174.430112]  [<ffffffff81210451>] user_path_at+0x11/0x20
[15174.430631]  [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0
[15174.431135]  [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60
[15174.431648]  [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150
[15174.432238]  [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0
[15174.432815]  [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30
[15174.433365]  [<ffffffff8120411e>] SyS_newlstat+0xe/0x10
[15174.433864]  [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-514.21.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Thu May 25 17:04:51 UTC 2017
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.21.1.el7.x86_64 root=UUID=f513f02d-54c7-42d0-bc3f-1a4b0d47ac07 ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867700K
Comment by Zhenyu Xu [ 16/Jun/17 ]

Would you please add the line back to test-framework.sh and apply the updated patch (patchset 2) and try to reproduce it? Thank you.

Comment by Sarah Liu [ 16/Jun/17 ]

I update the patch base on yours with the problematic line reinstated and adding test-parameter.

Comment by Sarah Liu [ 16/Jun/17 ]

I have run patchset #2 locally with the line added several times and cannot reproduce it this time, but the test failed as
https://testing.hpdd.intel.com/test_sets/3cec9368-52e0-11e7-8a1b-5254006e85c2

[19954.374299] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:51:35 (1497649895)
[19956.043116] Lustre: Lustre: Build Version: 2.9.59_16_gbe2e8e5
[19956.068256] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180]
[19956.068913] LNet: Accept secure, port 988
[19957.576952] Lustre: 2814:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
[19957.578575] Lustre: 2814:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism
[19957.579450] Key type lgssc registered
[19957.631952] Lustre: Echo OBD driver; http://www.lustre.org/
[19994.598809] Lustre: Mounted lustre-client
[19999.823173] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid 40
[20000.027082] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid in FULL state after 0 sec
[20000.102424] Lustre: Mounted lustre-client
[20000.202406] LustreError: 5182:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
[20000.203449] Lustre: 5182:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
[20000.204419] Lustre: 5182:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
[20000.205502] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
[20000.206337] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
[20000.207177] LustreError: 5182:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
[20000.208389] LustreError: 5182:0:(llite_lib.c:2273:ll_prep_inode()) new_inode -fatal: rc -22
[20000.368646] Lustre: DEBUG MARKER: conf-sanity test_46a: @@@@@@ FAIL: stat /mnt/lustre/widestripe failed
[20002.819582] LustreError: 5512:0:(obd_config.c:1429:class_process_proc_param()) lov.: error writing proc entry 'stripecount': rc = -34
[20008.013856] Lustre: Unmounted lustre-client
conf-sanity returned 0
Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre (opts:)
Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre2 (opts:)
Stopping /mnt/lustre-mds1 (opts:-f) on trevis-73vm11
[20031.824103] Key type lgssc unregistered
[20031.824613] Lustre: 6228:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism
[20037.824058] LNet: Removed LNI 10.9.6.136@tcp
modules unloaded.
Finished at Fri Jun 16 21:53:00 UTC 2017 in 176s
auster: completed with rc 0

Comment by Zhenyu Xu [ 17/Jun/17 ]

that's a good news, meaning that the patch fixes the memory issue.

Comment by Peter Jones [ 17/Jun/17 ]

Bobijam

So patchset 2 was more than just a diagnostic patch - it was a possible (and now proven) fix? Does that mean that we should rollback to v2 of the patch and get reviews with a view to landing this patch?

Peter

Comment by Zhenyu Xu [ 17/Jun/17 ]

yes, patchset 2 should be the fix patch, I'll update it and request reviews for future landing.

Comment by Gerrit Updater [ 22/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27658/
Subject: LU-9485 llite: check the return value of cl_file_inode_init()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ca41d88d9228e469a8728c20c21c7e919b77dc12

Comment by Peter Jones [ 22/Jun/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:26:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.