Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.8.0
-
None
-
3
-
9223372036854775807
Description
Met this during local racer test on master.
5[77465]: segfault at 8 ip 00000031f720b3f3 sp 00007fff8f7c3e50 error 4 in ld-2.12.so[31f7200000+20000] LustreError: 20700:0:(mdd_object.c:70:mdd_la_get()) lustre-MDD0000: object [0x200000404:0x4774:0x0] not found: rc = -2 LustreError: 20700:0:(mdd_object.c:70:mdd_la_get()) Skipped 1 previous similar message Lustre: 42406:0:(osd_internal.h:1087:osd_trans_exec_check()) op 9: used 10, used now 10, reserved 5 Lustre: 42406:0:(osd_handler.c:902:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 Lustre: 42406:0:(osd_handler.c:909:osd_trans_dump_creds()) attr_set: 2/2/0, xattr_set: 1/64/0 Lustre: 42406:0:(osd_handler.c:919:osd_trans_dump_creds()) write: 6/14/0, punch: 0/0/0, quota 4/4/0 Lustre: 42406:0:(osd_handler.c:926:osd_trans_dump_creds()) insert: 0/0/0, delete: 1/5/10 Lustre: 42406:0:(osd_handler.c:933:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 1/1/0 LustreError: 42406:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG Pid: 42406, comm: mdt_out00_004 Call Trace: [<ffffffffa05b4875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa05b4e77>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0f30631>] osd_index_ea_delete+0x7b1/0xe10 [osd_ldiskfs] [<ffffffffa0999f90>] out_obj_index_delete+0x150/0x370 [ptlrpc] [<ffffffffa099a1d8>] out_tx_index_delete_exec+0x28/0x190 [ptlrpc] [<ffffffffa098e0ca>] out_tx_end+0xda/0x5d0 [ptlrpc] [<ffffffffa09931df>] out_handle+0x7af/0x1950 [ptlrpc] [<ffffffffa05c0c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa098afc2>] tgt_request_handle+0xa42/0x1230 [ptlrpc] [<ffffffffa09331a1>] ptlrpc_main+0xe41/0x1920 [ptlrpc] [<ffffffffa0932360>] ? ptlrpc_main+0x0/0x1920 [ptlrpc] [<ffffffff8109e66e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1438790617.42406 Message from syslogd@testnode at Aug 5 09:48:22 ... kernel:LustreError: 8393:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG
Attachments
- racer_LBUG.tar
- 0.2 kB
Issue Links
Activity
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15924/
Subject: LU-6969 osd: remove agent inodes in a separate transaction
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0887b89c0c4e2b7c5a7ba3365e758a7d94c667fa
We faced this issue in sanity, test_51e and with the patch indicated in this ticket http://review.whamcloud.com/#/c/15924, did not face the issue even after running the test for 50 times.
We have asked test team to verify the same if they have any scenario/cases in which this re-produces and also started multi runs for Intel specific failures like racer, test_1 etc.
Kalpak, did you test http://review.whamcloud.com/15924 to see if it fixes this issue?
2 more instances seen in el7 test on master:
https://testing.hpdd.intel.com/test_sets/1960c0a2-60ef-11e5-b495-5254006e85c2
https://testing.hpdd.intel.com/test_sets/1a18a8ca-60ef-11e5-b495-5254006e85c2
Reproduces a lot on tests of el7 server
We have also hit this issue during master testing. This seems to be reproduced multiple times in last few days.
I think it would be good to mark this issue as a blocker for 2.8.0.
another seen in el7 client/server on master:
https://testing.hpdd.intel.com/test_sets/4f0cbd82-5d6b-11e5-80c4-5254006e85c2
from console log of mds:
14:33:34:[ 7016.394189] LustreError: 4685:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG 14:33:34:[ 7016.394773] Pid: 4685, comm: mdt_out00_001 14:33:34:[ 7016.395093] 14:33:34:[ 7016.395093] Call Trace: 14:33:34:[ 7016.395500] [<ffffffffa062a7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] 14:33:34:[ 7016.396023] [<ffffffffa062ad75>] lbug_with_loc+0x45/0xc0 [libcfs] 14:33:34:[ 7016.396542] [<ffffffffa0c08a5e>] osd_it_ea_rec.part.94+0x0/0x36 [osd_ldiskfs] 14:33:34:[ 7016.397083] [<ffffffffa0bdc857>] osd_index_ea_delete+0x6d7/0xad0 [osd_ldiskfs] 14:33:34:[ 7016.397664] [<ffffffff811ac1be>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 14:33:34:[ 7016.398235] [<ffffffffa0a30fb1>] out_obj_index_delete+0x111/0x2f0 [ptlrpc] 14:33:34:[ 7016.398805] [<ffffffffa076ae83>] ? lu_context_init+0xd3/0x1f0 [obdclass] 14:33:34:[ 7016.399351] [<ffffffffa0a311d5>] out_tx_index_delete_exec+0x25/0x180 [ptlrpc] 14:33:34:[ 7016.399985] [<ffffffffa0a2b98e>] out_tx_end+0xde/0x5e0 [ptlrpc] 14:33:34:[ 7016.400493] [<ffffffffa0a2f607>] out_handle+0xe77/0x18d0 [ptlrpc] 14:33:34:[ 7016.401083] [<ffffffffa097aaa0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] 14:33:34:[ 7016.401606] [<ffffffffa0a25723>] tgt_request_handle+0x7f3/0x1190 [ptlrpc] 14:33:34:[ 7016.402134] [<ffffffffa09cdf5b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] 14:33:34:[ 7016.402763] [<ffffffffa09cbd68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] 14:33:34:[ 7016.403269] [<ffffffff810a9672>] ? default_wake_function+0x12/0x20 14:33:34:[ 7016.403758] [<ffffffff810a08a8>] ? __wake_up_common+0x58/0x90 14:33:34:[ 7016.404214] [<ffffffffa09d1700>] ptlrpc_main+0xb70/0x1e90 [ptlrpc] 14:33:34:[ 7016.404700] [<ffffffff810ad906>] ? __dequeue_entity+0x26/0x40 14:33:34:[ 7016.405131] [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0 14:33:34:[ 7016.405583] [<ffffffffa09d0b90>] ? ptlrpc_main+0x0/0x1e90 [ptlrpc] 14:33:34:[ 7016.406057] [<ffffffff810973af>] kthread+0xcf/0xe0 14:33:34:[ 7016.406460] [<ffffffff810972e0>] ? kthread+0x0/0xe0 14:33:34:[ 7016.406813] [<ffffffff81615198>] ret_from_fork+0x58/0x90 14:33:34:[ 7016.407216] [<ffffffff810972e0>] ? kthread+0x0/0xe0 14:33:34:[ 7016.407627] 14:33:34:[ 7016.407840] Kernel panic - not syncing: LBUG 14:33:34:[ 7016.408176] CPU: 1 PID: 4685 Comm: mdt_out00_001 Tainted: GF O-------------- 3.10.0-229.14.1.el7_lustre.g630ab85.x86_64 #1 14:33:34:[ 7016.408827] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 14:33:34:[ 7016.408827] ffffffffa0647ecf 0000000001e4f211 ffff88007b4039c0 ffffffff8160533a 14:33:34:[ 7016.408827] ffff88007b403a40 ffffffff815febae ffffffff00000008 ffff88007b403a50 14:33:34:[ 7016.408827] ffff88007b4039f0 0000000001e4f211 ffffffffa0c0a7d0 0000000000000246 14:33:34:[ 7016.408827] Call Trace: 14:33:34:[ 7016.408827] [<ffffffff8160533a>] dump_stack+0x19/0x1b 14:33:34:[ 7016.408827] [<ffffffff815febae>] panic+0xd8/0x1e7 14:33:34:[ 7016.408827] [<ffffffffa062addb>] lbug_with_loc+0xab/0xc0 [libcfs] 14:33:34:[ 7016.408827] [<ffffffffa0c08a5e>] osd_trans_exec_check.part.91+0x1a/0x1a [osd_ldiskfs] 14:33:34:[ 7016.408827] [<ffffffffa0bdc857>] osd_index_ea_delete+0x6d7/0xad0 [osd_ldiskfs] 14:33:34:[ 7016.408827] [<ffffffff811ac1be>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 14:33:34:[ 7016.408827] [<ffffffffa0a30fb1>] out_obj_index_delete+0x111/0x2f0 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffffa076ae83>] ? lu_context_init+0xd3/0x1f0 [obdclass] 14:33:34:[ 7016.408827] [<ffffffffa0a311d5>] out_tx_index_delete_exec+0x25/0x180 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffffa0a2b98e>] out_tx_end+0xde/0x5e0 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffffa0a2f607>] out_handle+0xe77/0x18d0 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffffa097aaa0>] ? target_send_reply_msg+0x170/0x170 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffffa0a25723>] tgt_request_handle+0x7f3/0x1190 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffffa09cdf5b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffffa09cbd68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffff810a9672>] ? default_wake_function+0x12/0x20 14:33:34:[ 7016.408827] [<ffffffff810a08a8>] ? __wake_up_common+0x58/0x90 14:33:34:[ 7016.408827] [<ffffffffa09d1700>] ptlrpc_main+0xb70/0x1e90 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffff810ad906>] ? __dequeue_entity+0x26/0x40 14:33:34:[ 7016.408827] [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0 14:33:34:[ 7016.408827] [<ffffffffa09d0b90>] ? ptlrpc_register_service+0xfc0/0xfc0 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffff810973af>] kthread+0xcf/0xe0 14:33:34:[ 7016.408827] [<ffffffff810972e0>] ? kthread_create_on_node+0x140/0x140 14:33:34:[ 7016.408827] [<ffffffff81615198>] ret_from_fork+0x58/0x90 14:33:34:[ 7016.408827] [<ffffffff810972e0>] ? kthread_create_on_node+0x140/0x140 14:33:34:[ 7016.408827] drm_kms_helper: panic occurred, switching back to text console 14:33:34:[ 7016.408827] ------------[ cut here ]------------ 14:33:34:[ 7016.408827] kernel BUG at arch/x86/mm/pageattr.c:216! 14:33:34:[ 7016.408827] invalid opcode: 0000 [#1] SMP 14:33:34:[ 7016.408827] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ppdev ib_sa serio_raw pcspkr virtio_balloon i2c_piix4 ib_mad parport_pc parport ib_core ib_addr ext4 mbcache jbd2 ata_generic pata_acpi 8139too virtio_blk cirrus syscopyarea sysfillrect sysimgblt virtio_pci virtio_ring virtio drm_kms_helper 8139cp mii ata_piix ttm drm i2c_core libata floppy 14:33:34:[ 7016.408827] CPU: 1 PID: 4685 Comm: mdt_out00_001 Tainted: GF O-------------- 3.10.0-229.14.1.el7_lustre.g630ab85.x86_64 #1 14:33:34:[ 7016.408827] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 14:33:34:[ 7016.408827] task: ffff8800697e71c0 ti: ffff88007b400000 task.ti: ffff88007b400000 14:33:34:[ 7016.408827] RIP: 0010:[<ffffffff8105c2ef>] [<ffffffff8105c2ef>] change_page_attr_set_clr+0x4ef/0x500 14:33:34:[ 7016.408827] RSP: 0018:ffff88007b4031c0 EFLAGS: 00010046 14:33:34:[ 7016.408827] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000010 14:33:34:[ 7016.408827] RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000080000000 14:33:34:[ 7016.408827] RBP: ffff88007b403258 R08: 0000000000000004 R09: 000000000006d4cb 14:33:34:[ 7016.408827] R10: 0000000000003689 R11: ffffffff811902af R12: 0000000000000010 14:33:34:[ 7016.408827] R13: 0000000000000000 R14: 0000000000000200 R15: 0000000000000005 14:33:34:[ 7016.408827] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 14:33:34:[ 7016.408827] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 14:33:34:[ 7016.408827] CR2: 00007f79ce24d018 CR3: 000000000190e000 CR4: 00000000000006e0 14:33:34:[ 7016.408827] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 14:33:34:[ 7016.408827] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 14:33:34:[ 7016.408827] Stack: 14:33:34:[ 7016.408827] 000000045a5a5a5a 0000000000000000 0000000000000000 ffff88006df2b000 14:33:34:[ 7016.408827] ffff8800697e71c0 0000000000000000 0000000000000000 0000000000000010 14:33:34:[ 7016.408827] 0000000000000000 0000000500000001 000000000006d4cb 0000020000000000 14:33:34:[ 7016.408827] Call Trace: 14:33:34:[ 7016.408827] [<ffffffff8105c646>] _set_pages_array+0xe6/0x130 14:33:34:[ 7016.408827] [<ffffffff8105c6c3>] set_pages_array_wc+0x13/0x20 14:33:34:[ 7016.408827] [<ffffffffa00cf3af>] ttm_set_pages_caching+0x2f/0x70 [ttm] 14:33:34:[ 7016.408827] [<ffffffffa00cf4f4>] ttm_alloc_new_pages.isra.7+0xb4/0x180 [ttm] 14:33:34:[ 7016.408827] [<ffffffffa00cfe50>] ttm_pool_populate+0x3e0/0x500 [ttm] 14:33:34:[ 7016.408827] [<ffffffffa013332e>] cirrus_ttm_tt_populate+0xe/0x10 [cirrus] 14:33:34:[ 7016.408827] [<ffffffffa00cc6dd>] ttm_bo_move_memcpy+0x65d/0x6e0 [ttm] 14:33:34:[ 7016.408827] [<ffffffff8118fa7e>] ? map_vm_area+0x2e/0x40 14:33:34:[ 7016.408827] [<ffffffffa00c82c9>] ? ttm_tt_init+0x69/0xb0 [ttm] 14:33:34:[ 7016.408827] [<ffffffffa01332d8>] cirrus_bo_move+0x18/0x20 [cirrus] 14:33:34:[ 7016.408827] [<ffffffffa00c9de5>] ttm_bo_handle_move_mem+0x265/0x5b0 [ttm] 14:33:34:[ 7016.408827] [<ffffffff81601bf4>] ? __slab_free+0x10e/0x277 14:33:34:[ 7016.408827] [<ffffffff8118f273>] ? __free_vmap_area+0xb3/0xf0 14:33:34:[ 7016.408827] [<ffffffffa00ca74a>] ? ttm_bo_mem_space+0x10a/0x310 [ttm] 14:33:34:[ 7016.408827] [<ffffffffa00cae17>] ttm_bo_validate+0x247/0x260 [ttm] 14:33:34:[ 7016.408827] [<ffffffff81059e69>] ? iounmap+0x79/0xa0 14:33:34:[ 7016.408827] [<ffffffff81050000>] ? kgdb_arch_late+0x80/0x180 14:33:34:[ 7016.408827] [<ffffffffa0133ac2>] cirrus_bo_push_sysram+0x82/0xe0 [cirrus] 14:33:34:[ 7016.408827] [<ffffffffa0131c84>] cirrus_crtc_do_set_base.isra.8.constprop.10+0x84/0x430 [cirrus] 14:33:34:[ 7016.408827] [<ffffffffa0132479>] cirrus_crtc_mode_set+0x449/0x4d0 [cirrus] 14:33:34:[ 7016.408827] [<ffffffffa00e8939>] drm_crtc_helper_set_mode+0x2e9/0x520 [drm_kms_helper] 14:33:34:[ 7016.408827] [<ffffffffa00e96bf>] drm_crtc_helper_set_config+0x87f/0xaa0 [drm_kms_helper] 14:33:34:[ 7016.408827] [<ffffffffa0088711>] drm_mode_set_config_internal+0x61/0xe0 [drm] 14:33:34:[ 7016.408827] [<ffffffffa00f0e83>] restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper] 14:33:34:[ 7016.408827] [<ffffffffa00f1045>] drm_fb_helper_force_kernel_mode+0x75/0xb0 [drm_kms_helper] 14:33:34:[ 7016.408827] [<ffffffffa00f1d59>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper] 14:33:34:[ 7016.408827] [<ffffffff81610bec>] notifier_call_chain+0x4c/0x70 14:33:34:[ 7016.408827] [<ffffffff81610c4a>] atomic_notifier_call_chain+0x1a/0x20 14:33:34:[ 7016.408827] [<ffffffff815febdc>] panic+0x106/0x1e7 14:33:34:[ 7016.408827] [<ffffffffa062addb>] lbug_with_loc+0xab/0xc0 [libcfs] 14:33:34:[ 7016.408827] [<ffffffffa0c08a5e>] osd_trans_exec_check.part.91+0x1a/0x1a [osd_ldiskfs] 14:33:34:[ 7016.408827] [<ffffffffa0bdc857>] osd_index_ea_delete+0x6d7/0xad0 [osd_ldiskfs] 14:33:34:[ 7016.408827] [<ffffffff811ac1be>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 14:33:34:[ 7016.408827] [<ffffffffa0a30fb1>] out_obj_index_delete+0x111/0x2f0 [ptlrpc] 14:33:34:[ 7016.408827] [<ffffffffa076ae83>] ? lu_context_init+0xd3/0x1f0 [obdclass] 14:33:34:[ 7016.408827] [<ffffffffa0a311d5>] out_tx_index_delete_exec+0x25/0x180 [ptlrpc] 15:34:29:********** Timeout by autotest system **********
I think several recent failures of el7 client/server on master that had been classified as LU-5500 are in fact more instances of this bug. one example is https://testing.hpdd.intel.com/test_sets/2da949ac-5213-11e5-aed3-5254006e85c2
another instance on master:
replay-dual: https://testing.hpdd.intel.com/test_sets/ce9cc5be-54ed-11e5-9cd2-5254006e85c2
sanity-quota: https://testing.hpdd.intel.com/test_sets/cfae1cb4-54ed-11e5-9cd2-5254006e85c2
racer: https://testing.hpdd.intel.com/test_sets/ccb8516e-54ed-11e5-9cd2-5254006e85c2
hit this issue on master branch sanity test_17n
https://testing.hpdd.intel.com/test_sets/d44ca36c-54ed-11e5-9cd2-5254006e85c2
client and server: lustre-master build# 3175 RHEL7 DNE
15:47:04:[ 1734.620293] Lustre: lustre-MDT0001: Recovery over after 0:04, of 5 clients 5 recovered and 0 were evicted. 15:47:04:[ 1734.797772] Lustre: 5127:0:(osd_internal.h:1087:osd_trans_exec_check()) op 9: used 8, used now 8, reserved 5 15:47:04:[ 1734.798682] Lustre: 5127:0:(osd_handler.c:902:osd_trans_dump_creds()) create: 1/8/0, destroy: 0/0/0 15:47:04:[ 1734.799491] Lustre: 5127:0:(osd_handler.c:909:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 0/0/0 15:47:04:[ 1734.800321] Lustre: 5127:0:(osd_handler.c:919:osd_trans_dump_creds()) write: 8/36/0, punch: 0/0/0, quota 2/2/0 15:47:04:[ 1734.801212] Lustre: 5127:0:(osd_handler.c:926:osd_trans_dump_creds()) insert: 2/53/0, delete: 1/5/8 15:47:04:[ 1734.802011] Lustre: 5127:0:(osd_handler.c:933:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 1/1/0 15:47:04:[ 1734.802807] LustreError: 5127:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG 15:47:04:[ 1734.803540] Pid: 5127, comm: mdt_out00_003 15:47:04:[ 1734.803938] 15:47:04:[ 1734.803938] Call Trace: 15:47:04:[ 1734.804394] [<ffffffffa06197d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] 15:47:04:[ 1734.805110] [<ffffffffa0619d75>] lbug_with_loc+0x45/0xc0 [libcfs] 15:47:06:[ 1734.805766] [<ffffffffa0c0088e>] osd_it_ea_rec.part.94+0x0/0x36 [osd_ldiskfs] 15:47:06:[ 1734.806516] [<ffffffffa0bd47d7>] osd_index_ea_delete+0x6d7/0xad0 [osd_ldiskfs] 15:47:06:[ 1734.807250] [<ffffffff811abe7e>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 15:47:06:[ 1734.808058] [<ffffffffa0a25d71>] out_obj_index_delete+0x111/0x2f0 [ptlrpc] 15:47:06:[ 1734.808864] [<ffffffffa0759d73>] ? lu_context_init+0xd3/0x1f0 [obdclass] 15:47:06:[ 1734.809583] [<ffffffffa0a25f95>] out_tx_index_delete_exec+0x25/0x180 [ptlrpc] 15:47:06:[ 1734.810395] [<ffffffffa0a2074e>] out_tx_end+0xde/0x5e0 [ptlrpc] 15:47:06:[ 1734.811057] [<ffffffffa0a243c7>] out_handle+0xe77/0x18d0 [ptlrpc] 15:47:06:[ 1734.811691] [<ffffffffa096f8f0>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] 15:47:06:[ 1734.812429] [<ffffffffa0a1a479>] tgt_request_handle+0x719/0x1170 [ptlrpc] 15:47:06:[ 1734.813182] [<ffffffffa09c2d5b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] 15:47:06:[ 1734.813883] [<ffffffffa09c0b68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] 15:47:06:[ 1734.814498] [<ffffffff810a9662>] ? default_wake_function+0x12/0x20 15:47:06:[ 1734.815046] [<ffffffff810a0898>] ? __wake_up_common+0x58/0x90 15:47:06:[ 1734.815596] [<ffffffffa09c6500>] ptlrpc_main+0xb70/0x1e90 [ptlrpc] 15:47:07:[ 1734.816154] [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40 15:47:07:[ 1734.816687] [<ffffffffa09c5990>] ? ptlrpc_main+0x0/0x1e90 [ptlrpc] 15:47:07:[ 1734.817244] [<ffffffff8109739f>] kthread+0xcf/0xe0 15:47:07:[ 1734.817661] [<ffffffff810972d0>] ? kthread+0x0/0xe0 15:47:07:[ 1734.818124] [<ffffffff81615018>] ret_from_fork+0x58/0x90 15:47:07:[ 1734.818586] [<ffffffff810972d0>] ? kthread+0x0/0xe0 15:47:07:[ 1734.819039] 15:47:07:[ 1734.822140] Kernel panic - not syncing: LBUG 15:47:07:[ 1734.822532] CPU: 0 PID: 5127 Comm: mdt_out00_003 Tainted: GF O-------------- 3.10.0-229.7.2.el7_lustre.gea2bb60.x86_64 #1 15:47:07:[ 1734.823024] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 15:47:07:[ 1734.823024] ffffffffa0636ecf 00000000fbb88fcf ffff8800643c79c0 ffffffff816051aa 15:47:07:[ 1734.823024] ffff8800643c7a40 ffffffff815fea1e ffffffff00000008 ffff8800643c7a50 15:47:07:[ 1734.823024] ffff8800643c79f0 00000000fbb88fcf ffffffffa0c027d0 0000000000000246 15:47:07:[ 1734.823024] Call Trace: 15:47:07:[ 1734.823024] [<ffffffff816051aa>] dump_stack+0x19/0x1b 15:47:07:[ 1734.823024] [<ffffffff815fea1e>] panic+0xd8/0x1e7 15:47:07:[ 1734.823024] [<ffffffffa0619ddb>] lbug_with_loc+0xab/0xc0 [libcfs] 15:47:07:[ 1734.823024] [<ffffffffa0c0088e>] osd_trans_exec_check.part.91+0x1a/0x1a [osd_ldiskfs] 15:47:07:[ 1734.823024] [<ffffffffa0bd47d7>] osd_index_ea_delete+0x6d7/0xad0 [osd_ldiskfs] 15:47:07:[ 1734.823024] [<ffffffff811abe7e>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 15:47:07:[ 1734.823024] [<ffffffffa0a25d71>] out_obj_index_delete+0x111/0x2f0 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffffa0759d73>] ? lu_context_init+0xd3/0x1f0 [obdclass] 15:47:07:[ 1734.823024] [<ffffffffa0a25f95>] out_tx_index_delete_exec+0x25/0x180 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffffa0a2074e>] out_tx_end+0xde/0x5e0 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffffa0a243c7>] out_handle+0xe77/0x18d0 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffffa096f8f0>] ? target_send_reply_msg+0x170/0x170 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffffa0a1a479>] tgt_request_handle+0x719/0x1170 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffffa09c2d5b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffffa09c0b68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffff810a9662>] ? default_wake_function+0x12/0x20 15:47:07:[ 1734.823024] [<ffffffff810a0898>] ? __wake_up_common+0x58/0x90 15:47:07:[ 1734.823024] [<ffffffffa09c6500>] ptlrpc_main+0xb70/0x1e90 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffff810ad8b6>] ? __dequeue_entity+0x26/0x40 15:47:07:[ 1734.823024] [<ffffffffa09c5990>] ? ptlrpc_register_service+0xfc0/0xfc0 [ptlrpc] 15:47:07:[ 1734.823024] [<ffffffff8109739f>] kthread+0xcf/0xe0 15:47:07:[ 1734.823024] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 15:47:07:[ 1734.823024] [<ffffffff81615018>] ret_from_fork+0x58/0x90 15:47:07:[ 1734.823024] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 15:47:07:[ 1734.823024] drm_kms_helper: panic occurred, switching back to text console 15:47:07:[ 1734.823024] ------------[ cut here ]------------ 15:47:07:[ 1734.823024] kernel BUG at arch/x86/mm/pageattr.c:216! 15:47:07:[ 1734.823024] invalid opcode: 0000 [#1] SMP 15:47:07:[ 1734.823024] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr virtio_balloon serio_raw i2c_piix4 parport_pc parport ext4 mbcache jbd2 ata_generic pata_acpi 8139too virtio_blk cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper 8139cp ttm ata_piix mii virtio_pci drm virtio_ring virtio i2c_core libata floppy 15:47:07:[ 1734.823024] CPU: 0 PID: 5127 Comm: mdt_out00_003 Tainted: GF O-------------- 3.10.0-229.7.2.el7_lustre.gea2bb60.x86_64 #1 15:47:07:[ 1734.823024] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 15:47:07:[ 1734.823024] task: ffff8800642fc440 ti: ffff8800643c4000 task.ti: ffff8800643c4000 15:47:07:[ 1734.823024] RIP: 0010:[<ffffffff8105c2ef>] [<ffffffff8105c2ef>] change_page_attr_set_clr+0x4ef/0x500 15:47:08:[ 1734.823024] RSP: 0018:ffff8800643c71c0 EFLAGS: 00010046 15:47:08:[ 1734.823024] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000010 15:47:08:[ 1734.823024] RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000080000000 15:47:08:[ 1734.823024] RBP: ffff8800643c7258 R08: 0000000000000004 R09: 000000000006adf7 15:47:08:[ 1734.823024] R10: 0000000000003689 R11: 0000000000000002 R12: 0000000000000010 15:47:08:[ 1734.823024] R13: 0000000000000000 R14: 0000000000000200 R15: 0000000000000005 15:47:08:[ 1734.823024] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 15:47:08:[ 1734.823024] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 15:47:08:[ 1734.823024] CR2: 00007f8a24fe8220 CR3: 000000000190e000 CR4: 00000000000006f0 15:47:08:[ 1734.823024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 15:47:08:[ 1734.823024] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 15:47:08:[ 1734.823024] Stack: 15:47:08:[ 1734.823024] 00000004a0a8a89c ffff880000000000 0000000000000000 ffff88006b751000 15:47:08:[ 1734.823024] ffff8800642fc440 0000000000000000 0000000000000000 0000000000000010 15:47:08:[ 1734.823024] 0000000000000000 0000000500000001 000000000006adf7 0000020000000000 15:47:08:[ 1734.823024] Call Trace: 15:47:08:[ 1734.823024] [<ffffffff8105c646>] _set_pages_array+0xe6/0x130 15:47:08:[ 1734.823024] [<ffffffff8105c6c3>] set_pages_array_wc+0x13/0x20 15:47:08:[ 1734.823024] [<ffffffffa00dd3af>] ttm_set_pages_caching+0x2f/0x70 [ttm] 15:47:08:[ 1734.823024] [<ffffffffa00dd4f4>] ttm_alloc_new_pages.isra.7+0xb4/0x180 [ttm] 15:47:08:[ 1734.823024] [<ffffffffa00dde50>] ttm_pool_populate+0x3e0/0x500 [ttm] 15:47:08:[ 1734.823024] [<ffffffffa012332e>] cirrus_ttm_tt_populate+0xe/0x10 [cirrus] 15:47:08:[ 1734.823024] [<ffffffffa00da6dd>] ttm_bo_move_memcpy+0x65d/0x6e0 [ttm] 15:47:08:[ 1734.823024] [<ffffffff8118f73e>] ? map_vm_area+0x2e/0x40 15:47:08:[ 1734.823024] [<ffffffffa00d62c9>] ? ttm_tt_init+0x69/0xb0 [ttm] 15:47:08:[ 1734.823024] [<ffffffffa01232d8>] cirrus_bo_move+0x18/0x20 [cirrus] 15:47:08:[ 1734.823024] [<ffffffffa00d7de5>] ttm_bo_handle_move_mem+0x265/0x5b0 [ttm] 15:47:08:[ 1734.823024] [<ffffffff81601a64>] ? __slab_free+0x10e/0x277 15:47:08:[ 1734.823024] [<ffffffff8118ef33>] ? __free_vmap_area+0xb3/0xf0 15:47:08:[ 1734.823024] [<ffffffffa00d874a>] ? ttm_bo_mem_space+0x10a/0x310 [ttm] 15:47:08:[ 1734.823024] [<ffffffffa00d8e17>] ttm_bo_validate+0x247/0x260 [ttm] 15:47:08:[ 1734.823024] [<ffffffff81059e69>] ? iounmap+0x79/0xa0 15:47:08:[ 1734.823024] [<ffffffff81050000>] ? kgdb_arch_late+0x80/0x180 15:47:08:[ 1734.823024] [<ffffffffa0123ac2>] cirrus_bo_push_sysram+0x82/0xe0 [cirrus] 15:47:08:[ 1734.823024] [<ffffffffa0121c84>] cirrus_crtc_do_set_base.isra.8.constprop.10+0x84/0x430 [cirrus] 15:47:08:[ 1734.823024] [<ffffffffa0122479>] cirrus_crtc_mode_set+0x449/0x4d0 [cirrus] 15:47:08:[ 1734.823024] [<ffffffffa0103939>] drm_crtc_helper_set_mode+0x2e9/0x520 [drm_kms_helper] 15:47:08:[ 1734.823024] [<ffffffffa01046bf>] drm_crtc_helper_set_config+0x87f/0xaa0 [drm_kms_helper] 15:47:08:[ 1734.823024] [<ffffffffa0096711>] drm_mode_set_config_internal+0x61/0xe0 [drm] 15:47:08:[ 1734.823024] [<ffffffffa010be83>] restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper] 15:47:08:[ 1734.823024] [<ffffffffa010c045>] drm_fb_helper_force_kernel_mode+0x75/0xb0 [drm_kms_helper] 15:47:08:[ 1734.823024] [<ffffffffa010cd59>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper] 15:47:08:[ 1734.823024] [<ffffffff81610a6c>] notifier_call_chain+0x4c/0x70 15:47:08:[ 1734.823024] [<ffffffff81610aca>] atomic_notifier_call_chain+0x1a/0x20 15:47:08:[ 1734.823024] [<ffffffff815fea4c>] panic+0x106/0x1e7 15:47:08:[ 1734.823024] [<ffffffffa0619ddb>] lbug_with_loc+0xab/0xc0 [libcfs] 15:47:08:[ 1734.823024] [<ffffffffa0c0088e>] osd_trans_exec_check.part.91+0x1a/0x1a [osd_ldiskfs] 15:47:08:[ 1734.823024] [<ffffffffa0bd47d7>] osd_index_ea_delete+0x6d7/0xad0 [osd_ldiskfs] 15:47:08:[ 1734.823024] [<ffffffff811abe7e>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 15:47:08:[ 1734.823024] [<ffffffffa0a25d71>] out_obj_index_delete+0x111/0x2f0 [ptlrpc] 16:46:00:********** Timeout by autotest system **********
Looking at the call trace seems we also hit this issue while running test: sanity test 51e in osd_xattr_set() path.
LustreError: 122338:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG Lustre: 122338:0:(osd_handler.c:933:osd_trans_dump_creds()) ref_add: 1/1/1, ref_del: 0/0/0 Pid: 122338, comm: mdt01_006" Call Trace: [<ffffffffa0521875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0521e77>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa10628c8>] osd_xattr_set+0x5d8/0x6c0 [osd_ldiskfs] [<ffffffffa111c683>] lod_sub_object_xattr_set+0x223/0x460 [lod] [<ffffffffa101268b>] ? ldiskfs_xattr_inode_get+0xdb/0xf0 [ldiskfs] [<ffffffffa11088b6>] lod_xattr_set_internal+0x126/0x2b0 [lod] [<ffffffffa11143b6>] lod_xattr_set+0x156/0x3e0 [lod] [<ffffffffa101531b>] ? ldiskfs_xattr_trusted_get+0x2b/0x30 [ldiskfs] [<ffffffffa0e9bd8b>] dt_xattr_set.clone.2+0x9b/0x1b0 [mdd] [<ffffffffa0e9c78d>] mdd_links_write+0x12d/0x1f0 [mdd] [<ffffffffa0ea2d72>] mdd_links_rename+0x302/0x540 [mdd] [<ffffffffa0eaa736>] mdd_link+0xff6/0x1170 [mdd] [<ffffffffa0f23c61>] mdt_reint_link+0x9b1/0xb40 [mdt] [<ffffffffa0f1b6ac>] ? mdt_root_squash+0x2c/0x3f0 [mdt] [<ffffffffa0904882>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc] [<ffffffffa0f1f86d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa0f0b78b>] mdt_reint_internal+0x62b/0xb80 [mdt] [<ffffffffa0946792>] tgt_request_handle+0xa42/0x1230 [ptlrpc] [<ffffffffa08ee3a1>] ptlrpc_main+0xe41/0x1920 [ptlrpc] [<ffffffffa08ed560>] ? ptlrpc_main+0x0/0x1920 [ptlrpc] [<ffffffff8109ac66>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109abd0>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Landed for 2.8.0