[LU-7343] sanity test_129: iam_lfix_init_new+0x5/0x20 [osd_ldiskfs] Created: 27/Oct/15 Updated: 04/Feb/16 Resolved: 30/Nov/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/012ee7ca-7c33-11e5-9851-5254006e85c2. The sub-test test_129 failed with the following error: test failed to respond and timed out kernel panic 19:17:40:[ 5923.422794] Lustre: DEBUG MARKER: test -e /sys/fs/ldiskfs/dm-3/max_dir_size 19:17:40:[ 5923.504780] Lustre: DEBUG MARKER: echo 12288 >/sys/fs/ldiskfs/dm-3/max_dir_size 19:17:40:[ 5929.137273] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c 19:17:40:[ 5929.140181] IP: [<ffffffffa0c7e965>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs] 19:17:40:[ 5929.140181] PGD 0 19:17:40:[ 5929.140181] Oops: 0000 [#1] SMP 19:17:40:[ 5929.140181] Modules linked in: osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgc(OEN) osd_ldiskfs(OEN) lquota(OEN) fid(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) sha512_generic(E) crypto_null(E) libcfs(OEN) ldiskfs(OEN) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) sunrpc(E) fscache(E) iscsi_boot_sysfs(E) af_packet(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) ppdev(E) parport_pc(E) parport(E) pvpanic(E) virtio_balloon(E) 8139too(E) 8139cp(E) mii(E) processor(E) i2c_piix4(E) serio_raw(E) pcspkr(E) button(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) ata_piix(E) virtio_blk(E) ahci(E) libahci(E) floppy(E) uhci_hcd(E) ehci_hcd(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) drm_kms_helper(E) ttm(E) drm(E) virtio_pci(E) virtio_ring(E) virtio(E) usbcore(E) usb_common(E) libata(E) sg(E) scsi_mod(E) autofs4(E) 19:17:40:[ 5929.140181] Supported: No, Unsupported modules are loaded 19:17:40:[ 5929.140181] CPU: 0 PID: 9485 Comm: mdt00_004 Tainted: G OEN 3.12.48-52.27_lustre.gaaf427b-default #1 19:17:40:[ 5929.140181] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 19:17:40:[ 5929.140181] task: ffff88005416c1c0 ti: ffff88005416e000 task.ti: ffff88005416e000 19:17:40:[ 5929.140181] RIP: 0010:[<ffffffffa0c7e965>] [<ffffffffa0c7e965>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs] 19:17:40:[ 5929.140181] RSP: 0018:ffff88005416f518 EFLAGS: 00010286 19:17:40:[ 5929.140181] RAX: ffff880052ec9978 RBX: ffff88005416f598 RCX: 000000000000250d 19:17:40:[ 5929.140181] RDX: ffffffffa0cb4a20 RSI: ffffffffffffffe4 RDI: ffff88005ae88a88 19:17:40:[ 5929.140181] RBP: ffff8800384aa6d8 R08: 7010000000000000 R09: 007a6db9b8080000 19:17:40:[ 5929.140181] R10: ff679264e3666e02 R11: 000000000000000f R12: ffff880054044338 19:17:40:[ 5929.140181] R13: ffff880054044360 R14: ffff88005416f678 R15: ffff88005ae88a88 19:17:40:[ 5929.140181] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 19:17:40:[ 5929.140181] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 19:17:40:[ 5929.140181] CR2: 000000000000000c CR3: 0000000036421000 CR4: 00000000000006f0 19:17:40:[ 5929.140181] Stack: 19:17:40:[ 5929.140181] ffffffffa0c7dd8e ffff88005416f598 ffff880052ec9978 ffff88005d0533e8 19:17:40:[ 5929.140181] ffffffffa0c7c6ee 0000000000000000 ffff880052ec9978 ffffffffffffffe4 19:17:41:[ 5929.140181] 0000000000000000 ffff880054044338 ffff8800384aa6d8 ffff880054044360 19:17:41:[ 5929.140181] Call Trace: 19:17:41:[ 5929.140181] [<ffffffffa0c7dd8e>] iam_add_rec+0x1ce/0x2c0 [osd_ldiskfs] 19:17:41:[ 5929.140181] [<ffffffffa0c7e650>] iam_insert+0x90/0xe0 [osd_ldiskfs] 19:17:41:[ 5929.140181] [<ffffffffa0c75eb5>] osd_oi_iam_refresh.isra.15+0x125/0x2a0 [osd_ldiskfs] 19:17:41:[ 5929.140181] [<ffffffffa0c789df>] osd_oi_insert+0x13f/0x490 [osd_ldiskfs] 19:17:41:[ 5929.140181] [<ffffffffa0c73122>] osd_object_ea_create+0x632/0xb90 [osd_ldiskfs] 19:17:41:[ 5929.140181] [<ffffffffa0ef11ec>] lod_sub_object_create+0x1ec/0x470 [lod] 19:17:41:[ 5929.140181] [<ffffffffa0ee87e7>] lod_object_create+0xa7/0x200 [lod] 19:17:41:[ 5929.140181] [<ffffffffa0f51571>] mdd_object_create_internal+0xb1/0x270 [mdd] 19:17:41:[ 5929.140181] [<ffffffffa0f3c735>] mdd_object_create+0x55/0xa30 [mdd] 19:17:41:[ 5929.140181] [<ffffffffa0f47940>] mdd_create+0xca0/0x1240 [mdd] 19:17:41:[ 5929.140181] [<ffffffffa0e19952>] mdt_reint_open+0x2082/0x30c0 [mdt] 19:17:41:[ 5929.140181] [<ffffffffa0e0e1e6>] mdt_reint_rec+0x76/0x200 [mdt] 19:17:41:[ 5929.140181] [<ffffffffa0df2547>] mdt_reint_internal+0x5c7/0xaa0 [mdt] 19:17:41:[ 5929.140181] [<ffffffffa0df2b82>] mdt_intent_reint+0x162/0x410 [mdt] 19:17:41:[ 5929.140181] [<ffffffffa0dfbecc>] mdt_intent_policy+0x59c/0xb50 [mdt] 19:17:41:[ 5929.140181] [<ffffffffa09d7f73>] ldlm_lock_enqueue+0x323/0x890 [ptlrpc] 19:17:41:[ 5929.140181] [<ffffffffa0a00661>] ldlm_handle_enqueue0+0x741/0x1870 [ptlrpc] 19:17:41:[ 5929.140181] [<ffffffffa0a83b1d>] tgt_enqueue+0x5d/0x210 [ptlrpc] 19:17:41:[ 5929.140181] [<ffffffffa0a87513>] tgt_request_handle+0x7e3/0x1190 [ptlrpc] 19:17:41:[ 5929.140181] [<ffffffffa0a31da9>] ptlrpc_server_handle_request+0x209/0xa70 [ptlrpc] 19:17:42:[ 5929.140181] [<ffffffffa0a354ba>] ptlrpc_main+0xb2a/0x1ea0 [ptlrpc] 19:17:42:[ 5929.140181] [<ffffffff81077114>] kthread+0xb4/0xc0 19:17:42:[ 5929.140181] [<ffffffff81521198>] ret_from_fork+0x58/0x90 19:17:42:[ 5929.140181] Code: 8b 72 18 0f b7 40 02 01 ce 0f af c6 48 98 48 03 47 10 48 39 47 18 0f 94 c0 0f b6 c0 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <48> 8b 46 28 ba 76 19 00 00 31 c9 66 89 10 66 89 48 02 c3 0f 1f 20:17:47:********** Timeout by autotest system ********** Info required for matching: sanity 129 |
| Comments |
| Comment by Bob Glossman (Inactive) [ 28/Oct/15 ] |
|
another instance, also sles12 on master: |
| Comment by Bob Glossman (Inactive) [ 28/Oct/15 ] |
|
this only started happening in the last few days. I'm wondering if something bad landed in master recently. I note there was at least one mod that touched a sles12 ldiskfs patch: |
| Comment by Sarah Liu [ 04/Nov/15 ] |
|
I also hit this problem when testing interop between 2.7.0 client and master DNE server: server: lustre-master build #3226 RHEL7 [15575.420548] Lustre: DEBUG MARKER: == sanity test 129: test directory size limit ========================== 17:59:30 (1446602370) [15579.590647] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c [15579.601130] IP: [<ffffffffa0dc0635>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs] [15579.610740] PGD 0 [15579.614558] Oops: 0000 [#1] SMP [15579.619693] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) xprtrdma sunrpc ib_isert iscsi_target_mod intel_powerclamp ib_iser coretemp libiscsi intel_rapl scsi_transport_iscsi kvm_intel kvm crct10dif_pclmul ib_srpt crc32_pclmul crc32c_intel ioatdma iTCO_wdt ghash_clmulni_intel target_core_mod iTCO_vendor_support ipmi_devintf aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mei_me sb_edac lpc_ich i2c_i801 mei pcspkr edac_core shpchp mfd_core wmi ipmi_si ipmi_msghandler ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common mlx4_ib ib_sa mlx4_en ib_mad vxlan ib_core ip_tunnel ib_addr mgag200 syscopyarea sysfillrect sysimgblt isci drm_kms_helper igb ttm libsas ahci ptp scsi_transport_sas libahci pps_core drm mlx4_core libata dca i2c_algo_bit ntb i2c_core [last unloaded: llog_test] [15579.739283] CPU: 17 PID: 6104 Comm: mdt00_011 Tainted: GF IO-------------- 3.10.0-229.14.1.el7_lustre.x86_64 #1 [15579.753381] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.99.99.x045.022820121209 02/28/2012 [15579.766718] task: ffff88041ac0ad80 ti: ffff8808085dc000 task.ti: ffff8808085dc000 [15579.776980] RIP: 0010:[<ffffffffa0dc0635>] [<ffffffffa0dc0635>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs] [15579.789649] RSP: 0018:ffff8808085df420 EFLAGS: 00010286 [15579.797534] RAX: ffff88042509d750 RBX: ffff8808085df4b0 RCX: 0000000000000000 [15579.807480] RDX: ffffffffa0df09a0 RSI: ffffffffffffffe4 RDI: ffff88041d5f2508 [15579.817405] RBP: ffff8808085df498 R08: ffff88042509d098 R09: 0000000000000000 [15579.827360] R10: ffff8800a97a8394 R11: 000000000000000f R12: ffff88042edb8cf0 [15579.837310] R13: ffff8803e649c338 R14: ffff88041d5f2508 R15: ffff8803e649c360 [15579.847214] FS: 0000000000000000(0000) GS:ffff88042f720000(0000) knlGS:0000000000000000 [15579.858203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [15579.866560] CR2: 000000000000000c CR3: 000000000190e000 CR4: 00000000000407e0 [15579.876464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [15579.886310] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [15579.896160] Stack: [15579.900258] ffffffffa0dbf8bc 0000000000000000 ffff880417a42708 ffff88042509d750 [15579.910404] ffff8808085df4b0 ffff8808085df590 ffff8808085df4a8 ffff88042509d090 [15579.920543] ffffffffffffffe4 00000000c5388ef7 0000000000000000 ffff8803e649c338 [15579.930668] Call Trace: [15579.935197] [<ffffffffa0dbf8bc>] ? iam_add_rec+0x1fc/0x2e0 [osd_ldiskfs] [15579.944606] [<ffffffffa0dc1070>] ? iam_lfix_split+0x140/0x140 [osd_ldiskfs] [15579.954306] [<ffffffffa0dc024e>] iam_insert+0xce/0x120 [osd_ldiskfs] [15579.963316] [<ffffffffa0db73e5>] osd_oi_iam_refresh.isra.15+0x125/0x2a0 [osd_ldiskfs] [15579.973982] [<ffffffffa0db9f77>] osd_oi_insert+0x147/0x490 [osd_ldiskfs] [15579.983426] [<ffffffffa05044c4>] ? libcfs_log_return+0x24/0x30 [libcfs] [15579.992721] [<ffffffffa0db4652>] osd_object_ea_create+0x632/0xb90 [osd_ldiskfs] [15580.002779] [<ffffffffa101eb82>] lod_sub_object_create+0x1f2/0x480 [lod] [15580.012096] [<ffffffffa101614f>] lod_object_create+0xaf/0x200 [lod] [15580.020908] [<ffffffffa107b965>] mdd_object_create_internal+0xb5/0x280 [mdd] [15580.030542] [<ffffffffa10669f6>] mdd_object_create+0x76/0xa30 [mdd] [15580.039294] [<ffffffffa10706e7>] ? mdd_declare_create+0x447/0xd30 [mdd] [15580.048435] [<ffffffffa1071ca0>] mdd_create+0xcd0/0x1270 [mdd] [15580.056712] [<ffffffffa0f4c90e>] mdt_reint_open+0x20ce/0x3110 [mdt] [15580.065422] [<ffffffffa0f410d0>] mdt_reint_rec+0x80/0x210 [mdt] [15580.073723] [<ffffffffa0f24969>] mdt_reint_internal+0x5d9/0xab0 [mdt] [15580.082606] [<ffffffffa0f24fa2>] mdt_intent_reint+0x162/0x410 [mdt] [15580.091281] [<ffffffffa0f2e7ea>] mdt_intent_policy+0x57a/0xb50 [mdt] [15580.100079] [<ffffffffa0a831b3>] ldlm_lock_enqueue+0x353/0x8c0 [ptlrpc] [15580.109156] [<ffffffffa0aac702>] ldlm_handle_enqueue0+0x762/0x1850 [ptlrpc] [15580.118555] [<ffffffffa0507827>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [15580.127459] [<ffffffffa0b32862>] tgt_enqueue+0x62/0x210 [ptlrpc] [15580.135720] [<ffffffffa0b36f23>] tgt_request_handle+0x7f3/0x1190 [ptlrpc] [15580.144825] [<ffffffffa0adf1bb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] [15580.154804] [<ffffffffa0adcfc8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] [15580.163735] [<ffffffffa0507827>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [15580.172470] [<ffffffffa0ae2960>] ptlrpc_main+0xb70/0x1e90 [ptlrpc] [15580.180770] [<ffffffffa0ae1df0>] ? ptlrpc_register_service+0xfc0/0xfc0 [ptlrpc] [15580.190286] [<ffffffff810973af>] kthread+0xcf/0xe0 [15580.196951] [<ffffffff810972e0>] ? kthread_create_on_node+0x140/0x140 [15580.205457] [<ffffffff81615198>] ret_from_fork+0x58/0x90 [15580.212643] [<ffffffff810972e0>] ? kthread_create_on_node+0x140/0x140 [15580.221095] Code: 40 28 48 8b 0a 8b 72 18 0f b7 40 02 01 ce 0f af c6 48 98 48 03 47 10 48 39 47 18 0f 94 c0 0f b6 c0 c3 0f 1f 40 00 66 66 66 66 90 <48> 8b 46 28 55 ba 76 19 00 00 31 c9 48 89 e5 66 89 10 66 89 48 [15580.245333] RIP [<ffffffffa0dc0635>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs] [15580.254621] RSP <ffff8808085df420> [15580.259700] CR2: 000000000000000c [15581.093976] ---[ end trace cc2db69d6da44786 ]--- [15581.177368] Kernel panic - not syncing: Fatal exception Message from syslogd@onyx-25[15581.273758] drm_kms_helper: panic occurred, switching back to text console [15581.283334] ------------[ cut here ]------------ |
| Comment by Peter Jones [ 04/Nov/15 ] |
|
Fan Yong Could you please look into this issue? Thanks Peter |
| Comment by Gerrit Updater [ 13/Nov/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/17148 |
| Comment by nasf (Inactive) [ 13/Nov/15 ] |
|
This patch will affect all 3.1x based kernels, such as SLES12 and RHEL7 platforms. It should be landed before 2.8 GA. |
| Comment by Gerrit Updater [ 30/Nov/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17148/ |
| Comment by Joseph Gmitter (Inactive) [ 30/Nov/15 ] |
|
Landed for 2.8 |