[LU-7343] sanity test_129: iam_lfix_init_new+0x5/0x20 [osd_ldiskfs] Created: 27/Oct/15  Updated: 04/Feb/16  Resolved: 30/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/012ee7ca-7c33-11e5-9851-5254006e85c2.

The sub-test test_129 failed with the following error:

test failed to respond and timed out

kernel panic
seen with sles12 on master:
panic stack trace, from console log on MDS2:

19:17:40:[ 5923.422794] Lustre: DEBUG MARKER: test -e /sys/fs/ldiskfs/dm-3/max_dir_size
19:17:40:[ 5923.504780] Lustre: DEBUG MARKER: echo 12288 >/sys/fs/ldiskfs/dm-3/max_dir_size
19:17:40:[ 5929.137273] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
19:17:40:[ 5929.140181] IP: [<ffffffffa0c7e965>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs]
19:17:40:[ 5929.140181] PGD 0 
19:17:40:[ 5929.140181] Oops: 0000 [#1] SMP 
19:17:40:[ 5929.140181] Modules linked in: osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgc(OEN) osd_ldiskfs(OEN) lquota(OEN) fid(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) sha512_generic(E) crypto_null(E) libcfs(OEN) ldiskfs(OEN) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) sunrpc(E) fscache(E) iscsi_boot_sysfs(E) af_packet(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) ppdev(E) parport_pc(E) parport(E) pvpanic(E) virtio_balloon(E) 8139too(E) 8139cp(E) mii(E) processor(E) i2c_piix4(E) serio_raw(E) pcspkr(E) button(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) ata_piix(E) virtio_blk(E) ahci(E) libahci(E) floppy(E) uhci_hcd(E) ehci_hcd(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) drm_kms_helper(E) ttm(E) drm(E) virtio_pci(E) virtio_ring(E) virtio(E) usbcore(E) usb_common(E) libata(E) sg(E) scsi_mod(E) autofs4(E)
19:17:40:[ 5929.140181] Supported: No, Unsupported modules are loaded
19:17:40:[ 5929.140181] CPU: 0 PID: 9485 Comm: mdt00_004 Tainted: G           OEN  3.12.48-52.27_lustre.gaaf427b-default #1
19:17:40:[ 5929.140181] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
19:17:40:[ 5929.140181] task: ffff88005416c1c0 ti: ffff88005416e000 task.ti: ffff88005416e000
19:17:40:[ 5929.140181] RIP: 0010:[<ffffffffa0c7e965>]  [<ffffffffa0c7e965>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs]
19:17:40:[ 5929.140181] RSP: 0018:ffff88005416f518  EFLAGS: 00010286
19:17:40:[ 5929.140181] RAX: ffff880052ec9978 RBX: ffff88005416f598 RCX: 000000000000250d
19:17:40:[ 5929.140181] RDX: ffffffffa0cb4a20 RSI: ffffffffffffffe4 RDI: ffff88005ae88a88
19:17:40:[ 5929.140181] RBP: ffff8800384aa6d8 R08: 7010000000000000 R09: 007a6db9b8080000
19:17:40:[ 5929.140181] R10: ff679264e3666e02 R11: 000000000000000f R12: ffff880054044338
19:17:40:[ 5929.140181] R13: ffff880054044360 R14: ffff88005416f678 R15: ffff88005ae88a88
19:17:40:[ 5929.140181] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
19:17:40:[ 5929.140181] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
19:17:40:[ 5929.140181] CR2: 000000000000000c CR3: 0000000036421000 CR4: 00000000000006f0
19:17:40:[ 5929.140181] Stack:
19:17:40:[ 5929.140181]  ffffffffa0c7dd8e ffff88005416f598 ffff880052ec9978 ffff88005d0533e8
19:17:40:[ 5929.140181]  ffffffffa0c7c6ee 0000000000000000 ffff880052ec9978 ffffffffffffffe4
19:17:41:[ 5929.140181]  0000000000000000 ffff880054044338 ffff8800384aa6d8 ffff880054044360
19:17:41:[ 5929.140181] Call Trace:
19:17:41:[ 5929.140181]  [<ffffffffa0c7dd8e>] iam_add_rec+0x1ce/0x2c0 [osd_ldiskfs]
19:17:41:[ 5929.140181]  [<ffffffffa0c7e650>] iam_insert+0x90/0xe0 [osd_ldiskfs]
19:17:41:[ 5929.140181]  [<ffffffffa0c75eb5>] osd_oi_iam_refresh.isra.15+0x125/0x2a0 [osd_ldiskfs]
19:17:41:[ 5929.140181]  [<ffffffffa0c789df>] osd_oi_insert+0x13f/0x490 [osd_ldiskfs]
19:17:41:[ 5929.140181]  [<ffffffffa0c73122>] osd_object_ea_create+0x632/0xb90 [osd_ldiskfs]
19:17:41:[ 5929.140181]  [<ffffffffa0ef11ec>] lod_sub_object_create+0x1ec/0x470 [lod]
19:17:41:[ 5929.140181]  [<ffffffffa0ee87e7>] lod_object_create+0xa7/0x200 [lod]
19:17:41:[ 5929.140181]  [<ffffffffa0f51571>] mdd_object_create_internal+0xb1/0x270 [mdd]
19:17:41:[ 5929.140181]  [<ffffffffa0f3c735>] mdd_object_create+0x55/0xa30 [mdd]
19:17:41:[ 5929.140181]  [<ffffffffa0f47940>] mdd_create+0xca0/0x1240 [mdd]
19:17:41:[ 5929.140181]  [<ffffffffa0e19952>] mdt_reint_open+0x2082/0x30c0 [mdt]
19:17:41:[ 5929.140181]  [<ffffffffa0e0e1e6>] mdt_reint_rec+0x76/0x200 [mdt]
19:17:41:[ 5929.140181]  [<ffffffffa0df2547>] mdt_reint_internal+0x5c7/0xaa0 [mdt]
19:17:41:[ 5929.140181]  [<ffffffffa0df2b82>] mdt_intent_reint+0x162/0x410 [mdt]
19:17:41:[ 5929.140181]  [<ffffffffa0dfbecc>] mdt_intent_policy+0x59c/0xb50 [mdt]
19:17:41:[ 5929.140181]  [<ffffffffa09d7f73>] ldlm_lock_enqueue+0x323/0x890 [ptlrpc]
19:17:41:[ 5929.140181]  [<ffffffffa0a00661>] ldlm_handle_enqueue0+0x741/0x1870 [ptlrpc]
19:17:41:[ 5929.140181]  [<ffffffffa0a83b1d>] tgt_enqueue+0x5d/0x210 [ptlrpc]
19:17:41:[ 5929.140181]  [<ffffffffa0a87513>] tgt_request_handle+0x7e3/0x1190 [ptlrpc]
19:17:41:[ 5929.140181]  [<ffffffffa0a31da9>] ptlrpc_server_handle_request+0x209/0xa70 [ptlrpc]
19:17:42:[ 5929.140181]  [<ffffffffa0a354ba>] ptlrpc_main+0xb2a/0x1ea0 [ptlrpc]
19:17:42:[ 5929.140181]  [<ffffffff81077114>] kthread+0xb4/0xc0
19:17:42:[ 5929.140181]  [<ffffffff81521198>] ret_from_fork+0x58/0x90
19:17:42:[ 5929.140181] Code: 8b 72 18 0f b7 40 02 01 ce 0f af c6 48 98 48 03 47 10 48 39 47 18 0f 94 c0 0f b6 c0 c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <48> 8b 46 28 ba 76 19 00 00 31 c9 66 89 10 66 89 48 02 c3 0f 1f 
20:17:47:********** Timeout by autotest system **********

Info required for matching: sanity 129



 Comments   
Comment by Bob Glossman (Inactive) [ 28/Oct/15 ]

another instance, also sles12 on master:
https://testing.hpdd.intel.com/test_sets/adfe8040-7d7e-11e5-82ee-5254006e85c2

Comment by Bob Glossman (Inactive) [ 28/Oct/15 ]

this only started happening in the last few days. I'm wondering if something bad landed in master recently. I note there was at least one mod that touched a sles12 ldiskfs patch: LU-7261 ldiskfs: fix large_xattr overwrite, commit 66ca2bc59135b00cd20a4e5095a23cf54cdfa2eb

Comment by Sarah Liu [ 04/Nov/15 ]

I also hit this problem when testing interop between 2.7.0 client and master DNE server:

server: lustre-master build #3226 RHEL7
client: 2.7.0

[15575.420548] Lustre: DEBUG MARKER: == sanity test 129: test directory size limit ========================== 17:59:30 (1446602370)
[15579.590647] BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
[15579.601130] IP: [<ffffffffa0dc0635>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs]
[15579.610740] PGD 0 
[15579.614558] Oops: 0000 [#1] SMP 
[15579.619693] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) xprtrdma sunrpc ib_isert iscsi_target_mod intel_powerclamp ib_iser coretemp libiscsi intel_rapl scsi_transport_iscsi kvm_intel kvm crct10dif_pclmul ib_srpt crc32_pclmul crc32c_intel ioatdma iTCO_wdt ghash_clmulni_intel target_core_mod iTCO_vendor_support ipmi_devintf aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mei_me sb_edac lpc_ich i2c_i801 mei pcspkr edac_core shpchp mfd_core wmi ipmi_si ipmi_msghandler ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_common mlx4_ib ib_sa mlx4_en ib_mad vxlan ib_core ip_tunnel ib_addr mgag200 syscopyarea sysfillrect sysimgblt isci drm_kms_helper igb ttm libsas ahci ptp scsi_transport_sas libahci pps_core drm mlx4_core libata dca i2c_algo_bit ntb i2c_core [last unloaded: llog_test]
[15579.739283] CPU: 17 PID: 6104 Comm: mdt00_011 Tainted: GF         IO--------------   3.10.0-229.14.1.el7_lustre.x86_64 #1
[15579.753381] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.99.99.x045.022820121209 02/28/2012
[15579.766718] task: ffff88041ac0ad80 ti: ffff8808085dc000 task.ti: ffff8808085dc000
[15579.776980] RIP: 0010:[<ffffffffa0dc0635>]  [<ffffffffa0dc0635>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs]
[15579.789649] RSP: 0018:ffff8808085df420  EFLAGS: 00010286
[15579.797534] RAX: ffff88042509d750 RBX: ffff8808085df4b0 RCX: 0000000000000000
[15579.807480] RDX: ffffffffa0df09a0 RSI: ffffffffffffffe4 RDI: ffff88041d5f2508
[15579.817405] RBP: ffff8808085df498 R08: ffff88042509d098 R09: 0000000000000000
[15579.827360] R10: ffff8800a97a8394 R11: 000000000000000f R12: ffff88042edb8cf0
[15579.837310] R13: ffff8803e649c338 R14: ffff88041d5f2508 R15: ffff8803e649c360
[15579.847214] FS:  0000000000000000(0000) GS:ffff88042f720000(0000) knlGS:0000000000000000
[15579.858203] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[15579.866560] CR2: 000000000000000c CR3: 000000000190e000 CR4: 00000000000407e0
[15579.876464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[15579.886310] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[15579.896160] Stack:
[15579.900258]  ffffffffa0dbf8bc 0000000000000000 ffff880417a42708 ffff88042509d750
[15579.910404]  ffff8808085df4b0 ffff8808085df590 ffff8808085df4a8 ffff88042509d090
[15579.920543]  ffffffffffffffe4 00000000c5388ef7 0000000000000000 ffff8803e649c338
[15579.930668] Call Trace:
[15579.935197]  [<ffffffffa0dbf8bc>] ? iam_add_rec+0x1fc/0x2e0 [osd_ldiskfs]
[15579.944606]  [<ffffffffa0dc1070>] ? iam_lfix_split+0x140/0x140 [osd_ldiskfs]
[15579.954306]  [<ffffffffa0dc024e>] iam_insert+0xce/0x120 [osd_ldiskfs]
[15579.963316]  [<ffffffffa0db73e5>] osd_oi_iam_refresh.isra.15+0x125/0x2a0 [osd_ldiskfs]
[15579.973982]  [<ffffffffa0db9f77>] osd_oi_insert+0x147/0x490 [osd_ldiskfs]
[15579.983426]  [<ffffffffa05044c4>] ? libcfs_log_return+0x24/0x30 [libcfs]
[15579.992721]  [<ffffffffa0db4652>] osd_object_ea_create+0x632/0xb90 [osd_ldiskfs]
[15580.002779]  [<ffffffffa101eb82>] lod_sub_object_create+0x1f2/0x480 [lod]
[15580.012096]  [<ffffffffa101614f>] lod_object_create+0xaf/0x200 [lod]
[15580.020908]  [<ffffffffa107b965>] mdd_object_create_internal+0xb5/0x280 [mdd]
[15580.030542]  [<ffffffffa10669f6>] mdd_object_create+0x76/0xa30 [mdd]
[15580.039294]  [<ffffffffa10706e7>] ? mdd_declare_create+0x447/0xd30 [mdd]
[15580.048435]  [<ffffffffa1071ca0>] mdd_create+0xcd0/0x1270 [mdd]
[15580.056712]  [<ffffffffa0f4c90e>] mdt_reint_open+0x20ce/0x3110 [mdt]
[15580.065422]  [<ffffffffa0f410d0>] mdt_reint_rec+0x80/0x210 [mdt]
[15580.073723]  [<ffffffffa0f24969>] mdt_reint_internal+0x5d9/0xab0 [mdt]
[15580.082606]  [<ffffffffa0f24fa2>] mdt_intent_reint+0x162/0x410 [mdt]
[15580.091281]  [<ffffffffa0f2e7ea>] mdt_intent_policy+0x57a/0xb50 [mdt]
[15580.100079]  [<ffffffffa0a831b3>] ldlm_lock_enqueue+0x353/0x8c0 [ptlrpc]
[15580.109156]  [<ffffffffa0aac702>] ldlm_handle_enqueue0+0x762/0x1850 [ptlrpc]
[15580.118555]  [<ffffffffa0507827>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[15580.127459]  [<ffffffffa0b32862>] tgt_enqueue+0x62/0x210 [ptlrpc]
[15580.135720]  [<ffffffffa0b36f23>] tgt_request_handle+0x7f3/0x1190 [ptlrpc]
[15580.144825]  [<ffffffffa0adf1bb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
[15580.154804]  [<ffffffffa0adcfc8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[15580.163735]  [<ffffffffa0507827>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[15580.172470]  [<ffffffffa0ae2960>] ptlrpc_main+0xb70/0x1e90 [ptlrpc]
[15580.180770]  [<ffffffffa0ae1df0>] ? ptlrpc_register_service+0xfc0/0xfc0 [ptlrpc]
[15580.190286]  [<ffffffff810973af>] kthread+0xcf/0xe0
[15580.196951]  [<ffffffff810972e0>] ? kthread_create_on_node+0x140/0x140
[15580.205457]  [<ffffffff81615198>] ret_from_fork+0x58/0x90
[15580.212643]  [<ffffffff810972e0>] ? kthread_create_on_node+0x140/0x140
[15580.221095] Code: 40 28 48 8b 0a 8b 72 18 0f b7 40 02 01 ce 0f af c6 48 98 48 03 47 10 48 39 47 18 0f 94 c0 0f b6 c0 c3 0f 1f 40 00 66 66 66 66 90 <48> 8b 46 28 55 ba 76 19 00 00 31 c9 48 89 e5 66 89 10 66 89 48 
[15580.245333] RIP  [<ffffffffa0dc0635>] iam_lfix_init_new+0x5/0x20 [osd_ldiskfs]
[15580.254621]  RSP <ffff8808085df420>
[15580.259700] CR2: 000000000000000c
[15581.093976] ---[ end trace cc2db69d6da44786 ]---
[15581.177368] Kernel panic - not syncing: Fatal exception

Message from syslogd@onyx-25[15581.273758] drm_kms_helper: panic occurred, switching back to text console
[15581.283334] ------------[ cut here ]------------
Comment by Peter Jones [ 04/Nov/15 ]

Fan Yong

Could you please look into this issue?

Thanks

Peter

Comment by Gerrit Updater [ 13/Nov/15 ]

Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/17148
Subject: LU-7343 osd-ldiskfs: handle ldiskfs_append failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 38aef40fcf64cdb519d37e4f3af56be0142964fe

Comment by nasf (Inactive) [ 13/Nov/15 ]

This patch will affect all 3.1x based kernels, such as SLES12 and RHEL7 platforms. It should be landed before 2.8 GA.

Comment by Gerrit Updater [ 30/Nov/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17148/
Subject: LU-7343 osd-ldiskfs: handle ldiskfs_append failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 113aac9c212d63ec880a9731bd9a364f9b9a99bf

Comment by Joseph Gmitter (Inactive) [ 30/Nov/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:08:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.