[LU-11453] sanity test 184a: Basic layout swap panics on Power8 Created: 01/Oct/18  Updated: 26/Jul/19  Resolved: 23/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Critical
Reporter: James A Simmons Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None
Environment:

Power8 clients and Power8 servers running ZFS. All with the RHEL7.5 alt 4.14 kernel.


Issue Links:
Related
is related to LU-6387 Add Power8 support to Lustre Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running the sanity test on Power8 I see the follow crash:

[Sat Sep 29 15:15:49 2018][19080.929496] Lustre: DEBUG MARKER: == sanity test 184a: Basic layout swap ============================

=================================== 15:15:49 (1538248549)^M

[Sat Sep 29 15:15:50 2018][19082.085163] list_add corruption. next->prev should be prev (c000000761c5bc70), but was           (nul

l). (next=c000000761e5bcd0).^M

[Sat Sep 29 15:15:50 2018][19082.085313] -----------[ cut here ]-----------^M

[Sat Sep 29 15:15:50 2018][19082.085379] WARNING: CPU: 29 PID: 12368 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0*^M*

[Sat Sep 29 15:15:50 2018][19082.085471] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(O

E) fld(OE) ptlrpc(OE) obdclass(OE) zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) ko2iblnd(OE) lnet(O

E) libcfs(OE) xt_multiport xt_comment ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter mst_pciconf(OE) nf

sv3 nfs_acl nfs lockd grace fscache ip6t_REJECT nf_reject_ipv6 rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE

) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrac

k nf_conntrack libcrc32c ip6table_filter ip6_tables dm_mirror dm_region_hash dm_log dm_mod ses enclosure scsi_transport_sas ipmi_p

owernv ipmi_devintf ipmi_msghandler leds_powernv ibmpowernv*^M*

[Sat Sep 29 15:15:50 2018][19082.086251]  sg powernv_op_panel uio_pdrv_genirq i2c_opal uio i2c_core powernv_rng shpchp binfmt_misc

knem(OE) ip_tables mlx5_ib(OE) ib_core(OE) mlx5_core(OE) ipr mlxfw(OE) devlink tg3 libata cxl ptp pps_core nvme(OE) nvme_core(OE)

mlx_compat(OE) sunrpc [last unloaded: mst_pci]^M

[Sat Sep 29 15:15:50 2018][19082.086517] CPU: 29 PID: 12368 Comm: ll_ost00_007 Tainted: P        W  OE  ------------   4.14.0-49.6.1.el7a.ppc64le #1

....

[Sat Sep 29 15:15:50 2018][19082.087654] NIP [c0000000006a5dd4] __list_add_valid+0xb4/0xc0

[Sat Sep 29 15:15:50 2018][19082.087724] LR [c0000000006a5dd0] __list_add_valid+0xb0/0xc0

[Sat Sep 29 15:15:50 2018][19082.087794] Call Trace:

[Sat Sep 29 15:15:50 2018][19082.087826] [c000001dde6e73e0] [c0000000006a5dd0] __list_add_valid+0xb0/0xc0 (unreliable)

[Sat Sep 29 15:15:50 2018][19082.087942] [c000001dde6e7440] [d00000002edcf1fc] class_handle_hash+0x13c/0x610 [obdclass]

[Sat Sep 29 15:15:50 2018][19082.088102] [c000001dde6e7500] [d00000002f54b528] ldlm_lock_create+0x2e8/0xfa0 [ptlrpc]

[Sat Sep 29 15:15:50 2018][19082.088248] [c000001dde6e75e0] [d00000002f577840] ldlm_cli_enqueue_local+0x140/0xdd0 [ptlrpc]

[Sat Sep 29 15:15:50 2018][19082.088351] [c000001dde6e76e0] [d000000034b1bf58] ofd_destroy_by_fid+0x1c8/0x660 [ofd]

[Sat Sep 29 15:15:50 2018][19082.088446] [c000001dde6e7800] [d000000034b07914] ofd_destroy_hdl+0x3a4/0x10c0 [ofd]

[Sat Sep 29 15:15:50 2018][19082.088680] [c000001dde6e7930] [d00000002f67cc34] tgt_handle_request0+0x1e4/0x9b0 [ptlrpc]

[Sat Sep 29 15:15:50 2018][19082.088913] [c000001dde6e79e0] [d00000002f6864c8] tgt_request_handle+0x848/0x1bd0 [ptlrpc]

[Sat Sep 29 15:15:50 2018][19082.089147] [c000001dde6e7af0] [d00000002f5ee6e0] ptlrpc_server_handle_request+0x450/0x1480 [ptlrpc]

[Sat Sep 29 15:15:50 2018][19082.089403] [c000001dde6e7c20] [d00000002f5f46b0] ptlrpc_main+0xde0/0x1e80 [ptlrpc]

[Sat Sep 29 15:15:50 2018][19082.089563] [c000001dde6e7dc0] [c000000000171ce8] kthread+0x168/0x1b0

[Sat Sep 29 15:15:50 2018][19082.089698] [c000001dde6e7e30] [c00000000000b628] ret_from_kernel_thread+0x5c/0xb4



 Comments   
Comment by Peter Jones [ 01/Oct/18 ]

Yang Sheng

Could you please investigate?

Peter

Comment by Yang Sheng [ 05/Oct/18 ]

Hi, James,

Does it panic every time or just hit randomly?

Thanks,
YangSheng

Comment by James A Simmons [ 05/Oct/18 ]

Every time I run the sanity test.

Comment by Gerrit Updater [ 08/Oct/18 ]

Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33317
Subject: LU-11453 class: test patch
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ee0489082215ae9b524a05bd9b99c945867c8e50

Comment by Yang Sheng [ 08/Oct/18 ]

Hi, James,

Could you test with this patch?

Thanks,
YangSheng

Comment by James A Simmons [ 09/Oct/18 ]

Just tested it. It works!!!

Comment by Yang Sheng [ 09/Oct/18 ]

Thank you to confirm that.

Comment by Gerrit Updater [ 23/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33317/
Subject: LU-11453 class: use INIT_LIST_HEAD_RCU instead INIT_LIST_HEAD
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 68bc3984975bb72f730d8a8ab7aa2d836e50abe5

Comment by Peter Jones [ 23/Oct/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 14/Dec/18 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33875
Subject: LU-11453 misc: add compat for INIT_LIST_HEAD_RCU
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 35bfc75c76d445e6a56e489c4b429b152df1401d

Comment by Gerrit Updater [ 04/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33875/
Subject: LU-11453 misc: add compat for INIT_LIST_HEAD_RCU
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 773e60669b53b0ca2fb48723a21dcddba592af9a

Comment by Peter Jones [ 04/Jan/19 ]

Do we need  https://review.whamcloud.com/33875 ported to b2_12?

Comment by Gerrit Updater [ 18/Jul/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35564
Subject: LU-11453 misc: add compat for INIT_LIST_HEAD_RCU
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: b9104650bcc005e36c05c31f3f6488e5ee6fc46e

Comment by Gerrit Updater [ 26/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35564/
Subject: LU-11453 misc: add compat for INIT_LIST_HEAD_RCU
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: da84dce903cfbbdeec5f25dd435d00660dad7903

Generated at Sat Feb 10 02:44:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.