[LU-11453] sanity test 184a: Basic layout swap panics on Power8 Created: 01/Oct/18 Updated: 26/Jul/19 Resolved: 23/Oct/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | James A Simmons | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Power8 clients and Power8 servers running ZFS. All with the RHEL7.5 alt 4.14 kernel. |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Running the sanity test on Power8 I see the follow crash: [Sat Sep 29 15:15:49 2018][19080.929496] Lustre: DEBUG MARKER: == sanity test 184a: Basic layout swap ============================ =================================== 15:15:49 (1538248549)^M [Sat Sep 29 15:15:50 2018][19082.085163] list_add corruption. next->prev should be prev (c000000761c5bc70), but was (nul l). (next=c000000761e5bcd0).^M [Sat Sep 29 15:15:50 2018][19082.085313] ----------- [Sat Sep 29 15:15:50 2018][19082.085379] WARNING: CPU: 29 PID: 12368 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0*^M* [Sat Sep 29 15:15:50 2018][19082.085471] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(O E) fld(OE) ptlrpc(OE) obdclass(OE) zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) ko2iblnd(OE) lnet(O E) libcfs(OE) xt_multiport xt_comment ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter mst_pciconf(OE) nf sv3 nfs_acl nfs lockd grace fscache ip6t_REJECT nf_reject_ipv6 rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE ) ib_uverbs(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrac k nf_conntrack libcrc32c ip6table_filter ip6_tables dm_mirror dm_region_hash dm_log dm_mod ses enclosure scsi_transport_sas ipmi_p owernv ipmi_devintf ipmi_msghandler leds_powernv ibmpowernv*^M* [Sat Sep 29 15:15:50 2018][19082.086251] sg powernv_op_panel uio_pdrv_genirq i2c_opal uio i2c_core powernv_rng shpchp binfmt_misc knem(OE) ip_tables mlx5_ib(OE) ib_core(OE) mlx5_core(OE) ipr mlxfw(OE) devlink tg3 libata cxl ptp pps_core nvme(OE) nvme_core(OE) mlx_compat(OE) sunrpc [last unloaded: mst_pci]^M [Sat Sep 29 15:15:50 2018][19082.086517] CPU: 29 PID: 12368 Comm: ll_ost00_007 Tainted: P W OE ------------ 4.14.0-49.6.1.el7a.ppc64le #1 .... [Sat Sep 29 15:15:50 2018][19082.087654] NIP [c0000000006a5dd4] __list_add_valid+0xb4/0xc0 [Sat Sep 29 15:15:50 2018][19082.087724] LR [c0000000006a5dd0] __list_add_valid+0xb0/0xc0 [Sat Sep 29 15:15:50 2018][19082.087794] Call Trace: [Sat Sep 29 15:15:50 2018][19082.087826] [c000001dde6e73e0] [c0000000006a5dd0] __list_add_valid+0xb0/0xc0 (unreliable) [Sat Sep 29 15:15:50 2018][19082.087942] [c000001dde6e7440] [d00000002edcf1fc] class_handle_hash+0x13c/0x610 [obdclass] [Sat Sep 29 15:15:50 2018][19082.088102] [c000001dde6e7500] [d00000002f54b528] ldlm_lock_create+0x2e8/0xfa0 [ptlrpc] [Sat Sep 29 15:15:50 2018][19082.088248] [c000001dde6e75e0] [d00000002f577840] ldlm_cli_enqueue_local+0x140/0xdd0 [ptlrpc] [Sat Sep 29 15:15:50 2018][19082.088351] [c000001dde6e76e0] [d000000034b1bf58] ofd_destroy_by_fid+0x1c8/0x660 [ofd] [Sat Sep 29 15:15:50 2018][19082.088446] [c000001dde6e7800] [d000000034b07914] ofd_destroy_hdl+0x3a4/0x10c0 [ofd] [Sat Sep 29 15:15:50 2018][19082.088680] [c000001dde6e7930] [d00000002f67cc34] tgt_handle_request0+0x1e4/0x9b0 [ptlrpc] [Sat Sep 29 15:15:50 2018][19082.088913] [c000001dde6e79e0] [d00000002f6864c8] tgt_request_handle+0x848/0x1bd0 [ptlrpc] [Sat Sep 29 15:15:50 2018][19082.089147] [c000001dde6e7af0] [d00000002f5ee6e0] ptlrpc_server_handle_request+0x450/0x1480 [ptlrpc] [Sat Sep 29 15:15:50 2018][19082.089403] [c000001dde6e7c20] [d00000002f5f46b0] ptlrpc_main+0xde0/0x1e80 [ptlrpc] [Sat Sep 29 15:15:50 2018][19082.089563] [c000001dde6e7dc0] [c000000000171ce8] kthread+0x168/0x1b0 [Sat Sep 29 15:15:50 2018][19082.089698] [c000001dde6e7e30] [c00000000000b628] ret_from_kernel_thread+0x5c/0xb4 |
| Comments |
| Comment by Peter Jones [ 01/Oct/18 ] |
|
Yang Sheng Could you please investigate? Peter |
| Comment by Yang Sheng [ 05/Oct/18 ] |
|
Hi, James, Does it panic every time or just hit randomly? Thanks, |
| Comment by James A Simmons [ 05/Oct/18 ] |
|
Every time I run the sanity test. |
| Comment by Gerrit Updater [ 08/Oct/18 ] |
|
Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33317 |
| Comment by Yang Sheng [ 08/Oct/18 ] |
|
Hi, James, Could you test with this patch? Thanks, |
| Comment by James A Simmons [ 09/Oct/18 ] |
|
Just tested it. It works!!! |
| Comment by Yang Sheng [ 09/Oct/18 ] |
|
Thank you to confirm that. |
| Comment by Gerrit Updater [ 23/Oct/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33317/ |
| Comment by Peter Jones [ 23/Oct/18 ] |
|
Landed for 2.12 |
| Comment by Gerrit Updater [ 14/Dec/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33875 |
| Comment by Gerrit Updater [ 04/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33875/ |
| Comment by Peter Jones [ 04/Jan/19 ] |
|
Do we need https://review.whamcloud.com/33875 ported to b2_12? |
| Comment by Gerrit Updater [ 18/Jul/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35564 |
| Comment by Gerrit Updater [ 26/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35564/ |