Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.12.0
-
None
-
2.12-RC2 lustre-master-ib build #173 EL7.6 DNE
-
3
-
9223372036854775807
Description
MDS 0 hit kernel panic after running about 24 hours.
MDS 0 console
[18476.473668] Lustre: MGS: haven't heard from client 8e54524b-a52c-7091-0e06-f2d4a89dd59c (at 192.168.1.109@o2ib) in 227 seconds. I think it's dead, and I am evic ting it. exp ffff9fcae4159400, cur 1544788503 expire 1544788353 last 1544788276 [18520.421813] LNet: 28111:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 192.168.1.109@o2ib: 0 seconds [18520.433290] LNet: 28111:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Skipped 1 previous similar message [18521.243163] LustreError: 137-5: soaked-MDT0001_UUID: not available for connect from 192.168.1.126@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. [18521.263041] LustreError: Skipped 122 previous similar messages [18571.421594] LNet: 28111:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 192.168.1.109@o2ib: 1 seconds [18620.421324] LNet: 28111:0:(o2iblnd_cb.c:3370:kiblnd_check_conns()) Timed out tx for 192.168.1.109@o2ib: 0 seconds [18703.424361] Lustre: MGS: Connection restored to 192.168.1.109@o2ib (at 192.168.1.109@o2ib) [18704.407564] Lustre: soaked-MDT0000: Received new LWP connection from 192.168.1.109@o2ib, removing former export from same NID [18737.363675] Lustre: soaked-MDT0000: Received new LWP connection from 192.168.1.110@o2ib, removing former export from same NID [18737.376324] Lustre: Skipped 1 previous similar message [18737.382130] Lustre: soaked-MDT0000: Connection restored to 192.168.1.110@o2ib (at 192.168.1.110@o2ib) [18737.392457] Lustre: Skipped 2 previous similar messages [18737.423007] LustreError: 31941:0:(osd_oi.c:761:osd_oi_insert()) dm-2: the FID [0x20000c768:0x179ce:0x0] is used by two objects: 402128901/3006680378 357564421/3 006680381 [18737.440029] LNet: 28116:0:(o2iblnd_cb.c:408:kiblnd_handle_rx()) PUT_NACK from 192.168.1.110@o2ib [18741.123580] LustreError: 167-0: soaked-MDT0001-osp-MDT0000: This client was evicted by soaked-MDT0001; in progress operations using this service will fail. [18753.014392] BUG: unable to handle kernel NULL pointer dereference at (null) [18753.023170] IP: [< (null)>] (null) [18753.028819] PGD 0 [18753.031075] Oops: 0010 [#1] SMP [18753.034700] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ko2iblnd(OE) ptl rpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_c m(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) ib_core(OE) dm_round_robin sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm iTCO_wdt irqbypass iTCO_vendor_support sg crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr ipmi _ssif mei_me mei lpc_ich wmi i2c_i801 ioatdma ipmi_si ipmi_devintf ipmi_msghandler dm_multipath dm_mod auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t1 0dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops isci igb ttm mpt2sas ahci crct10dif_pclmul crct10dif_common libsas liba hci crc32c_intel ptp drm mlx4_core(OE) raid_class libata pps_core drm_panel_orientation_quirks scsi_transport_sas mlx_compat(OE) dca devlink i2c_algo_bit [18753.146014] CPU: 0 PID: 43000 Comm: mdt_out00_019 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7_lustre.x86_64 #1 [18753.159604] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [18753.172121] task: ffff9fcf2591e180 ti: ffff9fcad0a8c000 task.ti: ffff9fcad0a8c000 [18753.180476] RIP: 0010:[<0000000000000000>] [< (null)>] (null) [18753.188859] RSP: 0018:ffff9fcad0a8fb60 EFLAGS: 00010246 [18753.194788] RAX: 0000000000000000 RBX: ffff9fc7a8190000 RCX: 0000000000000002 [18753.202754] RDX: ffffffffc12dc770 RSI: ffff9fcad0a8fb68 RDI: ffff9fc7a8190008 [18753.210718] RBP: ffff9fcad0a8fba0 R08: 0000000000000004 R09: 0000000000000000 [18753.218682] R10: 0000000000000001 R11: 00000000007fffff R12: ffff9fc736087300 [18753.226645] R13: ffff9fcb00494c48 R14: ffff9fcefba3a200 R15: ffff9fc7a8190008 [18753.234611] FS: 0000000000000000(0000) GS:ffff9fcb2e000000(0000) knlGS:0000000000000000 [18753.243647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [18753.250052] CR2: 0000000000000000 CR3: 0000000551a10000 CR4: 00000000000607f0 [18753.258016] Call Trace: [18753.260766] [<ffffffffc12dabee>] ? osd_ldiskfs_it_fill+0xbe/0x260 [osd_ldiskfs] [18753.269028] [<ffffffffc12dadc7>] osd_it_ea_load+0x37/0x100 [osd_ldiskfs] [18753.276640] [<ffffffffc0e84038>] dt_index_walk+0xf8/0x430 [obdclass] [18753.283850] [<ffffffffc0e84370>] ? dt_index_walk+0x430/0x430 [obdclass] [18753.291350] [<ffffffffc0e85444>] dt_index_read+0x394/0x6a0 [obdclass] [18753.298701] [<ffffffffc10ceb32>] tgt_obd_idx_read+0x612/0x860 [ptlrpc] [18753.306117] [<ffffffffc10d135a>] tgt_request_handle+0xaea/0x1580 [ptlrpc] [18753.313825] [<ffffffffc10aaa51>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [18753.322284] [<ffffffffc0bdebde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [18753.330186] [<ffffffffc107592b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [18753.338795] [<ffffffffc10727b5>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] [18753.346380] [<ffffffff9fcd67c2>] ? default_wake_function+0x12/0x20 [18753.353379] [<ffffffff9fccba9b>] ? __wake_up_common+0x5b/0x90 [18753.359934] [<ffffffffc107925c>] ptlrpc_main+0xafc/0x1fc0 [ptlrpc] [18753.366951] [<ffffffffc1078760>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] [18753.375209] [<ffffffff9fcc1c31>] kthread+0xd1/0xe0 [18753.380662] [<ffffffff9fcc1b60>] ? insert_kthread_work+0x40/0x40 [18753.387480] [<ffffffffa0374c37>] ret_from_fork_nospec_begin+0x21/0x21 [18753.394766] [<ffffffff9fcc1b60>] ? insert_kthread_work+0x40/0x40 [18753.401564] Code: Bad RIP value. [18753.405279] RIP [< (null)>] (null) [18753.411023] RSP <ffff9fcad0a8fb60> [18753.414916] CR2: 0000000000000000 [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-957.el7_lustre.x86_64 (jenkins@trevis-309-el7-x8664-2.trevis.whamcloud.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Sat Dec 8 05:53:16 UTC 2018 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.el7_lustre.x86_64 ro console=ttyS0,115200 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr disable_cpu_apicid=0 elfcorehdr=869816K [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000001000-0x000000000008efff] usable [ 0.000000] BIOS-e820: [mem 0x000000000008f000-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x000000002b000000-0x000000003516dfff] usable [ 0.000000] BIOS-e820: [mem 0x00000000bb3c7000-0x00000000bdd2efff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000bdd2f000-0x00000000bddccfff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000bddcd000-0x00000000bdea0fff] ACPI data [ 0.000000] BIOS-e820: [mem 0x00000000bdea1000-0x00000000bdf2efff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000bdf2f000-0x00000000bdfabfff] ACPI data [ 0.000000] BIOS-e820: [mem 0x00000000be000000-0x00000000cfffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed19000-0x00000000fed19fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000ffa20000-0x00000000ffffffff] reserved [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] SMBIOS 2.6 present.