Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10871

MDS hit LBUG: LustreError: 2566:0:(osd_handler.c:3304:osd_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header) ) failed

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.11.0, Lustre 2.14.0
    • 2.11.0 RC3
      lustre-master-ib build 73
    • 3
    • 9223372036854775807

    Description

      After soak running for about 7 hours

      MDS hit LBUG

      Apr 1 23:58:08 soak-11 multipathd: sdp: mark as failed
      Apr 1 23:58:08 soak-11 multipathd: 360080e50001fedb80000015952012962: Entering recovery mode: max_retries=300
      Apr 1 23:58:08 soak-11 multipathd: 360080e50001fedb80000015952012962: remaining active paths: 0
      Apr 1 23:58:09 soak-11 kernel: LustreError: 2566:0:(osd_handler.c:3853:osd_ref_del()) soaked-MDT0003: nlink == 0 on [0x2c000d70d:0x389:0x0], maybe an upgraded file? (LU-3915)
      Apr 1 23:58:09 soak-11 kernel: LustreError: 2566:0:(osd_handler.c:3304:osd_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header) ) failed: 
      Apr 1 23:58:09 soak-11 kernel: LustreError: 2566:0:(osd_handler.c:3304:osd_destroy()) LBUG
      Apr 1 23:58:09 soak-11 kernel: Pid: 2566, comm: mdt_out00_004
      Apr 1 23:58:09 soak-11 kernel: #012Call Trace:
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc0dc47ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc0dc483c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc142a0b0>] osd_destroy+0x4a0/0x760 [osd_ldiskfs]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc1429237>] ? osd_ref_del+0x2c7/0x6a0 [osd_ldiskfs]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc1428589>] ? osd_attr_set+0x199/0xb80 [osd_ldiskfs]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffff816b3232>] ? down_write+0x12/0x3d
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc11c9141>] out_obj_destroy+0x101/0x2c0 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc11c93b0>] out_tx_destroy_exec+0x20/0x190 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc11c3e91>] out_tx_end+0xe1/0x5c0 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc11c7a72>] out_handle+0x1442/0x1bb0 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc11568a2>] ? lustre_msg_get_opc+0x22/0xf0 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc11be0a9>] ? tgt_request_preprocess.isra.28+0x299/0x7a0 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc11beeda>] tgt_request_handle+0x92a/0x13b0 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc1164813>] ptlrpc_server_handle_request+0x253/0xab0 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffffc11616c8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      Apr 1 23:58:09 soak-11 kernel: [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      Apr 1 23:58:09 soak-11 kernel: [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      Apr 1 23:58:10 soak-11 kernel: [<ffffffffc1167fc2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      Apr 1 23:58:10 soak-11 kernel: [<ffffffffc1167530>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
      Apr 1 23:58:10 soak-11 kernel: [<ffffffff810b4031>] kthread+0xd1/0xe0
      Apr 1 23:58:10 soak-11 kernel: [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      Apr 1 23:58:10 soak-11 kernel: [<ffffffff816c0577>] ret_from_fork+0x77/0xb0
      Apr 1 23:58:10 soak-11 kernel: [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      Apr 1 23:58:10 soak-11 kernel:
      
      Apr 1 23:58:10 soak-11 kernel: Kernel panic - not syncing: LBUG
      Apr 2 00:02:02 soak-11 systemd: Starting Stop Read-Ahead Data Collection...
      Apr 2 00:02:02 soak-11 rsyslogd: action 'action 0' resumed (module 'builtin:omfwd') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
      Apr 2 00:02:02 soak-11 rsyslogd: action 'action 0' resumed (module 'builtin:omfwd') [v8.24.0 try http://www.rsyslog.com/e/2359 ]
      Apr 2 00:02:02 soak-11 systemd: Started Stop Read-Ahead Data Collection.
      Apr 2 00:03:37 soak-11 chronyd[1280]: Source 64.6.144.6 replaced with 69.10.161.7
      Apr 2 00:10:01 soak-11 systemd: Created slice User Slice of root.
      Apr 2 00:10:01 soak-11 systemd: Starting User Slice of root.
      Apr 2 00:10:01 soak-11 systemd: Started Session 1 of user root.
      Apr 2 00:10:01 soak-11 systemd: Starting Session 1 of user root.
      Apr 2 00:10:01 soak-11 CROND[2234]: (root) CMD (/usr/lib64/sa/sa1 1 1)
      Apr 2 00:10:01 soak-11 systemd: Removed slice User Slice of root.
      Apr 2 00:10:01 soak-11 systemd: Stopping User Slice of root.
      Apr 2 00:15:54 soak-11 systemd: Starting Cleanup of Temporary Directories...
      Apr 2 00:15:54 soak-11 systemd: Started Cleanup of Temporary Directories.
      
      

      Attachments

        Issue Links

          Activity

            [LU-10871] MDS hit LBUG: LustreError: 2566:0:(osd_handler.c:3304:osd_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header) ) failed
            adilger Andreas Dilger added a comment - Hit in Janitor testing: https://testing.whamcloud.com/gerrit-janitor/39666/testresults/sanityn-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/
            sarah Sarah Liu added a comment -

            hit this again on lustre-master-ib build#437

            sarah Sarah Liu added a comment - hit this again on lustre-master-ib build#437
            sarah Sarah Liu added a comment -

            Hit the problem again on master branch
            tag-2.13.52
            crash dump can be found on spirit at /scratch/dumps/soak-8.spirit.whamcloud.com/10.10.1.108-2020-02-15-11\:36\:21/

            [ 2575.647060] Lustre: Skipped 17 previous similar messages
            [ 2575.694765] LustreError: dumping log to /tmp/lustre-log.1581765979.5236
            [ 2576.220180] LNet: 4683:0:(o2iblnd_cb.c:3393:kiblnd_check_conns()) Timed out tx for 192.168.1.110@o2ib: 0 seconds
            [ 2576.231562] LNet: 4683:0:(o2iblnd_cb.c:3393:kiblnd_check_conns()) Skipped 6 previous similar messages
            [ 2576.241901] LNetError: 4683:0:(lib-msg.c:479:lnet_handle_local_failure()) ni 192.168.1.108@o2ib added to recovery queue. Health = 900
            [ 2606.654791] LustreError: 6202:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1581765710, 300s ago); not entering recove
            ry in server code, just going back to sleep ns: mdt-soaked-MDT0000_UUID lock: ffff88f8c632d580/0xa21a133420fc92f4 lrc: 3/1,0 mode: --/PR res: [0x20000040e:0x3:0x
            0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 6202 timeout: 0 lvb_type: 0
            [ 2606.698855] LustreError: 6202:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) Skipped 3 previous similar messages
            [ 2643.197397] LNetError: 4683:0:(lib-msg.c:479:lnet_handle_local_failure()) ni 192.168.1.108@o2ib added to recovery queue. Health = 900
            [ 2643.210833] LNetError: 4683:0:(lib-msg.c:479:lnet_handle_local_failure()) Skipped 1 previous similar message
            [ 2767.192219] Lustre: MGS: Connection restored to 5c9f140e-4423-4 (at 192.168.1.110@o2ib)
            [ 2767.201198] Lustre: Skipped 1 previous similar message
            [ 2769.137827] Lustre: soaked-MDT0000: Received new LWP connection from 192.168.1.110@o2ib, removing former export from same NID
            [ 2769.150490] Lustre: Skipped 1 previous similar message
            [ 2853.947102] Lustre: soaked-MDT0002-osp-MDT0000: Connection restored to 192.168.1.110@o2ib (at 192.168.1.110@o2ib)
            [ 2853.958590] Lustre: Skipped 2 previous similar messages
            [ 2870.647167] Lustre: 4940:0:(service.c:1440:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply  req@ffff88f8af2fda00 x1658
            449052245760/t0(0) o36->775f92d9-5437-4@192.168.1.127@o2ib:509/0 lens 528/2888 e 24 to 0 dl 1581766279 ref 2 fl Interpret:/0/0 rc 0/0 job:''
            [ 2871.318872] Lustre: 4858:0:(service.c:1440:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply  req@ffff88f4963b2d00 x1658
            449307111936/t0(0) o36->4d2dbe7c-0a77-4@192.168.1.126@o2ib:509/0 lens 528/2888 e 24 to 0 dl 1581766279 ref 2 fl Interpret:/0/0 rc 0/0 job:''
            [ 2871.348299] Lustre: 4858:0:(service.c:1440:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages
            [ 2877.572936] Lustre: soaked-MDT0000: Client 4d2dbe7c-0a77-4 (at 192.168.1.126@o2ib) reconnecting
            [ 3153.983138] Lustre: soaked-MDT0000: Received new LWP connection from 192.168.1.110@o2ib, removing former export from same NID
            [ 3153.995796] Lustre: Skipped 1 previous similar message
            [ 3153.996927] Lustre: soaked-MDT0000: Connection restored to soaked-MDT0001-mdtlov_UUID (at 192.168.1.109@o2ib)
            [ 3153.996931] Lustre: Skipped 4 previous similar messages
            [ 3154.093652] LustreError: 5720:0:(osd_handler.c:4169:osd_ref_del()) soaked-MDT0000: nlink == 0 on [0x200011d4e:0x9eaf:0x0], maybe an upgraded file? (LU-3915)
            [ 3154.109319] LustreError: 5720:0:(osd_handler.c:3592:osd_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header) ) failed:
            [ 3154.122342] LustreError: 5720:0:(osd_handler.c:3592:osd_destroy()) LBUG
            [ 3154.129735] Pid: 5720, comm: mdt_out01_006 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Wed Feb 12 06:45:58 UTC 2020
            [ 3154.141104] Call Trace:
            [ 3154.143850]  [<ffffffffc0b0ffac>] libcfs_call_trace+0x8c/0xc0 [libcfs]
            [ 3154.151193]  [<ffffffffc0b1005c>] lbug_with_loc+0x4c/0xa0 [libcfs]
            [ 3154.158133]  [<ffffffffc1224f80>] osd_destroy+0x4a0/0x760 [osd_ldiskfs]
            [ 3154.165558]  [<ffffffffc0f92321>] out_obj_destroy+0x101/0x2c0 [ptlrpc]
            [ 3154.172976]  [<ffffffffc0f92590>] out_tx_destroy_exec+0x20/0x190 [ptlrpc]
            [ 3154.180640]  [<ffffffffc0f8c991>] out_tx_end+0xe1/0x5c0 [ptlrpc]
            [ 3154.187425]  [<ffffffffc0f90c52>] out_handle+0x1442/0x1bb0 [ptlrpc]
            [ 3154.194490]  [<ffffffffc0f8972a>] tgt_request_handle+0x95a/0x1610 [ptlrpc]
            [ 3154.202224]  [<ffffffffc0f2b0f6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
            [ 3154.210846]  [<ffffffffc0f2f4f4>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
            [ 3154.217898]  [<ffffffffa58c61f1>] kthread+0xd1/0xe0
            [ 3154.223385]  [<ffffffffa5f8dd37>] ret_from_fork_nospec_end+0x0/0x39
            [ 3154.230407]  [<ffffffffffffffff>] 0xffffffffffffffff
            [ 3154.235988] Kernel panic - not syncing: LBUG
            [ 3154.240763] CPU: 30 PID: 5720 Comm: mdt_out01_006 Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.9.1.el7_lustre.x86_64 #1
            [ 3154.254834] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
            [ 3154.267355] Call Trace:
            [ 3154.270096]  [<ffffffffa5f7ac23>] dump_stack+0x19/0x1b
            [ 3154.275834]  [<ffffffffa5f74967>] panic+0xe8/0x21f
            [ 3154.281187]  [<ffffffffc0b100ab>] lbug_with_loc+0x9b/0xa0 [libcfs]
            [ 3154.288092]  [<ffffffffc1224f80>] osd_destroy+0x4a0/0x760 [osd_ldiskfs]
            [ 3154.295481]  [<ffffffffa5f80caa>] ? _cond_resched+0x3a/0x50
            [ 3154.301699]  [<ffffffffa5f7fd42>] ? down_write+0x12/0x3d
            [ 3154.307674]  [<ffffffffc0f92321>] out_obj_destroy+0x101/0x2c0 [ptlrpc]
            [ 3154.314999]  [<ffffffffc0f92590>] out_tx_destroy_exec+0x20/0x190 [ptlrpc]
            [ 3154.322612]  [<ffffffffc0f8c991>] out_tx_end+0xe1/0x5c0 [ptlrpc]
            [ 3154.329353]  [<ffffffffc0f90c52>] out_handle+0x1442/0x1bb0 [ptlrpc]
            [ 3154.336372]  [<ffffffffa596ad65>] ? tracing_is_on+0x15/0x30
            [ 3154.342627]  [<ffffffffc0f8972a>] tgt_request_handle+0x95a/0x1610 [ptlrpc]
            [ 3154.350305]  [<ffffffffc0af700e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
            [ 3154.358212]  [<ffffffffc0f2b0f6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
            [ 3154.366791]  [<ffffffffc0f2787b>] ? ptlrpc_wait_event+0x12b/0x4f0 [ptlrpc]
            [ 3154.374495]  [<ffffffffa58d3360>] ? task_rq_unlock+0x20/0x20
            [ 3154.380825]  [<ffffffffa58d3903>] ? __wake_up+0x13/0x20
            [ 3154.386689]  [<ffffffffc0f2f4f4>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
            [ 3154.393738]  [<ffffffffc0f2e940>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc]
            [ 3154.401985]  [<ffffffffa58c61f1>] kthread+0xd1/0xe0
            [ 3154.407429]  [<ffffffffa58c6120>] ? insert_kthread_work+0x40/0x40
            [ 3154.414231]  [<ffffffffa5f8dd37>] ret_from_fork_nospec_begin+0x21/0x21
            [ 3154.421517]  [<ffffffffa58c6120>] ? insert_kthread_work+0x40/0x40
            [    0.000000] Initializing cgroup subsys cpuset
            [    0.000000] Initializing cgroup subsys cpu
            
            sarah Sarah Liu added a comment - Hit the problem again on master branch tag-2.13.52 crash dump can be found on spirit at /scratch/dumps/soak-8.spirit.whamcloud.com/10.10.1.108-2020-02-15-11\:36\:21/ [ 2575.647060] Lustre: Skipped 17 previous similar messages [ 2575.694765] LustreError: dumping log to /tmp/lustre-log.1581765979.5236 [ 2576.220180] LNet: 4683:0:(o2iblnd_cb.c:3393:kiblnd_check_conns()) Timed out tx for 192.168.1.110@o2ib: 0 seconds [ 2576.231562] LNet: 4683:0:(o2iblnd_cb.c:3393:kiblnd_check_conns()) Skipped 6 previous similar messages [ 2576.241901] LNetError: 4683:0:(lib-msg.c:479:lnet_handle_local_failure()) ni 192.168.1.108@o2ib added to recovery queue. Health = 900 [ 2606.654791] LustreError: 6202:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1581765710, 300s ago); not entering recove ry in server code, just going back to sleep ns: mdt-soaked-MDT0000_UUID lock: ffff88f8c632d580/0xa21a133420fc92f4 lrc: 3/1,0 mode: --/PR res: [0x20000040e:0x3:0x 0].0x0 bits 0x13/0x0 rrc: 13 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 6202 timeout: 0 lvb_type: 0 [ 2606.698855] LustreError: 6202:0:(ldlm_request.c:124:ldlm_expired_completion_wait()) Skipped 3 previous similar messages [ 2643.197397] LNetError: 4683:0:(lib-msg.c:479:lnet_handle_local_failure()) ni 192.168.1.108@o2ib added to recovery queue. Health = 900 [ 2643.210833] LNetError: 4683:0:(lib-msg.c:479:lnet_handle_local_failure()) Skipped 1 previous similar message [ 2767.192219] Lustre: MGS: Connection restored to 5c9f140e-4423-4 (at 192.168.1.110@o2ib) [ 2767.201198] Lustre: Skipped 1 previous similar message [ 2769.137827] Lustre: soaked-MDT0000: Received new LWP connection from 192.168.1.110@o2ib, removing former export from same NID [ 2769.150490] Lustre: Skipped 1 previous similar message [ 2853.947102] Lustre: soaked-MDT0002-osp-MDT0000: Connection restored to 192.168.1.110@o2ib (at 192.168.1.110@o2ib) [ 2853.958590] Lustre: Skipped 2 previous similar messages [ 2870.647167] Lustre: 4940:0:(service.c:1440:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@ffff88f8af2fda00 x1658 449052245760/t0(0) o36->775f92d9-5437-4@192.168.1.127@o2ib:509/0 lens 528/2888 e 24 to 0 dl 1581766279 ref 2 fl Interpret:/0/0 rc 0/0 job:'' [ 2871.318872] Lustre: 4858:0:(service.c:1440:ptlrpc_at_send_early_reply()) @@@ Could not add any time (5/5), not sending early reply req@ffff88f4963b2d00 x1658 449307111936/t0(0) o36->4d2dbe7c-0a77-4@192.168.1.126@o2ib:509/0 lens 528/2888 e 24 to 0 dl 1581766279 ref 2 fl Interpret:/0/0 rc 0/0 job:'' [ 2871.348299] Lustre: 4858:0:(service.c:1440:ptlrpc_at_send_early_reply()) Skipped 2 previous similar messages [ 2877.572936] Lustre: soaked-MDT0000: Client 4d2dbe7c-0a77-4 (at 192.168.1.126@o2ib) reconnecting [ 3153.983138] Lustre: soaked-MDT0000: Received new LWP connection from 192.168.1.110@o2ib, removing former export from same NID [ 3153.995796] Lustre: Skipped 1 previous similar message [ 3153.996927] Lustre: soaked-MDT0000: Connection restored to soaked-MDT0001-mdtlov_UUID (at 192.168.1.109@o2ib) [ 3153.996931] Lustre: Skipped 4 previous similar messages [ 3154.093652] LustreError: 5720:0:(osd_handler.c:4169:osd_ref_del()) soaked-MDT0000: nlink == 0 on [0x200011d4e:0x9eaf:0x0], maybe an upgraded file? (LU-3915) [ 3154.109319] LustreError: 5720:0:(osd_handler.c:3592:osd_destroy()) ASSERTION( !lu_object_is_dying(dt->do_lu.lo_header) ) failed: [ 3154.122342] LustreError: 5720:0:(osd_handler.c:3592:osd_destroy()) LBUG [ 3154.129735] Pid: 5720, comm: mdt_out01_006 3.10.0-1062.9.1.el7_lustre.x86_64 #1 SMP Wed Feb 12 06:45:58 UTC 2020 [ 3154.141104] Call Trace: [ 3154.143850] [<ffffffffc0b0ffac>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 3154.151193] [<ffffffffc0b1005c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 3154.158133] [<ffffffffc1224f80>] osd_destroy+0x4a0/0x760 [osd_ldiskfs] [ 3154.165558] [<ffffffffc0f92321>] out_obj_destroy+0x101/0x2c0 [ptlrpc] [ 3154.172976] [<ffffffffc0f92590>] out_tx_destroy_exec+0x20/0x190 [ptlrpc] [ 3154.180640] [<ffffffffc0f8c991>] out_tx_end+0xe1/0x5c0 [ptlrpc] [ 3154.187425] [<ffffffffc0f90c52>] out_handle+0x1442/0x1bb0 [ptlrpc] [ 3154.194490] [<ffffffffc0f8972a>] tgt_request_handle+0x95a/0x1610 [ptlrpc] [ 3154.202224] [<ffffffffc0f2b0f6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [ 3154.210846] [<ffffffffc0f2f4f4>] ptlrpc_main+0xbb4/0x1550 [ptlrpc] [ 3154.217898] [<ffffffffa58c61f1>] kthread+0xd1/0xe0 [ 3154.223385] [<ffffffffa5f8dd37>] ret_from_fork_nospec_end+0x0/0x39 [ 3154.230407] [<ffffffffffffffff>] 0xffffffffffffffff [ 3154.235988] Kernel panic - not syncing: LBUG [ 3154.240763] CPU: 30 PID: 5720 Comm: mdt_out01_006 Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.9.1.el7_lustre.x86_64 #1 [ 3154.254834] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 3154.267355] Call Trace: [ 3154.270096] [<ffffffffa5f7ac23>] dump_stack+0x19/0x1b [ 3154.275834] [<ffffffffa5f74967>] panic+0xe8/0x21f [ 3154.281187] [<ffffffffc0b100ab>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 3154.288092] [<ffffffffc1224f80>] osd_destroy+0x4a0/0x760 [osd_ldiskfs] [ 3154.295481] [<ffffffffa5f80caa>] ? _cond_resched+0x3a/0x50 [ 3154.301699] [<ffffffffa5f7fd42>] ? down_write+0x12/0x3d [ 3154.307674] [<ffffffffc0f92321>] out_obj_destroy+0x101/0x2c0 [ptlrpc] [ 3154.314999] [<ffffffffc0f92590>] out_tx_destroy_exec+0x20/0x190 [ptlrpc] [ 3154.322612] [<ffffffffc0f8c991>] out_tx_end+0xe1/0x5c0 [ptlrpc] [ 3154.329353] [<ffffffffc0f90c52>] out_handle+0x1442/0x1bb0 [ptlrpc] [ 3154.336372] [<ffffffffa596ad65>] ? tracing_is_on+0x15/0x30 [ 3154.342627] [<ffffffffc0f8972a>] tgt_request_handle+0x95a/0x1610 [ptlrpc] [ 3154.350305] [<ffffffffc0af700e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [ 3154.358212] [<ffffffffc0f2b0f6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [ 3154.366791] [<ffffffffc0f2787b>] ? ptlrpc_wait_event+0x12b/0x4f0 [ptlrpc] [ 3154.374495] [<ffffffffa58d3360>] ? task_rq_unlock+0x20/0x20 [ 3154.380825] [<ffffffffa58d3903>] ? __wake_up+0x13/0x20 [ 3154.386689] [<ffffffffc0f2f4f4>] ptlrpc_main+0xbb4/0x1550 [ptlrpc] [ 3154.393738] [<ffffffffc0f2e940>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [ 3154.401985] [<ffffffffa58c61f1>] kthread+0xd1/0xe0 [ 3154.407429] [<ffffffffa58c6120>] ? insert_kthread_work+0x40/0x40 [ 3154.414231] [<ffffffffa5f8dd37>] ret_from_fork_nospec_begin+0x21/0x21 [ 3154.421517] [<ffffffffa58c6120>] ? insert_kthread_work+0x40/0x40 [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu

            Hi Cliff,
            Thanks. but I can't login the spirit anymore (previously it is okay), I have reopened DCO-5820 to require the access to spirit.

            hongchao.zhang Hongchao Zhang added a comment - Hi Cliff, Thanks. but I can't login the spirit anymore (previously it is okay), I have reopened DCO-5820 to require the access to spirit.

            The dump is on spirit. spirit.hpdd.intel.com

            cliffw Cliff White (Inactive) added a comment - The dump is on spirit. spirit.hpdd.intel.com

            Hi Cliff

            Is the Core dump at /scratch/dumps at onyx.hpdd.intel.com? I can't find it at that directory.
            Thanks!

            hongchao.zhang Hongchao Zhang added a comment - Hi Cliff Is the Core dump at /scratch/dumps at onyx.hpdd.intel.com? I can't find it at that directory. Thanks!
            pjones Peter Jones added a comment -

            Hongchao

            Can you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Hongchao Can you please investigate? Thanks Peter
            adilger Andreas Dilger added a comment - - edited

            The LU-9808 failure is on the same assertion, but it is in the mdt code instead of the out. Since the recovery-small test_102 failure is easily reproducible, it might make sense to debug and fix that, rather than spending time trying to understand this less common failure.

            adilger Andreas Dilger added a comment - - edited The LU-9808 failure is on the same assertion, but it is in the mdt code instead of the out . Since the recovery-small test_102 failure is easily reproducible, it might make sense to debug and fix that, rather than spending time trying to understand this less common failure.

            Core dump is available on soak at /scratch/dumps
            vmcore-dmesg is attached.

            cliffw Cliff White (Inactive) added a comment - Core dump is available on soak at /scratch/dumps vmcore-dmesg is attached.

            People

              wc-triage WC Triage
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: