Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8990

Failback LBUG lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref)

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.10.0, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3
    • Soak cluster lustre: 2.9.51_4_g39af202 - tip of master on 12/30
    • 3
    • 9223372036854775807

    Description

      System completed failover of lola-8 to lola-9 (MDS failover)
      Attempting to tailback, trigger assertion.

      <4>Lustre: Failing over soaked-MDT0000
      <3>LustreError: 34433:0:(osp_precreate.c:912:osp_precreate_cleanup_orphans()) soaked-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -5
      <3>LustreError: 34433:0:(osp_precreate.c:912:osp_precreate_cleanup_orphans()) Skipped 10 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.126@o2ib100 (stopping)
      <6>Lustre: Skipped 3 previous similar messages
      <3>LustreError: 34664:0:(lod_qos.c:208:lod_statfs_and_check()) soaked-MDT0000-mdtlov: statfs: rc = -108
      <3>LustreError: 34664:0:(lod_qos.c:208:lod_statfs_and_check()) Skipped 43 previous similar messages
      <3>LustreError: 34700:0:(ldlm_resource.c:882:ldlm_resource_complain()) mdt-soaked-MDT0000_UUID: namespace resource [0x20002ac2a:0x1fbe:0x0].0xe8519c29 (ffff8803f859fec0) refcount nonzero (1) after lock cleanup; forcing cleanup.
      <3>LustreError: 34700:0:(ldlm_resource.c:882:ldlm_resource_complain()) Skipped 5 previous similar messages
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0xe8519c29 (ffff8803f859fec0) refcount = 2
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1469:ldlm_resource_dump()) ### ### ns: mdt-soaked-MDT0000_UUID lock: ffff8804001e4b40/0xda913c453295768d lrc: 2/0,1 mode: PW/PW res: [0x20002ac2a:0x1fbe:0x0].0xe8519c29 bits 0x2 rrc: 2 type: IBT flags: 0x40316400000000 nid: local remote: 0x0 expref: -99 pid: 34549 timeout: 0 lvb_type: 0
      <3>LustreError: 34700:0:(ldlm_resource.c:1469:ldlm_resource_dump()) Skipped 4 previous similar messages
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac14:0xd2e4:0x0].0x786cbc9c (ffff8807f5e87b00) refcount = 5
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1484:ldlm_resource_dump()) Waiting locks:
      <3>LustreError: 34700:0:(ldlm_resource.c:1486:ldlm_resource_dump()) ### ### ns: mdt-soaked-MDT0000_UUID lock: ffff880836be2b80/0xda913c4532957639 lrc: 2/0,1 mode: --/PW res: [0x20002ac14:0xd2e4:0x0].0x786cbc9c bits 0x2 rrc: 5 type: IBT flags: 0x40316400000020 nid: local remote: 0x0 expref: -99 pid: 34463 timeout: 0 lvb_type: 0
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0xc7d60cfe (ffff8807f5e87a40) refcount = 2
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0x0 (ffff88041165ac00) refcount = 9
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0x78a9f709 (ffff880401cd9b40) refcount = 2
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0xd5a94b89 (ffff8803f859fbc0) refcount = 2
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac14:0xd2e4:0x0].0x0 (ffff8807fbdccf00) refcount = 9
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 9135:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8803f0bb0c80 x1555552176176736/t0(0) o13->soaked-OST000f-osc-MDT0000@192.168.1.105@o2ib10:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      <3>LustreError: 9135:0:(client.c:1166:ptlrpc_import_delay_req()) Skipped 6 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.107@o2ib10 (stopping)
      <6>Lustre: Skipped 7 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.117@o2ib100 (stopping)
      <6>Lustre: Skipped 15 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.130@o2ib100 (stopping)
      <6>Lustre: Skipped 6 previous similar messages
      <3>LustreError: 0-0: Forced cleanup waiting for mdt-soaked-MDT0000_UUID namespace with 6 resources in use, (rc=-110)
      <3>LustreError: 0-0: Forced cleanup waiting for mdt-soaked-MDT0000_UUID namespace with 6 resources in use, (rc=-110)
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.133@o2ib100 (stopping)
      <6>Lustre: Skipped 9 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.106@o2ib10 (stopping)
      <6>Lustre: Skipped 23 previous similar messages
      <3>LustreError: 0-0: Forced cleanup waiting for mdt-soaked-MDT0000_UUID namespace with 6 resources in use, (rc=-110)
      <3>LustreError: 34661:0:(lod_qos.c:208:lod_statfs_and_check()) soaked-MDT0000-mdtlov: statfs: rc = -108
      <3>LustreError: 34661:0:(lod_qos.c:208:lod_statfs_and_check()) Skipped 201 previous similar messages
      <0>LustreError: 34700:0:(mdt_handler.c:4565:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed:
      <0>LustreError: 9095:0:(lod_dev.c:1654:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: lu is ffff8803ffd54000
      <0>LustreError: 9095:0:(lod_dev.c:1654:lod_device_free()) LBUG
      <4>Pid: 9095, comm: obd_zombid
      

      A crash dump is available on lola-9 -/var/crash/127.0.0.1-2017-01-05-05:48:01

      Attachments

        Issue Links

          Activity

            [LU-8990] Failback LBUG lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref)

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31431/
            Subject: LU-8990 lod: put root at cleanup
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 34289a7be2e6ba42c6091ccd8835bd8f3eca9385

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31431/ Subject: LU-8990 lod: put root at cleanup Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 34289a7be2e6ba42c6091ccd8835bd8f3eca9385

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31431
            Subject: LU-8990 lod: put root at cleanup
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 348a1b52d538b9f26f213d766bef1f359f651e42

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31431 Subject: LU-8990 lod: put root at cleanup Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 348a1b52d538b9f26f213d766bef1f359f651e42
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31143/
            Subject: LU-8990 lod: put root at cleanup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 94fc345399b3cd94a96aa4b3f607f2dc9d669a98

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31143/ Subject: LU-8990 lod: put root at cleanup Project: fs/lustre-release Branch: master Current Patch Set: Commit: 94fc345399b3cd94a96aa4b3f607f2dc9d669a98

            Your new patch has been running several days, I am not seeing any output, or hard failures, I have grep'd for 'lod_device' in logs, but seeing nothing, any other strings I should search for to get you the output you need? Should I perhaps force a crash dump? If you have login to the system now, you can also force a dump if you need.

            cliffw Cliff White (Inactive) added a comment - Your new patch has been running several days, I am not seeing any output, or hard failures, I have grep'd for 'lod_device' in logs, but seeing nothing, any other strings I should search for to get you the output you need? Should I perhaps force a crash dump? If you have login to the system now, you can also force a dump if you need.
            laisiyao Lai Siyao added a comment -

            This should be a different issue, I'll look into it later.

            laisiyao Lai Siyao added a comment - This should be a different issue, I'll look into it later.
            cliffw Cliff White (Inactive) added a comment - - edited

            Not seeing any hard faults yet, but many watchdogs/hanging threads.
            Mostly from recovery after OSS failover.
            example

            Feb 12 22:24:35 soak-8 kernel: LNet: Service thread pid 2499 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
            Feb 12 22:24:35 soak-8 kernel: Pid: 2499, comm: mdt01_003
            Feb 12 22:24:35 soak-8 kernel: #012Call Trace:
            Feb 12 22:24:35 soak-8 kernel: [<ffffffff81033519>] ? sched_clock+0x9/0x10
            Feb 12 22:24:35 soak-8 kernel: [<ffffffff816ab6b9>] schedule+0x29/0x70
            Feb 12 22:24:35 soak-8 kernel: [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0
            Feb 12 22:24:35 soak-8 kernel: [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc0e21eb1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc17ae760>] osp_precreate_reserve+0x2e0/0x810 [osp]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffff810c6440>] ? default_wake_function+0x0/0x20
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc17a3c53>] osp_declare_create+0x193/0x590 [osp]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc0f404a9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f47dc>] lod_sub_declare_create+0xdc/0x210 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16eda4e>] lod_qos_declare_object_on+0xbe/0x3a0 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16ee9ca>] lod_alloc_rr.constprop.18+0x70a/0x1010 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f317d>] lod_qos_prep_create+0xced/0x1820 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f000e>] ? lod_alloc_qos.constprop.17+0xd3e/0x1590 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f420d>] lod_prepare_create+0x25d/0x360 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16e5f7e>] lod_declare_striped_create+0x1ee/0x970 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f47dc>] ? lod_sub_declare_create+0xdc/0x210 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16ea2b4>] lod_declare_create+0x204/0x590 [lod]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc0f60619>] ? lu_context_refill+0x19/0x50 [obdclass]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc175c3ef>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
            Feb 12 22:24:35 soak-8 kernel: [<ffffffffc174cb63>] mdd_declare_create+0x53/0xe30 [mdd]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc1750e89>] mdd_create+0x879/0x1410 [mdd]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc1605106>] mdt_reint_open+0x2206/0x3260 [mdt]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc0f73d2e>] ? upcall_cache_get_entry+0x20e/0x8f0 [obdclass]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15e8b43>] ? ucred_set_jobid+0x53/0x70 [mdt]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15f9400>] mdt_reint_rec+0x80/0x210 [mdt]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15d8f8b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15e5437>] mdt_intent_reint+0x157/0x420 [mdt]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15dc0b2>] mdt_intent_opc+0x442/0xad0 [mdt]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc113f470>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15e3c63>] mdt_intent_policy+0x1a3/0x360 [mdt]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc10ef202>] ldlm_lock_enqueue+0x382/0x8f0 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc1117753>] ldlm_handle_enqueue0+0x8f3/0x13e0 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc113f4f0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc119d202>] tgt_enqueue+0x62/0x210 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc11a5405>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc114958e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc1146448>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffff810c6452>] ? default_wake_function+0x12/0x20
            Feb 12 22:24:36 soak-8 kernel: [<ffffffff810bc0f8>] ? __wake_up_common+0x58/0x90
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc114cd42>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffffc114c2b0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
            Feb 12 22:24:36 soak-8 kernel: [<ffffffff810b252f>] kthread+0xcf/0xe0
            Feb 12 22:24:36 soak-8 kernel: [<ffffffff810b2460>] ? kthread+0x0/0xe0
            Feb 12 22:24:36 soak-8 kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90
            Feb 12 22:24:36 soak-8 kernel: [<ffffffff810b2460>] ? kthread+0x0/0xe0
            Feb 12 22:24:36 soak-8 kernel:
            Feb 12 22:24:36 soak-8 kernel: LustreError: dumping log to /tmp/lustre-log.1518474276.2499
            

            Logs are available on soak - /scratch/logs/syslog.

            cliffw Cliff White (Inactive) added a comment - - edited Not seeing any hard faults yet, but many watchdogs/hanging threads. Mostly from recovery after OSS failover. example Feb 12 22:24:35 soak-8 kernel: LNet: Service thread pid 2499 was inactive for 200.49s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Feb 12 22:24:35 soak-8 kernel: Pid: 2499, comm: mdt01_003 Feb 12 22:24:35 soak-8 kernel: #012Call Trace: Feb 12 22:24:35 soak-8 kernel: [<ffffffff81033519>] ? sched_clock+0x9/0x10 Feb 12 22:24:35 soak-8 kernel: [<ffffffff816ab6b9>] schedule+0x29/0x70 Feb 12 22:24:35 soak-8 kernel: [<ffffffff816a9004>] schedule_timeout+0x174/0x2c0 Feb 12 22:24:35 soak-8 kernel: [<ffffffff8109a6c0>] ? process_timeout+0x0/0x10 Feb 12 22:24:35 soak-8 kernel: [<ffffffffc0e21eb1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc17ae760>] osp_precreate_reserve+0x2e0/0x810 [osp] Feb 12 22:24:35 soak-8 kernel: [<ffffffff810c6440>] ? default_wake_function+0x0/0x20 Feb 12 22:24:35 soak-8 kernel: [<ffffffffc17a3c53>] osp_declare_create+0x193/0x590 [osp] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc0f404a9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f47dc>] lod_sub_declare_create+0xdc/0x210 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16eda4e>] lod_qos_declare_object_on+0xbe/0x3a0 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16ee9ca>] lod_alloc_rr.constprop.18+0x70a/0x1010 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f317d>] lod_qos_prep_create+0xced/0x1820 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f000e>] ? lod_alloc_qos.constprop.17+0xd3e/0x1590 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f420d>] lod_prepare_create+0x25d/0x360 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16e5f7e>] lod_declare_striped_create+0x1ee/0x970 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16f47dc>] ? lod_sub_declare_create+0xdc/0x210 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc16ea2b4>] lod_declare_create+0x204/0x590 [lod] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc0f60619>] ? lu_context_refill+0x19/0x50 [obdclass] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc175c3ef>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] Feb 12 22:24:35 soak-8 kernel: [<ffffffffc174cb63>] mdd_declare_create+0x53/0xe30 [mdd] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc1750e89>] mdd_create+0x879/0x1410 [mdd] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc1605106>] mdt_reint_open+0x2206/0x3260 [mdt] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc0f73d2e>] ? upcall_cache_get_entry+0x20e/0x8f0 [obdclass] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15e8b43>] ? ucred_set_jobid+0x53/0x70 [mdt] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15f9400>] mdt_reint_rec+0x80/0x210 [mdt] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15d8f8b>] mdt_reint_internal+0x5fb/0x9c0 [mdt] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15e5437>] mdt_intent_reint+0x157/0x420 [mdt] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15dc0b2>] mdt_intent_opc+0x442/0xad0 [mdt] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc113f470>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc15e3c63>] mdt_intent_policy+0x1a3/0x360 [mdt] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc10ef202>] ldlm_lock_enqueue+0x382/0x8f0 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc1117753>] ldlm_handle_enqueue0+0x8f3/0x13e0 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc113f4f0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc119d202>] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc11a5405>] tgt_request_handle+0x925/0x13b0 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc114958e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc1146448>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffff810c6452>] ? default_wake_function+0x12/0x20 Feb 12 22:24:36 soak-8 kernel: [<ffffffff810bc0f8>] ? __wake_up_common+0x58/0x90 Feb 12 22:24:36 soak-8 kernel: [<ffffffffc114cd42>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffffc114c2b0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] Feb 12 22:24:36 soak-8 kernel: [<ffffffff810b252f>] kthread+0xcf/0xe0 Feb 12 22:24:36 soak-8 kernel: [<ffffffff810b2460>] ? kthread+0x0/0xe0 Feb 12 22:24:36 soak-8 kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90 Feb 12 22:24:36 soak-8 kernel: [<ffffffff810b2460>] ? kthread+0x0/0xe0 Feb 12 22:24:36 soak-8 kernel: Feb 12 22:24:36 soak-8 kernel: LustreError: dumping log to /tmp/lustre-log.1518474276.2499 Logs are available on soak - /scratch/logs/syslog.
            laisiyao Lai Siyao added a comment -

            Hi Cliff, I just updated the patch, could you do soak test again?

            laisiyao Lai Siyao added a comment - Hi Cliff, I just updated the patch, could you do soak test again?
            laisiyao Lai Siyao added a comment -

            The crash dump shows lod->lod_md_root is not NULL at lod_device_free(), which means lod_md_root is released too early (in precleanup), and some request re-initialized it after that, I'll move it to real cleanup time.

            laisiyao Lai Siyao added a comment - The crash dump shows lod->lod_md_root is not NULL at lod_device_free(), which means lod_md_root is released too early (in precleanup), and some request re-initialized it after that, I'll move it to real cleanup time.
            laisiyao Lai Siyao added a comment -

            I still need to access the crash dump to verify some details, but I don't have account on Spirit yet, and I just created a ticket for it: https://jira.hpdd.intel.com/browse/DCO-7884

            laisiyao Lai Siyao added a comment - I still need to access the crash dump to verify some details, but I don't have account on Spirit yet, and I just created a ticket for it: https://jira.hpdd.intel.com/browse/DCO-7884

            People

              laisiyao Lai Siyao
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: