Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11111

crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.10.4
    • Fix Version/s: Lustre 2.13.0, Lustre 2.12.1
    • Labels:
      None
    • Environment:
      x86_64, zfs 0.7.9, centos7.4, 2.10.4 plus the "lfsck: load object attr when prepare LFSCK request" patch from LU-10988
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      Hi,

      I hit this just now when doing a lfsck for LU-10677.

      lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r -C

      2018-07-02 19:00:24 [699080.215320] Lustre: debug daemon will attempt to start writing to /mnt/root/root/lfsck.debug_daemon.dagg-MDTs.20180702.1 (20480000kB max)
      2018-07-02 19:51:27 [702136.419050] LustreError: 199045:0:(mdd_orphans.c:205:orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ) ) failed: 
      2018-07-02 19:51:27 [702136.432255] LustreError: 199045:0:(mdd_orphans.c:205:orph_index_insert()) LBUG
      2018-07-02 19:51:27 [702136.440349] Pid: 199045, comm: mdt01_013
      2018-07-02 19:51:27 [702136.445132] 
      2018-07-02 19:51:27 [702136.445132] Call Trace:
      2018-07-02 19:51:27 [702136.450734]  [<ffffffffc05267ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      2018-07-02 19:51:27 [702136.458083]  [<ffffffffc052683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      2018-07-02 19:51:27 [702136.465078]  [<ffffffffc11e3827>] __mdd_orphan_add+0x8e7/0xa20 [mdd]
      2018-07-02 19:51:27 [702136.472242]  [<ffffffffc09f720c>] ? osd_write_locked+0x3c/0x60 [osd_zfs]
      2018-07-02 19:51:27 [702136.479747]  [<ffffffffc11d33ce>] mdd_finish_unlink+0x17e/0x410 [mdd]
      2018-07-02 19:51:27 [702136.486988]  [<ffffffffc11d56e4>] mdd_unlink+0xae4/0xbf0 [mdd]
      2018-07-02 19:51:27 [702136.493616]  [<ffffffffc10bc728>] mdt_reint_unlink+0xc28/0x11d0 [mdt]
      2018-07-02 19:51:27 [702136.500846]  [<ffffffffc10bfb33>] mdt_reint_rec+0x83/0x210 [mdt]
      2018-07-02 19:51:27 [702136.507628]  [<ffffffffc10a137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      2018-07-02 19:51:27 [702136.514928]  [<ffffffffc10acf07>] mdt_reint+0x67/0x140 [mdt]
      2018-07-02 19:51:28 [702136.521402]  [<ffffffffc0e752ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      2018-07-02 19:51:28 [702136.529074]  [<ffffffffc0e1de2b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      2018-07-02 19:51:28 [702136.537493]  [<ffffffffc0e1a458>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      2018-07-02 19:51:28 [702136.545015]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      2018-07-02 19:51:28 [702136.552008]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      2018-07-02 19:51:28 [702136.558588]  [<ffffffffc0e21572>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.565605]  [<ffffffffc0e20ae0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.572580]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      2018-07-02 19:51:28 [702136.578163]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      2018-07-02 19:51:28 [702136.583819]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
      2018-07-02 19:51:28 [702136.589900]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      2018-07-02 19:51:28 [702136.595542] 
      2018-07-02 19:51:28 [702136.597694] Kernel panic - not syncing: LBUG
      2018-07-02 19:51:28 [702136.602600] CPU: 9 PID: 199045 Comm: mdt01_013 Tainted: P           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
      2018-07-02 19:51:28 [702136.613744] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.3.7 02/08/2018
      2018-07-02 19:51:28 [702136.621857] Call Trace:
      2018-07-02 19:51:28 [702136.624946]  [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
      2018-07-02 19:51:28 [702136.630704]  [<ffffffff816a8634>] panic+0xe8/0x21f
      2018-07-02 19:51:28 [702136.636117]  [<ffffffffc0526854>] lbug_with_loc+0x64/0xb0 [libcfs]
      2018-07-02 19:51:28 [702136.642927]  [<ffffffffc11e3827>] __mdd_orphan_add+0x8e7/0xa20 [mdd]
      2018-07-02 19:51:28 [702136.649895]  [<ffffffffc09f720c>] ? osd_write_locked+0x3c/0x60 [osd_zfs]
      2018-07-02 19:51:28 [702136.657201]  [<ffffffffc11d33ce>] mdd_finish_unlink+0x17e/0x410 [mdd]
      2018-07-02 19:51:28 [702136.664232]  [<ffffffffc11d56e4>] mdd_unlink+0xae4/0xbf0 [mdd]
      2018-07-02 19:51:28 [702136.670661]  [<ffffffffc10bc728>] mdt_reint_unlink+0xc28/0x11d0 [mdt]
      2018-07-02 19:51:28 [702136.677681]  [<ffffffffc10bfb33>] mdt_reint_rec+0x83/0x210 [mdt]
      2018-07-02 19:51:28 [702136.684247]  [<ffffffffc10a137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      2018-07-02 19:51:28 [702136.691331]  [<ffffffffc10acf07>] mdt_reint+0x67/0x140 [mdt]
      2018-07-02 19:51:28 [702136.697583]  [<ffffffffc0e752ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      2018-07-02 19:51:28 [702136.705025]  [<ffffffffc0e1de2b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      2018-07-02 19:51:28 [702136.713235]  [<ffffffffc0e1a458>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      2018-07-02 19:51:28 [702136.720538]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      2018-07-02 19:51:28 [702136.727314]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      2018-07-02 19:51:28 [702136.733666]  [<ffffffffc0e21572>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.740438]  [<ffffffffc0e20ae0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
      2018-07-02 19:51:28 [702136.748310]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      2018-07-02 19:51:28 [702136.753661]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      2018-07-02 19:51:28 [702136.760227]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
      2018-07-02 19:51:28 [702136.766108]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      2018-07-02 19:51:28 [702136.772696] Kernel Offset: disabled
      

      the debug_daemon running at the time, but the file is 6.1G and probably truncated because of the crash. let me know if you want it anyway.

      I've mounted the MDT's back up again with -o skip_lfsck.

      I think the crash happened somewhere in phase2 of the namespace lfsck. the lfsck state after the reboot is below.

      [warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528222041
      time_since_last_completed: 2305149 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14945199
      checked_phase2: 4683016
      updated_phase1: 0
      updated_phase2: 1
      failed_phase1: 0
      failed_phase2: 0
      directories: 2205524
      dirent_repaired: 0
      linkea_repaired: 0
      nlinks_repaired: 0
      multiple_linked_checked: 324817
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 1
      dangling_repaired: 92
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639566
      striped_shards_repaired: 1
      striped_shards_failed: 0
      striped_shards_skipped: 3
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 9
      run_time_phase1: 2357 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6340 items/sec
      average_speed_phase2: 4878 objs/sec
      average_speed_total: 5917 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528222004
      time_since_last_completed: 2305186 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14356291
      checked_phase2: 4705808
      updated_phase1: 0
      updated_phase2: 0
      failed_phase1: 0
      failed_phase2: 0
      directories: 2185653
      dirent_repaired: 0
      linkea_repaired: 0
      nlinks_repaired: 0
      multiple_linked_checked: 323999
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 11
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639415
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 4
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 10
      run_time_phase1: 2374 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6047 items/sec
      average_speed_phase2: 4901 objs/sec
      average_speed_total: 5717 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528221995
      time_since_last_completed: 2305195 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14368285
      checked_phase2: 4739156
      updated_phase1: 2
      updated_phase2: 1
      failed_phase1: 0
      failed_phase2: 0
      directories: 2187046
      dirent_repaired: 0
      linkea_repaired: 1
      nlinks_repaired: 0
      multiple_linked_checked: 324797
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 88
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 1
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639666
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 4
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 10
      run_time_phase1: 2387 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6019 items/sec
      average_speed_phase2: 4936 objs/sec
      average_speed_total: 5708 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      

      cheers,
      robin

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                laisiyao Lai Siyao
                Reporter:
                scadmin SC Admin
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: