Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11111

crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0, Lustre 2.12.1
    • Lustre 2.10.4
    • None
    • x86_64, zfs 0.7.9, centos7.4, 2.10.4 plus the "lfsck: load object attr when prepare LFSCK request" patch from LU-10988
    • 3
    • 9223372036854775807

    Description

      Hi,

      I hit this just now when doing a lfsck for LU-10677.

      lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r -C

      2018-07-02 19:00:24 [699080.215320] Lustre: debug daemon will attempt to start writing to /mnt/root/root/lfsck.debug_daemon.dagg-MDTs.20180702.1 (20480000kB max)
      2018-07-02 19:51:27 [702136.419050] LustreError: 199045:0:(mdd_orphans.c:205:orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ) ) failed: 
      2018-07-02 19:51:27 [702136.432255] LustreError: 199045:0:(mdd_orphans.c:205:orph_index_insert()) LBUG
      2018-07-02 19:51:27 [702136.440349] Pid: 199045, comm: mdt01_013
      2018-07-02 19:51:27 [702136.445132] 
      2018-07-02 19:51:27 [702136.445132] Call Trace:
      2018-07-02 19:51:27 [702136.450734]  [<ffffffffc05267ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      2018-07-02 19:51:27 [702136.458083]  [<ffffffffc052683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      2018-07-02 19:51:27 [702136.465078]  [<ffffffffc11e3827>] __mdd_orphan_add+0x8e7/0xa20 [mdd]
      2018-07-02 19:51:27 [702136.472242]  [<ffffffffc09f720c>] ? osd_write_locked+0x3c/0x60 [osd_zfs]
      2018-07-02 19:51:27 [702136.479747]  [<ffffffffc11d33ce>] mdd_finish_unlink+0x17e/0x410 [mdd]
      2018-07-02 19:51:27 [702136.486988]  [<ffffffffc11d56e4>] mdd_unlink+0xae4/0xbf0 [mdd]
      2018-07-02 19:51:27 [702136.493616]  [<ffffffffc10bc728>] mdt_reint_unlink+0xc28/0x11d0 [mdt]
      2018-07-02 19:51:27 [702136.500846]  [<ffffffffc10bfb33>] mdt_reint_rec+0x83/0x210 [mdt]
      2018-07-02 19:51:27 [702136.507628]  [<ffffffffc10a137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      2018-07-02 19:51:27 [702136.514928]  [<ffffffffc10acf07>] mdt_reint+0x67/0x140 [mdt]
      2018-07-02 19:51:28 [702136.521402]  [<ffffffffc0e752ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      2018-07-02 19:51:28 [702136.529074]  [<ffffffffc0e1de2b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      2018-07-02 19:51:28 [702136.537493]  [<ffffffffc0e1a458>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      2018-07-02 19:51:28 [702136.545015]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      2018-07-02 19:51:28 [702136.552008]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      2018-07-02 19:51:28 [702136.558588]  [<ffffffffc0e21572>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.565605]  [<ffffffffc0e20ae0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.572580]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      2018-07-02 19:51:28 [702136.578163]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      2018-07-02 19:51:28 [702136.583819]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
      2018-07-02 19:51:28 [702136.589900]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      2018-07-02 19:51:28 [702136.595542] 
      2018-07-02 19:51:28 [702136.597694] Kernel panic - not syncing: LBUG
      2018-07-02 19:51:28 [702136.602600] CPU: 9 PID: 199045 Comm: mdt01_013 Tainted: P           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
      2018-07-02 19:51:28 [702136.613744] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.3.7 02/08/2018
      2018-07-02 19:51:28 [702136.621857] Call Trace:
      2018-07-02 19:51:28 [702136.624946]  [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
      2018-07-02 19:51:28 [702136.630704]  [<ffffffff816a8634>] panic+0xe8/0x21f
      2018-07-02 19:51:28 [702136.636117]  [<ffffffffc0526854>] lbug_with_loc+0x64/0xb0 [libcfs]
      2018-07-02 19:51:28 [702136.642927]  [<ffffffffc11e3827>] __mdd_orphan_add+0x8e7/0xa20 [mdd]
      2018-07-02 19:51:28 [702136.649895]  [<ffffffffc09f720c>] ? osd_write_locked+0x3c/0x60 [osd_zfs]
      2018-07-02 19:51:28 [702136.657201]  [<ffffffffc11d33ce>] mdd_finish_unlink+0x17e/0x410 [mdd]
      2018-07-02 19:51:28 [702136.664232]  [<ffffffffc11d56e4>] mdd_unlink+0xae4/0xbf0 [mdd]
      2018-07-02 19:51:28 [702136.670661]  [<ffffffffc10bc728>] mdt_reint_unlink+0xc28/0x11d0 [mdt]
      2018-07-02 19:51:28 [702136.677681]  [<ffffffffc10bfb33>] mdt_reint_rec+0x83/0x210 [mdt]
      2018-07-02 19:51:28 [702136.684247]  [<ffffffffc10a137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      2018-07-02 19:51:28 [702136.691331]  [<ffffffffc10acf07>] mdt_reint+0x67/0x140 [mdt]
      2018-07-02 19:51:28 [702136.697583]  [<ffffffffc0e752ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      2018-07-02 19:51:28 [702136.705025]  [<ffffffffc0e1de2b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      2018-07-02 19:51:28 [702136.713235]  [<ffffffffc0e1a458>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      2018-07-02 19:51:28 [702136.720538]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      2018-07-02 19:51:28 [702136.727314]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      2018-07-02 19:51:28 [702136.733666]  [<ffffffffc0e21572>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.740438]  [<ffffffffc0e20ae0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
      2018-07-02 19:51:28 [702136.748310]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      2018-07-02 19:51:28 [702136.753661]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      2018-07-02 19:51:28 [702136.760227]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
      2018-07-02 19:51:28 [702136.766108]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      2018-07-02 19:51:28 [702136.772696] Kernel Offset: disabled
      

      the debug_daemon running at the time, but the file is 6.1G and probably truncated because of the crash. let me know if you want it anyway.

      I've mounted the MDT's back up again with -o skip_lfsck.

      I think the crash happened somewhere in phase2 of the namespace lfsck. the lfsck state after the reboot is below.

      [warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528222041
      time_since_last_completed: 2305149 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14945199
      checked_phase2: 4683016
      updated_phase1: 0
      updated_phase2: 1
      failed_phase1: 0
      failed_phase2: 0
      directories: 2205524
      dirent_repaired: 0
      linkea_repaired: 0
      nlinks_repaired: 0
      multiple_linked_checked: 324817
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 1
      dangling_repaired: 92
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639566
      striped_shards_repaired: 1
      striped_shards_failed: 0
      striped_shards_skipped: 3
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 9
      run_time_phase1: 2357 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6340 items/sec
      average_speed_phase2: 4878 objs/sec
      average_speed_total: 5917 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528222004
      time_since_last_completed: 2305186 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14356291
      checked_phase2: 4705808
      updated_phase1: 0
      updated_phase2: 0
      failed_phase1: 0
      failed_phase2: 0
      directories: 2185653
      dirent_repaired: 0
      linkea_repaired: 0
      nlinks_repaired: 0
      multiple_linked_checked: 323999
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 11
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639415
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 4
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 10
      run_time_phase1: 2374 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6047 items/sec
      average_speed_phase2: 4901 objs/sec
      average_speed_total: 5717 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528221995
      time_since_last_completed: 2305195 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14368285
      checked_phase2: 4739156
      updated_phase1: 2
      updated_phase2: 1
      failed_phase1: 0
      failed_phase2: 0
      directories: 2187046
      dirent_repaired: 0
      linkea_repaired: 1
      nlinks_repaired: 0
      multiple_linked_checked: 324797
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 88
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 1
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639666
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 4
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 10
      run_time_phase1: 2387 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6019 items/sec
      average_speed_phase2: 4936 objs/sec
      average_speed_total: 5708 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      

      cheers,
      robin

      Attachments

        Issue Links

          Activity

            [LU-11111] crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ)
            scadmin SC Admin added a comment -

            thanks.
            I applied it last night and let the original lfsck continue.
            it finished ok (output below).

            because it's been a long time, and we've had many more crashes from LU-11082, I started another namespace lfsck.

            this hung most of the way through phase2 on mdt0 and didn't complete. it completed on mdt1,2.
            I waited about 15mins after all the counters stopped going up on mdt0 and then tried a lfsck_stop, but it just blocked. lfsck_namespace was using 100% of a core. looked hung. I had to power cycle the MDS.
            MDTs mounted with -o skip_lfsck again now.

            do you want me to open a new ticket for this hang or stick with this ticket?

            there are a lot of problems repaired in both of the below. presumable because of LU-10887 and LU-11082 and maybe LU-10677.

            output from continued lfsck from ages ago (that crashed in phase2) ->

            [warble1]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: completed
            flags:
            param: all_targets,create_mdtobj
            last_completed_time: 1532449023
            time_since_last_completed: 22 seconds
            latest_start_time: 1532447977
            time_since_latest_start: 1068 seconds
            last_checkpoint_time: 1532449023
            time_since_last_checkpoint: 22 seconds
            latest_start_position: 266, N/A, N/A
            last_checkpoint_position: 35184372088832, N/A, N/A
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 14945199
            checked_phase2: 6638062
            updated_phase1: 0
            updated_phase2: 1
            failed_phase1: 0
            failed_phase2: 0
            directories: 2205524
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 324817
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 1
            dangling_repaired: 92
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 0
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 1639566
            striped_shards_repaired: 1
            striped_shards_failed: 0
            striped_shards_skipped: 3
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            success_count: 10
            run_time_phase1: 2357 seconds
            run_time_phase2: 1972 seconds
            average_speed_phase1: 6340 items/sec
            average_speed_phase2: 3366 objs/sec
            average_speed_total: 4985 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: completed
            flags:
            param: all_targets,create_mdtobj
            last_completed_time: 1532448851
            time_since_last_completed: 194 seconds
            latest_start_time: 1532447985
            time_since_latest_start: 1060 seconds
            last_checkpoint_time: 1532448851
            time_since_last_checkpoint: 194 seconds
            latest_start_position: 266, N/A, N/A
            last_checkpoint_position: 35184372088832, N/A, N/A
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 14356291
            checked_phase2: 6221317
            updated_phase1: 0
            updated_phase2: 0
            failed_phase1: 0
            failed_phase2: 0
            directories: 2185653
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 323999
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 11
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 0
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 1639415
            striped_shards_repaired: 0
            striped_shards_failed: 0
            striped_shards_skipped: 5
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            success_count: 11
            run_time_phase1: 2374 seconds
            run_time_phase2: 1800 seconds
            average_speed_phase1: 6047 items/sec
            average_speed_phase2: 3456 objs/sec
            average_speed_total: 4929 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: completed
            flags:
            param: all_targets,create_mdtobj
            last_completed_time: 1532448851
            time_since_last_completed: 194 seconds
            latest_start_time: 1532448011
            time_since_latest_start: 1034 seconds
            last_checkpoint_time: 1532448851
            time_since_last_checkpoint: 194 seconds
            latest_start_position: 266, N/A, N/A
            last_checkpoint_position: 35184372088832, N/A, N/A
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 14368285
            checked_phase2: 6254745
            updated_phase1: 2
            updated_phase2: 1
            failed_phase1: 0
            failed_phase2: 0
            directories: 2187046
            dirent_repaired: 0
            linkea_repaired: 1
            nlinks_repaired: 0
            multiple_linked_checked: 324797
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 88
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 1
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 1639666
            striped_shards_repaired: 0
            striped_shards_failed: 0
            striped_shards_skipped: 4
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            success_count: 11
            run_time_phase1: 2387 seconds
            run_time_phase2: 1799 seconds
            average_speed_phase1: 6019 items/sec
            average_speed_phase2: 3476 objs/sec
            average_speed_total: 4926 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            

            output from next namespace lfsck that stopped progressing on mdt0 (before power-off of MDS) ->

            [warble1]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: scanning-phase2
            flags: scanned-once,inconsistent
            param: all_targets,create_mdtobj
            last_completed_time: 1532449023
            time_since_last_completed: 5691 seconds
            latest_start_time: 1532449114
            time_since_latest_start: 5600 seconds
            last_checkpoint_time: 1532453018
            time_since_last_checkpoint: 1696 seconds
            latest_start_position: 266, N/A, N/A
            last_checkpoint_position: 35184372088832, N/A, N/A
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 18305111
            checked_phase2: 5342520
            updated_phase1: 44
            updated_phase2: 0
            failed_phase1: 0
            failed_phase2: 0
            directories: 2368594
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 381414
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 18
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 0
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 1761665
            striped_shards_repaired: 20
            striped_shards_failed: 0
            striped_shards_skipped: 84
            name_hash_repaired: 44
            linkea_overflow_cleared: 0
            success_count: 10
            run_time_phase1: 2823 seconds
            run_time_phase2: 2776 seconds
            average_speed_phase1: 6484 items/sec
            average_speed_phase2: 1924 objs/sec
            average_speed_total: 4223 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: 35 objs/sec
            current_position: [0x20001f8cf:0xcddd:0x0]
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: completed
            flags:
            param: all_targets,create_mdtobj
            last_completed_time: 1532453205
            time_since_last_completed: 1509 seconds
            latest_start_time: 1532449114
            time_since_latest_start: 5600 seconds
            last_checkpoint_time: 1532453205
            time_since_last_checkpoint: 1509 seconds
            latest_start_position: 266, N/A, N/A
            last_checkpoint_position: 35184372088832, N/A, N/A
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 17716791
            checked_phase2: 6134353
            updated_phase1: 155
            updated_phase2: 0
            failed_phase1: 0
            failed_phase2: 0
            directories: 2348707
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 381110
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 0
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 0
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 1761553
            striped_shards_repaired: 78
            striped_shards_failed: 0
            striped_shards_skipped: 102
            name_hash_repaired: 155
            linkea_overflow_cleared: 0
            success_count: 12
            run_time_phase1: 2707 seconds
            run_time_phase2: 1267 seconds
            average_speed_phase1: 6544 items/sec
            average_speed_phase2: 4841 objs/sec
            average_speed_total: 6001 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: completed
            flags:
            param: all_targets,create_mdtobj
            last_completed_time: 1532453199
            time_since_last_completed: 1515 seconds
            latest_start_time: 1532449114
            time_since_latest_start: 5600 seconds
            last_checkpoint_time: 1532453199
            time_since_last_checkpoint: 1515 seconds
            latest_start_position: 266, N/A, N/A
            last_checkpoint_position: 35184372088832, N/A, N/A
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 17724868
            checked_phase2: 6127858
            updated_phase1: 1
            updated_phase2: 0
            failed_phase1: 0
            failed_phase2: 0
            directories: 2349705
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 380857
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 36
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 0
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 1761725
            striped_shards_repaired: 0
            striped_shards_failed: 0
            striped_shards_skipped: 102
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            success_count: 12
            run_time_phase1: 2698 seconds
            run_time_phase2: 1261 seconds
            average_speed_phase1: 6569 items/sec
            average_speed_phase2: 4859 objs/sec
            average_speed_total: 6024 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            

            cheers,
            robin

            scadmin SC Admin added a comment - thanks. I applied it last night and let the original lfsck continue. it finished ok (output below). because it's been a long time, and we've had many more crashes from LU-11082 , I started another namespace lfsck. this hung most of the way through phase2 on mdt0 and didn't complete. it completed on mdt1,2. I waited about 15mins after all the counters stopped going up on mdt0 and then tried a lfsck_stop, but it just blocked. lfsck_namespace was using 100% of a core. looked hung. I had to power cycle the MDS. MDTs mounted with -o skip_lfsck again now. do you want me to open a new ticket for this hang or stick with this ticket? there are a lot of problems repaired in both of the below. presumable because of LU-10887 and LU-11082 and maybe LU-10677 . output from continued lfsck from ages ago (that crashed in phase2) -> [warble1]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: param: all_targets,create_mdtobj last_completed_time: 1532449023 time_since_last_completed: 22 seconds latest_start_time: 1532447977 time_since_latest_start: 1068 seconds last_checkpoint_time: 1532449023 time_since_last_checkpoint: 22 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 14945199 checked_phase2: 6638062 updated_phase1: 0 updated_phase2: 1 failed_phase1: 0 failed_phase2: 0 directories: 2205524 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 324817 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 1 dangling_repaired: 92 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 1639566 striped_shards_repaired: 1 striped_shards_failed: 0 striped_shards_skipped: 3 name_hash_repaired: 0 linkea_overflow_cleared: 0 success_count: 10 run_time_phase1: 2357 seconds run_time_phase2: 1972 seconds average_speed_phase1: 6340 items/sec average_speed_phase2: 3366 objs/sec average_speed_total: 4985 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: param: all_targets,create_mdtobj last_completed_time: 1532448851 time_since_last_completed: 194 seconds latest_start_time: 1532447985 time_since_latest_start: 1060 seconds last_checkpoint_time: 1532448851 time_since_last_checkpoint: 194 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 14356291 checked_phase2: 6221317 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2185653 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 323999 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 11 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 1639415 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 5 name_hash_repaired: 0 linkea_overflow_cleared: 0 success_count: 11 run_time_phase1: 2374 seconds run_time_phase2: 1800 seconds average_speed_phase1: 6047 items/sec average_speed_phase2: 3456 objs/sec average_speed_total: 4929 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: param: all_targets,create_mdtobj last_completed_time: 1532448851 time_since_last_completed: 194 seconds latest_start_time: 1532448011 time_since_latest_start: 1034 seconds last_checkpoint_time: 1532448851 time_since_last_checkpoint: 194 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 14368285 checked_phase2: 6254745 updated_phase1: 2 updated_phase2: 1 failed_phase1: 0 failed_phase2: 0 directories: 2187046 dirent_repaired: 0 linkea_repaired: 1 nlinks_repaired: 0 multiple_linked_checked: 324797 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 88 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 1 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 1639666 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 4 name_hash_repaired: 0 linkea_overflow_cleared: 0 success_count: 11 run_time_phase1: 2387 seconds run_time_phase2: 1799 seconds average_speed_phase1: 6019 items/sec average_speed_phase2: 3476 objs/sec average_speed_total: 4926 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A output from next namespace lfsck that stopped progressing on mdt0 (before power-off of MDS) -> [warble1]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace name: lfsck_namespace magic: 0xa0621a0b version: 2 status: scanning-phase2 flags: scanned-once,inconsistent param: all_targets,create_mdtobj last_completed_time: 1532449023 time_since_last_completed: 5691 seconds latest_start_time: 1532449114 time_since_latest_start: 5600 seconds last_checkpoint_time: 1532453018 time_since_last_checkpoint: 1696 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 18305111 checked_phase2: 5342520 updated_phase1: 44 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2368594 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 381414 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 18 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 1761665 striped_shards_repaired: 20 striped_shards_failed: 0 striped_shards_skipped: 84 name_hash_repaired: 44 linkea_overflow_cleared: 0 success_count: 10 run_time_phase1: 2823 seconds run_time_phase2: 2776 seconds average_speed_phase1: 6484 items/sec average_speed_phase2: 1924 objs/sec average_speed_total: 4223 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: 35 objs/sec current_position: [0x20001f8cf:0xcddd:0x0] name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: param: all_targets,create_mdtobj last_completed_time: 1532453205 time_since_last_completed: 1509 seconds latest_start_time: 1532449114 time_since_latest_start: 5600 seconds last_checkpoint_time: 1532453205 time_since_last_checkpoint: 1509 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 17716791 checked_phase2: 6134353 updated_phase1: 155 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2348707 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 381110 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 1761553 striped_shards_repaired: 78 striped_shards_failed: 0 striped_shards_skipped: 102 name_hash_repaired: 155 linkea_overflow_cleared: 0 success_count: 12 run_time_phase1: 2707 seconds run_time_phase2: 1267 seconds average_speed_phase1: 6544 items/sec average_speed_phase2: 4841 objs/sec average_speed_total: 6001 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: param: all_targets,create_mdtobj last_completed_time: 1532453199 time_since_last_completed: 1515 seconds latest_start_time: 1532449114 time_since_latest_start: 5600 seconds last_checkpoint_time: 1532453199 time_since_last_checkpoint: 1515 seconds latest_start_position: 266, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 17724868 checked_phase2: 6127858 updated_phase1: 1 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 directories: 2349705 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 380857 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 36 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 1761725 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 102 name_hash_repaired: 0 linkea_overflow_cleared: 0 success_count: 12 run_time_phase1: 2698 seconds run_time_phase2: 1261 seconds average_speed_phase1: 6569 items/sec average_speed_phase2: 4859 objs/sec average_speed_total: 6024 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A cheers, robin
            laisiyao Lai Siyao added a comment -

            I just uploaded a fix, you can apply it and update MDS only because it's a fix for MDT code.

            laisiyao Lai Siyao added a comment - I just uploaded a fix, you can apply it and update MDS only because it's a fix for MDT code.

            Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32796
            Subject: LU-11111 mdd: don't assert on ORPHAN flag
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 693fe452ec4349bb6ea12b8901b9ec1665d68e95

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32796 Subject: LU-11111 mdd: don't assert on ORPHAN flag Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 693fe452ec4349bb6ea12b8901b9ec1665d68e95
            laisiyao Lai Siyao added a comment -

            This may be because LFSCK added an orphan back to its original parent directory, and then it was unlinked again, which triggered this assertion, I'll commit a fix soon.

            laisiyao Lai Siyao added a comment - This may be because LFSCK added an orphan back to its original parent directory, and then it was unlinked again, which triggered this assertion, I'll commit a fix soon.
            scadmin SC Admin added a comment -

            sorry, the console log above is all we have. we don't have kernel crash dumps setup.

            cheers,
            robin

            scadmin SC Admin added a comment - sorry, the console log above is all we have. we don't have kernel crash dumps setup. cheers, robin
            laisiyao Lai Siyao added a comment -

            Can you post the stack trace of all processes in the crash dump?

            laisiyao Lai Siyao added a comment - Can you post the stack trace of all processes in the crash dump?
            pjones Peter Jones added a comment -

            Lai

            Could you please assist?

            Thanks

            Peter

            pjones Peter Jones added a comment - Lai Could you please assist? Thanks Peter
            scadmin SC Admin added a comment -

            FWIW the last little bit of the debug_file is

            ...
            00100000:10000000:3.0:1530524123.223490:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa471:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
            00100000:10000000:9.0:1530524123.224700:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa471:0x0] in the trace file, flags 2, old 1, new 3: rc = 0
            00100000:10000000:3.0:1530524123.224783:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa45c:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
            00100000:10000000:9.0:1530524123.225897:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa45c:0x0] in the trace file, flags 2, old 1, new 3: rc = 0
            00100000:10000000:3.0:1530524123.225978:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa46a:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
            00100000:10000000:9.0:1530524123.227163:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa46a:0x0] in the trace file, flags 2, old 1, new 3: rc = 0
            00100000:10000000:3.0:1530524123.227248:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa476:0x0] in the trace file, flags 1, old 0, new 1: rc = 0
            00100000:10000000:9.0:1530524123.228443:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa476:0x0] in the trace file, flags 2, old 1, new 3: rc = 0
            00100000:10000000:9.0:1530524123.231812:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x68002f9d1:0xa47c:0x0], index = 0, flags = 0, flags2 = 0, depth = 0
            00100000:10000000:9.0:1530524123.231905:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x2800301a1:0xa489:0x0], index = 2, flags = 0, flags2 = 0, depth = 0
            00100000:10000000:3.0:1530524123.231921:0:375744:0:(lfsck_engine.c:1092:lfsck_master_engine()) LFSCK exit: oit_flags = 0x60003, dir_flags = 0x8006, oit_cookie = 35184372088832, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 375744, rc = 1
            00100000:10000000:3.0:1530524123.231926:0:375744:0:(lfsck_lib.c:2577:lfsck_post_generic()) dagg-MDT0002-osd: waiting for assistant to do lfsck_namespace post, rc = 1
            00100000:10000000:9.0:1530524123.231951:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x20001f032:0xa4fa:0x0], index = 1, flags = 0, flags2 = 0, depth = 0
            00100000:10000000:9.0:1530524123.231956:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 0 [0x68002f9d1:0xa47c:0x0] of the striped directory [0x68002f9d1:0xa47c:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0
            00100000:10000000:9.0:1530524123.231985:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 1 [0x20001f032:0xa4fa:0x0] of the striped directory [0x20001f032:0xa4fa:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0
            00100000:10000000:9.0:1530524123.232007:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 2 [0x2800301a1:0xa489:0x0] of the striped directory [0x2800301a1:0xa489:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0
            00100000:10000000:9.0:1530524123.232034:0:377274:0:(lfsck_engine.c:1650:lfsck_assistant_engine()) dagg-MDT0002-osd: lfsck_namespace LFSCK assistant thread post
            00100000:10000000:3.0:1530524123.232035:0:375744:0:(lfsck_lib.c:2589:lfsck_post_generic()) dagg-MDT0002-osd: the assistant has done lfsck_namespace post, rc = 1
            00100000:10000000:5.0F:1530524123.232052:0:17117:0:(lfsck_namespace.c:4721:lfsck_namespace_in_notify()) dagg-MDT0001-osd: namespace LFSCK handles notify 3 from MDT 2, status 1, flags 3
            00100000:10000000:13.0F:1530524123.232053:0:24114:0:(lfsck_namespace.c:4721:lfsck_namespace_in_notify()) dagg-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 2, status 1, flags 2
            00100000:10000000:3.0:1530524123.232053:0:375744:0:(lfsck_namespace.c:4397:lfsck_namespace_post()) dagg-MDT0002-osd: namespace LFSCK post done: rc = 0
            00100000:10000000:3.0:1530524123.232054:0:375744:0:(lfsck_lib.c:2607:lfsck_double_scan_generic()) dagg-MDT0002-osd: waiting for assistant to do lfsck_namespace double_scan, status 2
            00100000:10000000:1.0F:1530524123.232061:0:375552:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0000-osd: namespace LFSCK phase2 scan start
            00100000:10000000:9.0:1530524123.232068:0:377274:0:(lfsck_engine.c:1669:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0
            00100000:10000000:9.0:1530524123.232070:0:377274:0:(lfsck_engine.c:1687:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant sync before the second-stage scaning
            00100000:10000000:3.0:1530524123.232072:0:375744:0:(lfsck_lib.c:2617:lfsck_double_scan_generic()) dagg-MDT0002-osd: the assistant has done lfsck_namespace double_scan, status 0
            00100000:10000000:11.0F:1530524123.232106:0:377219:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0001-osd: namespace LFSCK phase2 scan start
            00100000:10000000:1.0:1530524123.232204:0:375552:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0000-osd: stop to scan backend /lost+found: rc = 0
            00100000:10000000:11.0:1530524123.232339:0:377219:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0001-osd: stop to scan backend /lost+found: rc = 0
            00100000:10000000:11.0:1530524123.463692:0:377274:0:(lfsck_engine.c:1694:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant phase2 scan start, synced: rc = 0
            00100000:10000000:11.0:1530524123.463694:0:377274:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0002-osd: namespace LFSCK phase2 scan start
            00100000:10000000:11.0:1530524123.463722:0:377274:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0002-osd: stop to scan backend /lost+found: rc = 0
            00100000:10000000:3.0F:1530524329.684216:0:375552:0:(lfsck_namespace.c:2253:lfsck_namespace_repair_unmatched_pairs()) dagg-MDT0000-osd: namespace LFSCK rebuild dotdot name entry for the object [0x20001f034:0xb6c2:0x0], new parent [0x68002f9d2:0x866d:0x0]: rc = 1
            00100000:10000000:2.0F:1530524869.309940:0:377274:0:(lfsck_namespace.c:1217:lfsck_namespace_insert_normal()) dagg-MDT0002-osd: namespace LFSCK insert object [0x68002c327:0xc8c9:0x0] with the name 00000001000002CB00000083 and type 100000 to the parent [0x2000182d2:0x2c6d:0x0]: rc = 1
            

            cheers,
            robin

            scadmin SC Admin added a comment - FWIW the last little bit of the debug_file is ... 00100000:10000000:3.0:1530524123.223490:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa471:0x0] in the trace file, flags 1, old 0, new 1: rc = 0 00100000:10000000:9.0:1530524123.224700:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa471:0x0] in the trace file, flags 2, old 1, new 3: rc = 0 00100000:10000000:3.0:1530524123.224783:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa45c:0x0] in the trace file, flags 1, old 0, new 1: rc = 0 00100000:10000000:9.0:1530524123.225897:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa45c:0x0] in the trace file, flags 2, old 1, new 3: rc = 0 00100000:10000000:3.0:1530524123.225978:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa46a:0x0] in the trace file, flags 1, old 0, new 1: rc = 0 00100000:10000000:9.0:1530524123.227163:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa46a:0x0] in the trace file, flags 2, old 1, new 3: rc = 0 00100000:10000000:3.0:1530524123.227248:0:375744:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa476:0x0] in the trace file, flags 1, old 0, new 1: rc = 0 00100000:10000000:9.0:1530524123.228443:0:377274:0:(lfsck_namespace.c:576:lfsck_namespace_trace_update()) dagg-MDT0002-osd: namespace LFSCK add flags for [0x68002f9d1:0xa476:0x0] in the trace file, flags 2, old 1, new 3: rc = 0 00100000:10000000:9.0:1530524123.231812:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x68002f9d1:0xa47c:0x0], index = 0, flags = 0, flags2 = 0, depth = 0 00100000:10000000:9.0:1530524123.231905:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x2800301a1:0xa489:0x0], index = 2, flags = 0, flags2 = 0, depth = 0 00100000:10000000:3.0:1530524123.231921:0:375744:0:(lfsck_engine.c:1092:lfsck_master_engine()) LFSCK exit: oit_flags = 0x60003, dir_flags = 0x8006, oit_cookie = 35184372088832, dir_cookie = 0x0, parent = [0x0:0x0:0x0], pid = 375744, rc = 1 00100000:10000000:3.0:1530524123.231926:0:375744:0:(lfsck_lib.c:2577:lfsck_post_generic()) dagg-MDT0002-osd: waiting for assistant to do lfsck_namespace post, rc = 1 00100000:10000000:9.0:1530524123.231951:0:377274:0:(lfsck_striped_dir.c:505:lfsck_record_lmv()) dagg-MDT0002-osd: record slave LMV EA for the striped directory [0x68002fa24:0x172d1:0x0]: shard = [0x20001f032:0xa4fa:0x0], index = 1, flags = 0, flags2 = 0, depth = 0 00100000:10000000:9.0:1530524123.231956:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 0 [0x68002f9d1:0xa47c:0x0] of the striped directory [0x68002f9d1:0xa47c:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0 00100000:10000000:9.0:1530524123.231985:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 1 [0x20001f032:0xa4fa:0x0] of the striped directory [0x20001f032:0xa4fa:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0 00100000:10000000:9.0:1530524123.232007:0:377274:0:(lfsck_striped_dir.c:2189:lfsck_namespace_striped_dir_rescan()) dagg-MDT0002-osd: namespace LFSCK repair the shard 2 [0x2800301a1:0xa489:0x0] of the striped directory [0x2800301a1:0xa489:0x0] with dangling no/no, rename no/no, llinkea no/no, repair_lmvea no/no: rc = 0 00100000:10000000:9.0:1530524123.232034:0:377274:0:(lfsck_engine.c:1650:lfsck_assistant_engine()) dagg-MDT0002-osd: lfsck_namespace LFSCK assistant thread post 00100000:10000000:3.0:1530524123.232035:0:375744:0:(lfsck_lib.c:2589:lfsck_post_generic()) dagg-MDT0002-osd: the assistant has done lfsck_namespace post, rc = 1 00100000:10000000:5.0F:1530524123.232052:0:17117:0:(lfsck_namespace.c:4721:lfsck_namespace_in_notify()) dagg-MDT0001-osd: namespace LFSCK handles notify 3 from MDT 2, status 1, flags 3 00100000:10000000:13.0F:1530524123.232053:0:24114:0:(lfsck_namespace.c:4721:lfsck_namespace_in_notify()) dagg-MDT0000-osd: namespace LFSCK handles notify 3 from MDT 2, status 1, flags 2 00100000:10000000:3.0:1530524123.232053:0:375744:0:(lfsck_namespace.c:4397:lfsck_namespace_post()) dagg-MDT0002-osd: namespace LFSCK post done: rc = 0 00100000:10000000:3.0:1530524123.232054:0:375744:0:(lfsck_lib.c:2607:lfsck_double_scan_generic()) dagg-MDT0002-osd: waiting for assistant to do lfsck_namespace double_scan, status 2 00100000:10000000:1.0F:1530524123.232061:0:375552:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0000-osd: namespace LFSCK phase2 scan start 00100000:10000000:9.0:1530524123.232068:0:377274:0:(lfsck_engine.c:1669:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant notified others for lfsck_namespace post: rc = 0 00100000:10000000:9.0:1530524123.232070:0:377274:0:(lfsck_engine.c:1687:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant sync before the second-stage scaning 00100000:10000000:3.0:1530524123.232072:0:375744:0:(lfsck_lib.c:2617:lfsck_double_scan_generic()) dagg-MDT0002-osd: the assistant has done lfsck_namespace double_scan, status 0 00100000:10000000:11.0F:1530524123.232106:0:377219:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0001-osd: namespace LFSCK phase2 scan start 00100000:10000000:1.0:1530524123.232204:0:375552:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0000-osd: stop to scan backend /lost+found: rc = 0 00100000:10000000:11.0:1530524123.232339:0:377219:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0001-osd: stop to scan backend /lost+found: rc = 0 00100000:10000000:11.0:1530524123.463692:0:377274:0:(lfsck_engine.c:1694:lfsck_assistant_engine()) dagg-MDT0002-osd: LFSCK assistant phase2 scan start, synced: rc = 0 00100000:10000000:11.0:1530524123.463694:0:377274:0:(lfsck_namespace.c:6229:lfsck_namespace_assistant_handler_p2()) dagg-MDT0002-osd: namespace LFSCK phase2 scan start 00100000:10000000:11.0:1530524123.463722:0:377274:0:(lfsck_namespace.c:5944:lfsck_namespace_scan_local_lpf()) dagg-MDT0002-osd: stop to scan backend /lost+found: rc = 0 00100000:10000000:3.0F:1530524329.684216:0:375552:0:(lfsck_namespace.c:2253:lfsck_namespace_repair_unmatched_pairs()) dagg-MDT0000-osd: namespace LFSCK rebuild dotdot name entry for the object [0x20001f034:0xb6c2:0x0], new parent [0x68002f9d2:0x866d:0x0]: rc = 1 00100000:10000000:2.0F:1530524869.309940:0:377274:0:(lfsck_namespace.c:1217:lfsck_namespace_insert_normal()) dagg-MDT0002-osd: namespace LFSCK insert object [0x68002c327:0xc8c9:0x0] with the name 00000001000002CB00000083 and type 100000 to the parent [0x2000182d2:0x2c6d:0x0]: rc = 1 cheers, robin

            People

              laisiyao Lai Siyao
              scadmin SC Admin
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: