Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11111

crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0, Lustre 2.12.1
    • Lustre 2.10.4
    • None
    • x86_64, zfs 0.7.9, centos7.4, 2.10.4 plus the "lfsck: load object attr when prepare LFSCK request" patch from LU-10988
    • 3
    • 9223372036854775807

    Description

      Hi,

      I hit this just now when doing a lfsck for LU-10677.

      lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r -C

      2018-07-02 19:00:24 [699080.215320] Lustre: debug daemon will attempt to start writing to /mnt/root/root/lfsck.debug_daemon.dagg-MDTs.20180702.1 (20480000kB max)
      2018-07-02 19:51:27 [702136.419050] LustreError: 199045:0:(mdd_orphans.c:205:orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ) ) failed: 
      2018-07-02 19:51:27 [702136.432255] LustreError: 199045:0:(mdd_orphans.c:205:orph_index_insert()) LBUG
      2018-07-02 19:51:27 [702136.440349] Pid: 199045, comm: mdt01_013
      2018-07-02 19:51:27 [702136.445132] 
      2018-07-02 19:51:27 [702136.445132] Call Trace:
      2018-07-02 19:51:27 [702136.450734]  [<ffffffffc05267ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      2018-07-02 19:51:27 [702136.458083]  [<ffffffffc052683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      2018-07-02 19:51:27 [702136.465078]  [<ffffffffc11e3827>] __mdd_orphan_add+0x8e7/0xa20 [mdd]
      2018-07-02 19:51:27 [702136.472242]  [<ffffffffc09f720c>] ? osd_write_locked+0x3c/0x60 [osd_zfs]
      2018-07-02 19:51:27 [702136.479747]  [<ffffffffc11d33ce>] mdd_finish_unlink+0x17e/0x410 [mdd]
      2018-07-02 19:51:27 [702136.486988]  [<ffffffffc11d56e4>] mdd_unlink+0xae4/0xbf0 [mdd]
      2018-07-02 19:51:27 [702136.493616]  [<ffffffffc10bc728>] mdt_reint_unlink+0xc28/0x11d0 [mdt]
      2018-07-02 19:51:27 [702136.500846]  [<ffffffffc10bfb33>] mdt_reint_rec+0x83/0x210 [mdt]
      2018-07-02 19:51:27 [702136.507628]  [<ffffffffc10a137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      2018-07-02 19:51:27 [702136.514928]  [<ffffffffc10acf07>] mdt_reint+0x67/0x140 [mdt]
      2018-07-02 19:51:28 [702136.521402]  [<ffffffffc0e752ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      2018-07-02 19:51:28 [702136.529074]  [<ffffffffc0e1de2b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      2018-07-02 19:51:28 [702136.537493]  [<ffffffffc0e1a458>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      2018-07-02 19:51:28 [702136.545015]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      2018-07-02 19:51:28 [702136.552008]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      2018-07-02 19:51:28 [702136.558588]  [<ffffffffc0e21572>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.565605]  [<ffffffffc0e20ae0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.572580]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      2018-07-02 19:51:28 [702136.578163]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      2018-07-02 19:51:28 [702136.583819]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
      2018-07-02 19:51:28 [702136.589900]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      2018-07-02 19:51:28 [702136.595542] 
      2018-07-02 19:51:28 [702136.597694] Kernel panic - not syncing: LBUG
      2018-07-02 19:51:28 [702136.602600] CPU: 9 PID: 199045 Comm: mdt01_013 Tainted: P           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
      2018-07-02 19:51:28 [702136.613744] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.3.7 02/08/2018
      2018-07-02 19:51:28 [702136.621857] Call Trace:
      2018-07-02 19:51:28 [702136.624946]  [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
      2018-07-02 19:51:28 [702136.630704]  [<ffffffff816a8634>] panic+0xe8/0x21f
      2018-07-02 19:51:28 [702136.636117]  [<ffffffffc0526854>] lbug_with_loc+0x64/0xb0 [libcfs]
      2018-07-02 19:51:28 [702136.642927]  [<ffffffffc11e3827>] __mdd_orphan_add+0x8e7/0xa20 [mdd]
      2018-07-02 19:51:28 [702136.649895]  [<ffffffffc09f720c>] ? osd_write_locked+0x3c/0x60 [osd_zfs]
      2018-07-02 19:51:28 [702136.657201]  [<ffffffffc11d33ce>] mdd_finish_unlink+0x17e/0x410 [mdd]
      2018-07-02 19:51:28 [702136.664232]  [<ffffffffc11d56e4>] mdd_unlink+0xae4/0xbf0 [mdd]
      2018-07-02 19:51:28 [702136.670661]  [<ffffffffc10bc728>] mdt_reint_unlink+0xc28/0x11d0 [mdt]
      2018-07-02 19:51:28 [702136.677681]  [<ffffffffc10bfb33>] mdt_reint_rec+0x83/0x210 [mdt]
      2018-07-02 19:51:28 [702136.684247]  [<ffffffffc10a137b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      2018-07-02 19:51:28 [702136.691331]  [<ffffffffc10acf07>] mdt_reint+0x67/0x140 [mdt]
      2018-07-02 19:51:28 [702136.697583]  [<ffffffffc0e752ba>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      2018-07-02 19:51:28 [702136.705025]  [<ffffffffc0e1de2b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      2018-07-02 19:51:28 [702136.713235]  [<ffffffffc0e1a458>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      2018-07-02 19:51:28 [702136.720538]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      2018-07-02 19:51:28 [702136.727314]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      2018-07-02 19:51:28 [702136.733666]  [<ffffffffc0e21572>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      2018-07-02 19:51:28 [702136.740438]  [<ffffffffc0e20ae0>] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
      2018-07-02 19:51:28 [702136.748310]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      2018-07-02 19:51:28 [702136.753661]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      2018-07-02 19:51:28 [702136.760227]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
      2018-07-02 19:51:28 [702136.766108]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      2018-07-02 19:51:28 [702136.772696] Kernel Offset: disabled
      

      the debug_daemon running at the time, but the file is 6.1G and probably truncated because of the crash. let me know if you want it anyway.

      I've mounted the MDT's back up again with -o skip_lfsck.

      I think the crash happened somewhere in phase2 of the namespace lfsck. the lfsck state after the reboot is below.

      [warble2]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528222041
      time_since_last_completed: 2305149 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14945199
      checked_phase2: 4683016
      updated_phase1: 0
      updated_phase2: 1
      failed_phase1: 0
      failed_phase2: 0
      directories: 2205524
      dirent_repaired: 0
      linkea_repaired: 0
      nlinks_repaired: 0
      multiple_linked_checked: 324817
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 1
      dangling_repaired: 92
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639566
      striped_shards_repaired: 1
      striped_shards_failed: 0
      striped_shards_skipped: 3
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 9
      run_time_phase1: 2357 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6340 items/sec
      average_speed_phase2: 4878 objs/sec
      average_speed_total: 5917 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528222004
      time_since_last_completed: 2305186 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14356291
      checked_phase2: 4705808
      updated_phase1: 0
      updated_phase2: 0
      failed_phase1: 0
      failed_phase2: 0
      directories: 2185653
      dirent_repaired: 0
      linkea_repaired: 0
      nlinks_repaired: 0
      multiple_linked_checked: 323999
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 11
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 0
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639415
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 4
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 10
      run_time_phase1: 2374 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6047 items/sec
      average_speed_phase2: 4901 objs/sec
      average_speed_total: 5717 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      name: lfsck_namespace
      magic: 0xa0621a0b
      version: 2
      status: crashed
      flags: scanned-once,inconsistent
      param: all_targets,create_mdtobj
      last_completed_time: 1528221995
      time_since_last_completed: 2305195 seconds
      latest_start_time: 1530521735
      time_since_latest_start: 5455 seconds
      last_checkpoint_time: 1530525083
      time_since_last_checkpoint: 2107 seconds
      latest_start_position: 266, N/A, N/A
      last_checkpoint_position: 35184372088832, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 14368285
      checked_phase2: 4739156
      updated_phase1: 2
      updated_phase2: 1
      failed_phase1: 0
      failed_phase2: 0
      directories: 2187046
      dirent_repaired: 0
      linkea_repaired: 1
      nlinks_repaired: 0
      multiple_linked_checked: 324797
      multiple_linked_repaired: 0
      unknown_inconsistency: 0
      unmatched_pairs_repaired: 0
      dangling_repaired: 88
      multiple_referenced_repaired: 0
      bad_file_type_repaired: 0
      lost_dirent_repaired: 1
      local_lost_found_scanned: 0
      local_lost_found_moved: 0
      local_lost_found_skipped: 0
      local_lost_found_failed: 0
      striped_dirs_scanned: 0
      striped_dirs_repaired: 0
      striped_dirs_failed: 0
      striped_dirs_disabled: 0
      striped_dirs_skipped: 0
      striped_shards_scanned: 1639666
      striped_shards_repaired: 0
      striped_shards_failed: 0
      striped_shards_skipped: 4
      name_hash_repaired: 0
      linkea_overflow_cleared: 0
      success_count: 10
      run_time_phase1: 2387 seconds
      run_time_phase2: 960 seconds
      average_speed_phase1: 6019 items/sec
      average_speed_phase2: 4936 objs/sec
      average_speed_total: 5708 items/sec
      real_time_speed_phase1: N/A
      real_time_speed_phase2: N/A
      current_position: N/A
      

      cheers,
      robin

      Attachments

        Issue Links

          Activity

            [LU-11111] crash doing LFSCK: orph_index_insert()) ASSERTION( !(obj->mod_flags & ORPHAN_OBJ)

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34528/
            Subject: LU-11111 lfsck: skip orphan processing
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 92912099c0ee982d11019b2c586e255654eb9f45

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34528/ Subject: LU-11111 lfsck: skip orphan processing Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 92912099c0ee982d11019b2c586e255654eb9f45

            Tracking remaining patch in LU-12127.

            adilger Andreas Dilger added a comment - Tracking remaining patch in LU-12127 .

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34528
            Subject: LU-11111 lfsck: skip orphan processing
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 9847af4cbd7b2040e65decc589ecee2808eee521

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34528 Subject: LU-11111 lfsck: skip orphan processing Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 9847af4cbd7b2040e65decc589ecee2808eee521

            Cory, that patch resolved the aforementioned race condition (LFSCK on an orphan object that also had a corrupt layout), but it was having problems in testing and I thought it prudent to fix the issue with the simpler (if slightly imperfect) version of the patch from Shadow, which we had eventually gotten working.

            I don't want to abandon Fan yong's patch, since it shows how to fix the issue completely, but I don't think it is a priority to land at this point. I agree that it would probably make sense to open a different ticket and link the patch there, and close this one.

            adilger Andreas Dilger added a comment - Cory, that patch resolved the aforementioned race condition (LFSCK on an orphan object that also had a corrupt layout), but it was having problems in testing and I thought it prudent to fix the issue with the simpler (if slightly imperfect) version of the patch from Shadow, which we had eventually gotten working. I don't want to abandon Fan yong's patch, since it shows how to fix the issue completely, but I don't think it is a priority to land at this point. I agree that it would probably make sense to open a different ticket and link the patch there, and close this one.
            spitzcor Cory Spitz added a comment -

            Since https://review.whamcloud.com/#/c/32959/ was landed after Andreas recommended, "to land this patch as-is, with the knowledge that there is a rare corner case that is missed," should we keep this ticket open or just track the remaining work somewhere else? Either way it seems that https://review.whamcloud.com/#/c/34174/ should be abandoned now.

            spitzcor Cory Spitz added a comment - Since https://review.whamcloud.com/#/c/32959/ was landed after Andreas recommended, "to land this patch as-is, with the knowledge that there is a rare corner case that is missed," should we keep this ticket open or just track the remaining work somewhere else? Either way it seems that https://review.whamcloud.com/#/c/34174/ should be abandoned now.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32959/
            Subject: LU-11111 lfsck: skip orphan processing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 077570483e75e0610fd45149b926097547c434b8

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32959/ Subject: LU-11111 lfsck: skip orphan processing Project: fs/lustre-release Branch: master Current Patch Set: Commit: 077570483e75e0610fd45149b926097547c434b8

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34174
            Subject: LU-11111 lfsck: skip orphan processing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5a7be2a5cd89b67b55daa42725efee62b8e3fb57

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34174 Subject: LU-11111 lfsck: skip orphan processing Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5a7be2a5cd89b67b55daa42725efee62b8e3fb57

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32935/
            Subject: LU-11111 lfsck: print correct parent FID
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3583e1fd998c56977adde9abe9c9a9dc9bd88d58

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32935/ Subject: LU-11111 lfsck: print correct parent FID Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3583e1fd998c56977adde9abe9c9a9dc9bd88d58
            laisiyao Lai Siyao added a comment - - edited

            See lustre manual: https://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.deactivating_mdt_ost

            After you copy all dir/files to MDT0, remove old dirs/files(or you can keep them, but they are not accessible after deactivation, and consume space of MDT0 and OSTs), and permanently deactivate MDT1-2, then you can stop MDT1-2.

            I'm working on LU-11301, will update you when I make progress.

            laisiyao Lai Siyao added a comment - - edited See lustre manual: https://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.deactivating_mdt_ost After you copy all dir/files to MDT0, remove old dirs/files(or you can keep them, but they are not accessible after deactivation, and consume space of MDT0 and OSTs), and permanently deactivate MDT1-2, then you can stop MDT1-2. I'm working on LU-11301 , will update you when I make progress.
            scadmin SC Admin added a comment -

            Hi Lai,

            LU-11301 has by far the highest priority for us.

            it hangs the entire cluster on average about once a week I guess. ie. cluster is down. also it's taken to evicting most clients recently too, which means a total reboot of clients, plus an unknown amount of jobs lost.
            these incidents tend to happen in batches, so we can sometimes go a 2 or 3 weeks without incident and then 3 crashes in a row.
            LU-11082 and LU-11301 have both had much the same symptoms.

            we have inherited striping on all dirs across all 3 MDTs. this is the only DNE config that makes sense to us. how do we know where users are going to put 100's of millions of files across 10's of thousands of dirs? we cannot possibly know this. if we set inherited dir striping then the MDTs work like OSTs, and on average we will get good balance across them. multiple MDTs had some file stat() performance benefits, and we assume performance for the rest of ops will improve over time as DNE matures. AFAIK nobody does more than the usual amount of mkdir/rmdir, which yes, is ultra-slow.

            so if this isn't a supported config or is something that's never going to work well, then a) please make that clear in docs b) there is a window of time while the filesystem isn't over half full yet, and we can take a few weeks or a month to rsync all files (probably <2 PiB) onto an unstriped top level dir on MDT0 (which we left there for this purpose), take a couple of days of downtime to tidy up, and then destroy MDT1,MDT2 and make the MDT0 pool bigger. ie. remove DNE entirely. that's the only other config that makes sense to us. will this approach work?

            on your points ->

            1. cool. thanks.
            2. we applied patch from LU-11062 so stack traces are back. will re-try at some stage.
            3. also being able to corrupt the filesystem with just some mkdir's
            4. LU-11301

            cheers,
            robin

            scadmin SC Admin added a comment - Hi Lai, LU-11301 has by far the highest priority for us. it hangs the entire cluster on average about once a week I guess. ie. cluster is down. also it's taken to evicting most clients recently too, which means a total reboot of clients, plus an unknown amount of jobs lost. these incidents tend to happen in batches, so we can sometimes go a 2 or 3 weeks without incident and then 3 crashes in a row. LU-11082 and LU-11301 have both had much the same symptoms. we have inherited striping on all dirs across all 3 MDTs. this is the only DNE config that makes sense to us. how do we know where users are going to put 100's of millions of files across 10's of thousands of dirs? we cannot possibly know this. if we set inherited dir striping then the MDTs work like OSTs, and on average we will get good balance across them. multiple MDTs had some file stat() performance benefits, and we assume performance for the rest of ops will improve over time as DNE matures. AFAIK nobody does more than the usual amount of mkdir/rmdir, which yes, is ultra-slow. so if this isn't a supported config or is something that's never going to work well, then a) please make that clear in docs b) there is a window of time while the filesystem isn't over half full yet, and we can take a few weeks or a month to rsync all files (probably <2 PiB) onto an unstriped top level dir on MDT0 (which we left there for this purpose), take a couple of days of downtime to tidy up, and then destroy MDT1,MDT2 and make the MDT0 pool bigger. ie. remove DNE entirely. that's the only other config that makes sense to us. will this approach work? on your points -> 1. cool. thanks. 2. we applied patch from LU-11062 so stack traces are back. will re-try at some stage. 3. also being able to corrupt the filesystem with just some mkdir's 4. LU-11301 cheers, robin

            People

              laisiyao Lai Siyao
              scadmin SC Admin
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: