Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0, Lustre 2.10.5
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      Hi,

      we tested warble1 hardware all we could for about a week and found no hardware issues. we also replaced sas cards and cables just to be safe.

      warble1 now is 3.10.0-693.21.1.el7.x86_64 and zfs 0.7.8 and has these patches applied

      usr/src/lustre-2.10.3/lu10212-estale.patch
      usr/src/lustre-2.10.3/lu10707-ksocklnd-revert-jiffies.patch
      usr/src/lustre-2.10.3/lu10707-lnet-route-jiffies.patch
      usr/src/lustre-2.10.3/lu10887-lfsck.patch
      usr/src/lustre-2.10.3/lu8990-put-root.patch
      

      when the dagg MDT's were mounted on warble1 they COMPLETED ok and then about 5 seconds later it hit an LBUG in lfsck.

      ...
      2018-05-02 22:06:06 [ 2919.828067] Lustre: dagg-MDT0000: Client 22c84389-af1f-9970-0e9b-70c3a4861afd (at 10.8.49.155@tcp201) reconnecting
      2018-05-02 22:06:06 [ 2919.828113] Lustre: dagg-MDT0002: Recovery already passed deadline 0:31. If you do not want to wait more, please abort the recovery by force.
      2018-05-02 22:06:38 [ 2951.686211] Lustre: dagg-MDT0002: recovery is timed out, evict stale exports
      2018-05-02 22:06:38 [ 2951.694197] Lustre: dagg-MDT0002: disconnecting 1 stale clients
      2018-05-02 22:06:38 [ 2951.736799] Lustre: 24680:0:(ldlm_lib.c:2544:target_recovery_thread()) too long recovery - read logs
      2018-05-02 22:06:38 [ 2951.746774] Lustre: dagg-MDT0002: Recovery over after 6:24, of 125 clients 124 recovered and 1 was evicted.
      2018-05-02 22:06:38 [ 2951.746775] LustreError: dumping log to /tmp/lustre-log.1525262798.24680
      2018-05-02 22:06:44 [ 2957.910031] LustreError: 33236:0:(dt_object.c:213:dt_mode_to_dft()) LBUG
      2018-05-02 22:06:44 [ 2957.917615] Pid: 33236, comm: lfsck_namespace
      2018-05-02 22:06:44 [ 2957.922760]
      2018-05-02 22:06:44 [ 2957.922760] Call Trace:
      2018-05-02 22:06:44 [ 2957.928142]  [<ffffffffc06457ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      2018-05-02 22:06:44 [ 2957.935374]  [<ffffffffc064583c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      2018-05-02 22:06:44 [ 2957.942270]  [<ffffffffc0d82573>] dt_mode_to_dft+0x73/0x80 [obdclass]
      2018-05-02 22:06:44 [ 2957.949398]  [<ffffffffc115ac81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
      2018-05-02 22:06:44 [ 2957.957911]  [<ffffffffc0d7ea22>] ? htable_lookup+0x102/0x180 [obdclass]
      2018-05-02 22:06:44 [ 2957.965289]  [<ffffffffc1186f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
      2018-05-02 22:06:44 [ 2957.974129]  [<ffffffffc115ce71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
      2018-05-02 22:06:44 [ 2957.983217]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
      2018-05-02 22:06:44 [ 2957.989375]  [<ffffffffc114098e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
      2018-05-02 22:06:44 [ 2957.997154]  [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0
      2018-05-02 22:06:44 [ 2958.003538]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
      2018-05-02 22:06:44 [ 2958.009648]  [<ffffffff810c7c70>] ? default_wake_function+0x0/0x20
      2018-05-02 22:06:44 [ 2958.016449]  [<ffffffffc11405c0>] ? lfsck_assistant_engine+0x0/0x20b0 [lfsck]
      2018-05-02 22:06:44 [ 2958.024186]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      2018-05-02 22:06:44 [ 2958.029662]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      2018-05-02 22:06:44 [ 2958.035220]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
      2018-05-02 22:06:44 [ 2958.041197]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      2018-05-02 22:06:44 [ 2958.046723]
      2018-05-02 22:06:44 [ 2958.048771] Kernel panic - not syncing: LBUG
      2018-05-02 22:06:44 [ 2958.053576] CPU: 2 PID: 33236 Comm: lfsck_namespace Tainted: P           OE  ------------   3.10.0-693.21.1.el7.x86_64 #1
      2018-05-02 22:06:44 [ 2958.065051] Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 1.3.7 02/08/2018
      2018-05-02 22:06:44 [ 2958.073066] Call Trace:
      2018-05-02 22:06:44 [ 2958.076060]  [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
      2018-05-02 22:06:44 [ 2958.081738]  [<ffffffff816a8634>] panic+0xe8/0x21f
      2018-05-02 22:06:44 [ 2958.087058]  [<ffffffffc0645854>] lbug_with_loc+0x64/0xb0 [libcfs]
      2018-05-02 22:06:44 [ 2958.093781]  [<ffffffffc0d82573>] dt_mode_to_dft+0x73/0x80 [obdclass]
      2018-05-02 22:06:44 [ 2958.100741]  [<ffffffffc115ac81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
      2018-05-02 22:06:45 [ 2958.109091]  [<ffffffffc0d7ea22>] ? htable_lookup+0x102/0x180 [obdclass]
      2018-05-02 22:06:45 [ 2958.116294]  [<ffffffffc1186f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
      2018-05-02 22:06:45 [ 2958.124963]  [<ffffffffc115ce71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
      2018-05-02 22:06:45 [ 2958.133889]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
      2018-05-02 22:06:45 [ 2958.139866]  [<ffffffffc114098e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
      2018-05-02 22:06:45 [ 2958.147492]  [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0
      2018-05-02 22:06:45 [ 2958.153724]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
      2018-05-02 22:06:45 [ 2958.159689]  [<ffffffff810c7c70>] ? wake_up_state+0x20/0x20
      2018-05-02 22:06:45 [ 2958.165742]  [<ffffffffc11405c0>] ? lfsck_master_engine+0x1310/0x1310 [lfsck]
      2018-05-02 22:06:45 [ 2958.173343]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      2018-05-02 22:06:45 [ 2958.178685]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      2018-05-02 22:06:45 [ 2958.185227]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
      2018-05-02 22:06:45 [ 2958.191065]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      2018-05-02 22:06:45 [ 2958.197613] Kernel Offset: disabled
      

      I've failed the MDT's back to warble2 and mounted them by hand with -o skip_lfsck

      cheers,
      robin

      Attachments

        Issue Links

          Activity

            [LU-10988] LBUG in lfsck
            scadmin SC Admin added a comment -

            yup, that worked. thanks!

            lfsck on the MDT1,MDT2 where it was 'crashed' has now completed:

            [warble1]root:   lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: stopped
            flags:
            param: all_targets,create_mdtobj
            last_completed_time: 1523258154
            time_since_last_completed: 2084076 seconds
            latest_start_time: 1523283745
            time_since_latest_start: 2058485 seconds
            last_checkpoint_time: 1523284186
            time_since_last_checkpoint: 2058044 seconds
            latest_start_position: 3643308, [0x20001072c:0x16502:0x0], 0x75092e6bed520000
            last_checkpoint_position: 3643308, [0x20001072c:0x16502:0x0], 0x750934ffbf07ffff
            first_failure_position: 3643307, N/A, N/A
            checked_phase1: 2426567
            checked_phase2: 0
            updated_phase1: 0
            updated_phase2: 0
            failed_phase1: 1
            failed_phase2: 0
            directories: 202176
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 14706
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 0
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 0
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 1
            striped_shards_scanned: 142443
            striped_shards_repaired: 0
            striped_shards_failed: 0
            striped_shards_skipped: 0
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            success_count: 5
            run_time_phase1: 180 seconds
            run_time_phase2: 0 seconds
            average_speed_phase1: 13480 items/sec
            average_speed_phase2: 0 objs/sec
            average_speed_total: 13480 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: completed
            flags:
            param: all_targets,create_mdtobj
            last_completed_time: 1525341840
            time_since_last_completed: 390 seconds
            latest_start_time: 1525339672
            time_since_latest_start: 2558 seconds
            last_checkpoint_time: 1525341840
            time_since_last_checkpoint: 390 seconds
            latest_start_position: 2934356, [0x28001c941:0x9c24:0x0], 0x0
            last_checkpoint_position: 35184372088832, N/A, N/A
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 7946529
            checked_phase2: 3312630
            updated_phase1: 1
            updated_phase2: 1
            failed_phase1: 0
            failed_phase2: 0
            directories: 1414222
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 52655
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 3
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 1
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 1060672
            striped_shards_repaired: 0
            striped_shards_failed: 0
            striped_shards_skipped: 0
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            success_count: 6
            run_time_phase1: 1488 seconds
            run_time_phase2: 931 seconds
            average_speed_phase1: 5340 items/sec
            average_speed_phase2: 3558 objs/sec
            average_speed_total: 4654 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            name: lfsck_namespace
            magic: 0xa0621a0b
            version: 2
            status: completed
            flags:
            param: all_targets,create_mdtobj
            last_completed_time: 1525341827
            time_since_last_completed: 403 seconds
            latest_start_time: 1525339070
            time_since_latest_start: 3160 seconds
            last_checkpoint_time: 1525341827
            time_since_last_checkpoint: 403 seconds
            latest_start_position: 2965156, N/A, N/A
            last_checkpoint_position: 35184372088832, N/A, N/A
            first_failure_position: N/A, N/A, N/A
            checked_phase1: 7917065
            checked_phase2: 3318839
            updated_phase1: 0
            updated_phase2: 1
            failed_phase1: 0
            failed_phase2: 0
            directories: 1425028
            dirent_repaired: 0
            linkea_repaired: 0
            nlinks_repaired: 0
            multiple_linked_checked: 52856
            multiple_linked_repaired: 0
            unknown_inconsistency: 0
            unmatched_pairs_repaired: 0
            dangling_repaired: 0
            multiple_referenced_repaired: 0
            bad_file_type_repaired: 0
            lost_dirent_repaired: 1
            local_lost_found_scanned: 0
            local_lost_found_moved: 0
            local_lost_found_skipped: 0
            local_lost_found_failed: 0
            striped_dirs_scanned: 0
            striped_dirs_repaired: 0
            striped_dirs_failed: 0
            striped_dirs_disabled: 0
            striped_dirs_skipped: 0
            striped_shards_scanned: 1068469
            striped_shards_repaired: 0
            striped_shards_failed: 0
            striped_shards_skipped: 0
            name_hash_repaired: 0
            linkea_overflow_cleared: 0
            success_count: 6
            run_time_phase1: 2016 seconds
            run_time_phase2: 919 seconds
            average_speed_phase1: 3927 items/sec
            average_speed_phase2: 3611 objs/sec
            average_speed_total: 3828 items/sec
            real_time_speed_phase1: N/A
            real_time_speed_phase2: N/A
            current_position: N/A
            

            MDT0 still needs to be re-scanned I think though 'cos it was 'stopped' as part of the crashing stuff in LU-10887

            I'll start a whole new scan again now...

            lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r -C
            

            cheers,
            robin

            scadmin SC Admin added a comment - yup, that worked. thanks! lfsck on the MDT1,MDT2 where it was 'crashed' has now completed: [warble1]root: lctl get_param -n mdd.dagg-MDT000*.lfsck_namespace name: lfsck_namespace magic: 0xa0621a0b version: 2 status: stopped flags: param: all_targets,create_mdtobj last_completed_time: 1523258154 time_since_last_completed: 2084076 seconds latest_start_time: 1523283745 time_since_latest_start: 2058485 seconds last_checkpoint_time: 1523284186 time_since_last_checkpoint: 2058044 seconds latest_start_position: 3643308, [0x20001072c:0x16502:0x0], 0x75092e6bed520000 last_checkpoint_position: 3643308, [0x20001072c:0x16502:0x0], 0x750934ffbf07ffff first_failure_position: 3643307, N/A, N/A checked_phase1: 2426567 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 1 failed_phase2: 0 directories: 202176 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 14706 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 0 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 1 striped_shards_scanned: 142443 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 success_count: 5 run_time_phase1: 180 seconds run_time_phase2: 0 seconds average_speed_phase1: 13480 items/sec average_speed_phase2: 0 objs/sec average_speed_total: 13480 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: param: all_targets,create_mdtobj last_completed_time: 1525341840 time_since_last_completed: 390 seconds latest_start_time: 1525339672 time_since_latest_start: 2558 seconds last_checkpoint_time: 1525341840 time_since_last_checkpoint: 390 seconds latest_start_position: 2934356, [0x28001c941:0x9c24:0x0], 0x0 last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 7946529 checked_phase2: 3312630 updated_phase1: 1 updated_phase2: 1 failed_phase1: 0 failed_phase2: 0 directories: 1414222 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 52655 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 3 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 1 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 1060672 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 success_count: 6 run_time_phase1: 1488 seconds run_time_phase2: 931 seconds average_speed_phase1: 5340 items/sec average_speed_phase2: 3558 objs/sec average_speed_total: 4654 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A name: lfsck_namespace magic: 0xa0621a0b version: 2 status: completed flags: param: all_targets,create_mdtobj last_completed_time: 1525341827 time_since_last_completed: 403 seconds latest_start_time: 1525339070 time_since_latest_start: 3160 seconds last_checkpoint_time: 1525341827 time_since_last_checkpoint: 403 seconds latest_start_position: 2965156, N/A, N/A last_checkpoint_position: 35184372088832, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 7917065 checked_phase2: 3318839 updated_phase1: 0 updated_phase2: 1 failed_phase1: 0 failed_phase2: 0 directories: 1425028 dirent_repaired: 0 linkea_repaired: 0 nlinks_repaired: 0 multiple_linked_checked: 52856 multiple_linked_repaired: 0 unknown_inconsistency: 0 unmatched_pairs_repaired: 0 dangling_repaired: 0 multiple_referenced_repaired: 0 bad_file_type_repaired: 0 lost_dirent_repaired: 1 local_lost_found_scanned: 0 local_lost_found_moved: 0 local_lost_found_skipped: 0 local_lost_found_failed: 0 striped_dirs_scanned: 0 striped_dirs_repaired: 0 striped_dirs_failed: 0 striped_dirs_disabled: 0 striped_dirs_skipped: 0 striped_shards_scanned: 1068469 striped_shards_repaired: 0 striped_shards_failed: 0 striped_shards_skipped: 0 name_hash_repaired: 0 linkea_overflow_cleared: 0 success_count: 6 run_time_phase1: 2016 seconds run_time_phase2: 919 seconds average_speed_phase1: 3927 items/sec average_speed_phase2: 3611 objs/sec average_speed_total: 3828 items/sec real_time_speed_phase1: N/A real_time_speed_phase2: N/A current_position: N/A MDT0 still needs to be re-scanned I think though 'cos it was 'stopped' as part of the crashing stuff in LU-10887 I'll start a whole new scan again now... lctl lfsck_start -M dagg-MDT0000 -t namespace -A -r -C cheers, robin

            scadmin, thanks for the new test results. I have updated the patch ttps://review.whamcloud.com/32245. That should have fixed your trouble. Would you please to try again (set 2 or newer)? Thanks!

            yong.fan nasf (Inactive) added a comment - scadmin , thanks for the new test results. I have updated the patch ttps://review.whamcloud.com/32245. That should have fixed your trouble. Would you please to try again (set 2 or newer)? Thanks!
            scadmin SC Admin added a comment -

            no probs.

            2018-05-03 01:02:53 [10209.454437] LustreError: 35645:0:(dt_object.c:213:dt_mode_to_dft()) ASSERTION( 0 ) failed: invalid mode 0
            2018-05-03 01:02:53 [10209.464798] LustreError: 35645:0:(dt_object.c:213:dt_mode_to_dft()) LBUG
            2018-05-03 01:02:53 [10209.472260] Pid: 35645, comm: lfsck_namespace
            2018-05-03 01:02:53 [10209.477336] 
            2018-05-03 01:02:53 [10209.477336] Call Trace:
            2018-05-03 01:02:53 [10209.482611]  [<ffffffffc05f67ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
            2018-05-03 01:02:53 [10209.489784]  [<ffffffffc05f683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
            2018-05-03 01:02:53 [10209.496628]  [<ffffffffc0b275a1>] dt_mode_to_dft+0xa1/0xb0 [obdclass]
            2018-05-03 01:02:53 [10209.503721]  [<ffffffffc14b5c81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck]
            2018-05-03 01:02:53 [10209.512193]  [<ffffffffc0b23a22>] ? htable_lookup+0x102/0x180 [obdclass]
            2018-05-03 01:02:53 [10209.519533]  [<ffffffffc14e1f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck]
            2018-05-03 01:02:53 [10209.528336]  [<ffffffffc14b7e71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck]
            2018-05-03 01:02:53 [10209.537370]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
            2018-05-03 01:02:53 [10209.543458]  [<ffffffffc149b98e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck]
            2018-05-03 01:02:53 [10209.551182]  [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0
            2018-05-03 01:02:53 [10209.557512]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
            2018-05-03 01:02:53 [10209.563565]  [<ffffffff810c7c70>] ? default_wake_function+0x0/0x20
            2018-05-03 01:02:53 [10209.570305]  [<ffffffffc149b5c0>] ? lfsck_assistant_engine+0x0/0x20b0 [lfsck]
            2018-05-03 01:02:53 [10209.578004]  [<ffffffff810b4031>] kthread+0xd1/0xe0
            2018-05-03 01:02:53 [10209.583438]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
            2018-05-03 01:02:53 [10209.588968]  [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0
            2018-05-03 01:02:53 [10209.594914]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
            2018-05-03 01:02:53 [10209.600428] 
            2018-05-03 01:02:53 [10209.602473] Kernel panic - not syncing: LBUG
            

            cheers,
            robin

            scadmin SC Admin added a comment - no probs. 2018-05-03 01:02:53 [10209.454437] LustreError: 35645:0:(dt_object.c:213:dt_mode_to_dft()) ASSERTION( 0 ) failed: invalid mode 0 2018-05-03 01:02:53 [10209.464798] LustreError: 35645:0:(dt_object.c:213:dt_mode_to_dft()) LBUG 2018-05-03 01:02:53 [10209.472260] Pid: 35645, comm: lfsck_namespace 2018-05-03 01:02:53 [10209.477336] 2018-05-03 01:02:53 [10209.477336] Call Trace: 2018-05-03 01:02:53 [10209.482611] [<ffffffffc05f67ae>] libcfs_call_trace+0x4e/0x60 [libcfs] 2018-05-03 01:02:53 [10209.489784] [<ffffffffc05f683c>] lbug_with_loc+0x4c/0xb0 [libcfs] 2018-05-03 01:02:53 [10209.496628] [<ffffffffc0b275a1>] dt_mode_to_dft+0xa1/0xb0 [obdclass] 2018-05-03 01:02:53 [10209.503721] [<ffffffffc14b5c81>] lfsck_namespace_repair_dangling+0x621/0xf40 [lfsck] 2018-05-03 01:02:53 [10209.512193] [<ffffffffc0b23a22>] ? htable_lookup+0x102/0x180 [obdclass] 2018-05-03 01:02:53 [10209.519533] [<ffffffffc14e1f4a>] lfsck_namespace_striped_dir_rescan+0x86a/0x1220 [lfsck] 2018-05-03 01:02:53 [10209.528336] [<ffffffffc14b7e71>] lfsck_namespace_assistant_handler_p1+0x18d1/0x1f40 [lfsck] 2018-05-03 01:02:53 [10209.537370] [<ffffffff8102954d>] ? __switch_to+0xcd/0x500 2018-05-03 01:02:53 [10209.543458] [<ffffffffc149b98e>] lfsck_assistant_engine+0x3ce/0x20b0 [lfsck] 2018-05-03 01:02:53 [10209.551182] [<ffffffff810cb0b5>] ? sched_clock_cpu+0x85/0xc0 2018-05-03 01:02:53 [10209.557512] [<ffffffff8102954d>] ? __switch_to+0xcd/0x500 2018-05-03 01:02:53 [10209.563565] [<ffffffff810c7c70>] ? default_wake_function+0x0/0x20 2018-05-03 01:02:53 [10209.570305] [<ffffffffc149b5c0>] ? lfsck_assistant_engine+0x0/0x20b0 [lfsck] 2018-05-03 01:02:53 [10209.578004] [<ffffffff810b4031>] kthread+0xd1/0xe0 2018-05-03 01:02:53 [10209.583438] [<ffffffff810b3f60>] ? kthread+0x0/0xe0 2018-05-03 01:02:53 [10209.588968] [<ffffffff816c055d>] ret_from_fork+0x5d/0xb0 2018-05-03 01:02:53 [10209.594914] [<ffffffff810b3f60>] ? kthread+0x0/0xe0 2018-05-03 01:02:53 [10209.600428] 2018-05-03 01:02:53 [10209.602473] Kernel panic - not syncing: LBUG cheers, robin

            scadmin, would you please to apply the patch https://review.whamcloud.com/32245? that will help us to understand what the bad file "mode" is when hit the trouble. Thanks!

            yong.fan nasf (Inactive) added a comment - scadmin , would you please to apply the patch https://review.whamcloud.com/32245? that will help us to understand what the bad file "mode" is when hit the trouble. Thanks!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/32245
            Subject: LU-10988 obdclass: show invalid mode for dt_mode_to_dft
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b4b453e5a8d4ff2cb51b4b261b54457d3258b8c6

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/32245 Subject: LU-10988 obdclass: show invalid mode for dt_mode_to_dft Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b4b453e5a8d4ff2cb51b4b261b54457d3258b8c6
            scadmin SC Admin added a comment -

            yes, that patch is what I've called lu10887-lfsck.patch and is applied.

            cheers,
            robin

            scadmin SC Admin added a comment - yes, that patch is what I've called lu10887-lfsck.patch and is applied. cheers, robin
            pjones Peter Jones added a comment -

            usr/src/lustre-2.10.3/lu10887-lfsck.patch

            So you have applied the patch https://review.whamcloud.com/31915/ on the MDT, right? If yes, there should be other corners for lfsck_namespace_repair_dangling() that missed to set the “mode”. I will investigate more. Anyway, the existed patches are still valid.

            pjones Peter Jones added a comment - usr/src/lustre-2.10.3/lu10887-lfsck.patch So you have applied the patch https://review.whamcloud.com/31915/ on the MDT, right? If yes, there should be other corners for lfsck_namespace_repair_dangling() that missed to set the “mode”. I will investigate more. Anyway, the existed patches are still valid.

            People

              yong.fan nasf (Inactive)
              scadmin SC Admin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: