Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14175

OI Scrub triggered followed by LBUG ASSERTION( idx1 == 0 || idx1 == osd->od_index ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • Lustre 2.12.5
    • None
    • CentOS 7.6
    • 2
    • 9223372036854775807

    Description

      I'm opening this with Sev2 as we have an OST down on Oak. Indeed we have a problem this morning with one OST on Oak (note that Oak has been upgraded to 2.12.5 from 2.10 recently):

       

      Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: Recovery over after 2:42, of 1789 clients 1631 recovered and 158 were evicted.
      Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: Skipped 3 previous similar messages
      Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: deleting orphan objects from 0x10400013a0:371764 to 0x10400013a0:371809
      Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: deleting orphan objects from 0x1040000bd0:3790954 to 0x1040000bd0:3790977
      Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: deleting orphan objects from 0x0:33786809 to 0x0:33786849
      Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: deleting orphan objects from 0x1040000400:3170249 to 0x1040000400:3170273
      Dec 02 09:13:02 oak-io1-s2 kernel: Lustre: oak-OST000b: trigger OI scrub by RPC for the [0x1000b0000:0x10c759a:0x0] with flags 0x4a, rc = 0
      
      [root@oak-io1-s2 ~]# lctl get_param -n osd-ldiskfs.oak-OST000b.oi_scrub
      name: OI_scrub
      magic: 0x4c5fd252
      oi_files: 64
      status: scanning
      flags: auto
      param:
      time_since_last_completed: N/A
      time_since_latest_start: 16 seconds
      time_since_last_checkpoint: 16 seconds
      latest_start_position: 12
      last_checkpoint_position: 11
      first_failure_position: N/A
      checked: 1186
      updated: 0
      failed: 0
      prior_updated: 0
      noscrub: 4
      igif: 0
      success_count: 0
      run_time: 16 seconds
      average_speed: 74 objects/sec
      real-time_speed: 74 objects/sec
      current_position: 1263
      scrub_in_prior: no
      scrub_full_speed: yes
      partial_scan: no
      lf_scanned: 0
      lf_repaired: 0
      lf_failed: 0
      [root@oak-io1-s2 ~]# 
      Message from syslogd@oak-io1-s2 at Dec  2 09:13:19 ...
       kernel:LustreError: 291930:0:(osd_compat.c:701:osd_obj_update_entry()) ASSERTION( idx1 == 0 || idx1 == osd->od_index ) failed: invalid given FID [0x1000a0000:0x1d37dd1:0x0], not match the device index 11
      
      Message from syslogd@oak-io1-s2 at Dec  2 09:13:19 ...
       kernel:LustreError: 291930:0:(osd_compat.c:701:osd_obj_update_entry()) LBUG
      

      The backtrace is:

      Dec  2 03:41:08 oak-io1-s2 kernel: LustreError: 255421:0:(osd_compat.c:701:osd_obj_update_entry()) ASSERTION( idx1 == 0 || idx1 == osd->od_index ) failed: invalid given FID [0x1000a0000:0x1d37dd1:0x0], not match the device index 11
      Dec  2 03:41:08 oak-io1-s2 kernel: LustreError: 255421:0:(osd_compat.c:701:osd_obj_update_entry()) LBUG
      Dec  2 03:41:08 oak-io1-s2 kernel: Pid: 255421, comm: OI_scrub 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
      Dec  2 03:41:08 oak-io1-s2 kernel: Call Trace:
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc0b3e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc0b3e87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc1458149>] osd_obj_update_entry+0x969/0x980 [osd_ldiskfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc145a8a0>] osd_obj_map_update+0x1a0/0x340 [osd_ldiskfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc14471a9>] osd_oi_update+0x69/0x290 [osd_ldiskfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc145c71c>] osd_scrub_refresh_mapping+0x27c/0x440 [osd_ldiskfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc14611e0>] osd_scrub_check_update+0x280/0x10f0 [osd_ldiskfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc14620b5>] osd_scrub_exec+0x65/0x4f0 [osd_ldiskfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc14629e8>] osd_inode_iteration+0x4a8/0xcf0 [osd_ldiskfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffc1463ad9>] osd_scrub_main+0x8a9/0xe40 [osd_ldiskfs]
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffaa4c2e81>] kthread+0xd1/0xe0
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffaab77c37>] ret_from_fork_nospec_end+0x0/0x39
      Dec  2 03:41:08 oak-io1-s2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
      

      We ran fsck on the device and then the issue occurred again:

      Dec  2 09:13:19 oak-io1-s2 kernel: LustreError: 291930:0:(osd_compat.c:701:osd_obj_update_entry()) ASSERTION( idx1 == 0 || idx1 == osd->od_index ) failed: invalid given FID [0x1000a0000:0x1d37dd1:0x0], not match the device index 11
      Dec  2 09:13:19 oak-io1-s2 kernel: LustreError: 291930:0:(osd_compat.c:701:osd_obj_update_entry()) LBUG
      Dec  2 09:13:19 oak-io1-s2 kernel: Pid: 291930, comm: OI_scrub 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
      Dec  2 09:13:19 oak-io1-s2 kernel: Call Trace:
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc0cbe7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc0cbe87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15c6149>] osd_obj_update_entry+0x969/0x980 [osd_ldiskfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15c88a0>] osd_obj_map_update+0x1a0/0x340 [osd_ldiskfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15b51a9>] osd_oi_update+0x69/0x290 [osd_ldiskfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15ca71c>] osd_scrub_refresh_mapping+0x27c/0x440 [osd_ldiskfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15cf1e0>] osd_scrub_check_update+0x280/0x10f0 [osd_ldiskfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15d00b5>] osd_scrub_exec+0x65/0x4f0 [osd_ldiskfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15d09e8>] osd_inode_iteration+0x4a8/0xcf0 [osd_ldiskfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15d1ad9>] osd_scrub_main+0x8a9/0xe40 [osd_ldiskfs]
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffbcac2e81>] kthread+0xd1/0xe0
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffbd177c37>] ret_from_fork_nospec_end+0x0/0x39
      Dec  2 09:13:19 oak-io1-s2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
      

      Do you have an idea on how to find which file it is? I'm thinking on remounting with noscrub to avoid the LBUG, that will be my next step.

      Thanks!
      Stephane

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: