Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.12.5
-
None
-
CentOS 7.6
-
2
-
9223372036854775807
Description
I'm opening this with Sev2 as we have an OST down on Oak. Indeed we have a problem this morning with one OST on Oak (note that Oak has been upgraded to 2.12.5 from 2.10 recently):
Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: Recovery over after 2:42, of 1789 clients 1631 recovered and 158 were evicted. Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: Skipped 3 previous similar messages Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: deleting orphan objects from 0x10400013a0:371764 to 0x10400013a0:371809 Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: deleting orphan objects from 0x1040000bd0:3790954 to 0x1040000bd0:3790977 Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: deleting orphan objects from 0x0:33786809 to 0x0:33786849 Dec 02 09:13:01 oak-io1-s2 kernel: Lustre: oak-OST000b: deleting orphan objects from 0x1040000400:3170249 to 0x1040000400:3170273 Dec 02 09:13:02 oak-io1-s2 kernel: Lustre: oak-OST000b: trigger OI scrub by RPC for the [0x1000b0000:0x10c759a:0x0] with flags 0x4a, rc = 0
[root@oak-io1-s2 ~]# lctl get_param -n osd-ldiskfs.oak-OST000b.oi_scrub name: OI_scrub magic: 0x4c5fd252 oi_files: 64 status: scanning flags: auto param: time_since_last_completed: N/A time_since_latest_start: 16 seconds time_since_last_checkpoint: 16 seconds latest_start_position: 12 last_checkpoint_position: 11 first_failure_position: N/A checked: 1186 updated: 0 failed: 0 prior_updated: 0 noscrub: 4 igif: 0 success_count: 0 run_time: 16 seconds average_speed: 74 objects/sec real-time_speed: 74 objects/sec current_position: 1263 scrub_in_prior: no scrub_full_speed: yes partial_scan: no lf_scanned: 0 lf_repaired: 0 lf_failed: 0 [root@oak-io1-s2 ~]# Message from syslogd@oak-io1-s2 at Dec 2 09:13:19 ... kernel:LustreError: 291930:0:(osd_compat.c:701:osd_obj_update_entry()) ASSERTION( idx1 == 0 || idx1 == osd->od_index ) failed: invalid given FID [0x1000a0000:0x1d37dd1:0x0], not match the device index 11 Message from syslogd@oak-io1-s2 at Dec 2 09:13:19 ... kernel:LustreError: 291930:0:(osd_compat.c:701:osd_obj_update_entry()) LBUG
The backtrace is:
Dec 2 03:41:08 oak-io1-s2 kernel: LustreError: 255421:0:(osd_compat.c:701:osd_obj_update_entry()) ASSERTION( idx1 == 0 || idx1 == osd->od_index ) failed: invalid given FID [0x1000a0000:0x1d37dd1:0x0], not match the device index 11 Dec 2 03:41:08 oak-io1-s2 kernel: LustreError: 255421:0:(osd_compat.c:701:osd_obj_update_entry()) LBUG Dec 2 03:41:08 oak-io1-s2 kernel: Pid: 255421, comm: OI_scrub 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Dec 2 03:41:08 oak-io1-s2 kernel: Call Trace: Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc0b3e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc0b3e87c>] lbug_with_loc+0x4c/0xa0 [libcfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc1458149>] osd_obj_update_entry+0x969/0x980 [osd_ldiskfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc145a8a0>] osd_obj_map_update+0x1a0/0x340 [osd_ldiskfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc14471a9>] osd_oi_update+0x69/0x290 [osd_ldiskfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc145c71c>] osd_scrub_refresh_mapping+0x27c/0x440 [osd_ldiskfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc14611e0>] osd_scrub_check_update+0x280/0x10f0 [osd_ldiskfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc14620b5>] osd_scrub_exec+0x65/0x4f0 [osd_ldiskfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc14629e8>] osd_inode_iteration+0x4a8/0xcf0 [osd_ldiskfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffc1463ad9>] osd_scrub_main+0x8a9/0xe40 [osd_ldiskfs] Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffaa4c2e81>] kthread+0xd1/0xe0 Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffaab77c37>] ret_from_fork_nospec_end+0x0/0x39 Dec 2 03:41:08 oak-io1-s2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
We ran fsck on the device and then the issue occurred again:
Dec 2 09:13:19 oak-io1-s2 kernel: LustreError: 291930:0:(osd_compat.c:701:osd_obj_update_entry()) ASSERTION( idx1 == 0 || idx1 == osd->od_index ) failed: invalid given FID [0x1000a0000:0x1d37dd1:0x0], not match the device index 11 Dec 2 09:13:19 oak-io1-s2 kernel: LustreError: 291930:0:(osd_compat.c:701:osd_obj_update_entry()) LBUG Dec 2 09:13:19 oak-io1-s2 kernel: Pid: 291930, comm: OI_scrub 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019 Dec 2 09:13:19 oak-io1-s2 kernel: Call Trace: Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc0cbe7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc0cbe87c>] lbug_with_loc+0x4c/0xa0 [libcfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15c6149>] osd_obj_update_entry+0x969/0x980 [osd_ldiskfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15c88a0>] osd_obj_map_update+0x1a0/0x340 [osd_ldiskfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15b51a9>] osd_oi_update+0x69/0x290 [osd_ldiskfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15ca71c>] osd_scrub_refresh_mapping+0x27c/0x440 [osd_ldiskfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15cf1e0>] osd_scrub_check_update+0x280/0x10f0 [osd_ldiskfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15d00b5>] osd_scrub_exec+0x65/0x4f0 [osd_ldiskfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15d09e8>] osd_inode_iteration+0x4a8/0xcf0 [osd_ldiskfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffc15d1ad9>] osd_scrub_main+0x8a9/0xe40 [osd_ldiskfs] Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffbcac2e81>] kthread+0xd1/0xe0 Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffbd177c37>] ret_from_fork_nospec_end+0x0/0x39 Dec 2 09:13:19 oak-io1-s2 kernel: [<ffffffffffffffff>] 0xffffffffffffffff
Do you have an idea on how to find which file it is? I'm thinking on remounting with noscrub to avoid the LBUG, that will be my next step.
Thanks!
Stephane
Attachments
Issue Links
- is related to
-
LU-14119 FID-in-LMA [fid1] does not match the object self-fid [fid2]
- Resolved