Details
-
Bug
-
Resolution: Unresolved
-
Major
-
Lustre 2.18.0
-
None
-
RHEL 10.0 client and server
-
3
-
9223372036854775807
Description
This issue was created by maloo for jianyu <yujian@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/fe52c8a0-6c56-4985-88f3-7cca6609b99e
test_18a failed with the following error:
== sanity-lfsck test 18a: Find out orphan OST-object and repair it (1) ========================================================== 22:28:12 (1766183292)
#####
The target MDT-object is there, but related stripe information
is lost or partly lost. The LFSCK should regenerate the missing
layout EA entries.
#####
2+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0126913 s, 165 MB/s
[0x200000404:0x73:0x0]
/mnt/lustre/d18a.sanity-lfsck/a1/f1
lmm_stripe_count: 1
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
obdidx objid objid group
0 108 0x6c 0x300000401
2+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.013098 s, 160 MB/s
[0x240000404:0x2:0x0]
/mnt/lustre/d18a.sanity-lfsck/a2/f2
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
obdidx objid objid group
1 2 0x2 0x340000401
0 2 0x2 0x300000403
2+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.0167845 s, 125 MB/s
[0x200000404:0x75:0x0]
/mnt/lustre/d18a.sanity-lfsck/f3
lcm_layout_gen: 3
lcm_mirror_count: 1
lcm_entry_count: 2
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x300000401:0x6d:0x0] }
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 1048576
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
lmm_objects:
- 0: { l_ost_idx: 1, l_fid: [0x340000403:0x2:0x0] }
Inject failure, to make the MDT-object lost its layout EA
CMD: trevis-152vm88 /usr/sbin/lctl set_param fail_loc=0x1615
fail_loc=0x1615
chown: warning: '.' should be ':': '1.1'
CMD: trevis-152vm89 /usr/sbin/lctl set_param fail_loc=0x1615
fail_loc=0x1615
chown: warning: '.' should be ':': '1.1'
chown: warning: '.' should be ':': '1.1'
CMD: trevis-152vm88 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
CMD: trevis-152vm89 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
The file size should be incorrect since layout EA is lost
Trigger layout LFSCK on all devices to find out orphan OST-object
CMD: trevis-152vm88 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t layout -r -o
Started LFSCK on the device lustre-MDT0000: scrub layout
CMD: trevis-152vm88 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout |
awk '/^status/ { print \$2 }'
CMD: trevis-152vm89 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.lfsck_layout |
awk '/^status/ { print \$2 }'
CMD: trevis-152vm88 /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.lfsck_layout |
awk '/^status/ { print \$2 }'
CMD: trevis-152vm89 /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.lfsck_layout |
awk '/^status/ { print \$2 }'
CMD: trevis-152vm87 /usr/sbin/lctl get_param -n obdfilter.lustre-OST0000.lfsck_layout
CMD: trevis-152vm87 /usr/sbin/lctl get_param -n obdfilter.lustre-OST0001.lfsck_layout
CMD: trevis-152vm88 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout
sanity-lfsck test_18a: @@@@@@ FAIL: (6.1) Expect 3 fixed on mds1, but got: 248
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/119916 - 6.12.0-55.43.1.el10_0.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/119916 - 6.12.0-55.43.1_lustre.el10.x86_64
<<Please provide additional information about the failure here>>
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lfsck test_18a - (6.1) Expect 3 fixed on mds1, but got: 248
Attachments
Issue Links
- is related to
-
LU-18667 RHEL 10.0 support
-
- Open
-