Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
None
-
Upgrade from 2.5.3 to Master (Latest commit
LU-6012, I99ea077ae79fcdfedd7bb16c2a664714e0ea5ea3), downgrade to 2.5.1.
-
3
-
16860
Description
After downgrading from master (Latest commit was I99ea077ae79fcdfedd7bb16c2a664714e0ea5ea3), we are unable to do lfs setstripe to specific OSTs, and setstripe with a count of greater than 1 appears to succeed, but the files created are singly striped.
Viz:
lfs setstripe -i 2 some_file error on ioctl 0x4008669a for 'some_file' (3): File too large error: setstripe: create stripe file 'some_file' failed lfs setstripe -c 2 some_file lfs getstripe some_file some_file lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 86890763 0x52dd90b 0
Digging on the client, I've traced this to a failed RPC to the MDT:
00000002:00100000:5.0:1418933152.848140:0:4339:0:(mdc_locks.c:642:mdc_finish_enqueue()) @@@ op: 1 disposition: 3, status: -27 req@ffff8803c4f9d000 x1487836602112680/t0(0) o101->perses1-MDT0000-mdc-ffff8803f3c1e400@4@gni:12/10 lens 600/544 e 0 to 0 dl 1418933314 ref 1 fl Complete:R/0/0 rc 301/301
And looking on the MDT, I see the failure starting here:
00000004:00000001:2.0:1418933152.838169:0:625:0:(osp_precreate.c:1057:osp_precreate_reserve()) Process entered 00000004:00000001:2.0:1418933152.838170:0:625:0:(osp_precreate.c:1139:osp_precreate_reserve()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
(Note the -22 is turned in to -27 before being passed back to the client.)
Looking at the code, we see the only way to return -EINVAL here is if it's already set in d->opd_pre_status.
After logging the system startup, I see that being set here (MDT log):
00000100:00000040:4.0:1418933124.438340:0:668:0:(client.c:1176:ptlrpc_check_status()) @@@ status is -22 req@ffff88022d32cc00 x1487859182731960/t0(0) o5->perses1-OST0005-osc-MDT0000@26@gni:28/4 lens 432/400 e 0 to 0 dl 1418933286 ref 2 fl Rpc:RN/0/0 rc 0/-22 00000004:00020000:4.0:1418933124.438394:0:668:0:(osp_precreate.c:736:osp_precreate_cleanup_orphans()) perses1-OST0005-osc-MDT0000: cannot cleanup orphans: rc = -22
^-- Which sets the 'EINVAL' in d->opd_pre_status.
And that RPC is coming from this action on the MDT:
00000004:00080000:4.0:1418933124.429108:0:668:0:(osp_precreate.c:643:osp_precreate_cleanup_orphans()) perses1-OST0005-osc-MDT0000: going to cleanup orphans since [0x100050000:0x490dcec:0x0]
Looking at logs from OST0005, I see the following:
00000004:00000002:2.0:1418933124.428804:0:24404:0:(osd_handler.c:487:osd_check_lma()) perses1-OST0005: FID [0x100000000:0x0:0x0] != self_fid [0x100050000:0x0:0x0] 00000004:00000001:2.0:1418933124.428806:0:24404:0:(osd_handler.c:491:osd_check_lma()) Process leaving (rc=18446744073709551538 : -78 : ffffffffffffffb2) 00000001:00000001:2.0:1418933124.428807:0:24404:0:(osd_compat.c:330:osd_lookup_in_remote_parent()) Process entered 00000001:00000001:2.0:1418933124.428810:0:24404:0:(osd_compat.c:349:osd_lookup_in_remote_parent()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00000004:00000001:2.0:1418933124.428811:0:24404:0:(osd_handler.c:624:osd_fid_lookup()) Process leaving via out (rc=18446744073709551501 : -115 : 0xffffffffffffff8d) 00000020:00000001:2.0:1418933124.428813:0:24404:0:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=4278189824 : 4278189824 : feffff00) 00000004:00000010:2.0:1418933124.428815:0:24404:0:(osd_handler.c:721:osd_object_free()) kfreed 'obj': 176 at ffff880328190e00. 00002000:00000001:2.0:1418933124.428816:0:24404:0:(ofd_dev.c:327:ofd_object_free()) Process entered 00002000:00000040:2.0:1418933124.428817:0:24404:0:(ofd_dev.c:331:ofd_object_free()) object free, fid = [0x100000000:0x0:0x0] 00002000:00000010:2.0:1418933124.428818:0:24404:0:(ofd_dev.c:335:ofd_object_free()) slab-freed '(of)': 160 at ffff880250060310. 00002000:00000001:2.0:1418933124.428819:0:24404:0:(ofd_dev.c:336:ofd_object_free()) Process leaving 00000020:00000001:2.0:1418933124.428819:0:24404:0:(lu_object.c:242:lu_object_alloc()) Process leaving (rc=18446744073709551501 : -115 : ffffffffffffff8d) 00000020:00000001:2.0:1418933124.428821:0:24404:0:(dt_object.c:386:dt_find_or_create()) Process leaving (rc=18446744073709551501 : -115 : ffffffffffffff8d) 00002000:00000010:2.0:1418933124.428822:0:24404:0:(ofd_fs.c:291:ofd_seq_load()) kfreed 'oseq': 96 at ffff8802b52ab140. 00002000:00000001:2.0:1418933124.428824:0:24404:0:(ofd_fs.c:292:ofd_seq_load()) Process leaving (rc=18446744073709551501 : -115 : ffffffffffffff8d) 00002000:00020000:2.0:1418933124.428825:0:24404:0:(ofd_obd.c:1209:ofd_create()) perses1-OST0005: Can't find FID Sequence 0x0: rc = -115 00002000:00000001:2.0:1418933124.428826:0:24404:0:(ofd_obd.c:1210:ofd_create()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 00000010:00000001:2.0:1418933124.428827:0:24404:0:(obd_class.h:840:obd_create()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
And this error eventually gets bubbled back to the MDT.
I can't figure out what items should be in that FID sequence, or how those orphans came to be.
I'm posting this mainly so the underlying issue can hopefully be identified, but I'd also welcome it if anyone had any ideas how to get the live system working again. We would prefer not to have to reformat.
I'm going to attach logs from the client, MDS, and an OSS (all OSSes report this issue).
Attachments
Issue Links
- is related to
-
LU-6724 Downgrading from 2.8 with DNE2 patches to 2.5 servers fails: unsupported read-only filesystem feature(s) 2
-
- Resolved
-
The issue you hit is that when your system was upgraded to master, the master generated some new-formatted IDIF (which contains the real OST index) and stored such IDIF in the OST-object's LMA EA. There are three cases that may cause the new-formtted IDIF generated:
1) New created the OST-object after the upgrading. Currently, master will generate new-formatted IDIF for the new created OST-obejct.
2) For the old OST-object that was created before the upgrading, the OI scrub will convert the old-formatted IDIF to new-formatted IDIF.
3) For the old OST-object that was created before the upgrading, even though OI scrub is not run, when the OST-object is accessed, the system will auto-check whether its up-layer given FID (new formatted IDIF) matches the self-FID in LMA EA (old formatted IDIF) or not, if not match, then conditionally converted the self-FID as new-formatted IDIF. There is a proc switch under osd-ldiskfs can be used to control that: lma_self_repair, which is enabled by default.
If the OST-object contains new-formatted IDIF in its LMV EA, in spite of it is new created or converted one, then after the system downgraded to Lustre-2.5, the old system will found that the up layer given FID (old formatted IDIF) does not match the self-FID in LMV EA (new formatted IDIF). Because there is no further compatible handling on b2_5, then it will case kinds of unexpected failures.
To resolve the compatibility issues, there are two possible solutions:
1) We can make some patch for b2_5 to recognize the new-formatted IDIF.
2) From the long view, maybe we need to control the new-formatted IDIF generating as handling "dirdata", once the administrator specified "tune2fs -O new-idif", then the OSD-ldiskfs will generate new-formatted IDIF or convert old-formatted IDIF, and NOT allow to downgrade any longer.