[LU-6687] ALL osp-sync in D state Created: 03/Jun/15  Updated: 16/Oct/15  Resolved: 16/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Niu Yawei (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5297 osp_sync_thread can't handle invalid ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

After we reboot and mount mdt we see all osp-sync threads in D state and the following errors.

Jun 3 14:38:03 nbp8-mds1 kernel: LustreError: 5838:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp8-OST009e-osc-MDT0000: invalid setattr record, lsr_valid:100
Jun 3 14:38:03 nbp8-mds1 kernel: LustreError: 5838:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 1554778 previous similar messages
Jun 3 14:40:11 nbp8-mds1 kernel: LustreError: 6043:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp8-OST010e-osc-MDT0000: invalid setattr record, lsr_valid:8191
Jun 3 14:40:11 nbp8-mds1 kernel: LustreError: 6043:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 2608741 previous similar messages
Jun 3 14:44:27 nbp8-mds1 kernel: LustreError: 6342:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp8-OST0133-osc-MDT0000: invalid setattr record, lsr_valid:8191
Jun 3 14:44:27 nbp8-mds1 kernel: LustreError: 6342:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 10802201 previous similar messages
Jun 3 14:47:16 nbp8-mds1 pcp-pmie[4713]: High 1-minute load average 321load@nbp8-mds1
Jun 3 14:52:59 nbp8-mds1 kernel: LustreError: 6220:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp8-OST0045-osc-MDT0000: invalid setattr record, lsr_valid:68
Jun 3 14:52:59 nbp8-mds1 kernel: LustreError: 6220:0:(osp_sync.c:487:osp_sync_new_setattr_job()) Skipped 58201221 previous similar messages



 Comments   
Comment by Peter Jones [ 04/Jun/15 ]

Niu

Could you please advise?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 04/Jun/15 ]

Looks the related patch was applied correctly in your 2.5.3 tree (https://github.com/jlan/lustre-nas/commit/fb970b342a7fac22a17b4932e11febb6963b3dff)

Is this an upgraded system? and it's the first mount after upgrading? I'm wondering if these invalid records were some leftover from old system.

Comment by Niu Yawei (Inactive) [ 04/Jun/15 ]

BTW, because of LU-5297, osp sync thread can't handle invalid records properly.

Comment by Mahmoud Hanafi [ 04/Jun/15 ]

This is a upgraded system to 2.5.3. This happens every time when the mdt is mounted. How do we go about fixing the invalid records?

We are going and cleanup a lot of mismatch between object UID/GID and MDT records. These occurred most likely due to LU-6574.

Comment by Niu Yawei (Inactive) [ 04/Jun/15 ]

Hmm, fixing these invalid records manually will be troublesome (there isn't any llog edit tool, so you have to use hex edit to modify the records...)

Actually, if there are only few leftover records, we can just delete all of them by removing the llog files, then we can move on to mount the MDT.

1. lctl --device $MDTDEV llog_catlist to show all the catalogs for the unlink/setattr records;
2. mount mdt as ldiskfs and find all the catalogs under /O;
3. use llog_reader to show plain logs belong to these catalogs;
4. use llog_reader to see how many leftover records in the plain logs;
5. remove all the plain logs and catalogs, they will be recreated on next mount;

Comment by Mahmoud Hanafi [ 04/Jun/15 ]

nbp8-mds1 ~ # lctl --device 6 llog_catlist
OBD_IOC_CATLOGLIST failed: Operation not supported

it looks like there are a lot of records. I am not sure If i understand item #4 and #5

Comment by Niu Yawei (Inactive) [ 05/Jun/15 ]

Hmm, looks llog_catlist is only available in master now.

Ok, each chown & unlink on MDT will generate a llog record in llog file, and this record will be used to sync the operations to OST objects, once the sync to OST done, the record will be removed from the llog file. Usually after a clean shutdown, there won't be any leftover records in the llog files. However, in your case, there are some invalid records which can't be processed and not removed at the end.

Let's look at the on disk structure of llog files:

  • There is a global CATLOGLIST file which storing all the catalog IDs for each OST; ("lctl llog_catlist" can print the content of this file)
  • Each catalog is a plain log index, which stores plain log IDs;
  • Each plain log stores the unlink/setattr records I mentioned above;

I think that could make it easier for understanding the #4 & #5 of my previous comment?

Given the "lctl llog_catlist" isn't supported in 2.5, you can remove all the leftover records as following: mount mdt as ldiskfs, find all the files which name is numerical under /O/1/ and remove them all. (it's better to backup these files)

Comment by Mahmoud Hanafi [ 16/Oct/15 ]

Please close this case

Comment by Peter Jones [ 16/Oct/15 ]

ok Mahmoud

Generated at Sat Feb 10 02:02:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.