[LU-5265] Lustre clients hang while OI_Scrub is running Created: 27/Jun/14  Updated: 27/Aug/14  Resolved: 27/Aug/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Bruno Travouillon (Inactive) Assignee: nasf (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

RHEL6


Attachments: HTML File mds_top_oi_scrub    
Severity: 3
Rank (Obsolete): 14692

 Description   

Context:
OI_Scrub has been triggered after failover of the MDT on the failover MDS. (related to LU-4554)

---8<---
LustreError: 0-0: ptmp2-MDT0000: trigger OI scrub by RPC for [0x22cb1aa25:0xfabf:0x0], rc = 0 [1]
LustreError: 0-0: spool2-MDT0000: trigger OI scrub by RPC for [0x20cf1887f:0x92c:0x0], rc = 0 [1]
---8<---

Issue:
Lustre clients were hung while trying to read/write from/to the filesystem, getting an error EINPROGRESS from the server for each request until the completion of the OI_Scrub process.

However, the following commands were still working: ls, cd, df

Due to the number of inodes, the OI_Scrub took 3 hours to complete, hanging the production.

OI_Scrub status once completed:
---8<---

  1. cat /proc/fs/lustre/osd-ldiskfs/ptmp2-MDT0000/oi_scrub
    name: OI_scrub
    magic: 0x4c5fd252
    oi_files: 1
    status: completed
    flags:
    param:
    time_since_last_completed: 382 seconds
    time_since_latest_start: 11068 seconds
    time_since_last_checkpoint: 382 seconds
    latest_start_position: 12
    last_checkpoint_position: 499122177
    first_failure_position: N/A
    checked: 190095126
    updated: 2
    failed: 0
    prior_updated: 0
    noscrub: 1965
    igif: 239
    success_count: 3
    run_time: 10685 seconds
    average_speed: 17790 objects/sec
    real-time_speed: N/A
    current_position: N/A
    ---8<---

run_time/3600 = 10685/3600 ~= 2.97 hours.

As a workaround, auto_scrub has been disabled (echo 0 > /proc/fs/lustre/osd-ldiskfs/ptmp2-MDT0000/auto_scrub)

We have since upgraded to Lustre 2.4.3 with the patch from LU-4554. The customer would like to enable the auto_scrub feature in order to get a consistent OI table, but cannot accept such an impact on the production systems.

Regarding the "OI Scrub and inode Iterator Solution Architecture", client can access the MDT while OI Scrub is running. Except the operations of FID-to-path or accessing parent from non-directory child, other operations behave as normal.



 Comments   
Comment by Bruno Travouillon (Inactive) [ 27/Jun/14 ]

Top on the MDS while OI_Scrub was running

Comment by Peter Jones [ 27/Jun/14 ]

Fan Yong

Could you please advise on this issue?

Thanks

Peter

Comment by nasf (Inactive) [ 30/Jun/14 ]

During the OI scrub rebuilding the OI files, if the client accesses the system with name-based RPC, such as lookup, then it will not be affected. But if the client sends FID-based RPC to the MDS and related FID mapping has not been rebuilt yet, it will get -EINPROGRESS until related FID mapping has been rebuilt, the worst case is that the application has to wait until OI scrub finished. The case of FID-based RPC usually happens for old connected client that caches the FID on client-side before the upgrading or before the MDS file-level backup/restore. For the new connected client, the FID-based RPC will always be after name-based RPC (except for FID-to-path), so the new connected client will not be affected.

So for your above case, it is normal. Since your system has already run OI scrub, the inconsistent cases should have been fixed already. So even though you enable the "auto_scrub", the OI scrub should not be triggered unless it finds some new inconsistency (very rare). On the other hand, even though the OI scrub is rebuilding the OI files, it is NOT all the FIDs will be affected. Means that if the application tries to open-read/write the file which FID is not cached on the client or its FID mapping has been rebuilt already, then the application should not be affected by the OI scrub.

So please tell me whether your system often hits the OI mapping failures (and trigger OI scrub) or not. If not, then enable "auto_scrub" will be OK. Otherwise, means the OI scrub cannot build the OI files completely, there should be other hidden bugs.

Comment by Bruno Travouillon (Inactive) [ 09/Jul/14 ]

Thanks for you clear answer.

However, can you tell me how to check if OI scrub is triggered while auto_scrub is off?

In osd_fid_lookup(), the LCONSOLE message "trigger OI scrub by RPC for DFID" only displays when auto_scrub is on.

Should I check on clients' consoles?

Comment by nasf (Inactive) [ 11/Jul/14 ]

If auto_scrub is disabled, then the OI scrub will NOT be triggered automatically even though some inconsistency is detected during the normal processing. So you can NOT find the message about OI scrub auto running on the MDS. But under such case, the administrator can trigger OI scrub manually via "lctl lfsck_start".

The OI scrub is server-side work, in any cases, the client will NOT print any message.

Comment by Bruno Travouillon (Inactive) [ 17/Jul/14 ]

Understood. We should enable the auto_scrub by the beginning of September.

We will then be able to monitor the OI mapping failures and open a new ticket if we hit some issue.

Thanks.

Generated at Sat Feb 10 01:49:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.