Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5265

Lustre clients hang while OI_Scrub is running

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.4.2
    • None
    • RHEL6
    • 3
    • 14692

    Description

      Context:
      OI_Scrub has been triggered after failover of the MDT on the failover MDS. (related to LU-4554)

      ---8<---
      LustreError: 0-0: ptmp2-MDT0000: trigger OI scrub by RPC for [0x22cb1aa25:0xfabf:0x0], rc = 0 [1]
      LustreError: 0-0: spool2-MDT0000: trigger OI scrub by RPC for [0x20cf1887f:0x92c:0x0], rc = 0 [1]
      ---8<---

      Issue:
      Lustre clients were hung while trying to read/write from/to the filesystem, getting an error EINPROGRESS from the server for each request until the completion of the OI_Scrub process.

      However, the following commands were still working: ls, cd, df

      Due to the number of inodes, the OI_Scrub took 3 hours to complete, hanging the production.

      OI_Scrub status once completed:
      ---8<---

      1. cat /proc/fs/lustre/osd-ldiskfs/ptmp2-MDT0000/oi_scrub
        name: OI_scrub
        magic: 0x4c5fd252
        oi_files: 1
        status: completed
        flags:
        param:
        time_since_last_completed: 382 seconds
        time_since_latest_start: 11068 seconds
        time_since_last_checkpoint: 382 seconds
        latest_start_position: 12
        last_checkpoint_position: 499122177
        first_failure_position: N/A
        checked: 190095126
        updated: 2
        failed: 0
        prior_updated: 0
        noscrub: 1965
        igif: 239
        success_count: 3
        run_time: 10685 seconds
        average_speed: 17790 objects/sec
        real-time_speed: N/A
        current_position: N/A
        ---8<---

      run_time/3600 = 10685/3600 ~= 2.97 hours.

      As a workaround, auto_scrub has been disabled (echo 0 > /proc/fs/lustre/osd-ldiskfs/ptmp2-MDT0000/auto_scrub)

      We have since upgraded to Lustre 2.4.3 with the patch from LU-4554. The customer would like to enable the auto_scrub feature in order to get a consistent OI table, but cannot accept such an impact on the production systems.

      Regarding the "OI Scrub and inode Iterator Solution Architecture", client can access the MDT while OI Scrub is running. Except the operations of FID-to-path or accessing parent from non-directory child, other operations behave as normal.

      Attachments

        Activity

          People

            yong.fan nasf (Inactive)
            bruno.travouillon Bruno Travouillon (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: