Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4626

directories missing after upgrade from 1.8 to 2.3 then 2.4.1 then 2.4.2

    XMLWordPrintable

Details

    • Bug
    • Resolution: Low Priority
    • Critical
    • None
    • Lustre 2.4.2
    • None
    • Lustre servers and clients RHEL6, clients running Lustre 1.8.9, file system upgraded from at least 1.8 (could be 1.6)
    • 3
    • 12656

    Description

      we have got a test file system which had been created with Lustre 1.8 (or even 1.6), then briefly updated to 2.3, 2.4.1 and now to 2.4.2. On this file system we now have a few directories that are inaccessible after the latest upgrade. I believe they were accessible when we were still running 2.4.1 but I'm not sure.

      All clients are currently running 1.8.9.

      Trying to ls one of the directories does generate an error on the command line, but nothing in any of the system logs that I could find.

      [bnh65367@p60-storage ~]$ ls -l /mnt/play01 |grep p60
      ls: cannot access /mnt/play01/p45: No such file or directory
      ls: cannot access /mnt/play01/p60: No such file or directory
      d?????????? ? ? ? ? ? p60
      [bnh65367@p60-storage ~]$ ls -l /mnt/play01/p60
      ls: cannot access /mnt/play01/p60: No such file or directory
      [bnh65367@p60-storage ~]$

      Trying to touch one of the missing directories results in this on the MDS and an input output error on the client command line.

      Feb 11 19:13:23 cs04r-sc-mds02-03 kernel: LustreError: 14367:0:(mdt_open.c:1694:mdt_reint_open()) play01-MDT0000: name p60 present, but fid [0x45828f:0x7f3b41ef:0x0] invalid

      I'm currently trying to understand if this is something that is expected? Something we're likely to see if we upgrade directly from 1.8 to 2.4.2 on our production file systems? And of course we need to fix it. To me it looks like LU-3934 could be related, though if I understand that bug correctly, it should be fixed? Maybe it'll fix itself (by automatically starting OI scrub?)?

      Is this sufficiently different from LU-3934 and unexpected that I should open a new ticket?

      The file system has been upgrade a few hours ago, lctl get_param 'osd-ldiskfs.*.oi_scrub on the MDS reports the status init for both MDT and MGT (see below), does this mean it hasn't been started and I should start it? How would I start it?

      sudo lctl get_param 'osd-ldiskfs.*.oi_scrub'
      osd-ldiskfs.MGS.oi_scrub=
      name: OI_scrub
      magic: 0x4c5fd252
      oi_files: 64
      status: init
      flags:
      param:
      time_since_last_completed: N/A
      time_since_latest_start: N/A
      time_since_last_checkpoint: N/A
      latest_start_position: N/A
      last_checkpoint_position: N/A
      first_failure_position: N/A
      checked: 0
      updated: 0
      failed: 0
      prior_updated: 0
      noscrub: 0
      igif: 0
      success_count: 0
      run_time: 0 seconds
      average_speed: 0 objects/sec
      real-time_speed: N/A
      current_position: N/A
      osd-ldiskfs.play01-MDT0000.oi_scrub=
      name: OI_scrub
      magic: 0x4c5fd252
      oi_files: 64
      status: init
      flags:
      param:
      time_since_last_completed: N/A
      time_since_latest_start: N/A
      time_since_last_checkpoint: N/A
      latest_start_position: N/A
      last_checkpoint_position: N/A
      first_failure_position: N/A
      checked: 0
      updated: 0
      failed: 0
      prior_updated: 0
      noscrub: 0
      igif: 0
      success_count: 0
      run_time: 0 seconds
      average_speed: 0 objects/sec
      real-time_speed: N/A
      current_position: N/A
      [bnh65367@cs04r-sc-mds02-03 ~]$

      Note that since this is a test file system, I'm going to leave it in this state for a bit longer (day or two) in case there is some additional information I should collect. But sometime next week, I will need to start the OI scrub hoping that this will fix it.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              ferner Frederik Ferner (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: