Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4084

ll_inode_revalidate_fini()) failure -13

Details

    • Story
    • Resolution: Not a Bug
    • Major
    • None
    • None
    • None
    • Lustre 2.1.5 on CentOS 6.3
    • 10976

    Description

      Our Lustre client reported the following errors continuously:

      Oct 10 16:06:13 avmmst1a kernel: Lustre: Mounted modelfs-client
      Oct 10 16:06:13 avmmst1a kernel: Lustre: Mounted preposfs-client
      Oct 10 16:06:18 avmmst1a kernel: LustreError: 54029:0:(mdc_locks.c:736:mdc_enqueue()) ldlm_cli_enqueue: -13
      Oct 10 16:06:18 avmmst1a kernel: LustreError: 54029:0:(file.c:2196:ll_inode_revalidate_fini()) failure -13 inode 452984833
      Oct 10 16:06:25 avmmst1a kernel: LustreError: 54029:0:(mdc_locks.c:736:mdc_enqueue()) ldlm_cli_enqueue: -13
      Oct 10 16:06:25 avmmst1a kernel: LustreError: 54029:0:(file.c:2196:ll_inode_revalidate_fini()) failure -13 inode 159383553

      We've googled but no hint. What's the cause of this? Thanks.

      Regards,
      Patrick

      Attachments

        Activity

          [LU-4084] ll_inode_revalidate_fini()) failure -13
          pjones Peter Jones added a comment -

          That's great! Thanks for letting us know.

          pjones Peter Jones added a comment - That's great! Thanks for letting us know.

          We've solved the problem.

          The error is displayed, when Nagios periodically runs the plugin check_disk on lustre client to check the disk capacity of the Lustre file system.

          However, the user nagios only exist on master node (lustre client). MDS does not have such user.

          After adding local user nagios on MDS, this error does not appear anymore.

          ctcychan Patrick Chan (Inactive) added a comment - We've solved the problem. The error is displayed, when Nagios periodically runs the plugin check_disk on lustre client to check the disk capacity of the Lustre file system. However, the user nagios only exist on master node (lustre client). MDS does not have such user. After adding local user nagios on MDS, this error does not appear anymore.
          laisiyao Lai Siyao added a comment -

          It'll be better to do this in a script if system is busy, in case the old logs get discarded.

          laisiyao Lai Siyao added a comment - It'll be better to do this in a script if system is busy, in case the old logs get discarded.

          Just already discovered that both inodes mentioned in the error (159383553 & 452984833) are owned by top level directory of mount point.

          ctcychan Patrick Chan (Inactive) added a comment - Just already discovered that both inodes mentioned in the error (159383553 & 452984833) are owned by top level directory of mount point.

          The 'll_inode_revalidate_fini())' failure message appears occasionally, I need to watch /var/log/messages on client and capture the debug buffer on MDS promptly.

          The log file lustre_debug2.log.gz is uploaded. This is the debug buffer generated about 10 seconds after 'll_inode_revalidate_fini()) failure' appears on client.

          As you suggested, I've add inode and trace into debug parameter, debug buffer size is increased to 512MB.

          ctcychan Patrick Chan (Inactive) added a comment - The 'll_inode_revalidate_fini())' failure message appears occasionally, I need to watch /var/log/messages on client and capture the debug buffer on MDS promptly. The log file lustre_debug2.log.gz is uploaded. This is the debug buffer generated about 10 seconds after 'll_inode_revalidate_fini()) failure' appears on client. As you suggested, I've add inode and trace into debug parameter, debug buffer size is increased to 512MB.
          laisiyao Lai Siyao added a comment -

          I don't find any error messages in this log, which means -13 (-EACCES) is not from disk filesystem, because mdt_getattr_internal() will print an error if it gets attr from disk fails. Did you dump the debug logs right after you see this failure? Because debug log size is limited, and it only contains the most recent logs.

          To further understand this failure, could you enable more debug on MDS with `lctl set_param debug=+inode` and `lctl set_param debug=+trace` which will enable debug for inode access and function trace. And you can use `lctl get_param debug_mb` and `lctl set_param debug_mb=<debug_size>` to check and increase debug memory size.

          You can also dump debug logs on client, which may help you find the file name.

          laisiyao Lai Siyao added a comment - I don't find any error messages in this log, which means -13 (-EACCES) is not from disk filesystem, because mdt_getattr_internal() will print an error if it gets attr from disk fails. Did you dump the debug logs right after you see this failure? Because debug log size is limited, and it only contains the most recent logs. To further understand this failure, could you enable more debug on MDS with `lctl set_param debug=+inode` and `lctl set_param debug=+trace` which will enable debug for inode access and function trace. And you can use `lctl get_param debug_mb` and `lctl set_param debug_mb=<debug_size>` to check and increase debug memory size. You can also dump debug logs on client, which may help you find the file name.

          I run 'lctl dk' on MDS, the file lustre_debug.log.gz was uploaded to ftp.whamcloud.com/uploads.

          By the way, I try to find out if the culprit is the inode 159383553. I perform a full scan on the inode 159383553, but can't find any file in the whole lustre file system.

          client> find /lustre_mnt_point -inum 159383553

          ctcychan Patrick Chan (Inactive) added a comment - I run 'lctl dk' on MDS, the file lustre_debug.log.gz was uploaded to ftp.whamcloud.com/uploads. By the way, I try to find out if the culprit is the inode 159383553. I perform a full scan on the inode 159383553, but can't find any file in the whole lustre file system. client> find /lustre_mnt_point -inum 159383553
          laisiyao Lai Siyao added a comment -

          The next time you see this error, could you use `lctl dk` on MDS to dump debug logs right after that?

          laisiyao Lai Siyao added a comment - The next time you see this error, could you use `lctl dk` on MDS to dump debug logs right after that?

          Lai Siyao,

          There is no logs in MDS.

          Bruno,

          MDS is also a NIS server, both servers and clients use the same NIS server.

          Patrick

          ctcychan Patrick Chan (Inactive) added a comment - Lai Siyao, There is no logs in MDS. Bruno, MDS is also a NIS server, both servers and clients use the same NIS server. Patrick

          Could you also check if Clients and Servers share the same UIDs/GIDs databases ??

          bfaccini Bruno Faccini (Inactive) added a comment - Could you also check if Clients and Servers share the same UIDs/GIDs databases ??

          People

            laisiyao Lai Siyao
            ctcychan Patrick Chan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: