Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4380

data corruption when copy a file to a new directory (sles11sp2 only)

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.4.1
    • None
    • server: centos 2.1.5 server OR centos 2.4.1 server
      client: sles11sp2 2.4.1 client

      Source can be found at github.com/jlan/lustre-nas. The tag for the client is 2.4.1-1nasC.
    • 3
    • 12006

    Description

      Users reported a data corruption problem. We have a test script to reproduce the problem.

      When run in a Lustre file system with a sles11sp2 host as the remote host, the script fails (sum reports 00000). It works if the remote host is running sles11sp1 or CentOS.

      — cut here for test5.sh —
      #!/bin/sh

      host=${1:-endeavour2}
      rm -fr zz hosts
      cp /etc/hosts hosts
      #fsync hosts
      ssh $host "cd $PWD && mkdir -p zz && cp hosts zz/"
      sum hosts zz/hosts
      — cut here —

      Good result:
      ./test5.sh r301i0n0
      61609 41 hosts
      61609 41 zz/hosts

      Bad result:
      ./test5.sh r401i0n2
      61609 41 hosts
      00000 41 zz/hosts

      Notes:

      • If the copied file is small enough (e.g., /etc/motd), the script succeeds.
      • If you uncomment the fsync, the script succeeds.
      • When it fails, stat reports no blocks have been allocated to the zz/hosts file:

      $ stat zz/hosts
      File: `zz/hosts'
      Size: 41820 Blocks: 0 IO Block: 2097152 regular file
      Device: 914ef3a8h/2437870504d Inode: 163153538715835056 Links: 1
      Access: (0644/rw-rr-) Uid: (10491/dtalcott) Gid: ( 1179/ cstaff)
      Access: 2013-12-12 09:24:46.000000000 -0800
      Modify: 2013-12-12 09:24:46.000000000 -0800
      Change: 2013-12-12 09:24:46.000000000 -0800

      • If you run in an NFS file system, the script usually succeeds, but sometimes reports a no such file error on the sum of zz/hosts. After a few seconds, though, the file appears, with the correct sum. (Typical NFS behavior.)
      • Acts the same on nbp7 and nbp8.

      Attachments

        1. LU-4380-debug.patch
          0.5 kB
          Niu Yawei
        2. LU4380.dbg.20131224
          2.76 MB
          Jay Lan
        3. LU4380.dbg.20121230.tgz
          2.17 MB
          Jay Lan
        4. LU4380.dbg.20121230.resend.tgz
          2.17 MB
          Jay Lan

        Issue Links

          Activity

            People

              bogl Bob Glossman (Inactive)
              jaylan Jay Lan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: