Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4380

data corruption when copy a file to a new directory (sles11sp2 only)

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.4.1
    • None
    • server: centos 2.1.5 server OR centos 2.4.1 server
      client: sles11sp2 2.4.1 client

      Source can be found at github.com/jlan/lustre-nas. The tag for the client is 2.4.1-1nasC.
    • 3
    • 12006

    Description

      Users reported a data corruption problem. We have a test script to reproduce the problem.

      When run in a Lustre file system with a sles11sp2 host as the remote host, the script fails (sum reports 00000). It works if the remote host is running sles11sp1 or CentOS.

      — cut here for test5.sh —
      #!/bin/sh

      host=${1:-endeavour2}
      rm -fr zz hosts
      cp /etc/hosts hosts
      #fsync hosts
      ssh $host "cd $PWD && mkdir -p zz && cp hosts zz/"
      sum hosts zz/hosts
      — cut here —

      Good result:
      ./test5.sh r301i0n0
      61609 41 hosts
      61609 41 zz/hosts

      Bad result:
      ./test5.sh r401i0n2
      61609 41 hosts
      00000 41 zz/hosts

      Notes:

      • If the copied file is small enough (e.g., /etc/motd), the script succeeds.
      • If you uncomment the fsync, the script succeeds.
      • When it fails, stat reports no blocks have been allocated to the zz/hosts file:

      $ stat zz/hosts
      File: `zz/hosts'
      Size: 41820 Blocks: 0 IO Block: 2097152 regular file
      Device: 914ef3a8h/2437870504d Inode: 163153538715835056 Links: 1
      Access: (0644/rw-rr-) Uid: (10491/dtalcott) Gid: ( 1179/ cstaff)
      Access: 2013-12-12 09:24:46.000000000 -0800
      Modify: 2013-12-12 09:24:46.000000000 -0800
      Change: 2013-12-12 09:24:46.000000000 -0800

      • If you run in an NFS file system, the script usually succeeds, but sometimes reports a no such file error on the sum of zz/hosts. After a few seconds, though, the file appears, with the correct sum. (Typical NFS behavior.)
      • Acts the same on nbp7 and nbp8.

      Attachments

        1. LU4380.dbg.20121230.resend.tgz
          2.17 MB
        2. LU4380.dbg.20121230.tgz
          2.17 MB
        3. LU4380.dbg.20131224
          2.76 MB
        4. LU-4380-debug.patch
          0.5 kB

        Issue Links

          Activity

            [LU-4380] data corruption when copy a file to a new directory (sles11sp2 only)
            pjones Peter Jones added a comment -

            ok - thanks Jay!

            pjones Peter Jones added a comment - ok - thanks Jay!

            We tested the 2.1.5 server with LU-3219 patch and the problem went away.

            Since somehow we are no longer able to reproduce the problem with our 2.4.0
            server (yes, LU-3219 was included in 2.4.0 release), we can close this ticket. Thanks for your help!

            jaylan Jay Lan (Inactive) added a comment - We tested the 2.1.5 server with LU-3219 patch and the problem went away. Since somehow we are no longer able to reproduce the problem with our 2.4.0 server (yes, LU-3219 was included in 2.4.0 release), we can close this ticket. Thanks for your help!
            00000010:00000001:0.0:1389375898.914586:0:15540:0:(ost_handler.c:1261:ost_get_info()) Process leaving (rc=0 : 0 : 0)
            

            Mahmoud, looks your OST is running on 2.1.5 and it doesn't have the patch 58444c4e9bc58e192f0bc0c163a5d51d42ba4255 (LU-3219) applied, so data corruption is expected.

            niu Niu Yawei (Inactive) added a comment - 00000010:00000001:0.0:1389375898.914586:0:15540:0:(ost_handler.c:1261:ost_get_info()) Process leaving (rc=0 : 0 : 0) Mahmoud, looks your OST is running on 2.1.5 and it doesn't have the patch 58444c4e9bc58e192f0bc0c163a5d51d42ba4255 ( LU-3219 ) applied, so data corruption is expected.

            I have gathered clean debug logs from localclient, remoteclient and oss. The files are to large to attach here. I have uploaded it to your ftp site 'ftp://ftp.whamcloud.com/uploads'

            The filename is "LU_4380.debug.tgz"


            $ tar tzvf LU_4380.debug.tgz
            rw-rr- root/root 215807901 2014-01-10 09:45 lu-4380.out.LOCALHOST
            rw-rr- root/root 1198791 2014-01-10 09:45 lu-4380.out.OSS
            rw-rr- root/root 135327548 2014-01-10 09:45 lu-4380.out.REMOTEHOST


            mhanafi Mahmoud Hanafi added a comment - I have gathered clean debug logs from localclient, remoteclient and oss. The files are to large to attach here. I have uploaded it to your ftp site 'ftp://ftp.whamcloud.com/uploads' The filename is "LU_4380.debug.tgz" $ tar tzvf LU_4380.debug.tgz rw-r r - root/root 215807901 2014-01-10 09:45 lu-4380.out.LOCALHOST rw-r r - root/root 1198791 2014-01-10 09:45 lu-4380.out.OSS rw-r r - root/root 135327548 2014-01-10 09:45 lu-4380.out.REMOTEHOST

            Which of the above could have changed the outcome?

            All the patches seems not related to this problem, and I don't see why mds upgrading can change the outcome (I think this is a problem related only to client and OST). Could you verify the clients and OSS version? Do they all have the patch 58444c4e9bc58e192f0bc0c163a5d51d42ba4255 (LU-3219)?

            Also, do you expect it to work correctly when running 2.4.1 client against 2.1.5 server? I am still able to reproduce against 2.1.5 server.

            Does the 2.1.5 server have the patch 58444c4e9bc58e192f0bc0c163a5d51d42ba4255 applied?

            niu Niu Yawei (Inactive) added a comment - Which of the above could have changed the outcome? All the patches seems not related to this problem, and I don't see why mds upgrading can change the outcome (I think this is a problem related only to client and OST). Could you verify the clients and OSS version? Do they all have the patch 58444c4e9bc58e192f0bc0c163a5d51d42ba4255 ( LU-3219 )? Also, do you expect it to work correctly when running 2.4.1 client against 2.1.5 server? I am still able to reproduce against 2.1.5 server. Does the 2.1.5 server have the patch 58444c4e9bc58e192f0bc0c163a5d51d42ba4255 applied?

            I was wrong in saying that the reproducer can be run against 2.4.1 centos server. It actually was 2.4.0 server with patches. The branch was nas-2.4.0-1 and tag was 2.4.0-3nasS.

            We recently updated the 2.4.0 mds (for testing LU-4403). Well, I am not able to reproduce the problem any more. The patches I picked up were:
            LU-4179 mdt: skip open lock enqueue during resent
            LU-3992 libcfs: Fix NUMA emulated mode
            LU-4139 quota: improve write performance when over softlimit
            LU-4336 quota: improper assert in osc_quota_chkdq()
            LU-4403 mds: extra lock during resend lock lookup
            LU-4028 quota: improve lfs quota output

            Which of the above could have changed the outcome?

            Also, do you expect it to work correctly when running 2.4.1 client against 2.1.5 server? I am still able to reproduce against 2.1.5 server.

            jaylan Jay Lan (Inactive) added a comment - I was wrong in saying that the reproducer can be run against 2.4.1 centos server. It actually was 2.4.0 server with patches. The branch was nas-2.4.0-1 and tag was 2.4.0-3nasS. We recently updated the 2.4.0 mds (for testing LU-4403 ). Well, I am not able to reproduce the problem any more. The patches I picked up were: LU-4179 mdt: skip open lock enqueue during resent LU-3992 libcfs: Fix NUMA emulated mode LU-4139 quota: improve write performance when over softlimit LU-4336 quota: improper assert in osc_quota_chkdq() LU-4403 mds: extra lock during resend lock lookup LU-4028 quota: improve lfs quota output Which of the above could have changed the outcome? Also, do you expect it to work correctly when running 2.4.1 client against 2.1.5 server? I am still able to reproduce against 2.1.5 server.

            I had a problem that I was not able to stop debug_daemon until good data were flushed out of the debug file at the OST side. You need to tell me how to address that problem so that I can produce OST log for you.

            You can try to execute 'lctl clear' on OSS to clear the debug buffer before testing.

            niu Niu Yawei (Inactive) added a comment - I had a problem that I was not able to stop debug_daemon until good data were flushed out of the debug file at the OST side. You need to tell me how to address that problem so that I can produce OST log for you. You can try to execute 'lctl clear' on OSS to clear the debug buffer before testing.
            jaylan Jay Lan (Inactive) added a comment - - edited

            Please discard LU4380.dbg.20121230.tgz. The two files contained in the tarball had confusing names. (besides it should be 2013 ) Here is the new tarball: LU4380.dbg.20131230.resend.tgz. Same two files with new names:
            LU4380.dbg.local.20131230
            LU4380.dbg.remote.20131230

            The fiemap was actually happened at the remote client, the one which actually did file creation and contents copying.

            I had a problem that I was not able to stop debug_daemon until good data were flushed out of the debug file at the OST side. You need to tell me how to address that problem so that I can produce OST log for you.

            jaylan Jay Lan (Inactive) added a comment - - edited Please discard LU4380.dbg.20121230.tgz. The two files contained in the tarball had confusing names. (besides it should be 2013 ) Here is the new tarball: LU4380.dbg.20131230.resend.tgz. Same two files with new names: LU4380.dbg.local.20131230 LU4380.dbg.remote.20131230 The fiemap was actually happened at the remote client, the one which actually did file creation and contents copying. I had a problem that I was not able to stop debug_daemon until good data were flushed out of the debug file at the OST side. You need to tell me how to address that problem so that I can produce OST log for you.

            Thank you Jay.

            I guess wr is the client which execute the test script (which cp hosts file from /etc/hosts)? because I see the blocking ast from wr log:

            00010000:00000001:28.0F:1388433400.002345:0:95598:0:(ldlm_lockd.c:1694:ldlm_handle_bl_callback()) Process entered
            00000100:00000001:22.0:1388433400.002345:0:4805:0:(events.c:407:reply_out_callback()) Process leaving
            00000100:00000001:29.0:1388433400.002346:0:54104:0:(service.c:1571:ptlrpc_server_hpreq_fini()) Process entered
            00000100:00000001:29.0:1388433400.002346:0:54104:0:(service.c:1582:ptlrpc_server_hpreq_fini()) Process leaving
            00000100:00000001:29.0:1388433400.002347:0:54104:0:(service.c:2078:ptlrpc_server_handle_request()) Process leaving (rc=1 : 1 : 1)
            00000400:00000001:29.0:1388433400.002348:0:54104:0:(watchdog.c:448:lc_watchdog_disable()) Process entered
            00010000:00010000:28.0:1388433400.002348:0:95598:0:(ldlm_lockd.c:1696:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: nbp8-OST0050-osc-ffff8807e657ec00 lock: ffff880aa9efb480/0x11c2317b4b200a63 lrc: 3/0,0 mode: PW/PW res: [0x3552ee:0x0:0x0].0 rrc: 1 type: EXT [0->18446744073709551615] (req 40960->18446744073709551615) flags: 0x420000000000 nid: local remote: 0x42e20173aa80c345 expref: -99 pid: 63286 timeout: 0 lvb_type: 1
            00000400:00000001:29.0:1388433400.002349:0:54104:0:(watchdog.c:456:lc_watchdog_disable()) Process leaving
            00010000:00010000:28.0:1388433400.002354:0:95598:0:(ldlm_lockd.c:1709:ldlm_handle_bl_callback()) Lock ffff880aa9efb480 already unused, calling callback (ffffffffa08f79e0)
            00000020:00000001:28.0:1388433400.002372:0:95598:0:(cl_lock.c:357:cl_lock_get_trust()) acquiring trusted reference: 0 ffff88089dfea238 18446744072108337004
            00000020:00000001:28.0:1388433400.002374:0:95598:0:(cl_lock.c:150:cl_lock_trace0()) got mutex: ffff88089dfea238@(1 ffff880f181f8340 1 5 0 0 0 0)(ffff880404adca70/1/1) at cl_lock_mutex_tail():668
            00000020:00000001:28.0:1388433400.002377:0:95598:0:(cl_lock.c:1839:cl_lock_cancel()) Process entered
            00000020:00010000:28.0:1388433400.002378:0:95598:0:(cl_lock.c:150:cl_lock_trace0()) cancel lock: ffff88089dfea238@(1 ffff880f181f8340 1 5 0 0 0 0)(ffff880404adca70/1/1) at cl_lock_cancel():1840
            00000020:00000001:28.0:1388433400.002381:0:95598:0:(cl_lock.c:804:cl_lock_cancel0()) Process entered
            00000008:00000001:28.0:1388433400.002382:0:95598:0:(osc_lock.c:1305:osc_lock_flush()) Process entered
            00000008:00000001:28.0:1388433400.002383:0:95598:0:(osc_cache.c:2827:osc_cache_writeback_range()) Process entered
            00000008:00000001:28.0:1388433400.002386:0:95598:0:(osc_cache.c:2770:osc_cache_wait_range()) Process entered
            00000008:00000020:28.0:1388433400.002387:0:95598:0:(osc_cache.c:2807:osc_cache_wait_range()) obj ffff88105cc78408 ready 0|-|- wr 0|-|- rd 0|- sync file range.
            00000008:00000001:28.0:1388433400.002388:0:95598:0:(osc_cache.c:2808:osc_cache_wait_range()) Process leaving (rc=0 : 0 : 0)
            00000008:00000020:28.0:1388433400.002389:0:95598:0:(osc_cache.c:2923:osc_cache_writeback_range()) obj ffff88105cc78408 ready 0|-|- wr 0|-|- rd 0|- pageout [0, 18446744073709551615], 0.
            

            I think obj ffff88105cc78408 should be the source hosts on lustre, and when remote client try to cp it to zz directory, blocking ast should be sent to local client.

            The interesting thing is that I didn't see fiemap calls on remote client ('rd' client), maybe it did the copy by normal read. Anyway, I didn't see anything wrong from the log, did the test success or not?

            Since the remote client didn't call fiemap, we don't need ost log for now, thank you.

            niu Niu Yawei (Inactive) added a comment - Thank you Jay. I guess wr is the client which execute the test script (which cp hosts file from /etc/hosts)? because I see the blocking ast from wr log: 00010000:00000001:28.0F:1388433400.002345:0:95598:0:(ldlm_lockd.c:1694:ldlm_handle_bl_callback()) Process entered 00000100:00000001:22.0:1388433400.002345:0:4805:0:(events.c:407:reply_out_callback()) Process leaving 00000100:00000001:29.0:1388433400.002346:0:54104:0:(service.c:1571:ptlrpc_server_hpreq_fini()) Process entered 00000100:00000001:29.0:1388433400.002346:0:54104:0:(service.c:1582:ptlrpc_server_hpreq_fini()) Process leaving 00000100:00000001:29.0:1388433400.002347:0:54104:0:(service.c:2078:ptlrpc_server_handle_request()) Process leaving (rc=1 : 1 : 1) 00000400:00000001:29.0:1388433400.002348:0:54104:0:(watchdog.c:448:lc_watchdog_disable()) Process entered 00010000:00010000:28.0:1388433400.002348:0:95598:0:(ldlm_lockd.c:1696:ldlm_handle_bl_callback()) ### client blocking AST callback handler ns: nbp8-OST0050-osc-ffff8807e657ec00 lock: ffff880aa9efb480/0x11c2317b4b200a63 lrc: 3/0,0 mode: PW/PW res: [0x3552ee:0x0:0x0].0 rrc: 1 type: EXT [0->18446744073709551615] (req 40960->18446744073709551615) flags: 0x420000000000 nid: local remote: 0x42e20173aa80c345 expref: -99 pid: 63286 timeout: 0 lvb_type: 1 00000400:00000001:29.0:1388433400.002349:0:54104:0:(watchdog.c:456:lc_watchdog_disable()) Process leaving 00010000:00010000:28.0:1388433400.002354:0:95598:0:(ldlm_lockd.c:1709:ldlm_handle_bl_callback()) Lock ffff880aa9efb480 already unused, calling callback (ffffffffa08f79e0) 00000020:00000001:28.0:1388433400.002372:0:95598:0:(cl_lock.c:357:cl_lock_get_trust()) acquiring trusted reference: 0 ffff88089dfea238 18446744072108337004 00000020:00000001:28.0:1388433400.002374:0:95598:0:(cl_lock.c:150:cl_lock_trace0()) got mutex: ffff88089dfea238@(1 ffff880f181f8340 1 5 0 0 0 0)(ffff880404adca70/1/1) at cl_lock_mutex_tail():668 00000020:00000001:28.0:1388433400.002377:0:95598:0:(cl_lock.c:1839:cl_lock_cancel()) Process entered 00000020:00010000:28.0:1388433400.002378:0:95598:0:(cl_lock.c:150:cl_lock_trace0()) cancel lock: ffff88089dfea238@(1 ffff880f181f8340 1 5 0 0 0 0)(ffff880404adca70/1/1) at cl_lock_cancel():1840 00000020:00000001:28.0:1388433400.002381:0:95598:0:(cl_lock.c:804:cl_lock_cancel0()) Process entered 00000008:00000001:28.0:1388433400.002382:0:95598:0:(osc_lock.c:1305:osc_lock_flush()) Process entered 00000008:00000001:28.0:1388433400.002383:0:95598:0:(osc_cache.c:2827:osc_cache_writeback_range()) Process entered 00000008:00000001:28.0:1388433400.002386:0:95598:0:(osc_cache.c:2770:osc_cache_wait_range()) Process entered 00000008:00000020:28.0:1388433400.002387:0:95598:0:(osc_cache.c:2807:osc_cache_wait_range()) obj ffff88105cc78408 ready 0|-|- wr 0|-|- rd 0|- sync file range. 00000008:00000001:28.0:1388433400.002388:0:95598:0:(osc_cache.c:2808:osc_cache_wait_range()) Process leaving (rc=0 : 0 : 0) 00000008:00000020:28.0:1388433400.002389:0:95598:0:(osc_cache.c:2923:osc_cache_writeback_range()) obj ffff88105cc78408 ready 0|-|- wr 0|-|- rd 0|- pageout [0, 18446744073709551615], 0. I think obj ffff88105cc78408 should be the source hosts on lustre, and when remote client try to cp it to zz directory, blocking ast should be sent to local client. The interesting thing is that I didn't see fiemap calls on remote client ('rd' client), maybe it did the copy by normal read. Anyway, I didn't see anything wrong from the log, did the test success or not? Since the remote client didn't call fiemap, we don't need ost log for now, thank you.
            jaylan Jay Lan (Inactive) added a comment - - edited

            I tried to run the test on an lustre filesystem that use older hardware but much less activities. I set debug file size to 2G.

            The problem was that "lctl debug_daemon stop" hanged until the 2G ran out. The debug file missed most part of the test. Same thing when I specified 1G

            jaylan Jay Lan (Inactive) added a comment - - edited I tried to run the test on an lustre filesystem that use older hardware but much less activities. I set debug file size to 2G. The problem was that "lctl debug_daemon stop" hanged until the 2G ran out. The debug file missed most part of the test. Same thing when I specified 1G

            People

              bogl Bob Glossman (Inactive)
              jaylan Jay Lan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: