Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4840

Deadlock when truncating file during lfs migrate

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.4.2
    • 3
    • 13336

    Description

      While migrating a file with "lfs migrate", if a process tries to truncate the file, both lfs migrate and truncating processes will deadlock.

      This will result in both processes never finishing (unless it is killed) and watchdog messages saying that the processes did not progress for the last XXX seconds.

      Here is a reproducer:

      [root@lustre24cli ~]# cat reproducer.sh
      #!/bin/sh
      
      FS=/test
      FILE=${FS}/file
      
      rm -f ${FILE}
      # Create a file on OST 1 of size 512M
      lfs setstripe -o 1 -c 1 ${FILE}
      dd if=/dev/zero of=${FILE} bs=1M count=512
      
      echo 3 > /proc/sys/vm/drop_caches
      
      # Launch a migrate to OST 0 and a bit later open it for write
      lfs migrate -i 0 --block ${FILE} &
      sleep 2
      dd if=/dev/zero of=${FILE} bs=1M count=512 
      

      Once the last dd tries to open the file, both lfs and dd processes stay forever with this stack:

      lfs stack:

      [<ffffffff8128e864>] call_rwsem_down_read_failed+0x14/0x30
      [<ffffffffa08d98dd>] ll_file_io_generic+0x29d/0x600 [lustre]
      [<ffffffffa08d9d7f>] ll_file_aio_read+0x13f/0x2c0 [lustre]
      [<ffffffffa08da61c>] ll_file_read+0x16c/0x2a0 [lustre]
      [<ffffffff811896b5>] vfs_read+0xb5/0x1a0
      [<ffffffff811897f1>] sys_read+0x51/0x90
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      dd stack:

      [<ffffffffa03436fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [<ffffffffa04779fa>] cl_lock_state_wait+0x1aa/0x320 [obdclass]
      [<ffffffffa04781eb>] cl_enqueue_locked+0x15b/0x1f0 [obdclass]
      [<ffffffffa0478d6e>] cl_lock_request+0x7e/0x270 [obdclass]
      [<ffffffffa047e00c>] cl_io_lock+0x3cc/0x560 [obdclass]
      [<ffffffffa047e242>] cl_io_loop+0xa2/0x1b0 [obdclass]
      [<ffffffffa092a8c8>] cl_setattr_ost+0x208/0x2c0 [lustre]
      [<ffffffffa08f8a0e>] ll_setattr_raw+0x9ce/0x1000 [lustre]
      [<ffffffffa08f909b>] ll_setattr+0x5b/0xf0 [lustre]
      [<ffffffff811a7348>] notify_change+0x168/0x340
      [<ffffffff81187074>] do_truncate+0x64/0xa0
      [<ffffffff8119bcc1>] do_filp_open+0x861/0xd20
      [<ffffffff81185d39>] do_sys_open+0x69/0x140
      [<ffffffff81185e50>] sys_open+0x20/0x30
      [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      Attachments

        Issue Links

          Activity

            [LU-4840] Deadlock when truncating file during lfs migrate

            Thanks Jinshan.

            I still have a question regarding when to check data version. In which case could it fail if we both get and check it under the file lease? Also swap layout will perform a dataversion check. Is the following sequence flawed?

            • open source file
            • get lease
            • open volatile
            • get data version
            • copy file content
            • put lease
            • swap layout (pass in data version for check)
            • close volatile
            • close source file
            hdoreau Henri Doreau (Inactive) added a comment - Thanks Jinshan. I still have a question regarding when to check data version. In which case could it fail if we both get and check it under the file lease? Also swap layout will perform a dataversion check. Is the following sequence flawed? open source file get lease open volatile get data version copy file content put lease swap layout (pass in data version for check) close volatile close source file

            Please check the attachment for the implementation of migration.

            The procedure is a little bit like HSM release where close and swap layout should be an atomic operation. Also, you need to check if the lease is valid in the middle of data copying periodically therefore data copying can abort if the file is being opened by others.

            Please take a look at it and I'll be happy to answer questions.

            jay Jinshan Xiong (Inactive) added a comment - Please check the attachment for the implementation of migration. The procedure is a little bit like HSM release where close and swap layout should be an atomic operation. Also, you need to check if the lease is valid in the middle of data copying periodically therefore data copying can abort if the file is being opened by others. Please take a look at it and I'll be happy to answer questions.

            Use file lease to implement migration.

            jay Jinshan Xiong (Inactive) added a comment - Use file lease to implement migration.

            Here's a patch aimed to solve it:

            http://review.whamcloud.com/10013

            hdoreau Henri Doreau (Inactive) added a comment - Here's a patch aimed to solve it: http://review.whamcloud.com/10013

            OK, CEA will do it.
            Could you confirm some behaviour of open lease to be sure we are in line. There is only exclusive open lease for now? When lease are revoked by concurrent access?

            adegremont Aurelien Degremont (Inactive) added a comment - OK, CEA will do it. Could you confirm some behaviour of open lease to be sure we are in line. There is only exclusive open lease for now? When lease are revoked by concurrent access?

            That's true, Aurelien. As long as the migration was implemented by CEA, would it be possible for CEA to pick it up again to reimplement it with open lease?

            Jinshan

            jay Jinshan Xiong (Inactive) added a comment - That's true, Aurelien. As long as the migration was implemented by CEA, would it be possible for CEA to pick it up again to reimplement it with open lease? Jinshan

            I see no objection to replacing grouplock-based code with an open-lease mechanism. Current code was using this lock because open lease did not exist as that time and we were advised to do this this way as they was no mechanism to protect from concurrent access. I think it will be cleaner.

            adegremont Aurelien Degremont (Inactive) added a comment - I see no objection to replacing grouplock-based code with an open-lease mechanism. Current code was using this lock because open lease did not exist as that time and we were advised to do this this way as they was no mechanism to protect from concurrent access. I think it will be cleaner.
            jay Jinshan Xiong (Inactive) added a comment - - edited

            It's a good chance to reimplement migration with open lease.

            Aurelien and JC, do you have any inputs on this?

            jay Jinshan Xiong (Inactive) added a comment - - edited It's a good chance to reimplement migration with open lease. Aurelien and JC, do you have any inputs on this?
            bobijam Zhenyu Xu added a comment -

            It relates to the lfs migrate implementation (LU-2445), lfs migrate first takes a group lock to limite concurrent OST access from other clients, then vfs reads and writes the file to migrate data from OSTs to OSTs, at last drops the group lock.

            In this case, lfs migrate gets the group lock, then the other client from the same node tries to truncate the file which takes the inode truncate semaphore (lli_trunc_sem) and enqueues OST lock and waits for it to be granted.

            The lfs migrate comes to the read phase, it tries get the truncate semaphore as well.

            The other client cannot get its OST lock granted, since OST cannot revoke it from the lfs migrate process. Deadlock happens.

            bobijam Zhenyu Xu added a comment - It relates to the lfs migrate implementation ( LU-2445 ), lfs migrate first takes a group lock to limite concurrent OST access from other clients, then vfs reads and writes the file to migrate data from OSTs to OSTs, at last drops the group lock. In this case, lfs migrate gets the group lock, then the other client from the same node tries to truncate the file which takes the inode truncate semaphore (lli_trunc_sem) and enqueues OST lock and waits for it to be granted. The lfs migrate comes to the read phase, it tries get the truncate semaphore as well. The other client cannot get its OST lock granted, since OST cannot revoke it from the lfs migrate process. Deadlock happens.

            I can also reproduce it on master. The exact details of the lfs and dd task stacks are a little different but the deadlock is still there.

            bogl Bob Glossman (Inactive) added a comment - I can also reproduce it on master. The exact details of the lfs and dd task stacks are a little different but the deadlock is still there.
            green Oleg Drokin added a comment -

            So I tried the script on master and it also happens there.

            green Oleg Drokin added a comment - So I tried the script on master and it also happens there.

            People

              bobijam Zhenyu Xu
              patrick.valentin Patrick Valentin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: