Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11247

chgrp, OST mount, MDT/MGS journal deadlock

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This is similar to LU-11119, LU-11227, and LU-11236. However those issues are fixed by https://review.whamcloud.com/#/c/32964/ whereas this issue is not. If an OST containing an object for a file is unmounted when chgrp on that file is done then chgrp will hang (which is fine) and remounting the OST will hang as well (which is not):

      o:~# export OSTCOUNT=2
      o:~# $LUSTRE/tests/llmount.sh
      ...
      o:~# lfs setstripe -c2 /mnt/lustre/f0
      o:~# chown sanity: /mnt/lustre/f0
      o:~# umount /mnt/lustre-ost1
      o:~# sudo -u sanity chgrp gsanity1 /mnt/lustre/f0 &
      [1] 31691
      o:~# mount /tmp/lustre-ost1 /mnt/lustre-ost1 -t lustre -o loop
      

      Stack traces:

      31692 chgrp
      [<ffffffffc0de8520>] ptlrpc_set_wait+0x480/0x790 [ptlrpc]
      [<ffffffffc0de88ad>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [<ffffffffc10a8c57>] mdc_reint+0x57/0x160 [mdc]
      [<ffffffffc10a91ae>] mdc_setattr+0x1ae/0x4a0 [mdc]
      [<ffffffffc0d544ff>] lmv_setattr+0x20f/0x3b0 [lmv]
      [<ffffffffc16a42f7>] ll_setattr_raw+0x7e7/0x1290 [lustre]
      [<ffffffffc16a4e0c>] ll_setattr+0x6c/0xd0 [lustre]
      [<ffffffffb9239af4>] notify_change+0x2c4/0x420
      [<ffffffffb921840c>] chown_common+0x19c/0x1d0
      [<ffffffffb92199ef>] SyS_fchownat+0xcf/0x120
      [<ffffffffb972082f>] system_call_fastpath+0x1c/0x21
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      30381 mdt01_001
      [<ffffffffc0de8520>] ptlrpc_set_wait+0x480/0x790 [ptlrpc]
      [<ffffffffc0de88ad>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [<ffffffffc15ec733>] osp_remote_sync+0xd3/0x200 [osp]
      [<ffffffffc15d3cef>] osp_attr_set+0x4bf/0x5d0 [osp]
      [<ffffffffc15846b8>] lod_sub_attr_set+0x1c8/0x460 [lod]
      [<ffffffffc15630e0>] lod_obj_stripe_attr_set_cb+0x40/0x100 [lod]
      [<ffffffffc156f91e>] lod_obj_for_each_stripe+0x11e/0x2d0 [lod]
      [<ffffffffc157104b>] lod_attr_set+0x3db/0x9e0 [lod]
      [<ffffffffc1438e40>] mdd_attr_set_internal+0x120/0x2a0 [mdd]
      [<ffffffffc1439c2d>] mdd_attr_set+0x8bd/0xcf0 [mdd]
      [<ffffffffc14aa31f>] mdt_attr_set+0x19f/0xbb0 [mdt]
      [<ffffffffc14ab589>] mdt_reint_setattr+0x609/0xa90 [mdt]
      [<ffffffffc14aba93>] mdt_reint_rec+0x83/0x210 [mdt]
      [<ffffffffc148b1d2>] mdt_reint_internal+0x6b2/0xa80 [mdt]
      [<ffffffffc14961e7>] mdt_reint+0x67/0x140 [mdt]
      [<ffffffffc0e5e2aa>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [<ffffffffc0e0140b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [<ffffffffc0e04c44>] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
      [<ffffffffb90bb161>] kthread+0xd1/0xe0
      [<ffffffffb9720677>] ret_from_fork_nospec_end+0x0/0x39
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      31708 mount.lustre
      [<ffffffffc0de8520>] ptlrpc_set_wait+0x480/0x790 [ptlrpc]
      [<ffffffffc0de88ad>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
      [<ffffffffc0d7eb44>] mgc_target_register+0x134/0x4c0 [mgc]
      [<ffffffffc0d81e3b>] mgc_set_info_async+0x37b/0x1610 [mgc]
      [<ffffffffc0bde93b>] server_start_targets+0x116b/0x2a30 [obdclass]
      [<ffffffffc0be12fc>] server_fill_super+0x10fc/0x18c0 [obdclass]
      [<ffffffffc0bb65f8>] lustre_fill_super+0x328/0x950 [obdclass]
      [<ffffffffb921ef3f>] mount_nodev+0x4f/0xb0
      [<ffffffffc0bae748>] lustre_mount+0x38/0x60 [obdclass]
      [<ffffffffb921fabe>] mount_fs+0x3e/0x1b0
      [<ffffffffb923d097>] vfs_kern_mount+0x67/0x110
      [<ffffffffb923f6bf>] do_mount+0x1ef/0xce0
      [<ffffffffb92404f3>] SyS_mount+0x83/0xd0
      [<ffffffffb972082f>] system_call_fastpath+0x1c/0x21
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      30373 ll_mgs_0001
      [<ffffffffc0240495>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
      [<ffffffffc0241987>] __jbd2_journal_force_commit+0x57/0xb0 [jbd2]
      [<ffffffffc0241a21>] jbd2_journal_force_commit+0x21/0x30 [jbd2]
      [<ffffffffc12b9539>] ldiskfs_force_commit+0x29/0x30 [ldiskfs]
      [<ffffffffc1342290>] osd_sync+0x50/0x180 [osd_ldiskfs]
      [<ffffffffc13c3bed>] mgs_target_reg+0x62d/0x1320 [mgs]
      [<ffffffffc0e5e2aa>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [<ffffffffc0e0140b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [<ffffffffc0e04c44>] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
      [<ffffffffb90bb161>] kthread+0xd1/0xe0
      [<ffffffffb9720677>] ret_from_fork_nospec_end+0x0/0x39
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      Note that this requires a shared MGS and MDT.

      Attachments

        Activity

          People

            wc-triage WC Triage
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: