Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16206

PCC crashes MDS: mdt_big_xattr_get()) ASSERTION( info->mti_big_lmm_used == 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.14.0, Lustre 2.15.1
    • None
    • Linux 5.4.0-1091-azure #96~18.04.1-Ubuntu SMP Tue Aug 30 19:15:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    • 3
    • 9223372036854775807

    Description

      Reproducible on 2.15.1 and 2.14.0.  Both clients and servers are running Ubuntu 18.04 as shown in Environment.

      Steps to reproduce:

      # confirm hsm is enabled
      mds-node:~# lctl get_param mdt.lustrefs-MDT0000.hsm_control
      mdt.lustrefs-MDT0000.hsm_control=enabled

      # setup pcc on client 0
      client-0:~# mkdir /pcc
      client-0:~# chmod 777 /pcc /lustre
      client-0:~# lhsmtool_posix --daemon --hsm-root /pcc --archive=2 /lustre < /dev/null > /tmp/copytool_log 2>&1
      client-0:~# lctl pcc add /lustre /pcc -p "gid={0},gid={2001} rwid=2"
      # setup pcc on client 1
      client-1:~# mkdir /pcc
      client-1:~# chmod 777 /pcc /lustre
      client-1:~# lhsmtool_posix --daemon --hsm-root /pcc --archive=3 /lustre < /dev/null > /tmp/copytool_log 2>&1
      client-1:~# lctl pcc add /lustre /pcc -p "gid={0},gid={2001} rwid=3"
      # create file on client 0 and confirm in-cache
      client-0:~# echo "test" > /lustre/test
      client-0:~# lfs pcc state /lustre/test
      file: /lustre/test, type: readwrite, PCC file: /pcc/0001/0000/0402/0000/0002/0000/0x200000402:0x1:0x0, user number: 0, flags: 0
      # read file from client 1
      client-1:~# lfs pcc state /lustre/test
      file: /lustre/test, type: none
      client-1:~# cat /lustre/test
      cat: /lustre/test: No data available
      client-1:~# cat /lustre/test
      test
      client-1:~# lfs pcc state /lustre/test
      file: /lustre/test, type: none
      # check pcc state, and attempt to attach again on client 0
      client-0:~# lfs pcc state /lustre/test
      file: /lustre/test, type: none
      client-0:~# lfs pcc attach -i 2 /lustre/test
      ^C^C^C^C^C^C^C^C^C   <---- hang
      # while client 0 is hanging, check state on client 1
      client-1:~# lfs pcc state /lustre/test
      ^C^C^C^C  <---- hang

      Minutes later things resolve and the stuck command lines return.  Examining the MDS, it crashed and rebooted.  Relevant
      output from dmesg:

      [ 3266.211270] LustreError: 11458:0:(mdt_handler.c:960:mdt_big_xattr_get()) ASSERTION( info->mti_big_lmm_used == 0 ) failed:
      [ 3266.217023] LustreError: 11458:0:(mdt_handler.c:960:mdt_big_xattr_get()) LBUG
      [ 3266.220653] Pid: 11458, comm: mdt_rdpg02_001 5.4.0-1091-azure #96~18.04.1-Ubuntu SMP Tue Aug 30 19:15:32 UTC 2022
      [ 3266.220653] Call Trace TBD:
      [ 3266.220654] Kernel panic - not syncing: LBUG
      [ 3266.222778] CPU: 8 PID: 11458 Comm: mdt_rdpg02_001 Kdump: loaded Tainted: P           OE     5.4.0-1091-azure #96~18.04.1-Ubuntu
      [ 3266.224582] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
      [ 3266.224582] Call Trace:
      [ 3266.224582]  dump_stack+0x57/0x6d
      [ 3266.224582]  panic+0xf8/0x2d4
      [ 3266.224582]  lbug_with_loc+0x89/0x2c0 [libcfs]
      [ 3266.224582]  mdt_big_xattr_get+0x398/0x8b0 [mdt]
      [ 3266.224582]  ? mdd_read_unlock+0x2d/0xc0 [mdd]
      [ 3266.224582]  ? mdd_readpage+0x1919/0x1ed0 [mdd]
      [ 3266.224582]  __mdt_stripe_get+0x1d4/0x430 [mdt]
      [ 3266.224582]  mdt_attr_get_complex+0x56e/0x1af0 [mdt]
      [ 3266.224582]  mdt_mfd_close+0x2062/0x41c0 [mdt]
      [ 3266.224582]  ? lustre_msg_buf+0x17/0x50 [ptlrpc]
      [ 3266.224582]  ? __req_capsule_offset+0x5ae/0x6e0 [ptlrpc]
      [ 3266.224582]  mdt_close_internal+0x1f0/0x250 [mdt]
      [ 3266.259003]  mdt_close+0x483/0x13f0 [mdt][ 3266.259003]  tgt_request_handle+0xc9a/0x1950 [ptlrpc]
      [ 3266.259003]  ? lustre_msg_get_transno+0x22/0xe0 [ptlrpc]
      [ 3266.259003]  ptlrpc_register_service+0x25e6/0x4610 [ptlrpc]
      [ 3266.259003]  ? __switch_to_asm+0x34/0x70
      [ 3266.259003]  kthread+0x121/0x140
      [ 3266.259003]  ? ptlrpc_register_service+0x1590/0x4610 [ptlrpc]
      [ 3266.259003]  ? kthread_park+0x90/0x90
      [ 3266.259003]  ret_from_fork+0x35/0x40
      [ 3266.259003] Kernel Offset: 0x1be00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              elliswilson Ellis Wilson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: