Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17770

qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used )

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There is a valid case when it is impossible to map OST index into an appropriate index of lqe global array(lqe_gblb_array). This might happen when newly added OSTs haven't connected yet to QMT and there is no corresponding index files in quota_master/dt-0x0 directory. At the same time if these OSTs already exist in OST pools, this might cause following panic:

      Apr  5 12:32:12 vm01 kernel: LustreError: Skipped 2 previous similar messages
      Apr  5 12:32:24 vm01 kernel: LustreError: 28185:0:(qmt_entry.c:1145:qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used ) failed: Cannot map ostidx 32 for 00000000b8271fd4
      Apr  5 12:32:24 vm01 kernel: LustreError: 28182:0:(qmt_entry.c:1145:qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used ) failed: Cannot map ostidx 32 for 000000000505fcbe
      Apr  5 12:32:24 vm01 kernel: LustreError: 28185:0:(qmt_entry.c:1145:qmt_map_lge_idx()) LBUG
      Apr  5 12:32:24 vm01 kernel: LustreError: 28182:0:(qmt_entry.c:1145:qmt_map_lge_idx()) LBUG
      Apr  5 12:32:24 vm01 kernel: Pid: 28185, comm: mdt_rdpg08_004 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 SMP Wed Dec 20 09:59:57 UTC 2023
      Apr  5 12:32:24 vm01 kernel: Call Trace TBD:
      Apr  5 12:32:24 vm01 kernel: [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
      Apr  5 12:32:24 vm01 kernel: [<0>] lbug_with_loc+0x3f/0x70 [libcfs]
      Apr  5 12:32:24 vm01 kernel: [<0>] qmt_map_lge_idx+0x7f/0x90 [lquota]
      Apr  5 12:32:24 vm01 kernel: [<0>] qmt_seed_glbe_all+0x17f/0x770 [lquota]
      Apr  5 12:32:24 vm01 kernel: [<0>] qmt_revalidate_lqes+0x213/0x360 [lquota]
      Apr  5 12:32:24 vm01 kernel: [<0>] qmt_dqacq0+0x7d5/0x2320 [lquota]
      Apr  5 12:32:24 vm01 kernel: [<0>] qmt_intent_policy+0x8d2/0xf10 [lquota]
      Apr  5 12:32:24 vm01 kernel: [<0>] mdt_intent_opc+0x9a9/0xa80 [mdt]
      Apr  5 12:32:24 vm01 kernel: [<0>] mdt_intent_policy+0x1fd/0x390 [mdt]
      Apr  5 12:32:24 vm01 kernel: [<0>] ldlm_lock_enqueue+0x469/0xa90 [ptlrpc]
      Apr  5 12:32:24 vm01 kernel: [<0>] ldlm_handle_enqueue0+0x61a/0x16c0 [ptlrpc]
      Apr  5 12:32:24 vm01 kernel: [<0>] tgt_enqueue+0xa4/0x200 [ptlrpc]
      Apr  5 12:32:24 vm01 kernel: [<0>] tgt_request_handle+0xc9c/0x1950 [ptlrpc]
      Apr  5 12:32:24 vm01 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
      Apr  5 12:32:24 vm01 kernel: [<0>] ptlrpc_main+0xbf1/0x1510 [ptlrpc]
      Apr  5 12:32:24 vm01 kernel: [<0>] kthread+0x134/0x150
      Apr  5 12:32:24 vm01 kernel: [<0>] ret_from_fork+0x1f/0x40
      Apr  5 12:32:24 vm01 kernel: Kernel panic - not syncing: LBUG 

      Reproducer:

      [root@vm1 tests]# cat test.sh 
      #!/bin/bash
      
      
      OSTCOUNT=4 bash ./llmount.sh
      lctl set_param -P debug=+trace
      lctl set_param -P debug=+quota
      lctl set_param -P debug_mb=200
      lctl set_param -P osd*.*.quota_slave.enabled=u
      lctl pool_new lustre.qpool1
      lctl pool_add lustre.qpool1 OST[0-3]
      lfs setquota -u quota_usr -B50M /mnt/lustre
      chmod 777 /mnt/lustre
      umount /mnt/lustre-mds1
      mount -t ldiskfs -o loop /tmp/lustre-mdt1 /mnt/mds
      ls -l /mnt/mds/quota_master/dt-0x0
      rm -f /mnt/mds/quota_master/dt-0x0/*OST*UUID
      echo "ls after removing"
      ls -l /mnt/mds/quota_master/dt-0x0
      umount /mnt/mds
      mount -t lustre -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1
      ./runas -u quota_usr dd if=/dev/zero of=/mnt/lustre/f1 bs=1M count=50
      [root@vm1 tests]#   

      Attachments

        Activity

          [LU-17770] qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used )
          pjones Peter Jones added a comment -

          Merged for 2.17

          pjones Peter Jones added a comment - Merged for 2.17

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55476/
          Subject: LU-17770 quota: don't panic in qmt_map_lge_idx
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 1f9689d0f92e8a8cdfe162ea0e56b9bed2c9f6d2

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55476/ Subject: LU-17770 quota: don't panic in qmt_map_lge_idx Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1f9689d0f92e8a8cdfe162ea0e56b9bed2c9f6d2
          adilger Andreas Dilger added a comment - - edited

          "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55476
          Subject: LU-17770 quota: don't panic in qmt_map_lge_idx
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: c3e5c84015ac0305cd735fa059107d53374185d2

          adilger Andreas Dilger added a comment - - edited "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55476 Subject: LU-17770 quota: don't panic in qmt_map_lge_idx Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c3e5c84015ac0305cd735fa059107d53374185d2

          "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55245
          Subject: LU-17770 tests: assertion in qmt_map_lge_idx
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 4fe43c47be1b8ea1394ba1e1e8f4d6e534109e37

          gerrit Gerrit Updater added a comment - "Sergey Cheremencev <scherementsev@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55245 Subject: LU-17770 tests: assertion in qmt_map_lge_idx Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4fe43c47be1b8ea1394ba1e1e8f4d6e534109e37

          I have a fix for that but haven't enough time to convert a reproducer to a test case. I'm going to push it as soon I have a free slot.

          scherementsev Sergey Cheremencev added a comment - I have a fix for that but haven't enough time to convert a reproducer to a test case. I'm going to push it as soon I have a free slot.

          People

            scherementsev Sergey Cheremencev
            scherementsev Sergey Cheremencev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: