Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
There is a valid case when it is impossible to map OST index into an appropriate index of lqe global array(lqe_gblb_array). This might happen when newly added OSTs haven't connected yet to QMT and there is no corresponding index files in quota_master/dt-0x0 directory. At the same time if these OSTs already exist in OST pools, this might cause following panic:
Apr 5 12:32:12 vm01 kernel: LustreError: Skipped 2 previous similar messages Apr 5 12:32:24 vm01 kernel: LustreError: 28185:0:(qmt_entry.c:1145:qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used ) failed: Cannot map ostidx 32 for 00000000b8271fd4 Apr 5 12:32:24 vm01 kernel: LustreError: 28182:0:(qmt_entry.c:1145:qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used ) failed: Cannot map ostidx 32 for 000000000505fcbe Apr 5 12:32:24 vm01 kernel: LustreError: 28185:0:(qmt_entry.c:1145:qmt_map_lge_idx()) LBUG Apr 5 12:32:24 vm01 kernel: LustreError: 28182:0:(qmt_entry.c:1145:qmt_map_lge_idx()) LBUG Apr 5 12:32:24 vm01 kernel: Pid: 28185, comm: mdt_rdpg08_004 4.18.0-477.27.1.el8_lustre.ddn17.x86_64 #1 SMP Wed Dec 20 09:59:57 UTC 2023 Apr 5 12:32:24 vm01 kernel: Call Trace TBD: Apr 5 12:32:24 vm01 kernel: [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs] Apr 5 12:32:24 vm01 kernel: [<0>] lbug_with_loc+0x3f/0x70 [libcfs] Apr 5 12:32:24 vm01 kernel: [<0>] qmt_map_lge_idx+0x7f/0x90 [lquota] Apr 5 12:32:24 vm01 kernel: [<0>] qmt_seed_glbe_all+0x17f/0x770 [lquota] Apr 5 12:32:24 vm01 kernel: [<0>] qmt_revalidate_lqes+0x213/0x360 [lquota] Apr 5 12:32:24 vm01 kernel: [<0>] qmt_dqacq0+0x7d5/0x2320 [lquota] Apr 5 12:32:24 vm01 kernel: [<0>] qmt_intent_policy+0x8d2/0xf10 [lquota] Apr 5 12:32:24 vm01 kernel: [<0>] mdt_intent_opc+0x9a9/0xa80 [mdt] Apr 5 12:32:24 vm01 kernel: [<0>] mdt_intent_policy+0x1fd/0x390 [mdt] Apr 5 12:32:24 vm01 kernel: [<0>] ldlm_lock_enqueue+0x469/0xa90 [ptlrpc] Apr 5 12:32:24 vm01 kernel: [<0>] ldlm_handle_enqueue0+0x61a/0x16c0 [ptlrpc] Apr 5 12:32:24 vm01 kernel: [<0>] tgt_enqueue+0xa4/0x200 [ptlrpc] Apr 5 12:32:24 vm01 kernel: [<0>] tgt_request_handle+0xc9c/0x1950 [ptlrpc] Apr 5 12:32:24 vm01 kernel: [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc] Apr 5 12:32:24 vm01 kernel: [<0>] ptlrpc_main+0xbf1/0x1510 [ptlrpc] Apr 5 12:32:24 vm01 kernel: [<0>] kthread+0x134/0x150 Apr 5 12:32:24 vm01 kernel: [<0>] ret_from_fork+0x1f/0x40 Apr 5 12:32:24 vm01 kernel: Kernel panic - not syncing: LBUG
Reproducer:
[root@vm1 tests]# cat test.sh #!/bin/bash OSTCOUNT=4 bash ./llmount.sh lctl set_param -P debug=+trace lctl set_param -P debug=+quota lctl set_param -P debug_mb=200 lctl set_param -P osd*.*.quota_slave.enabled=u lctl pool_new lustre.qpool1 lctl pool_add lustre.qpool1 OST[0-3] lfs setquota -u quota_usr -B50M /mnt/lustre chmod 777 /mnt/lustre umount /mnt/lustre-mds1 mount -t ldiskfs -o loop /tmp/lustre-mdt1 /mnt/mds ls -l /mnt/mds/quota_master/dt-0x0 rm -f /mnt/mds/quota_master/dt-0x0/*OST*UUID echo "ls after removing" ls -l /mnt/mds/quota_master/dt-0x0 umount /mnt/mds mount -t lustre -o loop /tmp/lustre-mdt1 /mnt/lustre-mds1 ./runas -u quota_usr dd if=/dev/zero of=/mnt/lustre/f1 bs=1M count=50 [root@vm1 tests]#
Merged for 2.17