[LU-8872] sanity-lfsck: no tests run Created: 29/Nov/16  Updated: 16/May/17  Resolved: 26/Mar/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Full - EL7.3 Server/EL7.3 Client - ZFS
b2_9, build# 21


Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/414e1f26-b443-11e6-b287-5254006e85c2.

Error message:

/usr/bin/lfs setquota -g quota_2usr -b 7997952 -B 8397849 -i 109015 -I 114465 /mnt/lustre FAILED!

suite_log:

Total allocated inode limit: 0, total allocated block limit: 0
Setting up quota on onyx-32vm1.onyx.hpdd.intel.com:/mnt/lustre for quota_2usr...
+ /usr/bin/lfs setquota -u quota_2usr -b 7997952 -B 8397849 -i 109015 -I 114465 /mnt/lustre
+ /usr/bin/lfs setquota -g quota_2usr -b 7997952 -B 8397849 -i 109015 -I 114465 /mnt/lustre
setquota failed: Transport endpoint is not connected
 sanity-lfsck : @@@@@@ FAIL: /usr/bin/lfs setquota -g quota_2usr -b 7997952 -B 8397849 -i 109015 -I 114465 /mnt/lustre FAILED! 

Might be related to LU-8340



 Comments   
Comment by James Nunez (Inactive) [ 29/Nov/16 ]

In the test_complete log for vm8, we see:

17:33:59:[  606.188774] Lustre: lustre-OST0003: Connection restored to lustre-MDT0000-mdtlov_UUID (at 10.2.4.117@tcp)
17:33:59:[  606.191655] Lustre: Skipped 6 previous similar messages
17:33:59:[  610.365899] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname 		                           lustre-ost4/ost4 2>/dev/null
17:33:59:[  614.462466] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
17:33:59:[  614.746365] Lustre: DEBUG MARKER: Using TIMEOUT=20
17:33:59:[  615.646830] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osd-zfs.lustre-OST0000.quota_slave.enabled
17:33:59:[  621.474849] LustreError: 15111:0:(qsd_writeback.c:124:qsd_add_deferred()) ASSERTION( tmp->qur_lqe ) failed: 
17:33:59:[  621.477708] LustreError: 15111:0:(qsd_writeback.c:124:qsd_add_deferred()) LBUG
17:33:59:[  621.480296] Pid: 15111, comm: ldlm_cb00_000
17:33:59:[  621.482716] 
17:33:59:[  621.482716] Call Trace:
17:33:59:[  621.486796]  [<ffffffffa09c57d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
17:33:59:[  621.489228]  [<ffffffffa09c5841>] lbug_with_loc+0x41/0xb0 [libcfs]
17:33:59:[  621.491543]  [<ffffffffa0f0b6af>] qsd_upd_schedule+0x6ef/0x760 [lquota]
17:33:59:[  621.493834]  [<ffffffffa0f04178>] qsd_glb_glimpse_ast+0x228/0x3a0 [lquota]
17:33:59:[  621.496292]  [<ffffffffa0d1fa3d>] ldlm_callback_handler.part.24+0x13bd/0x2110 [ptlrpc]
17:33:59:[  621.498685]  [<ffffffffa09d0537>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
17:33:59:[  621.500993]  [<ffffffffa0d207c7>] ldlm_callback_handler+0x37/0xd0 [ptlrpc]
17:33:59:[  621.503350]  [<ffffffffa0d4d1fb>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
17:33:59:[  621.505735]  [<ffffffffa0d4adb8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
17:33:59:[  621.508047]  [<ffffffffa0d512b0>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
17:33:59:[  621.510292]  [<ffffffffa0d50810>] ? ptlrpc_main+0x0/0x1de0 [ptlrpc]
17:33:59:[  621.512503]  [<ffffffff810b052f>] kthread+0xcf/0xe0
17:33:59:[  621.514565]  [<ffffffff810b0460>] ? kthread+0x0/0xe0
17:33:59:[  621.516605]  [<ffffffff81696658>] ret_from_fork+0x58/0x90
17:33:59:[  621.518642]  [<ffffffff810b0460>] ? kthread+0x0/0xe0
17:33:59:[  621.520596] 
17:33:59:[  621.522237] Kernel panic - not syncing: LBUG
17:33:59:[  621.523228] CPU: 1 PID: 15111 Comm: ldlm_cb00_000 Tainted: P           OE  ------------   3.10.0-514.el7_lustre.x86_64 #1
17:33:59:[  621.523228] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
Comment by Joseph Gmitter (Inactive) [ 29/Nov/16 ]

Hi Niu,

There seems to be some relation to quota related LBUGs. Can you please have a look?

Thanks.
Joe

Comment by Niu Yawei (Inactive) [ 30/Nov/16 ]

That LASSERT is just inappropriate, the 'lqe' can be NULL for global list, I'm going to cook a patch to remove it.

Comment by Gerrit Updater [ 30/Nov/16 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/24024
Subject: LU-8872 quota: incorrect LASSERT
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ff93e16c6deeb150258fd9f86c2e455b2591c67b

Comment by Gerrit Updater [ 26/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24024/
Subject: LU-8872 quota: incorrect LASSERT
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 56203820a260e9ddb6e084df842e6697c7a4eca7

Comment by Peter Jones [ 26/Mar/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:21:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.