[LU-5331] qsd_handler.c:1139:qsd_op_adjust()) ASSERTION( qqi ) failed Created: 11/Jul/14 Updated: 20/Jan/22 Resolved: 30/Jul/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 14873 | ||||||||||||
| Description |
|
Had this crash happen on the tip of master as of yesterday running test 900 of sanity.sh: <3>[109552.952152] LustreError: 12574:0:(client.c:1079:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88001bd78be8 x1473227759636904/t0(0) o13->lustre-OST0000-osc-MDT0000@0@lo:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 <3>[109552.954088] LustreError: 12574:0:(client.c:1079:ptlrpc_import_delay_req()) Skipped 1032 previous similar messages <4>[109556.157386] Lustre: server umount lustre-MDT0000 complete <0>[109556.443175] LustreError: 59:0:(qsd_handler.c:1139:qsd_op_adjust()) ASSERTION( qqi ) failed: <0>[109556.443758] LustreError: 59:0:(qsd_handler.c:1139:qsd_op_adjust()) LBUG <4>[109556.444090] Pid: 59, comm: kswapd0 <4>[109556.444347] <4>[109556.444348] Call Trace: <4>[109556.444843] [<ffffffffa0a9c8a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4>[109556.445208] [<ffffffffa0a9cea7>] lbug_with_loc+0x47/0xb0 [libcfs] <4>[109556.445581] [<ffffffffa08d671c>] qsd_op_adjust+0x4cc/0x5a0 [lquota] <4>[109556.445972] [<ffffffff811a6c7d>] ? generic_drop_inode+0x1d/0x80 <4>[109556.446306] [<ffffffffa09ba8af>] osd_object_delete+0x1ff/0x2d0 [osd_ldiskfs] <4>[109556.446858] [<ffffffffa0c507b1>] lu_object_free+0x81/0x1a0 [obdclass] <4>[109556.447211] [<ffffffffa0ab28e2>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs] <4>[109556.447568] [<ffffffffa0c51827>] lu_site_purge+0x2c7/0x4c0 [obdclass] <4>[109556.447915] [<ffffffffa0c51ba8>] lu_cache_shrink+0x188/0x310 [obdclass] <4>[109556.448237] [<ffffffff8113712d>] shrink_slab+0x13d/0x1c0 <4>[109556.448547] [<ffffffff8113a58a>] balance_pgdat+0x5ba/0x830 <4>[109556.448856] [<ffffffff81140676>] ? set_pgdat_percpu_threshold+0xa6/0xd0 <4>[109556.449198] [<ffffffff8113a934>] kswapd+0x134/0x3b0 <4>[109556.449486] [<ffffffff81098f90>] ? autoremove_wake_function+0x0/0x40 <4>[109556.449810] [<ffffffff8113a800>] ? kswapd+0x0/0x3b0 <4>[109556.450097] [<ffffffff81098c06>] kthread+0x96/0xa0 <4>[109556.450423] [<ffffffff8100c24a>] child_rip+0xa/0x20 <4>[109556.450736] [<ffffffff81098b70>] ? kthread+0x0/0xa0 <4>[109556.451027] [<ffffffff8100c240>] ? child_rip+0x0/0x20 <4>[109556.451516] <0>[109556.452221] LustreError: 32249:0:(ofd_dev.c:2296:ofd_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: tag in my tree: master-20140710 |
| Comments |
| Comment by Niu Yawei (Inactive) [ 11/Jul/14 ] |
|
This looks a general race of umount thread vs. slab shrink thread. 1. memory reclaim thread (kswapd, for exmaple) calls lu_site_purge(nr) to purge some lu_objects; It looks to me that ofd_stack_fini() and mdt_stack_fini() needs be improved to address such kind of race. I don't quite understand why we always called lu_purge_site(-1) twice in ofd_statck_fini() & mdt_stack_fini(). |
| Comment by Niu Yawei (Inactive) [ 14/Jul/14 ] |
|
Alex/Tappro, why do we have to call lu_site_purge() twice in ofd_stack_fini() & mdt_statck_fini()? Thanks. |
| Comment by Niu Yawei (Inactive) [ 15/Jul/14 ] |
| Comment by Niu Yawei (Inactive) [ 30/Jul/14 ] |
|
patch landed for 2.6 |
| Comment by Isaac Huang (Inactive) [ 16/Oct/14 ] |
|
The lu_site::ls_purge_mutex added by the patch may hurt osd-zfs object creation rate, see |