[LU-2246] failure on sanity.sh test_132: ASSERTION( env->le_ses != ((void *)0) ) failed Created: 29/Oct/12 Updated: 07/Jan/16 Resolved: 07/Jan/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Both server and client are RHEL6 |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 5318 | ||||||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/34cdc80e-21c2-11e2-b552-52540035b04c. The sub-test test_132 failed with the following error:
From MDS console log: 00:20:08:Lustre: lustre-MDT0000: haven't heard from client 5be596f3-4c31-36f1-0f15-cf978ff44af7 (at 192.168.4.23@o2ib) in 55 seconds. I think it's dead, and I am evicting it. exp ffff88030ddd2000, cur 1351495207 expire 1351495177 last 1351495152 00:20:08:LustreError: 25221:0:(mdd_device.c:1426:md_ucred()) ASSERTION( env->le_ses != ((void *)0) ) failed: 00:20:08:LustreError: 25221:0:(mdd_device.c:1426:md_ucred()) LBUG 00:20:08:Pid: 25221, comm: ll_evictor 00:20:08: 00:20:08:Call Trace: 00:20:08: [<ffffffffa03f2905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 00:20:08: [<ffffffffa03f2f17>] lbug_with_loc+0x47/0xb0 [libcfs] 00:20:08: [<ffffffffa0be7c2c>] md_ucred+0x5c/0x60 [mdd] 00:20:08: [<ffffffffa0bd03a6>] mdd_xattr_sanity_check+0x36/0x1f0 [mdd] 00:20:08: [<ffffffffa0bd71db>] mdd_xattr_set+0x17b/0x620 [mdd] 00:20:08: [<ffffffffa0bf0426>] ? mdd_read_unlock+0x26/0x30 [mdd] 00:20:08: [<ffffffffa0bd513b>] ? mdd_xattr_get+0x13b/0x340 [mdd] 00:20:08: [<ffffffffa0c4de73>] mdt_som_attr_set+0x1b3/0x440 [mdt] 00:20:08: [<ffffffffa0c4e24c>] mdt_ioepoch_close_on_eviction+0x14c/0x170 [mdt] 00:20:08: [<ffffffffa0f48e89>] ? osd_key_init+0x119/0x680 [osd_ldiskfs] 00:20:08: [<ffffffffa0c4eccb>] mdt_ioepoch_close+0x2ab/0x3d0 [mdt] 00:20:08: [<ffffffffa0c4f272>] mdt_mfd_close+0x482/0x700 [mdt] 00:20:08: [<ffffffffa0c1e01e>] mdt_obd_disconnect+0x3ae/0x4f0 [mdt] 00:20:08: [<ffffffffa056ed88>] class_fail_export+0x248/0x580 [obdclass] 00:20:08: [<ffffffffa0765e69>] ping_evictor_main+0x249/0x640 [ptlrpc] 00:20:08: [<ffffffff81060250>] ? default_wake_function+0x0/0x20 00:20:08: [<ffffffffa0765c20>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 00:20:08: [<ffffffff8100c14a>] child_rip+0xa/0x20 00:20:08: [<ffffffffa0765c20>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 00:20:08: [<ffffffffa0765c20>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 00:20:08: [<ffffffff8100c140>] ? child_rip+0x0/0x20 00:20:08: 00:20:08:Kernel panic - not syncing: LBUG 00:20:08:Pid: 25221, comm: ll_evictor Not tainted 2.6.32-279.5.1.el6_lustre.x86_64 #1 00:20:08:Call Trace: 00:20:08: [<ffffffff814fd58a>] ? panic+0xa0/0x168 00:20:08: [<ffffffffa03f2f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 00:20:08: [<ffffffffa0be7c2c>] ? md_ucred+0x5c/0x60 [mdd] 00:20:08: [<ffffffffa0bd03a6>] ? mdd_xattr_sanity_check+0x36/0x1f0 [mdd] 00:20:08: [<ffffffffa0bd71db>] ? mdd_xattr_set+0x17b/0x620 [mdd] 00:20:08: [<ffffffffa0bf0426>] ? mdd_read_unlock+0x26/0x30 [mdd] 00:20:08: [<ffffffffa0bd513b>] ? mdd_xattr_get+0x13b/0x340 [mdd] 00:20:08: [<ffffffffa0c4de73>] ? mdt_som_attr_set+0x1b3/0x440 [mdt] 00:20:08: [<ffffffffa0c4e24c>] ? mdt_ioepoch_close_on_eviction+0x14c/0x170 [mdt] 00:20:08: [<ffffffffa0f48e89>] ? osd_key_init+0x119/0x680 [osd_ldiskfs] 00:20:08: [<ffffffffa0c4eccb>] ? mdt_ioepoch_close+0x2ab/0x3d0 [mdt] 00:20:08: [<ffffffffa0c4f272>] ? mdt_mfd_close+0x482/0x700 [mdt] 00:20:08: [<ffffffffa0c1e01e>] ? mdt_obd_disconnect+0x3ae/0x4f0 [mdt] 00:20:08: [<ffffffffa056ed88>] ? class_fail_export+0x248/0x580 [obdclass] 00:20:08: [<ffffffffa0765e69>] ? ping_evictor_main+0x249/0x640 [ptlrpc] 00:20:08: [<ffffffff81060250>] ? default_wake_function+0x0/0x20 00:20:08: [<ffffffffa0765c20>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 00:20:08: [<ffffffff8100c14a>] ? child_rip+0xa/0x20 00:20:08: [<ffffffffa0765c20>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 00:20:08: [<ffffffffa0765c20>] ? ping_evictor_main+0x0/0x640 [ptlrpc] 00:20:08: [<ffffffff8100c140>] ? child_rip+0x0/0x20 |
| Comments |
| Comment by Hongchao Zhang [ 08/Nov/12 ] |
|
this issue is related to SOM, and it can be reproduced easily, it will occur if a client is evicted while SOM is enabled, in this case, the client encountered ASSERTION of |
| Comment by Hongchao Zhang [ 09/Nov/12 ] |
|
the initial patch is under test and will be attached soon |
| Comment by Hongchao Zhang [ 12/Nov/12 ] |
|
the patch is tracked at http://review.whamcloud.com/#change,4512 this problem is related to SOM, which is not a mature feature, and the patch fixes some bugs in SOM, 1, the lu_env could not contain lu_context related to specific request (lu_env->le_ses), say, for eviction case, this issue can be dropped as a blocker, for it is caused by blocker |
| Comment by Hongchao Zhang [ 09/Dec/12 ] |
|
the patch is updated again, and it tries to fix the following problems related to SOM, the ASSERTION issue 1, if there is no LSM, ll_som_update won't need to update SOM attributes on MDS (test_206 in sanity.sh) 2, if the file isn't a regular one, ll_setattr_raw should not open the file (MF_EPOCH_OPEN), for it will leave the set_attr 3, mdt_som_au_close could be called during eviction, then there is no ptlrpc_request with it. 4, during closing a file, MDT should not require the client to send AU if the file has been unlinked. |
| Comment by John Fuchs-Chesney (Inactive) [ 07/Jan/16 ] |
|
Patch was merged. |