[LU-6192] seq_client_alloc_fid()) ASSERTION( seq != ((void *)0) ) failed Created: 02/Feb/15  Updated: 02/Feb/15  Resolved: 02/Feb/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: zfs

Issue Links:
Duplicate
duplicates LU-6154 striped directory on ZFS Resolved
Severity: 3
Bugzilla ID: 6,154
Rank (Obsolete): 17315

 Description   

Trying to see how sanity works on ZFS (and I also have DNE on - two MDS) trying to run simple sanity.sh in slow mode, test 31p causes MDS to crash 100% of the time with that assertion:

<4>[ 4377.551909] Lustre: DEBUG MARKER: == sanity test 31p: remove of open striped directory == 22:00:43 (1422846043)
<0>[ 4377.641621] LustreError: 14253:0:(fid_request.c:329:seq_client_alloc_fid()) ASSERTION( seq != ((void *)0) ) failed: 
<0>[ 4377.641922] LustreError: 14253:0:(fid_request.c:329:seq_client_alloc_fid()) LBUG
<4>[ 4377.642333] Pid: 14253, comm: mdt01_009
<4>[ 4377.642459] 
<4>[ 4377.642459] Call Trace:
<4>[ 4377.642681]  [<ffffffffa05e88a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[ 4377.642839]  [<ffffffffa05e8ea7>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[ 4377.642996]  [<ffffffffa0bb005c>] seq_client_alloc_fid+0x46c/0x470 [fid]
<4>[ 4377.643155]  [<ffffffff81175e1b>] ? __kmalloc+0x1bb/0x2a0
<4>[ 4377.643298]  [<ffffffffa12395dc>] osd_fid_alloc+0xbc/0xc0 [osd_zfs]
<4>[ 4377.643453]  [<ffffffffa102916a>] lod_declare_xattr_set_lmv+0xe8a/0x28d0 [lod]
<4>[ 4377.643702]  [<ffffffffa0b277f8>] ? qsd_op_begin+0xb8/0xb70 [lquota]
<4>[ 4377.643855]  [<ffffffffa102ade7>] lod_dir_striping_create_internal+0x237/0x1b40 [lod]
<4>[ 4377.644119]  [<ffffffffa123d331>] ? osd_declare_quota+0x1c1/0x2d0 [osd_zfs]
<4>[ 4377.644293]  [<ffffffffa124314a>] ? osd_declare_object_create+0x33a/0x440 [osd_zfs]
<4>[ 4377.644543]  [<ffffffffa123edb1>] ? osd_object_write_unlock+0x61/0x70 [osd_zfs]
<4>[ 4377.644791]  [<ffffffffa102c951>] lod_declare_object_create+0x261/0x3e0 [lod]
<4>[ 4377.644954]  [<ffffffffa0f1a846>] mdd_declare_object_create_internal+0x116/0x340 [mdd]
<4>[ 4377.645211]  [<ffffffffa0f1605e>] mdd_create+0x68e/0x1730 [mdd]
<4>[ 4377.645366]  [<ffffffffa0f779cc>] ? mdt_version_save+0x8c/0x1a0 [mdt]
<4>[ 4377.645521]  [<ffffffffa0f7bdc1>] mdt_reint_create+0xbf1/0xd40 [mdt]
<4>[ 4377.645689]  [<ffffffffa0795b10>] ? lu_ucred+0x20/0x30 [obdclass]
<4>[ 4377.645854]  [<ffffffffa0f57d15>] ? mdt_ucred+0x15/0x20 [mdt]
<4>[ 4377.646102]  [<ffffffffa0f7282c>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
<4>[ 4377.646371]  [<ffffffffa09b98b2>] ? __req_capsule_get+0x162/0x6d0 [ptlrpc]
<4>[ 4377.646533]  [<ffffffffa0f76add>] mdt_reint_rec+0x5d/0x200 [mdt]
<4>[ 4377.646692]  [<ffffffffa0f5c64b>] mdt_reint_internal+0x4cb/0x7a0 [mdt]
<4>[ 4377.646847]  [<ffffffffa0f5cdbb>] mdt_reint+0x6b/0x120 [mdt]
<4>[ 4377.647009]  [<ffffffffa09f312e>] tgt_request_handle+0x8be/0x1000 [ptlrpc]
<4>[ 4377.647185]  [<ffffffffa09a4a34>] ptlrpc_main+0xdf4/0x1940 [ptlrpc]
<4>[ 4377.647349]  [<ffffffffa09a3c40>] ? ptlrpc_main+0x0/0x1940 [ptlrpc]
<4>[ 4377.647500]  [<ffffffff8109ce4e>] kthread+0x9e/0xc0
<4>[ 4377.647636]  [<ffffffff8100c24a>] child_rip+0xa/0x20
<4>[ 4377.647773]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
<4>[ 4377.647909]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
<4>[ 4377.648053] 
<0>[ 4377.649299] Kernel panic - not syncing: LBUG

I guess this is a bit similar to LU-2911 and LU-2958,except I do not have any olf FS configs or anthing of the nature, everything is freshly reformatted before the test starts.



 Comments   
Comment by Oleg Drokin [ 02/Feb/15 ]

I just tested and ldiskfs does not crash in this case, so it's a zfs-only issue apparently.

Comment by Di Wang [ 02/Feb/15 ]

I believe this can be fixed by the patch in LU-6154 http://review.whamcloud.com/#/c/13518/

Comment by Di Wang [ 02/Feb/15 ]

duplicate with LU-6154

Generated at Sat Feb 10 01:58:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.