[LU-2828] conf-sanity test_64 test_59: MDS dt_object.h dt_declare_record_write() ASSERTION( dt != NULL ) Created: 18/Feb/13  Updated: 03/Apr/16  Resolved: 03/Mar/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0, Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Keith Mannthey (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: HB
Environment:

A patch pushed via git.


Severity: 3
Rank (Obsolete): 6849

 Description   

From this test run:
https://maloo.whamcloud.com/test_sessions/f62dc660-7943-11e2-9cb9-52540035b04c

The patch being tests is not involved in this area of the code.

conf-sanity test_64

Error: 'test failed to respond and timed out'
Failure Rate: 4.00% of last 100 executions [all branches]

In the MDS the following is seen:

09:37:51:Lustre: DEBUG MARKER: == conf-sanity test 64: check lfs df --lazy == 09:37:45 (1361122665)
09:37:51:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
09:37:51:Lustre: DEBUG MARKER: test -b /dev/lvm-MDS/P1
09:37:51:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl  		                   /dev/lvm-MDS/P1 /mnt/mds1
09:37:51:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
09:37:51:Lustre: lustre-MDT0000: used disk, loading
09:37:51:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
09:37:51:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 2>/dev/null
09:38:02:Lustre: lustre-OST0000-osc-MDT0000: Connection to lustre-OST0000 (at 10.10.17.34@tcp) was lost; in progress operations using this service will wait for recovery to complete
09:38:14:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
09:38:14:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1
09:38:14:LustreError: 7883:0:(client.c:1048:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88006c4a2c00 x1427239946161272/t0(0) o13->lustre-OST0000-osc-MDT0000@10.10.17.34@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
09:38:14:LustreError: 24649:0:(dt_object.h:979:dt_declare_record_write()) ASSERTION( dt != NULL ) failed: dt is NULL when we want to write record
09:38:14:LustreError: 24649:0:(dt_object.h:979:dt_declare_record_write()) LBUG
09:38:14:Pid: 24649, comm: osp-pre-1
09:38:14:
09:38:14:Call Trace:
09:38:14: [<ffffffffa0ee7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
09:38:14: [<ffffffffa0ee7e97>] lbug_with_loc+0x47/0xb0 [libcfs]
09:38:14: [<ffffffffa0704ca5>] osp_write_last_oid_seq_files+0x595/0x6a0 [osp]
09:38:14: [<ffffffffa070918d>] osp_precreate_thread+0x80d/0x1460 [osp]
09:38:14: [<ffffffffa0708980>] ? osp_precreate_thread+0x0/0x1460 [osp]
09:38:14: [<ffffffff8100c0ca>] child_rip+0xa/0x20
09:38:14: [<ffffffffa0708980>] ? osp_precreate_thread+0x0/0x1460 [osp]
09:38:14: [<ffffffffa0708980>] ? osp_precreate_thread+0x0/0x1460 [osp]
09:38:14: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Looks like the MDS paniced on unmount.



 Comments   
Comment by Li Wei (Inactive) [ 19/Feb/13 ]

https://maloo.whamcloud.com/test_sets/172a3bae-7adf-11e2-b916-52540035b04c

Comment by Zhenyu Xu [ 20/Feb/13 ]

another hit at https://maloo.whamcloud.com/test_sets/b7840b4c-7b67-11e2-8242-52540035b04c

Comment by Nathaniel Clark [ 20/Feb/13 ]

ZFS having same issue on test 59 of conf-sanity

https://maloo.whamcloud.com/test_sets/a79faa76-7b51-11e2-8242-52540035b04c

Comment by nasf (Inactive) [ 20/Feb/13 ]

another failure instance:

https://maloo.whamcloud.com/test_sets/35e5a63c-7ada-11e2-b916-52540035b04c

Comment by Minh Diep [ 21/Feb/13 ]

another hit: https://maloo.whamcloud.com/test_sets/a7a08a3a-79d1-11e2-ad0e-52540035b04c

Comment by Jodi Levi (Inactive) [ 21/Feb/13 ]

Alex,
This is coming up regularly in Review runs. Do you have any ideas?

Comment by Nathaniel Clark [ 21/Feb/13 ]

I've seen it several times in ZFS testing.

maloo says:

Failure Rate: 36.00% of last 100 executions [all branches]

for failures in test_59

Comment by Sarah Liu [ 25/Feb/13 ]

another instance seen in ldiskfs:
https://maloo.whamcloud.com/test_sets/5368ca28-7e58-11e2-8f4f-52540035b04c

Comment by Zhenyu Xu [ 25/Feb/13 ]

patch tracking at http://review.whamcloud.com/5528

commit message
LU-2828 osp: correct osp device finialize order
    
    Should stop osp precreate thread before releasing its last used
    oid/seq files.
Comment by Peter Jones [ 26/Feb/13 ]

Landed for 2.4

Comment by Andreas Dilger [ 01/Oct/14 ]

conf-sanity.sh test_59 and test_64 are still being skipped due to this bug.

Comment by Gerrit Updater [ 13/Feb/15 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13757
Subject: LU-2828 test: Remove tests from ALWAYS_EXCEPT list
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 60b129f7f8716937752c5e89e4ad301f29252792

Comment by James Nunez (Inactive) [ 13/Feb/15 ]

Patch to remove tests 59 and 64 from the ALWAYS_EXCEPT list at http://review.whamcloud.com/13757

Comment by Gerrit Updater [ 03/Mar/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13757/
Subject: LU-2828 test: Remove tests from ALWAYS_EXCEPT list
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 26de2a56803ce20f6ae21dac650d598e4335f247

Comment by James Nunez (Inactive) [ 03/Mar/15 ]

Patch removing tests 59 and 64 from ALWAYS_EXCEPT list landed to master (pre-2.8).

Generated at Sat Feb 10 01:28:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.