[LU-2048] Crash in llite_lib.c:1161:ll_clear_inode()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) replay-vbr 7e Created: 28/Sep/12  Updated: 19/Dec/13  Resolved: 08/Dec/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Oleg Drokin Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4277

 Description   

Potentially also affects 2.3

I was running replay-vbr in a single node config and hit this assertion in test 7e

[27477.960158] Lustre: 12294:0:(import.c:1207:completed_replay_interpret()) lustre-MDT0000-mdc-ffff880293febbf0: version recovery fails, reconnecting
[27480.518466] LustreError: 27732:0:(llite_lib.c:1161:ll_clear_inode()) ASSERTION( lli->u.d.d_sai == ((void *)0) ) failed:
[27480.522498] LustreError: 27732:0:(llite_lib.c:1161:ll_clear_inode()) LBUG
[27480.524558] Pid: 27732, comm: mkdir
[27480.525917]
[27480.525919] Call Trace:
[27480.527297] [<ffffffffa068c915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[27480.529930] [<ffffffffa068cf27>] lbug_with_loc+0x47/0xb0 [libcfs]
[27480.533191] [<ffffffffa0c67a9d>] ll_clear_inode+0xa6d/0xa70 [lustre]
[27480.534860] [<ffffffff81197bda>] clear_inode+0xca/0x160
[27480.536301] [<ffffffffa0c66b3d>] ll_delete_inode+0x19d/0x690 [lustre]
[27480.539906] [<ffffffffa0c669a0>] ? ll_delete_inode+0x0/0x690 [lustre]
[27480.541609] [<ffffffff811982c6>] generic_delete_inode+0xd6/0x1c0
[27480.543149] [<ffffffff81198415>] generic_drop_inode+0x65/0x80
[27480.545208] [<ffffffff81197212>] iput+0x62/0x70
[27480.546138] [<ffffffffa0c417a1>] ll_d_iput+0x1f1/0x950 [lustre]
[27480.547053] [<ffffffff81193c99>] dentry_iput+0x89/0x110
[27480.547860] [<ffffffff81193e11>] d_kill+0x31/0x60
[27480.548588] [<ffffffff811958ac>] dput+0x7c/0x160
[27480.549339] [<ffffffff8118d147>] sys_mkdirat+0xa7/0x120
[27480.551055] [<ffffffff814ff61e>] ? do_page_fault+0x3e/0xa0
[27480.553160] [<ffffffff8118d1d8>] sys_mkdir+0x18/0x20



 Comments   
Comment by Oleg Drokin [ 28/Sep/12 ]

Fan Yong, this is SAI-related so I think it would be useful for you to take a look.

Comment by Oleg Drokin [ 28/Sep/12 ]

I just did another run and things crashed again in the same spot.
I run "SLOW=yes REFORMAT=yes sh replay-vbr.sh"

Comment by Oleg Drokin [ 08/Dec/12 ]

This appears to have been fixed by one of the patches not so long ago, I cannot reproduce this anymore.

Comment by Jian Yu [ 04/Sep/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)
Lustre server: http://build.whamcloud.com/job/lustre-b2_4/44/ (2.4.1 RC1)

replay-vbr test 7e hit the same failure:
https://maloo.whamcloud.com/test_sets/a4d6c200-14ff-11e3-9828-52540035b04c

Comment by Jian Yu [ 19/Dec/13 ]

Lustre client: http://build.whamcloud.com/job/lustre-b2_3/41/ (2.3.0)
Lustre server: http://build.whamcloud.com/job/lustre-b2_4/69/ (2.4.2 RC1)

replay-vbr test 7e hit the same failure:
https://maloo.whamcloud.com/test_sets/dfb5ffa8-6860-11e3-a16f-52540035b04c

Generated at Sat Feb 10 01:21:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.