[LU-185] LBUG: (cl_page.c:1362:cl_page_completion()) !(pg->cp_flags & CPF_READ_COMPLETED) ASSERTION(0) failed Created: 01/Apr/11  Updated: 17/May/11  Resolved: 04/May/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Blocker
Reporter: Sebastien Buisson (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Bugzilla ID: 19,352
Rank (Obsolete): 5056

 Description   

Hi,

At CEA they have 'special' client nodes dedicated to file exchange between two clusters. These nodes frequently crash with the following messages in the syslog:

LustreError: 8142:0:(osc_request.c:773:osc_announce_cached()) dirty 1807 - 1807 > system dirty_max 8650752
LustreError: 8142:0:(osc_request.c:773:osc_announce_cached()) Skipped 50 previous similar messages
LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) page@ffff880e31124140[2 ffff8803e8022548:0 ^(null)_ffff880e311248c0
3 0 1 (null) (null) 0x1]
LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) page@ffff880e311248c0[1 ffff8803609a0508:0 ^ffff880e31124140_(null)
3 0 1 (null) (null) 0x0]
LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) vvp-page@ffff880f15628960(1:0:0) vm@ffffea00343cb960
3800000000000821 3:0 ffff880e31124140 0 lru
LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) lov-page@ffff880f9bcebb88
LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) osc-page@ffff881047778db0: 1< 0x845fed 257 0 - - - >
2< 0 0 0x0 0x108 | (null) ffff8808641887c8 ffff8802c39a60c0 ffffffffa0714c20 ffff881047778db0 > 3<

  • ffff880eca22b0e0 0 0 0 > 4< 0 0 8 2097152 - | - - - - > 5< - - - - | 0 - - | 0 - ->
    LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) end page@ffff880e31124140
    LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) !(pg->cp_flags & CPF_READ_COMPLETED)
    LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) ASSERTION(0) failed
    LustreError: 7389:0:(cl_page.c:1362:cl_page_completion()) LBUG
    Pid: 7389, comm: ptlrpcd-brw

Analyzing the crash dump we can see the following stack:
crash> bt
PID: 7389 TASK: ffff88087aa22ed0 CPU: 3 COMMAND: "ptlrpcd-brw"
#0 [ffff88087caeb838] machine_kexec at ffffffff8102e66b
#1 [ffff88087caeb898] crash_kexec at ffffffff810a9b08
#2 [ffff88087caeb968] panic at ffffffff8145212d
#3 [ffff88087caeb9e8] lbug_with_loc at ffffffffa03b8eeb
#4 [ffff88087caeba38] libcfs_assertion_failed at ffffffffa03c47d6
#5 [ffff88087caeba88] cl_page_completion at ffffffffa047fc5a
#6 [ffff88087caebb28] osc_completion at ffffffffa070eccf
#7 [ffff88087caebba8] osc_ap_completion at ffffffffa06f79ce
#8 [ffff88087caebc28] brw_interpret at ffffffffa0704759
#9 [ffff88087caebcf8] ptlrpc_check_set at ffffffffa05573fa
#10 [ffff88087caebdd8] ptlrpcd_check at ffffffffa058a840
#11 [ffff88087caebe38] ptlrpcd at ffffffffa058ac93
#12 [ffff88087caebf48] kernel_thread at ffffffff8100d1aa

In the crash dump we also see that the concerned cl_page struct has only CPF_READ_COMPLETED set.

Looking for similar issues in Lustre bugzilla database, I found bug 19352. To me this is exactly the same bug, but the problem is a fix for this bug was landed in 2.0. I have made sure that our sources do include this fix.

At CEA, it seems that this problem began to occur when the copy tool running on these nodes was modified to do O_DIRECT IOs.



 Comments   
Comment by Peter Jones [ 01/Apr/11 ]

Jay

Could you look at this one please?

Thanks

Peter

Comment by Jinshan Xiong (Inactive) [ 01/Apr/11 ]

looking into this.

Comment by Jinshan Xiong (Inactive) [ 01/Apr/11 ]

Does the customer do regularly read/write on the same file while he is doing direct-IO?

Comment by Jinshan Xiong (Inactive) [ 05/Apr/11 ]

The patch is at: http://review.whamcloud.com/#change,404

The root cause of this problem is that the application is doing regular and direct IO on the same time. This causes that a ra page is to be submitted twice for read.

Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » server,el6-i686 #89
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 84631daa279bc77599d6cd69eef413b50b64f5e8
Files :

  • lustre/llite/rw26.c
  • lustre/llite/vvp_page.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,el5-x86_64 #89
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 84631daa279bc77599d6cd69eef413b50b64f5e8
Files :

  • lustre/llite/vvp_page.c
  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,el6-x86_64 #89
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 84631daa279bc77599d6cd69eef413b50b64f5e8
Files :

  • lustre/llite/rw26.c
  • lustre/llite/vvp_page.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,el5-i686 #89
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 84631daa279bc77599d6cd69eef413b50b64f5e8
Files :

  • lustre/llite/rw26.c
  • lustre/llite/vvp_page.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,ubuntu-x86_64 #89
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 84631daa279bc77599d6cd69eef413b50b64f5e8
Files :

  • lustre/llite/rw26.c
  • lustre/llite/vvp_page.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » client,el6-i686 #89
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 84631daa279bc77599d6cd69eef413b50b64f5e8
Files :

  • lustre/llite/vvp_page.c
  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » server,el5-x86_64 #89
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 84631daa279bc77599d6cd69eef413b50b64f5e8
Files :

  • lustre/llite/rw26.c
  • lustre/llite/vvp_page.c
Comment by Build Master (Inactive) [ 05/Apr/11 ]

Integrated in lustre-reviews » server,el5-i686 #89
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 84631daa279bc77599d6cd69eef413b50b64f5e8
Files :

  • lustre/llite/rw26.c
  • lustre/llite/vvp_page.c
Comment by Build Master (Inactive) [ 06/Apr/11 ]

Integrated in lustre-reviews » client,el5-x86_64 #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 3bab56b54457cbdf287ed89e07314fd988bdea4a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 06/Apr/11 ]

Integrated in lustre-reviews » client,el6-x86_64 #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 3bab56b54457cbdf287ed89e07314fd988bdea4a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 06/Apr/11 ]

Integrated in lustre-reviews » client,el5-i686 #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 3bab56b54457cbdf287ed89e07314fd988bdea4a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 06/Apr/11 ]

Integrated in lustre-reviews » client,ubuntu-x86_64 #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 3bab56b54457cbdf287ed89e07314fd988bdea4a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 06/Apr/11 ]

Integrated in lustre-reviews » client,el6-i686 #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 3bab56b54457cbdf287ed89e07314fd988bdea4a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 06/Apr/11 ]

Integrated in lustre-reviews » server,el6-i686 #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 3bab56b54457cbdf287ed89e07314fd988bdea4a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 06/Apr/11 ]

Integrated in lustre-reviews » server,el5-x86_64 #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 3bab56b54457cbdf287ed89e07314fd988bdea4a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 06/Apr/11 ]

Integrated in lustre-reviews » server,el5-i686 #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 3bab56b54457cbdf287ed89e07314fd988bdea4a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » server,el6-i686 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,ubuntu-x86_64 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,el6-i686 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,el5-x86_64 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,el6-x86_64 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » server,el5-x86_64 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » server,el5-i686 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » server,el6-x86_64 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 07/Apr/11 ]

Integrated in lustre-reviews » client,el5-i686 #117
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Jinshan Xiong : 74029e4c5e8d0b212962a480727d52371260e527
Files :

  • lustre/llite/rw26.c
Comment by Peter Jones [ 21/Apr/11 ]

Patch to be rolled into production at CEA next week

Comment by Peter Jones [ 26/Apr/11 ]

As per Bull, this fix is now in production at CEA

Comment by Peter Jones [ 03/May/11 ]

Update from Bull - no reoccurrences of this issue since it was rolled into production

Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » i686,server,el5,ofa #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Build Master (Inactive) [ 03/May/11 ]

Integrated in lustre-master » i686,client,el5,ofa #103
LU-185 LBUG: (cl_page.c:1362:cl_page_completion()) ...

Oleg Drokin : 119036444bc48381b2d5ca3333438000c409046a
Files :

  • lustre/llite/rw26.c
Comment by Peter Jones [ 04/May/11 ]

Patch landed for 2.1. Please reopen if this issue reoccurs with the patch in place

Comment by Sebastien Buisson (Inactive) [ 17/May/11 ]

The customer has been testing for several weeks a backport of this patch in 2.0.0.1, now it considers the problem as fixed.

Generated at Sat Feb 10 01:04:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.