Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
None
-
Lustre 1.8.9
-
Scientific Linux [walker@fe02 ~]$ uname -r
2.6.18-348.3.1.el5
Patchless client:
lustre-client-modules-1.8.9-wc1_2.6.18_348.3.1.el5
lustre-client-1.8.9-wc1_2.6.18_348.3.1.el5
Servers are all running:
[root@sn20 ~]# rpm -qa | grep ^lustre
lustre-modules-1.8.9-wc1_2.6.18_348.1.1.el5_lustre
lustre-1.8.9-wc1_2.6.18_348.1.1.el5_lustre
lustre-ldiskfs-3.1.53-wc1_2.6.18_348.1.1.el5_lustre
Scientific Linux [ walker@fe02 ~]$ uname -r 2.6.18-348.3.1.el5 Patchless client: lustre-client-modules-1.8.9-wc1_2.6.18_348.3.1.el5 lustre-client-1.8.9-wc1_2.6.18_348.3.1.el5 Servers are all running: [ root@sn20 ~]# rpm -qa | grep ^lustre lustre-modules-1.8.9-wc1_2.6.18_348.1.1.el5_lustre lustre-1.8.9-wc1_2.6.18_348.1.1.el5_lustre lustre-ldiskfs-3.1.53-wc1_2.6.18_348.1.1.el5_lustre
-
3
-
7466
Description
One of our OSSs had problems writing to disk (due to a raid card problem).
Several clients have an LBUG and haven't recovered after OSS reboot.
The error is:
Mar 29 06:20:10 cn492 kernel: LustreError: 3004:0:(osc_request.c:2357:brw_interpret()) ASSERTION(!(aa->aa_oa->o_valid & OBD_MD_FLHANDLE)) failed
Mar 29 06:20:10 cn492 kernel: LustreError: 3004:0:(osc_request.c:2357:brw_interpret()) LBUG
I attach the associated log file, and reproduce some lines of context in /var/log/messages
Mar 29 05:57:03 cn492 kernel: Lustre: lustre_0-OST0027-osc-ffff81021c041800: Connection restored to service lustre_0-OST0027 using nid 10.1.4.12
0@tcp.
Mar 29 05:57:03 cn492 kernel: Lustre: Skipped 1 previous similar message
Mar 29 06:09:39 cn492 kernel: Lustre: 3004:0:(client.c:1529:ptlrpc_expire_one_request()) @@@ Request x1430341259304767 sent from lustre_0-OST002
7-osc-ffff81021c041800 to NID 10.1.4.120@tcp 756s ago has timed out (756s prior to deadline).
Mar 29 06:09:39 cn492 kernel: req@ffff8101145e6800 x1430341259304767/t0 o3->lustre_0-OST0027_UUID@10.1.4.120@tcp:6/4 lens 448/592 e 1 to 1 dl
1364537379 ref 2 fl Rpc:/2/0 rc 0/0
Mar 29 06:09:39 cn492 kernel: Lustre: 3004:0:(client.c:1529:ptlrpc_expire_one_request()) Skipped 1 previous similar message
Mar 29 06:09:39 cn492 kernel: Lustre: lustre_0-OST0027-osc-ffff81021c041800: Connection to service lustre_0-OST0027 via nid 10.1.4.120@tcp was l
ost; in progress operations using this service will wait for recovery to complete.
Mar 29 06:09:39 cn492 kernel: Lustre: Skipped 1 previous similar message
Mar 29 06:09:39 cn492 kernel: Lustre: lustre_0-OST0027-osc-ffff81021c041800: Connection restored to service lustre_0-OST0027 using nid 10.1.4.12
0@tcp.
Mar 29 06:09:39 cn492 kernel: Lustre: Skipped 1 previous similar message
Mar 29 06:20:10 cn492 kernel: LustreError: 3004:0:(osc_request.c:2357:brw_interpret()) ASSERTION(!(aa->aa_oa->o_valid & OBD_MD_FLHANDLE)) failed
Mar 29 06:20:10 cn492 kernel: LustreError: 3004:0:(osc_request.c:2357:brw_interpret()) LBUG
Mar 29 06:20:10 cn492 kernel: Pid: 3004, comm: ptlrpcd
Mar 29 06:20:10 cn492 kernel:
Mar 29 06:20:10 cn492 kernel: Call Trace:
Mar 29 06:20:10 cn492 kernel: [<ffffffff885786a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
Mar 29 06:20:10 cn492 kernel: [<ffffffff88578bda>] lbug_with_loc+0x7a/0xd0 [libcfs]
Mar 29 06:20:10 cn492 kernel: [<ffffffff88580fc0>] tracefile_init+0x0/0x110 [libcfs]
Mar 29 06:20:10 cn492 kernel: [<ffffffff8879c7e8>] brw_interpret+0x8e8/0xdb0 [osc]
Mar 29 06:20:10 cn492 kernel: [<ffffffff886d36ac>] after_reply+0xcac/0xe30 [ptlrpc]
Mar 29 06:20:10 cn492 kernel: [<ffffffff886d4b0b>] ptlrpc_check_set+0x12db/0x15a0 [ptlrpc]
Mar 29 06:20:10 cn492 kernel: [<ffffffff8004b396>] try_to_del_timer_sync+0x7f/0x88
Mar 29 06:20:10 cn492 kernel: [<ffffffff887095ad>] ptlrpcd_check+0xdd/0x1f0 [ptlrpc]
Mar 29 06:20:10 cn492 kernel: [<ffffffff8009a98c>] process_timeout+0x0/0x5
Mar 29 06:20:10 cn492 kernel: [<ffffffff88709ef1>] ptlrpcd+0x1b1/0x259 [ptlrpc]
Mar 29 06:20:10 cn492 kernel: [<ffffffff8008f3ad>] default_wake_function+0x0/0xe
Mar 29 06:20:10 cn492 kernel: [<ffffffff8005dfc1>] child_rip+0xa/0x11
Mar 29 06:20:10 cn492 kernel: [<ffffffff88709d40>] ptlrpcd+0x0/0x259 [ptlrpc]
Mar 29 06:20:10 cn492 kernel: [<ffffffff8005dfb7>] child_rip+0x0/0x11
Mar 29 06:20:10 cn492 kernel:
Mar 29 06:20:10 cn492 kernel: LustreError: dumping log to /tmp/lustre-log.1364538010.3004
Attachments
Issue Links
- is duplicated by
-
LU-4452 Lustre 1.8.8 client causes kernel panic
- Resolved