[LU-3766] ASSERTION( stripe < lio->lis_stripe_count ) Created: 15/Aug/13 Updated: 09/Oct/21 Resolved: 09/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Rustem Bikboulatov | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Linux 2.6.32-279.19.1.el6_lustre.x86_64 #1 SMP |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9700 |
| Description |
|
We have a kernel crash on Lustre Client 2.1.5 with the following assertion: LustreError: 31091:0:(lov_io.c:214:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed: It very similar to: This bug has been fixed in 2.4? If so, any plans to fix it in 2.1? And how can you get around the error (perhaps by configuring) without updating? [root@r03 lustre_2.1.5]# crash /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux /var/crash/127.0.0.1-2013-08-13-10\:15\:56/vmcore crash 6.0.4-2.el6 GNU gdb (GDB) 7.3.1 KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux crash> log LustreError: 31091:0:(lov_io.c:214:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed: Call Trace: Kernel panic - not syncing: LBUG |
| Comments |
| Comment by Rustem Bikboulatov [ 19/Aug/13 ] |
|
Today we had another "Lustre Client" crash: [root@r03 ~]# crash /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux /var/crash/127.0.0.1-2013-08-19-01\:19\:55/vmcore crash 6.0.4-2.el6 GNU gdb (GDB) 7.3.1 KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6_lustre.x86_64/vmlinux crash> log LustreError: 9099:0:(lov_io.c:214:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed: Call Trace: Kernel panic - not syncing: LBUG crash> kmem -i TOTAL SWAP 524286 2 GB ---- |
| Comment by Rustem Bikboulatov [ 29/Aug/13 ] |
|
And another crash: LustreError: 3447:0:(lov_io.c:214:lov_sub_get()) ASSERTION( stripe < lio->lis_stripe_count ) failed: Call Trace: Kernel panic - not syncing: LBUG |
| Comment by Rustem Bikboulatov [ 14/Jan/14 ] |
|
Lustre Cluster Diagram |
| Comment by Rustem Bikboulatov [ 14/Jan/14 ] |
|
Here the cluster configuration: Lustre Server MGS/MDS - mmp-2 (refer to the diagram "20140113 - Hardware Diagram v0.1_R3.gif" in attachment) Environment: Mount points: OSS: MGS/MDS: Clients: Stripe config: [root@mmp-1 ~]# lfs getstripe /array1/. kdump config: core_collector makedumpfile -c --message-level 1 -d 31 Application Software Description (LRVfarm): LRVfarm - it's the software that processes media files (video+audio) and creates a proxy video ( low resolution video). Each running task contains a small task file, which is located on Lustre file system. Software LRVfarm runs several threads (processes) - 8 processes on each client (r01, r02, r03, r04). For a total of 32 LRVfarm processes running in cluster. Each process uses the file locking feature when performing tasks. LRVfarm process locks a task file and perform the task. Other LRVfarm processes (on local client and remote clients) also attempt to lock that task files, but it is not possible in the case when the task already running (for the reason that the file is already locked by another LRVfarm process). Periodically, all clients are crashes (kernel panic) with different errors. Here the crash statistics for the last time: Client r01 Client r02 Client r03 Client r04 |