NWmemoryProtectionRestart
From CoolSolutionsWiki
-my other wiki
Summary of Memory Protection Issue
Contents |
Tracking
- SR 10504113741 GW protected memory faults Sept 21,2009
- SR 10523137371 GWSOAP-<post office name-Handler_5 Sept 1, 2009
- https://bugzilla.novell.com/show_bug.cgi?id=535749 Page Fault Processor Exception, GWSOAP-<post office name-Handler_5 Fixed GW 703 hp4, GW 801
- Defects
- GW8 515295 USERLIB.NLM abend. Unknown EIP - FIXED in 801
- GW8 491527 GWXIS12.NLM - SOAP Abend - FIXED in 801
- GW7 518385 USERLIB.NLM abend. Unknown EIP - FIXED in 703 HP4
- GW7 534927 Abend on GWSOAP process - FIXED in 703 hp4
- Bug 529767 - Abend in USERLIB.NLM|AllocSleepOK
Action Plan
- enable memory protection debug to take cores - sys:\system\memdbg.ncf and umemdbg.ncf
- 25% free on ALL PO volumes!!!
- Ensure Antivirus NOT scanning QUEUES - Customer changed this to comply
- Ensure Purge Immediate - Customer changed this to comply
- Fixed memory protection load/unload syntax to match NW 65 sp8 format
- syntax for sp8
- http://www.novell.com/communities/node/6863/consultants-corner-upgrading-groupwise-8-part-3-post-offices
- load restart address space=demopo sys:\system\gw8\gwpoa @demopo.poa
- syntax for sp8
- Tried post gw 801 code: ftp://ftp.novell.com/outgoing/thomasgw8.0.1HP-88366_full_nlmwin_en.zip
- update to current bsdsock.nlm
- STOP PROCESSORS on the command line, or don't load the ACPIDRV.PSM driver. Either will do.
- Maximum Physical Receive Packet Size to 4224 (default) -- SWGAS already had at 4224
- keep sending CORES!!
Aggressive plans
- Let me install/configure SLES 10/Linux and put the PO there. 250+ days uptime
- decouple Memory Protection/NCS
- Replace hardware
- Back-rev
Customer cores
Accessing cores Internally: VNC 151.155.135.178
- August 31 LVC-GW3-SWGAS.ZIP (not in incoming anymore, but on VNC machine)
- EIP: 8FE5166E LIB0.NLM|NCPSendToLocalServerWithReplyLen+1EFA
- Sept 28 ftp://ftp.novell.com/incoming/090927-lvc-gw1.zip
- Sept 29 ftp://ftp.novell.com/incoming/090929-lvc-gw4-retry.zip
- Oct 8 ftp://ftp.novell.com/incoming/swgaszipfiles.zip
- swgaszipfiles/lvc-gw4-091007AA
- Current EIP: 8FE4466E LIB0.NLM|NCPSendToLocalServerWithReplyLen+1EFA
- swgaszipfiles/lvc-gw4-091008AA
- Address in USERLIB.NLM at code start +00002146h
- swgaszipfiles/lvc-gw4-091007AA
- Oct 10 ftp://ftp.novell.com/incoming/lvc-gw4-10-5-09.zip
- Address in BSDSOCK.NLM at code start +000020E6h
- BSDSOCK Version 6.82.00 November 20, 2007
- Oct 13 ftp://ftp.novell.com/incoming/lvo-02-091013AA-retry.zip
- Address in SERVER.NLM at code start +00167858h
- Oct 28 ftp://ftp.novell.com/incoming/091028AA-lvc-gw1-swgas.zip
- gwcda abend - bug 550824
- This explains how to quarantine the file/trap the file.
- Oct 28 ftp://ftp.novell.com/incoming/sr10569161301-lvc-gw2-091828aB.IMG
- server.nlm SERVER.NLM|TcoNewSystemThreadEntryPoint+40
- Oct 28 ftp://ftp.novell.com/incoming/SR10504113741-LVC-GW3-swgas.zip
- 091021AA.IMG
- Current EIP: F509507D CLIB.NLM|longjmp+2A
- 091021AB.IMG
- Current EIP: F509507D CLIB.NLM|longjmp+2A
- C18C3A80 F419F000 SCCUT.NLM|<Start of DATA segment>
- 091021AA.IMG
- Oct 28 ftp://ftp.novell.com/incoming/SR10569161301-lvo-02-091028AA.zip
- Break caused by Abend: Deadlock detected waiting for spinlock currently owned by
- Current EIP: 0011367C LOADER.NLM|WaitForSpinLock+6C
- Oct 29 ftp://ftp.novell.com/incoming/sr10569161301-lvc-gw2-091029AA.zip
- Break caused by: Free detected corrupt preceeding redzone for node 0xCB730AE0
- This one abended with the message "Free detected corrupt preceeding redzone for node 0xCB730AE0". This means that memory has been corrupted. And it is usually by the owning application. It doesn't say who owns the memory currently, but only because it is in protected memory. So most likely it's going to be a Groupwise address space faulting.
- Address in SYSCALLS.NLM at code start +00000CF8h
- Oct 29 ftp://ftp.novell.com/incoming/sr10569161301-lvc-gw1-091029AA-retry.IMG
- gwcda
- Oct 29 ftp://ftp.novell.com/incoming/sr10569161301-091028AB-lvc-gw1-swgas.zip
- Current EIP: F3DCF64A EXHTML.NLM|EXHTMLGetLockCount+1E0B5
- gwcda
- Oct 29 ftp://ftp.novell.com/incoming/sr10569161301-091029AA-lvo-02-swgas.zip
- This is a cluster abend, "At least one of the nodes is Alive in the old master's node partition. This node is NOT in the old master's node partition." The server is not busy. In fact, it isn't even hosting any Groupwise services at the moment. This is going to be a LAN channel issue though in a 2 node cluster. If this pops up more, then we'll need to troubleshoot that.
- Nov 2nd ftp://ftp.novell.com/incoming/sr10569161301-091101AA-lvo-02.zip Address in VSTIF6.NLM at code start +00012F20h AKA gwdcaw
- gwdca was ENABLED...but filtered.
- Nov 2nd ftp://ftp.novell.com/incoming/sr10569161301-lvc-gw3-091101AA.zip
- Address in LIB0.NLM at code start +0001366Eh AKA gwpoa abend
- This one we have seen before and look to be the same issue where the CLIB thread control structure is missing. This was part of the reason we were given the debug THREADS, then the debug SERVER.EXE as well. I have forwarded this on to the OS developers.
- Nov 5 ftp://ftp.novell.com/incoming/091105AA-lvo-03-swgas.zip
- cluster abend/clusterlib
- Nov 6 ftp://ftp.novell.com/incoming/091106AA-LVO-02-SWGAS.zip
- 6E6928F4 0010AB6F LOADER.NLM|ExceptionVector14+EF
- 6E692988 0000FFFF
- EIP invalid:
- Nov 10 ftp://ftp.novell.com/incoming/sr10569161301-lvc-gw3-091109AA.zip
- P07# sw ## Current EIP: 8FEE066E LIB0.NLM|NCPSendToLocalServerWithReplyLen+1EFA
- AD571E1C 003714B3 SERVER.NLM|NewSystemCall+D3
- ESP invalid!
- Nov 10 ftp://ftp.novell.com/incoming/SR10569161301-lvo-02-091110AA.zip
- Nov 10 ftp://ftp.novell.com/incoming/SR10569161301-lvc-gw4-091109AA.zip
- Nov 10 ftp://ftp.novell.com/incoming/sr10569161301-pxo-02-091110AA.zip
- Nov 13 ftp://ftp.novell.com/incoming/SWG_PXO-03_091113AA.zip
- BSDsock Version 6.81.05 February 12, 2008
- Nov 13 ftp://ftp.novell.com/incoming/SR10569161301-lvc-gw4-091112AA.zip
- P05# ?
- Address in LIB0.NLM at code start +0001366Eh
- Previous: -00001EFA 8F85B774 LIB0.NLM|NCPSendToLocalServerWithReplyLen
- Current: 00000000 8F85D66E
- Next: +000021A2 8F85F810 LIB0.NLM|Debug_malloc
- P05# sw
- Current EIP: 8F85D66E LIB0.NLM|NCPSendToLocalServerWithReplyLen+1EFA
- B26D9F04 0036FD73 SERVER.NLM|NewSystemCall+D3
- ESP invalid!
- Nov 17 ftp://ftp.novell.com/incoming/sr10569161301-car-03-091116AA.zip
- Nov 17 ftp://ftp.novell.com/incoming/SR10569161301-lvc-gw4-091117AA-retry.IMG
- Nov 18 ftp://ftp.novell.com/incoming/sr1059161301-lvc-gw3-091118AA.zip
- both nodes abended on what looks to be the same abend. on one node core was taken, while the other node rebooted quickly to restore services.
- Running process: GWPOA-GWEvent Reader 1 Process
- The CPU encountered a problem executing code in LIB0.NLM. The problem may be in that module or in data passed to that module by a process owned by GWPOA.NLM.
brief description of each core
- Oct 9 swgaszipfiles/lvc-gw4-091007AA
- P07# sw
Insert non-formatted text here Current EIP: 8FE4466E LIB0.NLM|NCPSendToLocalServerWithReplyLen+1EFA B3ED5EBC 00370003 SERVER.NLM|NewSystemCall+D3 Emulated 5000 and found no RET instruction Function may never return. P07# .m gwpoa.nlm GWPOA.NLM GroupWise Post Office Agent (Release version)
Loaded from [SYS:\SYSTEM\] [159] PO3EMAIL address space Version 8.00.01 August 25, 2009 Code Address: F463C000h Length: 00215434h Data Address: F45BD000h Length: 0007D989h
P07# .m userlib.nlm USERLIB.NLM NetWare Operating System Function Library
Loaded from [SYS:SYSTEM\] [169] GWDCAW.LVCPO31 address space Version 5.60.00 September 29, 2008 Code Address: F5353000h Length: 000048DEh Data Address: F5350000h Length: 000011F0h
USERLIB.NLM NetWare Operating System Function Library
Loaded from [SYS:SYSTEM\] [86] PO3EMAIL address space Version 5.60.00 September 29, 2008 Code Address: F5353000h Length: 000048DEh Data Address: F5350000h Length: 000011F0h
USERLIB.NLM NetWare Operating System Function Library
Loaded from [SYS:SYSTEM\] [15A] GWDCA.LVCPO3 address space Version 5.60.00 September 29, 2008 Code Address: F5353000h Length: 000048DEh Data Address: F5350000h Length: 000011F0h
Insert non-formatted text here
- Oct 9 swgaszipfiles/lvc-gw4-091008AA
Insert non-formatted text here P05# ? Address in USERLIB.NLM at code start +00002146h Previous: -0000009E F53550A8 USERLIB.NLM|AllocMemoryDefault Current: 00000000 F5355146 Next: +00000156 F535529C USERLIB.NLM|AllocateSlab P05#
USERLIB.NLM NetWare Operating System Function Library
Loaded from [SYS:SYSTEM\] [86] PO3EMAIL address space Version 5.60.00 September 29, 2008 Code Address: F5353000h Length: 000048DEh Data Address: F5350000h Length: 000011F0h
P05# .m gwpoa.nlm GWPOA.NLM GroupWise Post Office Agent (Release version)
Loaded from [SYS:\SYSTEM\] [157] PO3EMAIL address space Version 8.00.01 September 27, 2009 Code Address: F3C16000h Length: 002155B4h Data Address: F3B97000h Length: 0007D989h
P05# Insert non-formatted text here
- Oct 10 E:\cores\lvc-gw4-10-5-09
- Address in BSDSOCK.NLM at code start +000020E6h
- P01# .m bsdsock.nm
BSDSOCK.NLM Novell BSDSOCK Module
Loaded from [SYS:SYSTEM\] [79] OS address space Version 6.82.00 November 20, 2007 Code Address: A1631000h Length: 00011BF9h Data Address: A1643000h Length: 0000C0E0h
P01#
- Oct 13 cores\lvo-02-091013AA-retry
- Address in SERVER.NLM at code start +00167858h
- P00# sw Current EIP: 00371F18 SERVER.NLM|NewSystemCall+88 ESP invalid!
- USERLIB.NLM NetWare Operating System Function Library
Loaded from [SYS:SYSTEM\] [13C] GWDCAW.LVOPO1 address space Version 5.60.00 September 29, 2008 Code Address: F5353000h Length: 000048DEh Data Address: F5350000h Length: 000011F0h
P00# .m gwpoa.nlm GWPOA.NLM GroupWise Post Office Agent (Release version)
Loaded from [SYS:\SYSTEM\] [16F] LVOEMAIL address space Version 8.00.01 September 27, 2009 Code Address: F43B4000h Length: 002155B4h Data Address: F4335000h Length: 0007D989h
P00#
