Recently our application encountered a strange problem.
The application has a win32 window in the WPF window, when resize the WPF window, the problem occurred.
StackTrace:
Exception object: 0000000002ab2c78
Exception type: System.OutOfMemoryException
InnerException: <none>
StackTrace (generated):
SP IP Function
0048D94C 689FB82F PresentationCore_ni!System.Windows.Media.Composition.DUCE+Channel.SyncFlush()+0x80323f
0048D98C 681FEE37 PresentationCore_ni!System.Windows.Media.Composition.DUCE+CompositionTarget.UpdateWindowSettings(ResourceHandle, RECT, System.Windows.Media.Color, Single, System.Windows.Media.Composition.MILWindowLayerType, System.Windows.Media.Composition.MILTransparencyFlags, Boolean, Boolean, Boolean, Int32, Channel)+0x127
0048DA38 681FEAD1 PresentationCore_ni!System.Windows.Interop.HwndTarget.UpdateWindowSettings(Boolean, System.Nullable`1<ChannelSet>)+0x301
0048DBC8 6820718F PresentationCore_ni!System.Windows.Interop.HwndTarget.UpdateWindowSettings(Boolean)+0x2f
0048DBDC 68207085 PresentationCore_ni!System.Windows.Interop.HwndTarget.UpdateWindowPos(IntPtr)+0x185
0048DC34 681FFE9F PresentationCore_ni!System.Windows.Interop.HwndTarget.HandleMessage(Int32, IntPtr, IntPtr)+0xff
0048DC64 681FD0BA PresentationCore_ni!System.Windows.Interop.HwndSource.HwndTargetFilterMessage(IntPtr, Int32, IntPtr, IntPtr, Boolean ByRef)+0x3a
0048DC88 68C6668E WindowsBase_ni!MS.Win32.HwndWrapper.WndProc(IntPtr, Int32, IntPtr, IntPtr, Boolean ByRef)+0xbe
0048DCD4 68C665BA WindowsBase_ni!MS.Win32.HwndSubclass.DispatcherCallbackOperation(System.Object)+0x7a
0048DCE4 68C664AA WindowsBase_ni!System.Windows.Threading.ExceptionWrapper.InternalRealCall(System.Delegate, System.Object, Boolean)+0x8a
0048DD08 68C6639A WindowsBase_ni!System.Windows.Threading.ExceptionWrapper.TryCatchWhen(System.Object, System.Delegate, System.Object, Boolean, System.Delegate)+0x4a
0048DD50 68C64504 WindowsBase_ni!System.Windows.Threading.Dispatcher.WrappedInvoke(System.Delegate, System.Object, Boolean, System.Delegate)+0x44
0048DD70 68C63661 WindowsBase_ni!System.Windows.Threading.Dispatcher.InvokeImpl(System.Windows.Threading.DispatcherPriority, System.TimeSpan, System.Delegate, System.Object, Boolean)+0x91
0048DDB4 68C635B0 WindowsBase_ni!System.Windows.Threading.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority, System.Delegate, System.Object)+0x40
0048DDD8 68C65CFC WindowsBase_ni!MS.Win32.HwndSubclass.SubclassWndProc(IntPtr, Int32, IntPtr, IntPtr)+0xdc
StackTraceString: <none>
HResult: 8007000e
Also, I found some related links:
Is there any way to avoid or handle this problem?
How to find out the real problem?
From the call stack, Can we determine that the problem was came from .NET Framework?
Thank you for your answer or comments!
Your problem is not caused by a managed memory leak. Clearly you are tickling a bug somewhere in unmanaged code.
The SyncFlush() method is called after several MILCore calls, and it appears to cause the changes that have been sent to be processed immediately instead of being left in queue for later processing. Since the call processes everything previously sent, nothing in your visual tree can be ruled out from the call stack you sent.
A call stack that includes unmanaged calls may turn up more useful information. Run the application under VS.NET with native debugging, or with windbg or another native code debugger. Set the debugger to break on the exception, and get the call stack at the relative breakpoint.
The call stack will of course descend into MILCore, and from there it may go into the DirectX layer and the DirectX driver. A clue as to which part of your code caused the problem may be found somewhere in this native call stack.
Chances are that MILCore is passing a huge value of some parameter into DirectX based on what you are telling it. Check your application for anything that could cause a bug that would make DirectX to allocate a lot of memory. Examples of things to look for would be:
Another way to attack this problem is to progressively simplify your application until the problem disappears, then look very closedly at what you removed last. When convenient, it can be good to do this as a binary search: Initially cut out half of the visual complexity. If it works, put back half of what was removed, otherwise remove another half. Repeat until done.
Also note that it is usually unnecssary to actually remove UI components to keep MILCore from seeing then. Any Visual with Visibility.Hidden may be skipped over entirely.
There is no generalized way to avoid this problem, but the search technique will help you pinpoint what specifically needs to be changed to fix it in the particular case.
It is safe to say from the call stack, that you have found a bug in either NET Framework or the DirectX drivers for a particular video card.
Regarding the second stack trace you posted
John Knoeller is correct that the transition from RtlFreeHeap to ConvertToUnicode is nonsense, but draws the wrong conclusion from it. What we are seeing is that your debugger got lost when tracing back the stack. It started correctly from the exception but got lost below the Assembly.ExecuteMainMethod
frame because that part of the stack had been overwritten as the exception was handled and the debugger was invoked.
Unfortunately any analysis of this stack trace is useless for your purposes because it was captured too late. What we are seeing is an exception occuring during processing of a WM_LBUTTONDOWN which is converted to a WM_SYSCOMMAND, which then catches an exception. In other words, you clicked on something that caused a system command (such as a resize), which caused an exception. At the point this stack trace was captured, the exception was already being handled. The reason you are seeing User32 and UxTheme calls is because these are involved in processing the button click. They have nothing to do with the real problem.
You are on the right track, but you will need to capture a stack trace at the moment the allocation fails (or you can use one of the other approaches I suggested above).
You will know you have the correct stack trace when the all the managed frames in your first stack trace appear in it and the top of the stack is a failing memory allocation. Note that we are really interested only in the unmanaged frames that appear above the DUCE+Channel.SyncFlush
call -- everything below that will be NET Framework and your application code.
How to get a native stack trace at the right time
You want to get a stack trace at the time the first memory allocation failure within the DUCE+Channel.SyncFlush
call shown. This may be tricky. There are three approaches I use: (note that in each case you start with a breakpoint inside the SyncFlush call - see note below for more details)
Set the debugger to break on all exceptions (managed and unmanaged), then keep hitting go (F5, or "g") until it breaks on the memory allocation exception you are interested in. This is the first thing to try because it is quick, but it often fails when working with native code because the native code often returns an error code to the calling native code instead of throwing an exception.
Set the debugger to break on all exceptions and also set breakpoints on common memory allocation routines, then hit F5 (go) repeatedly until the exception occurs, counting how many F5s you hit. Next time you run, use one fewer F5 and you may be on the allocation call that generated the exception. Capture the call stack to Notepad, then F10 (step over) repeatedly from there to see if it really was the allocation that failed.
Set a breakpoint on the first native frame called by SyncFlush (this is wpfgfx_v0300!MilComposition_SyncFlush) to skip over the managed to native transition, then F5 to run to it. F10 (step over) through the function it until EAX contains one of the error codes E_OUTOFMEMORY (0x8007000E), ERROR_OUTOFMEMORY (0x0000000E), or ERROR_NOT_ENOUGH_MEMORY (0x0000008). Note the most recent "Call" instruction. The next time you run the program, run to there and step into it. Repeat this until you are down to the memory allocation call that caused the problem and dump the stack trace. Note that in many cases you will find yourself looping through a largish data structure, so some intelligence is required to set an appropriate breakpoint to skip over the loop so you can get where you need to be quickly. This technique is very reliable but very labor-intensive.
Note: In each case you don't want to set breakpoints or start single-stepping until your application is inside the failing DUCE+Channel.SyncFlush
call. To ensure this, start the application with all breakpoints disabled. When it is running, enable a breakpoint on System.Windows.Media.Composition.DUCE+Channel.SyncFlush
and resize the window. The first time around just hit F5 to make sure the exception fails on the first SyncFlush call (if not, count how many times you have to hit F5 before the exception occurs). Then disable the breakpoint and restart the program. Repeat the procedure but this time after you hit the SyncFlush call the right time, set your breakpoints or do you single-stepping as described above.
Recommendations
The debugging techniques I describe above are labor-intensive: Plan to spend several hours at least. Because of this, I generally try repeatedly simplifying my application to find out exactly what tickles the bug before jumping into the debugger for something like this. This has two advantages: It will give you a good repro to send the graphics card vendor, and it will make your debugging faster because there will be less displayed and therefore less code to single-step through, fewer allocations, etc.
Because the problem happens only with a specific graphics card, there is no doubt that the problem is either a bug in the graphics card driver or in the MilCore code that calls it. Most likely it is in the graphics card driver, but it is possible that MilCore is passing invalid values that are handled correctly by most graphics cards but not this one. The debugging techniques I describe above will tell you this is the case: For example, if MilCore is telling the graphics card to allocate a 1000000x1000000 pixel area and the graphics card is giving correct resolution information, the bug is in the MilCore. But if MilCore's requests are reasonable then the bug is in the graphics card driver.