The use of P/Invoke in Automation
The platform invocation (“P/invoke”) framework was designed to allow interaction between your .net application and unmanaged DLL functions. When it comes to UI automation, that framework is very useful since it is rare that any given product provides a built-in automation system. In the following article, I’ll explain different techniques to automate a user interface by using the functionalities of the operating system beneath it. Subsequent articles will discuss about client-side automation for web application.
Some products use effective inter-process communication techniques to allow interactions with the external world (e.g.: sockets and pre-defined standards). In some other cases, the products were not designed to allow those interactions; however UI Automation may still be required. Briefly, there are many scenarios where you would like to use P/Invoke for automation:
- Manipulation of out-of-process windows
- Communicate with unexposed in-process windows
- Map over pre-defined graphic areas
- Etc.
Pre-requisites
Although those kinds of scenarios are recognized to use advanced features of the .net framework, it is only assumed that you have a basic knowledge about the framework. In fact, for whatever reasons it is, it looks like the need to communicate with already existing application is omni-present among hobbyist developers. The use of P/Invoke may be less intuitive than DDE (for example), but it gives to the developers a much deeper look and control over what is really going on.
In addition to the pre-requisites would be a basic knowledge of the Win32 API; I strongly recommend that you install your platform SDK as it provides very useful information. Just as a side note if you are not familiar with Windows but wish to read on: every distinct graphic element in Microsoft Windows is a “window” (thus the OS name J) and they interact through handles.
In order to use the platform invocation framework, you’d need to import the namespace System.Runtime.InteropServices. Then you would also have to declare the references to external DLL functions. In this article, I’ll work with only a few of the Win32 API:
- GetDesktopWindow
- GetWindowText
- EnumChildWindows
Other than your platform SDK, this web site http://www.pinvoke.net is also a very good resource. Finally, Microsoft Spy++ is a great tool to troubleshoot your problems.
Out-of-Process windows
In most cases, you need to hook up to an out-of-process main window. If you did launch that process, you can always use the Process.MainWindowHandle to access it. Otherwise, you’ll need to find it among the desktop windows. In order to do this, first we have to encapsulate the native API into a single library:
public sealed class UnsafeNativeMethods
{
private UnsafeNativeMethods( ) { }
[DllImport("User32.dll")]
public static extern IntPtr GetDesktopWindow( );
[DllImport("User32.dll")]
public static extern Boolean EnumChildWindows( IntPtr handle, Delegate lpEnumFunc, IntPtr lParam );
[DllImport("User32.dll")]
public static extern int GetWindowText( IntPtr handle, StringBuilder s, int MaxCount );
}
To enumerate through the desktop windows, we need to use the EnumChildWindows native call. This function requires a callback that the platform will call for each child found and will continue depending on the return parameter that you provide in that callback. You may wonder how to communicate elegantly with the callback without having to declare class-wide variables. The last parameter of the method can be used to pass back and forth a platform specific pointer (IntPtr); there are multiple ways to achieve this. First we can create a structure and marshal it into memory, read it, modify it, write it back, etc. The problem with that approach is that we have to marshal it every time we want to save the modifications. The other option is to use a reference type (object). However, in that case, since we are interacting with the unmanaged world, we can’t pass out the object directly. Instead we need to pass out a stable “cookie” that can latter be used to get back to the object. Also since there may not be any references left to the object, we need to inform the garbage collector that it is still being used. That’s the reason why we need to allocate a GCHandle; a GCHandle creates a new root in the garbage collector which enables you to prevent your object from being collected. But to protect from memory leaks, you need to make sure you free the handle once the job is done. Since we only want to keep the object from being collected, we will use the default constructor (no need to pin here because the unmanaged API doesn’t need its address – it is only passed back to our managed callback).
To make our “Find” function as generic as possible, the idea is to use regular expressions to filter the text of the window we are looking for. Let’s create the state bag (as an inner class) that will be used to communicate with the callback:
public sealed class Utilities
{
private Utilities( ) { }
…
private class StateBag
{
private Regex expression;
private IntPtr handle;
public Regex Expression
{
get { return expression; }
}
public IntPtr Handle
{
get { return handle; }
set { handle = value; }
}
public StateBag(Regex Expression)
{
expression = Expression;
}
}
}
Now let’s create a function to find a particular window based on its caption (its text). Note that I removed most of the error checks to make the code more readable.
public sealed class Utilities
{
… see above …
private delegate int EnumChildProc(IntPtr Handle, IntPtr Parameter);
public static IntPtr Find(Regex Caption)
{
return Find( UnsafeNativeMethods.GetDesktopWindow(), Caption );
}
public static IntPtr Find(IntPtr Parent, Regex Caption)
{
StateBag bag = new StateBag(Caption);
GCHandle bagHandle = GCHandle.Alloc(bag);
try
{
EnumChildProc childProc = new EnumChildProc(EnumChild);
UnsafeNativeMethods.EnumChildWindows(Parent, childProc, (IntPtr)bagHandle);
}
finally
{
if( bagHandle.IsAllocated )
bagHandle.Free();
}
return bag.Handle;
}
private static int EnumChild(IntPtr Handle, IntPtr Parameter)
{
StringBuilder caption = new StringBuilder(256);
UnsafeNativeMethods.GetWindowText(Handle, caption, caption.Capacity - 1);
StateBag bag = (StateBag)((GCHandle)Parameter).Target;
if (bag.Expression.IsMatch(caption.ToString()))
{
bag.Handle = Handle;
return 0;
}
return 1;
}
}
Now that you found the window, you can do mostly anything you want with it: change its text, resize it, simulate mouse clicks, etc. If the product is using UI elements to maintain its state, you can pretty much mess up everything. If you would like to send messages, use the following API:
[DllImport("user32.dll", CharSet=CharSet.Auto)]
public static extern int SendMessage( IntPtr handle, uint Msg, IntPtr wParam, IntPtr lParam );
//Or for messages containing text:
[DllImport("User32.dll", CharSet=CharSet.Auto )]
public static extern IntPtr SendMessage( IntPtr handle, int MSG, int wParam, StringBuilder lParam );
More information will be provided in the next article along with code looking for a particular window’s class.
Communicate with unexposed in-process windows
Sometimes, it happens that a 3rd party control won’t give access to a particular underlying control. However you can always access it by communicating with the windows inside of it. Since you have the current Process reference, you just need to get the handle of the main window then go down the windows hierarchy and hook up to the appropriate ones using a similar function than above; Microsoft Spy++ can be very helpful here. Another interesting API allows you to install a WinProc hook on a particular event for a specific in-process window – which is pretty cool (see this KB).
Map over predefined graphic areas
In some cases, the UI elements are drawn directly on a single window so there is no way to get a handle on them because they don’t really exist. The underlying global window manages the clicks and routes them to the appropriate handlers depending where the clicks actually happened. A more specific example would be web-based card games that may use that technique to draw the table and the cards J. A way to work around this problem is to create our own mapping (region) over each area of interest. That region would consist of the parent window’s handle which processes all the events and a rectangle to express its position and size. A click method (for example) would notify the parent window by redirecting the “mouse down/up” events. Further functionalities can be added as desired; for example, the BitBlt native API can be used to take a snapshot (say an image as a bitmap) of the underlying window then parse specific areas of that bitmap and expose properties expressing the mapped UI element. Here’s an overview of that kind of functionality:
public class Region
{
protected IntPtr owner;
private Rectangle area;
private Bitmap view;
private object updateSync = new object( );
public Rectangle Area
{
get { return area; }
}
public Bitmap View
{
get
{
lock( updateSync ) { return view; }
}
}
public IntPtr Owner
{
get { return owner; }
}
public Region( IntPtr Owner, Rectangle Area )
{
owner = Owner;
area = Area;
}
/// Overload this
/// Call the base.Update() then parse the bitmap
/// Finally, add some properties and set them based on the parsed bitmap
public virtual void Update( )
{
lock( updateSync )
{
if( view != null )
view.Dispose( );
IntPtr srcContext = UnsafeNativeMethods.GetWindowDC( owner );
IntPtr destContext = UnsafeNativeMethods.CreateCompatibleDC( srcContext );
IntPtr destBmp = UnsafeNativeMethods.CreateCompatibleBitmap( srcContext, area.Width + 1, area.Height + 1 );
UnsafeNativeMethods.SelectObject( destContext, destBmp ); // Move the bmp into the context
// Make sure the window got the focus
UnsafeNativeMethods.SetForegroundWindow( owner );
// Copy the selected region from the source context to the destination context
UnsafeNativeMethods.BitBlt( destContext, 0, 0, area.Width + 1, area.Height + 1, srcContext, area.X + SystemInformation.FixedFrameBorderSize.Width, area.Y + SystemInformation.CaptionHeight + SystemInformation.FixedFrameBorderSize.Height, UnsafeNativeMethods.SRCCOPY );
try { view = Bitmap.FromHbitmap( destBmp ); }
catch( Exception ) { }
// Fast Cleanup -- Make sure no leaks
UnsafeNativeMethods.ReleaseDC( owner, srcContext );
UnsafeNativeMethods.DeleteDC( destContext );
UnsafeNativeMethods.DeleteObject( destBmp );
}
}
}
I haven’t provided the definition of the native methods used; they can be found on the net or in the SDK. If you have any ideas, questions or comments related to this article then please let me know. In the subsequent article, I will concentrate on automating an Internet Explorer window.
Joe