December 2007 - Posts

What’s Next for Window Clippings

I know a lot of you are wondering what’s going on with Window Clippings. I had hoped to release another update this year but time has just slipped away. It’s turned into a pretty hectic year what with moving from Canada to the UK and starting a new job as a technology consultant in London, quite a departure from commercial software development.

Unfortunately I cannot work on Window Clippings full time at the moment as I know many of you would like but progress is still being made. I have scrapped plans for the minor 2.1 release and am moving ahead with work on a more significant 2.5 release. The release is aimed at addressing a few key themes:

  • Usability and bug fixes
  • Installation and initial experience
  • Feature parity with Vista’s snipping tool
  • Screen capture API
  • More built-in add-ins

Usability and bug fixes

I’ve chatted with many users about how they use the product and have received much invaluable feedback. There are some bugs but mostly I’m focusing on making the core features easier to use and Window Clippings work more naturally and intuitively.

Installation and initial experience

I’m biting the bullet and providing an installer. I’ve always had it in my head that something like this doesn’t absolutely need an installer and should just work. Although I will continue in that spirit I realize that not everybody is a techie or cares to find a good spot to stick Window Clippings and an installer would simply help folks get started more easily.

Feature parity with Vista’s Snipping tool

I’ve also been told by a number of users that they’d love to replace Vista’s Snipping tool with Window Clippings but that the Snipping tool provides freehand selection and highlighting facilities which Window Clippings really should provide. So the next release aims to provide feature parity and in particular supports freehand selection to complement the existing rectangular selection. There’s also a new highlight filter add-in that lets you easily use a “marker” to highlight a captured image or use ellipses and other shapes to draw attention to parts of a screenshot.

Window capture API

I’ve had many requests from ISVs to incorporate the screen capture functionality into their applications. Although I’ve provided this in some cases it was never very easy to separate the functionality from the Window Clippings application. Well this is now fixed and Window Clippings itself is built using the window capture API which will be licensed separately.

More built-in add-ins

I’ve been working on a number of new add-ins that will join the existing built-in add-ins including the highlight filter mentioned above, a more polished watermark filter, a much more sophisticated “Save to disk” add-in with support for thumbnails and a “Send to Amazon S3” add-in to name a few.

So development on Window Clippings continues but it is unfortunately something I can only work on in my spare time at the moment what with Window Clippings only really covering the costs of web hosting and Internet access. Considering the many thousands of regular users I’m suspecting that the “free” version is just too good!  :)

Anyway, I just wanted to confirm that a new version is in the works and give you an idea of what you can look forward to. Let me know what you think!

 

Posted by KennyKerr with 14 comment(s)

Parallel Programming with C++ – Part 3 – Queuing Asynchronous Procedure Calls

In part 1 of the Parallel Programming with C++ series I introduced asynchronous procedure calls (APCs) and how they can be used with alertable I/O to process asynchronous I/O requests without blocking an application’s thread. In part 2, I showed how APC handling can be integrated with window message loop.

Although that covers the most common use cases for APCs there are still many more ways in which you can use APCs. Windows uses APCs extensively and many subsystems use and expose them in various ways. I will be covering some of these in future articles in this series as it relates to other concurrency topics. For now I want to wrap up the discussion of APCs by showing how you can queue your own user-mode APCs. In a future column I’ll show you how you can set APCs to be queued in the future which can be more useful but for now let’s take a look at the QueueUserAPC function:
 
if (!::QueueUserAPC(apcFunction,
                    threadHandle,
                    data))
{
    // Call GetLastError for more information.
}

The first parameter is the APC function to be queued. The second is the handle to thread identifying the APC queue. The last parameter is there for you to pass any contextual information to the APC function. Let’s take a look at a more complete example. Here’s a simple thread procedure that queues APCs:

bool m_stopped = false;

void CALLBACK OnEvent(ULONG_PTR context)
{
    std::cout << context << std::endl;
}

void CALLBACK OnStopped(ULONG_PTR context)
{
    ASSERT(0 == context);

    std::cout << "Stopped" << std::endl;

    m_stopped = true;
}

DWORD WINAPI Producer(HANDLE consumerThread)
{
    for (int index = 0; index < 10; ++index)
    {
        ::Sleep(1000);

        VERIFY(::QueueUserAPC(OnEvent,
                              consumerThread,
                              index));
    }

    ::Sleep(1000);

    VERIFY(::QueueUserAPC(OnStopped,
                          consumerThread,
                          0));

    return 0;
}

The OnEvent APC function is queued 10 times in 1 second intervals and then the OnStopped APC function is queued a second later before the thread terminates. The thread parameter is actually a handle to the thread that the APCs should be queued on. The OnEvent APC function simply prints the context value. The OnStopped APC function sets the m_stopped value to true to signal the end of the operation.

An application can then make use of this producer thread procedure by creating the thread and then waiting for the APC functions to be queued. The only real trick at this point is figuring out how to get the producer thread the handle to the current thread. The GetCurrentThread function returns a handle to the current thread but it’s really a pseudo handle representing whatever happens to be the current thread. So to pass this handle to another thread you need to use the DuplicateHandle function to create an actual handle that is portable. And here is the result:

int main()
{
    CHandle consumerThread;

    if (!::DuplicateHandle(::GetCurrentProcess(),
                           ::GetCurrentThread(),
                           ::GetCurrentProcess(),
                           &consumerThread.m_h,
                           THREAD_SET_CONTEXT, // only permission required by QueueUserAPC
                           FALSE, // not inheritable
                           0)) // no options
    {
        // Call GetLastError for more information.
    }

    ASSERT(0 != consumerThread);

    CHandle producerThread(::CreateThread(0, // default security
                                          0, // default stack size
                                          Producer, // thread proc
                                          consumerThread, // context
                                          0, // flags
                                          0)); // ignore thread id

    if (0 == producerThread)
    {
        // Call GetLastError for more information.
    }

    while (!m_stopped)
    {
        ::SleepEx(INFINITE, TRUE);
    }
}

Finally I should point out that you need to be conscious of what system calls you make within your APC functions. Specifically you need to avoid calling other functions that might directly or indirectly enter an alertable state. This may not always be obvious so be sure to read the documentation carefully!

That’s it’s for today.

Read part 4 now: I/O Completion Ports

© 2007 Kenny Kerr

Posted by KennyKerr with 2 comment(s)

Parallel Programming with C++ – Part 2 – Asynchronous Procedure Calls and Window Messages

In part 1 of the Parallel Programming with C++ series I introduced asynchronous procedure calls (APCs) and how they can be used with alertable I/O to process asynchronous I/O requests without blocking an application’s thread.

Of course the example still ended up blocking since the SleepEx function was used to flush the APC queue. Fortunately that’s not the only function that Windows provides to place a thread in an alertable state and in fact Windows provides a number of such functions with different characteristics. One that is particularly useful for client applications is MsgWaitForMultipleObjectsEx as it allows you to integrate APC handling into a thread’s message loop.

Before we can examine this function however I first need to recap how message loops work. Any thread that creates windows directly or indirectly must remove and dispatch messages destined for its windows. Normally you shouldn’t worry about writing your own message loop. Whatever user interface framework you happen to be using, whether its MFC, WTL, Windows Forms or even WPF, will provide an implementation tailored for it. At the heart of any message loop however are a few fundamental functions. The GetMessage function (usually) removes a message from a thread’s message queue and copies the message information to the provided MSG structure. If the message queue is empty, GetMessage will wait until one arrives in the queue before returning. A return value of -1 indicates that an error occurred retrieving the message. Usually this is handled simply by looping again and waiting for the next message. A return value of 0 indicates that the WM_QUIT message was retrieved. This is usually a signal that the message loop should exit and the thread should terminate. Any other return value indicates that a message was successfully retrieved and should be dispatched to the appropriate window procedure for handling. Here is a simple message loop implementation:

int Run()
{
    MSG messageInfo = { 0 };

    while (true)
    {
        // Wait for next message.
        BOOL result = ::GetMessage(&messageInfo,
                                   0, // all windows
                                   0, // all messages
                                   0); // all messages

        if (-1 == result)
        {
            TRACE(L"GetMessage failed (%d)\n", ::GetLastError());
            continue;
        }

        if (0 == result)
        {
            ASSERT(WM_QUIT == messageInfo.message);
            break;
        }

        // Send message to window procedure.
        ::DispatchMessage(&messageInfo);
    }

    // Return the WM_QUIT exit code.
    return static_cast<int>(messageInfo.wParam);
}

Please keep in mind that message loops in practice can be a lot more complicated. You might want to translate keyboard input into character messages, allow a window to “pre-translate” a message for dialog handling, etc. but the example above is sufficient for this discussion.

Now back to APCs. In part 1 I mentioned that you can use SleepEx with an interval of zero to handle all pending APCs and then return immediately. One (flawed) solution is to integrate SleepEx with the message loop above. The problem is that it would only handle APCs when messages arrive since queued APCs will not signal GetMessage to return. What we need is a function that will wait on both a thread’s message queue and its APC queue. Fortunately just such a function exists and as I mentioned before it is called MsgWaitForMultipleObjectsEx. Like SleepEx it places a thread into an alertable state so that APCs can be handled, but unlike GetMessage it does not retrieve messages from a thread’s message queue but simply returns indicating that messages are available. Fortunately that’s all we need to update our message loop to also handle APCs efficiently.

Since MsgWaitForMultipleObjectsEx is doing the waiting for us, we cannot also use GetMessage to retrieve messages from the queue and instead need to use the closely related PeekMessage function. PeekMessage is similar to GetMessage but it does not wait for a message. It will optionally remove a message from the queue but if a message is not available it will return immediately. Fortunately that’s exactly the behaviour we need. We can simply call PeekMessage in a loop to flush the message queue and then call MsgWaitForMultipleObjectsEx to wait for new messages or APCs. Here’s an updated message loop reflecting this:

int Run()
{
    MSG messageInfo = { 0 };

    while (WM_QUIT != messageInfo.message)
    {
        DWORD result = ::MsgWaitForMultipleObjectsEx(0, // no handles
                                                     0, // no handles
                                                     INFINITE,
                                                     QS_ALLINPUT,
                                                     MWMO_ALERTABLE | MWMO_INPUTAVAILABLE);

        if (WAIT_FAILED == result)
        {
            TRACE(L"MsgWaitForMultipleObjectsEx failed (%d)\n", ::GetLastError());
            continue;
        }

        ASSERT(WAIT_IO_COMPLETION == result || WAIT_OBJECT_0 == result);

        if (WAIT_OBJECT_0 == result)
        {
            while (::PeekMessage(&messageInfo,
                                 0, // any window
                                 0, // all messages
                                 0, // all messages
                                 PM_REMOVE))
            {
                if (WM_QUIT == messageInfo.message)
                {
                    break; // WM_QUIT retrieved so stop looping
                }

                ::DispatchMessage(&messageInfo);
            }
        }
    }

    ASSERT(WM_QUIT == messageInfo.message);
    return static_cast<int>(messageInfo.wParam);
}

As you might have guessed, MsgWaitForMultipleObjectsEx is an alertable version of MsgWaitForMultipleObjects and both are similar to WaitForMultipleObjects(Ex) in that they can wait for kernel objects to be signalled. In this case however we don’t need to wait on kernel objects so the first and second parameters are set to zero. The third parameter indicates the minimum number of milliseconds that the thread should be suspended (unless the wake conditions are met). INFINITE indicates that the call should not time out. The second-to-last parameter is a bitmask indicating the types of messages that will force the call to return. QS_ALLINPUT indicates that the call should return as soon as any message is queued. You can tailor this to only wait for certain messages such as mouse input or paint messages. The last parameter indicates that the call should return if APCs are queued and handled (MWMO_ALERTABLE) or if messages were previously queued (MWMO_INPUTAVAILABLE). The latter is needed since MsgWaitForMultipleObjectsEx may not otherwise return if messages are queued prior to calling MsgWaitForMultipleObjectsEx.

MsgWaitForMultipleObjectsEx returns WAIT_FAILED if it fails for some reason. Call the GetLastError function for the actual reason. It returns WAIT_IO_COMPLETION to indicate that one or more APCs were handled. And it returns WAIT_OBJECT_0 if messages are waiting in the queue in which case they are de-queued using PeekMessage with the PM_REMOVE flag and dispatched to the appropriate window procedure using the DispatchMessage function.

With this new alertable message loop you can now safely and efficiently use APCs from your application’s window threads to perform asynchronous I/O without needing to add additional threads to your application and thereby complicate its design and implementation.

Read part 3 now: Queuing Asynchronous Procedure Calls

© 2007 Kenny Kerr

Posted by KennyKerr with 2 comment(s)

Parallel Programming with C++ – Part 1 – Asynchronous Procedure Calls

Who says you need to add additional threads to your application to keep it from becoming unresponsive? The golden rule for responsive client applications is to avoid blocking calls on window threads. A blocking function call on a window thread prevents the thread’s message loop from dispatching messages promptly and the result is an unresponsive set of windows since the window is not able to respond to input from the mouse, the keyboard, other applications or the operating system itself. A common solution is to make blocking calls on worker threads but threads are costly, introduce complexity into your application, and would itself not be doing much of use other than managing some state and waiting for the blocking call to return. One simple and efficient solution to this problem is called alertable I/O and makes use of asynchronous procedure calls (APCs) and that is the topic of this first part of the Parallel Programming with C++ series or articles.

Windows manages a queue of APCs for each thread and this allows user-mode as well as kernel-mode code to queue a function to be called at some point in the future. This feature allows you to build responsive applications with only a single thread, removing the need for background threads in many cases. Although it certainly doesn’t address every scenario, it does fit the bill quite nicely for many client applications. Let’s take a look at how this technique can be used in practice, but first I need to explain briefly how APCs work.

APCs come in kernel-mode and user-mode varieties. Kernel-mode APCs are queued by devices in the kernel and the kernel issues a software interrupt to give the APCs an opportunity to run in the context and address space associated with a thread. The primary reason for kernel-mode APCs is to allow code in the kernel access to the user-mode address space associated with a particular thread, in other words the virtual memory for a particular application.

User-mode APCs are queued in much the same way, but unlike the kernel-mode variety don’t execute without a thread’s permission. A thread needs to enter an alertable state at which point the thread handles all APCs in the queue in a first in first out order automatically. This becomes especially interesting when you realize that the kernel can queue a user-mode APC given the address of a function in the address space associated with a thread. Why is this interesting? Consider how I/O requests are fulfilled.

Let’s say you have a file handle that you would like to read from. This file handle might be a file on a local disk. It might be a file on a file server. It might not even be a file at all but rather the client end of a named pipe. The good news is that the I/O manager abstracts away the differences. The bad news is that regardless of what the file handle really represents it’s almost guaranteed to be a lot slower to read from than from a page in your address space. What is useful to realize about this type of latency is that it is not processor-bound. You might need to wait for a disk controller or a network roundtrip but the processor is otherwise free to perform other tasks such as dispatching window messages.

Under the hood an I/O request is shipped off to the kernel’s I/O manager which finds the appropriate device stack and submits an I/O request packet and then moves on. The device eventually finds the necessary data and notifies the I/O manager to complete the I/O request. At this point it needs a way to tell the application that originally made the I/O request that it has been completed and one way that it can do this is by using an APC. Assuming the application had associated an APC with the request, the kernel can easily queue a user-mode APC to the thread that made the request. The APC, also known as a completion routine in this case, is then called the next time the thread enters an alertable state. A thread enters an alertable state when it calls one of a handful of functions that suspend the thread.

The explanation above is a considerable simplification but sufficient for our needs. Now let’s look at an example to make this more concrete. I had considered some more interesting samples such as a market feed or chat client but decided that it would just needlessly complicate the samples. Let’s imagine there’s a server providing a stream of DWORD values over a named pipe.

The first step is to open the client end of the named pipe:

CHandle pipe;

pipe.Attach(::CreateFile(L"\\\\.\\pipe\\TestServer",
                         FILE_READ_DATA,
                         0, // no sharing
                         0, // default security
                         OPEN_EXISTING,
                         FILE_FLAG_OVERLAPPED,
                         0)); // no template

if (INVALID_HANDLE_VALUE == pipe)
{
    // The pipe is not accessible.
}

CHandle is simply a wrapper class provided by ATL that ensures that the underlying HANDLE is automatically closed when the variable goes out of scope. The CreateFile function opens the client end of the named pipe returning a file handle that can be used to access the pipe. The parameters to this function are unimportant to this discussion. Basically they just ensure that the handle can be used to read from the pipe using asynchronous I/O. CreateFile returns INVALID_HANDLE_VALUE if the pipe is not accessible. This may be due to a variety of reasons. For example the client may not have permission to connect to the pipe, the server may not be running, etc. Call the GetLastError function for the actual reason.

The next step is to begin reading the first value from the pipe into a buffer asynchronously:

void CALLBACK ReadFileCompleted(DWORD errorCode,
                                DWORD bytesCopied,
                                OVERLAPPED* overlapped);

DWORD buffer = 0;
OVERLAPPED overlapped = { 0 };

if (!::ReadFileEx(pipe,
                  &buffer,
                  sizeof(buffer),
                  &overlapped,
                  ReadFileCompleted))
{
    // The server may have closed the pipe or the connection was lost.
}

Unlike ReadFile, the ReadFileEx function can only be used to read asynchronously and it notifies the caller that the operation completed by queuing the caller-provided ReadFileCompleted function as an APC. The first parameter indicates the handle to read from. The second and third parameters indicate the address and size of the buffer that should receive any data that is read. The second-to-last parameter is the address of an OVERLAPPED structure. Typically OVERLAPPED is used to specify the file position for the operation since unlike synchronous I/O, an overlapped file handle does not keep track of this. Since we’re reading from a pipe however, we don’t need to specify a position and simply provide the address of a zero-initialized OVERLAPPED structure. It is still quite useful since its address is passed to your completion routine and this can be useful for identifying which I/O operation completed.

So far we’ve connected to a pipe using asynchronous I/O and began reading the first value from the pipe into the buffer asynchronously. We can’t however get the results from the read operation until the same thread that called ReadFileEx enters an alertable state. The simplest approach is to use the SleepEx function:

const DWORD sleepResult = ::SleepEx(INFINITE,
                                    TRUE); // Alertable

ASSERT(WAIT_IO_COMPLETION == sleepResult);

SleepEx’s first parameter indicates the minimum number of milliseconds that the thread should be suspended. INFINITE indicates that the call should not time out. The second parameter indicates that the thread should enter an alertable state prior to being suspended thus allowing it to handle any APCs in its queue before returning. If the APC queue is not empty when SleepEx is called it will not be suspended but instead handle all of the APCs immediately before returning. If the APC is empty it will suspend indefinitely until such time as an APC is queued at which point it will again be schedulable and will return once the queue is empty. Keep in mind that it is possible for APCs to be queued more rapidly that the thread can handle them in which case SleepEx may never return. SleepEx returns zero if the interval elapses. Alternatively it returns WAIT_IO_COMPLETION to indicate that one or more APCs were handled.

Of course suspending a thread isn’t a great way to build responsive applications. It is however a useful way to flush the APC queue. Simply use a value of zero for the timeout interval and it will not suspend the thread at all but simply handle any APCs that may have already been queued before returning. The performance-conscious developers out there may not like the sound of that as it can easily lead developers to “poll” the APC queue. We’ll take a look at an alternative approach that solves the problem in a far more efficient manner in part 2 of this series but first I want to wrap up this example. Here’s the complete example in a console application:

void CALLBACK ReadFileCompleted(DWORD errorCode,
                                DWORD bytesCopied,
                                OVERLAPPED* overlapped);

OVERLAPPED overlapped = { 0 };
CHandle pipe;
DWORD buffer = 0;
DWORD status = ERROR_SUCCESS;

int main()
{
    // Open the client end of a named pipe.
    pipe.Attach(::CreateFile(L"
\\\\.\\pipe\\TestServer",
                             FILE_READ_DATA,
                             0, // no sharing
                             0, // default security
                             OPEN_EXISTING,
                             FILE_FLAG_OVERLAPPED,
                             0)); // no template

    if (INVALID_HANDLE_VALUE == pipe)
    {
        // The pipe is not accessible.
        status = ::GetLastError();
    }
    else
    {
        // Read the first value from the pipe into the buffer asynchronously.
        if (!::ReadFileEx(pipe,
                          &buffer,
                          sizeof(buffer),
                          &overlapped,
                          ReadFileCompleted))
        {
            // The server may have closed the pipe or the connection was lost.
            status = ::GetLastError();
        }
        else
        {
            while (ERROR_SUCCESS == status)
            {
                // Wait until one or more APCs are queued.
                const DWORD sleepResult = ::SleepEx(INFINITE, // Suspend indefinitely
                                                    TRUE); // Alertable

                // Since the thread is suspended indefinitely it will only return
                // after one or more APCs are called.
                ASSERT(WAIT_IO_COMPLETION == sleepResult);
            }
        }
    }

    return status;
}

void CALLBACK ReadFileCompleted(const DWORD errorCode,
                                const DWORD bytesCopied,
                                OVERLAPPED* overlapped)
{
    // The read request may have failed asynchronously.
    // The server may have closed the pipe or the connection was lost.
    status = errorCode;

    if (ERROR_SUCCESS == status)
    {
        // The read request completed successfully.
        ASSERT(sizeof(buffer) == bytesCopied);

        // The current value is available in the buffer.
        std::cout << buffer << std::endl;

        // Read the next value from the pipe into the buffer asynchronously.
        if (!::ReadFileEx(pipe,
                          &buffer,
                          sizeof(buffer),
                          overlapped,
                          ReadFileCompleted))
        {
            // The server may have closed the pipe or the connection was lost.
            status = ::GetLastError();
        }
    }
}

If you would like to avoid global variables you can take advantage of the fact that the system provides the same address for the OVERLAPPED structure that you specify in a call to ReadFileEx to the completion routine. This means that you can hang some extra state off the end of it. You could for example declare a structure as follows:

struct SampleOverlapped
{
    SampleOverlapped() :
        Overlapped(OVERLAPPED()),
        Buffer(0),
        Status(ERROR_SUCCESS)
    {
        // Do nothing
    }

    OVERLAPPED Overlapped;
    CHandle Pipe;
    DWORD Buffer;
    DWORD Status;
};

If you pass the address to SampleOverlapped::Overlapped to ReadFileEx then you can simply cast the OVERLAPPED pointer in the completion routine to a SampleOverlapped pointer and thereby gain access to this extra state without having to make the memory global to both functions.

Read part 2 now: Asynchronous Procedure Calls and Window Messages

© 2007 Kenny Kerr

Posted by KennyKerr with 1 comment(s)

Parallel Programming with C++ – A New Series

Microsoft’s developer division has, at least publicly, been placing a lot of emphasis on making it easier for C# and VB developers to build scalable applications more easily. The Parallel Extensions CTP for .NET 3.5 is clear evidence of their commitment to the C# and VB developer. Why C# and VB and what about C++? Is C++ not getting the attention it needs in the age of parallel programming?

There are many ways to answer that question, but the thing to remember about C++ and specifically Visual C++ on the Window platform is that it has always been very amenable to parallel programming. After all, incredibly parallel programs like SQL Server and Windows itself are developed using Visual C++. Microsoft is spending a lot of effort at the moment making it easier to develop parallel programs in C# because it just isn’t very easy to do in that environment but no such problem exists for the C++ developer. In fact, Visual C++ introduced facilities that made it trivial to apply parallel programming techniques to .NET code years ago in the Visual C++ 2005 release while C# is only now getting similar functionality in the Parallel Extensions CTP. So whether you’re looking for parallel programming techniques for native or managed code Visual C++ is ready and willing today.

In my latest series of articles to be published here on my blog I’m taking a look at parallel programming with C++. I’ll start with the fundamentals of creating responsive client applications and scalable services and along the way look at various techniques and technologies to make it happen. Many are provided by Windows while others are provided directly by Visual C++. I may even cover some of the more enterprise-oriented techniques such as clustering and grid computing if there is enough interest and I have enough time.

Keep in mind that I’m writing this series in my spare time, in the evenings, of which I have surprisingly little these days. My kids want their dad around after all! We also still don’t have Internet access so articles in this series will appear whenever I happen to find the time to write and posted whenever I find an Internet connection. Thanks for reading and I hope you enjoy this new series!

Part 1: Asynchronous Procedure Calls

Part 2: Asynchronous Procedure Calls and Window Messages

Part 3: Queuing Asynchronous Procedure Calls

Part 4: I/O Completion Ports

Part 5: Coming soon...

If you’re looking for one of my previous articles here is a complete list of them for you to browse through.

© 2007 Kenny Kerr

Posted by KennyKerr with no comments
More Posts