WWN Issue 86

This is the 86th release of the Wine's kernel cousin publication. Its main goal is to distribute widely what's going on around Wine (the Un*x Windows emulator).

This week, 30 posts consumed 128 K. There were 16 different contributors. 4 (25%) posted more than once. 6 (37%) posted last week too.

The top 5 posters of the week were:

9 posts in 31K by Alexandre Julliard
5 posts in 28K by David Howells
2 posts in 24K by Michael McCormack
2 posts in 7K by Gavriel State

Headlines
Daniel Schwarz' web site ( http://www.winecentric.com/ ) has been completely revamped and now has expanded coverage: in addition to the existing coverage of Lotus Notes, info for Microsoft Excel '97 have been added (How to install, how to use, tips and tricks, etc).

Headlines

Daniel Schwarz' web site ( http://www.winecentric.com/ ) has been completely revamped and now has expanded coverage: in addition to the existing coverage of Lotus Notes, info for Microsoft Excel '97 have been added (How to install, how to use, tips and tricks, etc).

Enhanced asynchronous I/O

05 Mar 2001 00:00:00 -0800

Archive

Michael McCormack provided a first version of a patch to enhance the I/O operations in Wine: This patch moves responsibility for asynchronous I/O to the client process.

I think this implementation is more efficient, as it makes fewer server calls and duplicates fewer file descriptors, while maintaining correctness.

However, Mike requested some feedback on his patch.

Alexandre Julliard objected several points:

first of all, the patch conflicted with underway work from Alexandre: your approach is not going to work with the latest changes I made to the server. The good news is that the changes I'm making are in part to allow making server calls in signal handlers, so when this works you should be able to use SIGIO to do async IO.
secondly, Alexandre really doubted the patch improved the performance.

Mike tried to defend a bit his changes (especially, in the area of performance, where Mike thought he drastically reduced the number of context switches and the number of server calls), but Alexandre remained dubious: Reducing the number of server calls but making them more expensive is not necessarily a gain. Show me the numbers...

Later on, Mike provided a second patch (a derivation of the first one, making use of Alexandre's latest improvements on the server communication protocol). Here are the final results of a simple test program Mike wrote (against Wine 2001/03/05):

Wine's patch	average write time for "AT" command	average read time for response
vanilla	675 µsec	634 µsec
Mike's patch	362 µsec	322 µsec

Sounds like a big gain!!!

However, this doesn't fix one of Alexandre favorite topics: getting rid of the service thread (it's only use as of today is the handling of asynchronous requests). Using (SIGIO) signals should help getting rid of it.

So, this discussion is likely not finished yet. We'll keep you posted with its follow-up.

Wine's speed up (cont'd)	06 Mar 2001 00:00:00 -0800	Archive
Following <kcref>last weeks' discussions </kcref>, David Howells, Alexandre Julliard and Gavriel State resumed their exchanges. David re-iterated his main gripe against the slow speed of access to files... Every Read/WriteFile goes to the wineserver to convert the handle into a file descriptor and to check for locking. The FD is then passed back over a UNIX domain socket, used once and then closed. Alexandre Julliard explained this had just been enhanced: the file descriptor is only transferred once. All subsequent accesses only check if the file descriptor on client's side is still valid, hence reducing the complexity and the length of the server call (but, not the number of calls). The latency of the Wine server call is rather high as David explained: Context switching is the main element of it. Going to the wineserver and back again just for a ReadFile() call or a Wait() function incurs a fairly serious penalty (particularly on an X86, I think). Plus there's no requirement for the kernel to pass the remains of your timeslice to the wineserver and back again. Since the context switch also implies that you have to flush all the CPU caches, muck around with the MMU and execute scheduling algorithms, this can explain some of the latency. However, Alexandre thinks that it should be possible to improve that by a small kernel hack. It will never be as fast as doing everything in the kernel of course, but it may just be fast enough to avoid the need to reimplement the whole server. and that we are doing more than two switches (though I haven't proved it), which is why I think there is a margin for improvement. You'll obviously always have the context switch cost unless everything is in the kernel. By a small kernel hack* , Alexandre means having a specialized fifo, a network protocol, an ioctl, etc. Basically any mechanism that ensures that we do the strict minimum number of context switches and schedule() calls for a server call. And probably also a way to transfer chunks of memory from the client address space so that we don't need the shared memory area. David already suggested a new protocol (AF_WINE) which could nicely fit into this category (and also let the ability to use the internal API on non Linux platforms, although the kernel module had to be rewritten). David also asked Alexandre how does he plan on doing the locking stuff for Read/WriteFile? Cache it locally? It is unfortunate, but you can't really make use of UNIX file locking, since this is mostly advisory and as such doesn't actively stop read/write calls. Alexandre quickly replied Yes, we'll need to store the locks in the server and check them before each read/write (and probably also release them afterwards if necessary). There may be some optimizations possible, but we should probably do it the easy way first. This would, of course, require some more server calls. Later on, Gavriel explained that Alexandre would unlikely accept a huge patch at once, and that he'd rather have an incremental approach. Alexandre answered, but also spoke out some directions for adding such a kernel module David is working on into Wine: The kernel module itself may be hard to do incrementally, but you should really consider reusing the existing server API so that your module can be plugged in easily. For instance your module entry points should be the same as the server requests, and use the same request structures. As a reminder, David used the int 0x2E trap (as any NT system does) to hook the kernel module up to the Wine code, putting more into the Linux kernel than Wine currently does with its wineserver. However, this introduces another API into Wine, and makes it quite difficult to maintain the two APIs (the INT 0x2E and the wineserver's). Alexandre explained what he had in mind a bit more clearly: I'm not suggesting keeping the current socket stuff, just reusing the structures. So basically instead of passing the address of the stack arguments (which is really ugly IMO) to your ioctl, you pass one of the server request structures. This allows your changes to be localized to wine_server_call and doesn't require changing any of the routines that make server calls. Obviously you'd need some more changes for a few calls like ReadFile/WriteFile, but most operations could switch to your mechanism without needing any change. You simply cannot require people to recompile all of Wine to use your module. David also pointed out some strange issues with Wine loader. After some discussion, it turned out that alignments required by mmap did change between Linux 2.2 and 2.4. Wine did made the assumption that Page alignment is needed for the address in memory, not for the offset inside the file on disk; since section virtual addresses in PE files are always page-aligned the memory address is never a problem. The only problem comes from the alignment of the data inside the PE file, and this is where we only need block-size alignment to make mmap possible. David also proposed some enhancements for the Linux 2.4 kernel. As a (temporary) conclusion, the area of optimizing the Wine architecture is still under heavy discussion. Many tracks are available, and the potential results/benefits are still not 100% clear. On the bright side, there's still lots of space for improvement.

Wine's speed up (cont'd)

06 Mar 2001 00:00:00 -0800

Archive

Following <kcref>last weeks' discussions </kcref>, David Howells, Alexandre Julliard and Gavriel State resumed their exchanges.

David re-iterated his main gripe against the slow speed of access to files... Every Read/WriteFile goes to the wineserver to convert the handle into a file descriptor and to check for locking. The FD is then passed back over a UNIX domain socket, used once and then closed.

Alexandre Julliard explained this had just been enhanced: the file descriptor is only transferred once. All subsequent accesses only check if the file descriptor on client's side is still valid, hence reducing the complexity and the length of the server call (but, not the number of calls).

The latency of the Wine server call is rather high as David explained: Context switching is the main element of it. Going to the wineserver and back again just for a ReadFile() call or a Wait*() function incurs a fairly serious penalty (particularly on an X86, I think). Plus there's no requirement for the kernel to pass the remains of your timeslice to the wineserver and back again. Since the context switch also implies that you have to flush all the CPU caches, muck around with the MMU and execute scheduling algorithms, this can explain some of the latency.

However, Alexandre thinks that it should be possible to improve that by a small kernel hack. It will never be as fast as doing everything in the kernel of course, but it may just be fast enough to avoid the need to reimplement the whole server. and that we are doing more than two switches (though I haven't proved it), which is why I think there is a margin for improvement. You'll obviously always have the context switch cost unless everything is in the kernel.

By a small kernel hack , Alexandre means having a specialized fifo, a network protocol, an ioctl, etc. Basically any mechanism that ensures that we do the strict minimum number of context switches and schedule() calls for a server call. And probably also a way to transfer chunks of memory from the client address space so that we don't need the shared memory area. David already suggested a new protocol (AF_WINE) which could nicely fit into this category (and also let the ability to use the internal API on non Linux platforms, although the kernel module had to be rewritten).

David also asked Alexandre how does he plan on doing the locking stuff for Read/WriteFile? Cache it locally? It is unfortunate, but you can't really make use of UNIX file locking, since this is mostly advisory and as such doesn't actively stop read/write calls. Alexandre quickly replied Yes, we'll need to store the locks in the server and check them before each read/write (and probably also release them afterwards if necessary). There may be some optimizations possible, but we should probably do it the easy way first. This would, of course, require some more server calls.

Later on, Gavriel explained that Alexandre would unlikely accept a huge patch at once, and that he'd rather have an incremental approach. Alexandre answered, but also spoke out some directions for adding such a kernel module David is working on into Wine: The kernel module itself may be hard to do incrementally, but you should really consider reusing the existing server API so that your module can be plugged in easily. For instance your module entry points should be the same as the server requests, and use the same request structures.

As a reminder, David used the int 0x2E trap (as any NT system does) to hook the kernel module up to the Wine code, putting more into the Linux kernel than Wine currently does with its wineserver. However, this introduces another API into Wine, and makes it quite difficult to maintain the two APIs (the INT 0x2E and the wineserver's).

Alexandre explained what he had in mind a bit more clearly: I'm not suggesting keeping the current socket stuff, just reusing the structures. So basically instead of passing the address of the stack arguments (which is really ugly IMO) to your ioctl, you pass one of the server request structures. This allows your changes to be localized to wine_server_call and doesn't require changing any of the routines that make server calls. Obviously you'd need some more changes for a few calls like ReadFile/WriteFile, but most operations could switch to your mechanism without needing any change. You simply cannot require people to recompile all of Wine to use your module.

David also pointed out some strange issues with Wine loader. After some discussion, it turned out that alignments required by mmap did change between Linux 2.2 and 2.4. Wine did made the assumption that Page alignment is needed for the address in memory, not for the offset inside the file on disk; since section virtual addresses in PE files are always page-aligned the memory address is never a problem. The only problem comes from the alignment of the data inside the PE file, and this is where we only need block-size alignment to make mmap possible. David also proposed some enhancements for the Linux 2.4 kernel.

As a (temporary) conclusion, the area of optimizing the Wine architecture is still under heavy discussion. Many tracks are available, and the potential results/benefits are still not 100% clear. On the bright side, there's still lots of space for improvement.

World Wine News