[NDR] Implement NdrClientCall2 and NdrServerCall2
David Laight
david at l8s.co.uk
Thu Dec 1 16:05:04 CST 2005
On Thu, Dec 01, 2005 at 11:09:29AM +0100, Alexandre Julliard wrote:
> Robert Shearman <rob at codeweavers.com> writes:
>
> > + "shrl $2, %ecx\n\t" /* divide by 4 */
> > + "rep movsl\n\t" /* Copy dword blocks */
> > + "movl %eax, %ecx\n\t"
> > + "andl $3, %ecx\n\t" /* modulus 4 */
> > + "rep movsb\n\t" /* Copy remainder */
>
> If the argument size is not a multiple of 4 you are in serious
> trouble...
Not only that, but the code above is not very efficient!
The setup time for 'rep movsx' instruction is significant on many
modern cpus, making the second 'rep movsb' particularly slow.
I'm not even sure what the break-even length for the one is!
Sequence like (give or take assembler syntax):
mov %eax,(%esi+%ecx-4)
mov (%edi+%ecx-4),%eax
shrl $2, %ecx
rep movsl
should be better given large enough %ecx
David
--
David Laight: david at l8s.co.uk
More information about the wine-devel
mailing list