lists.zerezo.com



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

***BOGO*** Re: [Mingw-users] [OFF-TOPIC] MinGW with SIMD (SSE, etc)



Dustin McCartney escreveu:
>> Dustin,
>>
>> If you really want to take advantage of aligned data, you should 
>> allocate all the data you will use with SSE in the C/C++ side, then just 
>> send a pointer via a FloatBuffer/ByteBuffer to Java. That way, you only 
>> have to work with pointers between the two, and they will aways point to 
>> aligned data. Any POD structure that have a intrinsic variable (union, 
>> struct) will be aligned automatically by the compiler even if you use 
>> new/malloc so I think that's the way you will get the most performance. 
>> That's what I do in my C++ code and I never had any issue using movaps.
>>
>> I don't know how to send that pointer via FloatBuffer to Java as I have 
>> never done that before but I think it's not that difficult.
>>
>> best,
>>
>> Rafael
>>     
>
> Actually, I am not sharing any of the data that uses SSE with Java via
> JNI.  I am using Java simply as a "controller" that tells the C/C++ code
> when to do what (which the Java controller can invoke in concurrent
> threads).
>
> My current problem is that I have a Vector3 class defined as such (with
> defines for MinGW/GCC):
>
> #define PFX_EXPORT __stdcall
> #define ALIGN16_BEG
> #define ALIGN16_END __attribute__ ((aligned(16)))
> #define ALIGN_STACK __attribute__ ((force_align_arg_pointer))
>
> struct PFX_EXPORT Vector3 ALIGN16_BEG
> {
>   union
>   {
>     ALIGN16_BEG v4sf v ALIGN16_END;
>     struct ALIGN16_BEG 
>     {
>         float x, y, z, w;
>     } ALIGN16_END;
>   } ALIGN16_END;
>
>   PFXAPI Vector3() ALIGN_STACK {}
>
>   PFXAPI Vector3(bool bInit) ALIGN_STACK
>   {
>     if (bInit)
>     {
>       v = _mm_setzero_ps();
>       _mm_empty();
>     }
>   }
>
>   PFXAPI Vector3(const Vector3 & other) ALIGN_STACK
>   {
>     v = other.v;
>     _mm_empty();
>   }
>
>   PFXAPI Vector3(float fX, float fY, float fZ) ALIGN_STACK
>   {
>     v = _mm_set_ps( fX, fY, fZ, 0.0f );
>     _mm_empty();
>   }
>   ...
>   float PFXAPI Length() const ALIGN_STACK
>   {
>     Vector3 tmp;
>     tmp.v = _mm_mul_ps( v, v );
>     _mm_empty();
>     return sqrtf( tmp.x + tmp.y + tmp.z );
>   }
>   ...
> };
>
> I use Vector3 in various other objects such as:
> struct PFX_EXPORT Foo
> {
>   Vector3 m_v3Test;
> };
>
> SegFaults occur when the Vector3 object functions (such as the
> constructors) are called, as the Vector3 member variable "v" is not
> aligned on 16 byte boundaries.  This has been driving me crazy.  :-P
>
> Dustin
>
>
>   

The problem is that a function cannot be __stdcall if you force its 
parameters to be 16 byte aligned (and vice-versa). There's just no way 
as you are changing the calling convention itself. This only works if 
every binary (including the Java runtime itself as it calls your 
functions) is compiled to use the same convention. You would only 
benefit from a 16byte aligned stack if you pass vector data by value to 
the functions that operate on them. If you are using 
references/pointers, there's no need for that. Local data that have the 
aligned attribute will be aligned, independently from the alignment of 
the function parameters (the arg pointer you are aligning).

This is what I did in my code:

typedef float __attribute__((mode(V4SF))) __attribute__((aligned(16))) 
__m128;

and in my Vector class:

class Vector {
    union {
        __m128 vector;
        struct {
            float x;
            float y;
            float z;
            float w;
        };
        float data[4];
    };

// ... class methods, never taking vectors by value
};

I'm using g++ 3.4.5 and I didn't have any problems. If I need a 
temporary vetor only to work with the SSE functions, I just use:

register __m128 temp;

and g++ makes a very good code only using the SSE registers (MS compiler 
is dumb: it likes to copy data everytime, so I had to write the assembly 
code myself).

I don't know if that will work with gcc 4, but as you can see you don't 
need to be so worried about alignment. The compiler will do that for you.

Send me a private message if you would like to talk more about that, 
because this is off-topic.

Rafael

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
MinGW-users mailing list
MinGW-users@xxxxxxxxxxxxxxxxxxxxx

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users