Optimize QLineF::unitVector() on platforms with SSE2
While profiling the QGears2 graphics benchmark, QlineF::unitVector()
is identified as a CPU hotspot. In some cases, it can consume up
to 5% CPU time. The culprit is poorly vectorized code generated
by the GNU C++ compiler on platforms which support SSE2. Using
SSE intrinsics does not remedy this situation. The inline assembler
introduced by this commit replaces that poorly vectorized code with
more optimal code. When applied to tag v4.7.1, this inline
assembler reduces the cycle count for this function by approximately
20%. This cycle reduction yields an approximate 6% boost in FPS for
the benchmark.