"sub eax, 128" -> "add eax, -128". What's the improvement here ?

It's a space optimization. Sometimes shorter code leads to faster code because it frees up more space in the code cache. The trick is: -128 fits in a signed char; +128 does not. Here are some examples, along with their x86 machine code representation:

sub eax,+128    2D 80 00 00 00
add eax,-128    83 C0 80
sub ebx,+128    81 EB 80 00 00 00
add ebx,-128    83 C3 80