Arc Forumnew | comments | leaders | submitlogin
2 points by binx 5923 days ago | link | parent

Just leave the peephole stuff to gcc, it almost always does better than handcoded optimizer.

The CPS transformed code can be arbitrarily inlined, so a simple inliner without flow analysis can give you much efficiency for free.

2 points by almkglor 5923 days ago | link

And if the target isn't gcc?

For that matter my concern is with the expansion of PUSH and POP:

   PUSH(x); y = POP();


   *sp++ = x; y = *--sp;
Can gcc peephole the above, completely eliminating the assignment to the stack (which is unnecessary in our case after all)?

   y = x; //desired target
Without somehow informing gcc to the contrary, gcc will assume that writing to * sp is significant, even though in our "higher-level" view the assignment to * sp is secondary to transferring the data between two locations.


4 points by sacado 5923 days ago | link

Actually, I tried the above (tuning generated code so as to change something like :

to its semantic but obviously much faster equivalent :

  memcpy (stack, stack + 5, sizeof(obj) * 3); sp = stack + 3; END_JUMP(3);

  PUSH(x); if(POP)

  if (x)
Well, with full optimizations on gcc (-O3), it doesn't change anything (at least in execution time, I didn't compare generated machine codes). Wow, gcc seems really clever. Now that I know how hard it is to implement a compiler, I can only applaud :)


1 point by almkglor 5923 days ago | link

WOW. gcc must be real good then ^^.