Friday, April 25, 2014

A better digitalWrite for Arduino

Although I do a lot of AVR programming directly using avr-gcc, I sometimes use the Arduino IDE.  Some of the code I write could be useful to other people, so I'll try to write it so it can easily work in the Arduino IDE.  Sometimes I want to use a library that is available for Arduino, but not as a standalone AVR library.

One of the common complaints about the Arduino framework is the poor performance of the digitalWrite function.  Jan Dolinay has done an analysis of digitalWrite which shows just how bad it is.  As Jan points out though, if the pin passed to digitalWrite is not declared constant, the big version of digitalWrite is used.  If the pin number is never changed, then shouldn't there be a way to get the short and fast code, even though the pin number is not declared const?  My first idea was maybe some template tricks could be used, but I couldn't come up with anything.

The breakthrough came when I read about link time optimization, and realized it is smart enough to recognize constant variables that are not declared const.  Another hurdle is that the Arduino IDE uses gcc 4.3, which doesn't support LTO.  That should be out of the way with an upcoming release of version 1.5 of the IDE that will include avr-gcc 4.8.1.  Knowing that, I started thinking about how the ideal digitalWrite function will work.

First off, it will need to be different than the technique used in the Wiring framework.  It uses __builtin_constant_p( pin ) to use the fast version when the pin is constant, or call the slow version when the pin is not.  This is based on the parameter being declared const, not if LTO figures out the parameter is const in reality.  I couldn't tell this from the gcc documentation, but the results of a test program I wrote were clear.  So my goal is to write a simple digitalWrite that is easy for the compiler to optimize.  Here's what I'm starting with:
void digitalWrite(uint8_t pin, uint8_t val)
    if (val & 0x1)
        (IOPORT |= (1<<pin));
        (IOPORT &= ~(1<<pin));

It's a simplified version to start with - later I'll add macros for IOPORT similar to those used in the Wiring framework.  I already figured out LTO will optimize a port assigment to a single sbi, so it was no suprise when:
int clkPin = 5;

void main(void)

Compiled to a single sbi instruction.  While the pin used in a sketch is almost always known at compile time, the value might not be.  One example would be reading a sensor, then writing the bits to a pin in attached to an ASK/OOK transmitter.  To ensure the compiler won't know what the value is, I read it from an IO register.  Here's the code:
int main(void)
    uint8_t data = GPIOR0;
    GPIOR1 = data;
Which compiles to:
00000030 <main>:
  30:   81 b3           in      r24, 0x11       ; 17
  32:   80 ff           sbrs    r24, 0
  34:   02 c0           rjmp    .+4             ; 0x3a <main+0xa>
  36:   c3 9a           sbi     0x18, 3 ; 24
  38:   01 c0           rjmp    .+2             ; 0x3c <main+0xc>
  3a:   c3 98           cbi     0x18, 3 ; 24
  3c:   82 bb           out     0x12, r24       ; 18
  3e:   08 95           ret

If the optimizer was a bit better, the digitalWrite code could be compiled to 4 instead of 5 instructions - sbrs, cbi, sbrc, sbi.  I played with different ways of doing the if condition, but the compiler always generated 5 instructions.  The code is still good (5 or 6 cycles), but if anyone knows how to get it compiled to 4 instructions, please leave a comment.

In my next post I'll add the macros for mapping the pin to a port and mask, and see what code is generated when pin number is determined at runtime.

Starting next week (May 2014), the beta 4.8.1 nightly build of the Arduino IDE should include LTO in the build.  Even without changes to the digitalWrite function, most code should be smaller and faster.

No comments:

Post a Comment