the following changes have been adopted from previous discussions:
- add an ALTERNATE mode to allow implementation-specific functions such as pin multiplexing
- change SetClr register to use interleaved set/clear bits (its width is now 2*pin_count)
- the `input_stages=2` parameter can be any non-negative integer