I've got a weird code-generation/optimization issue. Here's the code involved:
```
            let post_bytes = rx.message_buf.len() - sync - 1;
            #[cfg(not(feature = "use-copy-within"))]
            for i in 0..post_bytes {
                rx.message_buf[i] = rx.message_buf[sync + 1 + i];
            }
            #[cfg(feature = "use-copy-within")]
            rx.message_buf.copy_within(sync + 1.., 0);
            rx.message_buf.truncate(post_bytes);
```

If I build that code with the `use-copy-within` feature enabled, the `.text` segment grows by 863 bytes on thumbv6m, and a similar amount on thumbv7m. This doesn't make any sense to me, as `copy_within` should just be a `memmove` which is roughly equivalent to the for-loop.

If I build similar code on amd64, then `copy_within` results in a small `.text` size decrease, which doesn't surprise me.