I've got a weird code-generation/optimization issue. Here's the code involved: ``` let post_bytes = rx.message_buf.len() - sync - 1; #[cfg(not(feature = "use-copy-within"))] for i in 0..post_bytes { rx.message_buf[i] = rx.message_buf[sync + 1 + i]; } #[cfg(feature = "use-copy-within")] rx.message_buf.copy_within(sync + 1.., 0); rx.message_buf.truncate(post_bytes); ``` If I build that code with the `use-copy-within` feature enabled, the `.text` segment grows by 863 bytes on thumbv6m, and a similar amount on thumbv7m. This doesn't make any sense to me, as `copy_within` should just be a `memmove` which is roughly equivalent to the for-loop. If I build similar code on amd64, then `copy_within` results in a small `.text` size decrease, which doesn't surprise me.