* Optimization question: both do the exact same thing to the best of my knowledge. However, one is 7x bigger than the other. (it gets the newest 100 elements from a 256 element ring buffer) The below take 101 lines of assembly ``` let foo = [0u16; 256]; let start_index = 10; for b in foo.iter().rev().cycle().skip(255-start_index as usize).take(100) { let _ = (&raw const b).read_volatile(); } ``` The below take 15 lines of assembly ``` let foo = [0u16; 256]; let start_index = 10; let end_index = start_index + 257 - 100; let start_index = start_index + 257; for i in (end_index..start_index).rev() { let b = foo[i&0xFF]; let _ = (&raw const b).read_volatile(); } ``` godbolt flags ``` -C opt-level=z --target=thumbv6m-none-eabi ``` Is this reasonable? I feel like I must be missing something. As if maybe I was implicitly asking for bounds checking or something.