* Optimization question: both do the exact same thing to the best of my knowledge. However, one is 7x bigger than the other.

(it gets the newest 100 elements from a 256 element ring buffer)

The below take 101 lines of assembly

```
    let foo = [0u16; 256];
    let start_index = 10;
    for b in foo.iter().rev().cycle().skip(255-start_index as usize).take(100) {
        let _ = (&raw const b).read_volatile();
    }
```

The below take 15 lines of assembly

```
    let foo = [0u16; 256];
    let start_index = 10;
    let end_index = start_index + 257 - 100;
    let start_index = start_index + 257;

    for i in (end_index..start_index).rev() {
        let b = foo[i&0xFF];
        let _ = (&raw const b).read_volatile();
    }
```

godbolt flags

```
-C
opt-level=z
--target=thumbv6m-none-eabi
```

Is this reasonable? I feel like I must be missing something. As if maybe I was implicitly asking for bounds checking or something.