> my exp so far is that embassy scales quite badly in regards of flash footprint and latency

that's surprising. the design
- Are you comparing Embassy async vs RTIC async, or vs RTIC raw interrupt handlers? 
- have you followed the steps here? https://embassy.dev/book/#_how_can_i_optimize_the_speed_of_my_embassy_stm32_program