自写的简单memset无法在ARMv7上与-03 eabi gcc一起使用

我在c中编写了一个非常简单的memset,它在-O2之前都可以正常工作,但在-O3下却不能工作...

记忆集:

void * memset(void * blk, int c, size_t n)
{
    unsigned char * dst = blk;

    while (n-- > 0)
        *dst++ = (unsigned char)c;

    return blk;
}

...在使用-O2时编译为该程序集:

20000430 <memset>:
20000430:       e3520000        cmp     r2, #0                  @ compare param 'n' with zero
20000434:       012fff1e        bxeq    lr                      @ if equal return to caller
20000438:       e6ef1071        uxtb    r1, r1                  @ else zero extend (extract byte from) param 'c'
2000043c:       e0802002        add     r2, r0, r2              @ add pointer 'blk' to 'n'
20000440:       e1a03000        mov     r3, r0                  @ move pointer 'blk' to r3
20000444:       e4c31001        strb    r1, [r3], #1            @ store value of 'c' to address of r3, increment r3 for next pass
20000448:       e1530002        cmp     r3, r2                  @ compare current store address to calculated max address
2000044c:       1afffffc        bne     20000444 <memset+0x14>  @ if not equal store next byte
20000450:       e12fff1e        bx      lr                      @ else back to caller

这对我来说很有意义。我注释了这里发生的事情。

当我用-O3编译时,程序崩溃。我的内存集反复调用自己,直到它吞噬了整个堆栈:

200005e4 <memset>:
200005e4:       e3520000        cmp     r2, #0                  @ compare param 'n' with zero
200005e8:       e92d4010        push    {r4, lr}                @ ? (1)
200005ec:       e1a04000        mov     r4, r0                  @ move pointer 'blk' to r4 (temp to hold return value)
200005f0:       0a000001        beq     200005fc <memset+0x18>  @ if equal (first line compare) jump to epilogue
200005f4:       e6ef1071        uxtb    r1, r1                  @ zero extend (extract byte from) param 'c'
200005f8:       ebfffff9        bl      200005e4 <memset>       @ call myself ? (2)
200005fc:       e1a00004        mov     r0, r4                  @ epilogue start. move return value to r0
20000600:       e8bd8010        pop     {r4, pc}                @ restore r4 and back to caller

I can't figure out how this optimised version is supposed to work without any strb or similar. It doesn't matter if I try to set the memory to '0' or something else so the function is not only called on .bss (zero initialised) variables.

(1)这是一个问题。当函数由于n为零而没有提前退出时,此推送会不断重复,而没有匹配的弹出消息,如(2)所称。我用uart打印验证了这一点。同样,r2从未被触及过,那么为什么零比较会变成真实?

请帮助我了解这里的情况。编译器是否假设我可能不满足的先决条件?

背景:我在裸机项目中使用需要memset的外部代码,所以我自己滚动了代码。它仅在启动时使用一次,对性能没有要求。