为什么printf使用浮点和整数格式说明符打印随机值

我在64位计算机上写了一个简单的代码

int main() {
    printf("%d", 2.443);
}

So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.

有趣的是,每次执行此程序时,打印的值都会更改。那到底是怎么回事?我希望它每次都打印相同的垃圾值,而不是每次都打印不同的值。

评论
  • 宇智波~
    宇智波~ 回复

    这个答案试图解决一些变异的原因。这是丹尼尔·菲舍尔(Daniel Fischer)的回答和对此的一些评论的后续措施。

    As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.

    地址空间布局随机化(ASLR)是其中之一:操作系统故意随机地重新布置一些内存,以防止恶意软件得知要使用的地址。我不知道Linux 3.4.4-2是否具有此功能。

    Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.

    There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.

    So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.

    Here is an experiment: After the initial printf, insert another printf with this code:

    printf("%d\n", 2.443);
    int a;
    printf("%p\n", (void *) &a);
    

    The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.

    If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.

    自然,这些都不对编写可靠的程序有用。关于程序加载和初始化的工作方式只是教育性的。

  • zvelit
    zvelit 回复

    当然,传递不符合格式的参数是不确定的行为,因此该语言无法告诉我们输出为何更改。我们必须查看实现,它产生什么代码以及可能的操作系统。

    我的设置不同于您的设置,

    Linux 3.1.10-1.16-桌面x86_64 GNU / Linux(openSuSE 12.1)

    使用gcc-4.6.2。但是它非常相似,因此有理由怀疑相同的机制。

    Looking at the generated assembly (-O3, out of habit), the relevant part (main) is

    .cfi_startproc
    subq    $8, %rsp             # adjust stack pointer
    .cfi_def_cfa_offset 16
    movl    $.LC1, %edi          # move format string to edi
    movl    $1, %eax             # move 1 to eax, seems to be the number of double arguments
    movsd   .LC0(%rip), %xmm0    # move the double to the floating point register
    call    printf
    xorl    %eax, %eax           # clear eax (return 0)
    addq    $8, %rsp             # adjust stack pointer
    .cfi_def_cfa_offset 8
    ret                          # return
    

    If instead of the double, I pass an int, not much changes, but that significantly

    movl    $47, %esi            # move int to esi
    movl    $.LC0, %edi          # format string
    xorl    %eax, %eax           # clear eax
    call    printf
    

    I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.

    So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.

    Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.