News Archive (1999-2012) | 2013-current at LinuxGizmos | Current Tech News Portal |    About   

The “hello world” breakpoint caper

May 25, 2005 — by LinuxDevices Staff — from the LinuxDevices Archive — 2 views

In this entertaining yet edifying essay, Viosoft Founder/CEO Hieu Tran sets out on a quest to debug a bug-report filed against his company's source-level debugger. The case leads Tran and his trusty Pomeranian toward a surprising, Sherlock Holmes-like epiphany illuminating the semantic assumptions made by compilers, debuggers, and, especially, users. Enjoy . . . !


Trust, but VERIFY

Recently, a user reported what appeared to be a bug in the Arriba Embedded Linux Edition source-level debugger. More exacerbating was the fact that this “bug” allegedly occurred while stepping through a variation of a well-used application rumored to have been deployed on every computer in existence: the “HelloWorld” program. Chartered to investigate, I initially fingered GCC as the culprit, but later retracted. This trivial article shares my experience and implores the readers to adopt the prudent Reaganian view of compilers and debuggers.

Below is the modified version of “Hello World” that our user wished to debug. The program was compiled with a version of GCC for the ARM processor, with debug symbols and no optimization. Couldn't be simpler, or not: the user set a breakpoint at line 5, and expected it to hit ten times. It hit exactly once, and ran to completion. This wasn't good.

 1.         #include  
2. main()
3. {
4. int i;
5. for (i = 0; i 10; i++)
6. printf("Hello world (%d)n", i);
7. }

A partial listing of the program generated by objdump is shown. The first thing to notice is that the compiler does not generate code in the same sequential order as the source, even when optimization is turned off. This can be confusing when the user is stepping over one source line at a time, and sees the program execution “jumping” from one line to a previous line, and then back again for no obvious reason. But we're digressing.

main(): 
/tmp/hello.c:3
849c: e1a0c00d mov ip,sp
84a0: e92dd800 stmdb sp!,{fp, ip, lr, pc}
84a4: e24cb004 sub fp,ip, #4 ; 0x4
84a8: e24dd004 sub sp,sp, #4 ; 0x4 /tmp/hello.c:5
84ac: e3a03000 mov r3,#0 ; 0x0
84b0: e50b3010 str r3,[fp,#-16]
84b4: e51b3010 ldr r3,[fp,#-16]
84b8: e3530009 cmp r3, #9 ; 0x9
84bc: da000000 ble 84c4 main+0x28
84c0: ea000006 b 84e0 main+0x44
/tmp/hello.c:6
84c4: e59f001c ldr r0,[pc,#28] ; 84e8 <.text+0x14c>
84c8: e51b1010 ldr r1, [fp, #-16]
84cc: ebffffac bl 8384 .text-0x18
/tmp/hello.c:5
84d0: e51b3010 ldr r3, [fp, #-16]
84d4: e2833001 add r3, r3, #1 ; 0x1
84d8: e50b3010 str r3, [fp, #-16]
84dc: eafffff4 b 84b4 main+0x18
/tmp/hello.c:7

The breakpoint our user set at line 5 corresponds to address 0x84ac. The generated code at this address is explained below. Tread lightly as this could bore you to tears, but I promise a point to be made:

84ac:  mov  r3, #0  ; 0x0       
Load 0 into r3 to initialize the loop index i to 0
84b8: cmp r3, #9 ; 0x9
Compare the index with the terminating condition (i0)
84bc: ble 84c4 main+0x28
...branch to 0x84c4, or line 6 to do the printf, if i is less than or equal to 9
84c0: b 84e0
...branch beyond line 7 if i is greater than 9, effectively terminating the loop

Line 6 loads the arguments into r0 and r1, and calls printf.


84c4: e59f001c ldr r0, [pc, #28] ; 84e8 .text+0x14c
84c8: e51b1010 ldr r1, [fp, #-16]
84cc: ebffffac bl 8384 .text-0x18

Here is where things get interesting (and buggy, or so I thought):

84d0: ldr  r3, [fp, #-16]
Load index i from memory in local stack
84d4: add r3, r3, #1 ; 0x1
Increment it by 1
84d8: str r3, [fp, #-16]
Store it back into the local stack
84dc: b 84b4 main+0x18
Branch back to the start of the loop

The branch at 0x84dc does not take control back to the very beginning of the for-loop at address 0x84ac, where our breakpoint was set. Rather, it puts it two instructions after, causing the breakpoint to effectively hit once, instead of ten times as expected. Aha: darn compiler bug. I elatedly emailed the finding to our user with this parting note: “You may want to report this problem to the ARM GCC maintainer. I expect that the correct behavior can be achieved by moving the line number information generated for line 5 two instructions forward to 0x84b4.” Case closed. I do good work.

The epiphany

Something about the previous conclusion bothered me. These guys (GCC developers) don't make mistakes; certainly not these types of trivial mistakes. For kicks, I ran the same program through the MIPS GCC compiler, and an inspection of the listing revealed similar results. Could this seemingly trivial problem be that pervasive? Conan, my Pomeranian, who if it wasn't for the lack of the opposable thumb would make a killer programmer, pointed out that my recommendation to the user would have caused her to be ridiculed, and possibly shunned by the ARM GCC maintainer and newsgroup. By moving the line number information two instructions forward to 0x84b4, the first breakpoint would have hit after the loop index i is initialized to 0. This was not the intended semantic.

If you think about it, the code at line 5 performs two different L-value assignments, a fancy way of saying that it assigns values to two different variables. While the initialization of the loop index occurs only once, its increment occurs each time through the loop, 10 times to be exact. So a breakpoint at line 5 is at best, semantically confusing. Should the breakpoint be set before the first assignment or after the first but before the second assignment? The choice, in our case, is mandated by the line number information generated by GCC, which chose to do the former. With this finding, I promptly retracted my previous conclusion and instead recommended that the user set the breakpoint at line 6 if she wished for it to stop each time through the loop.

Conclusions

A dear ex-colleague of mine, Glen Geiss, once said: “Happiness is reality minus expectation.” Put another way, an informed user with a proper, well-adjusted expectation of his tools is a happy and productive user. The C-language for-loop is so well used that it's often easy to overlook subtleties, as did our user and I when putting the loop through a debugger. By shedding light on a trivial incident, this article hopes to create awareness in the embedded developer of the three-way semantic gap that seemingly exist among himself, his compiler, and debugger.


About the author: Hieu T. Tran is the founder and chief grunt at Viosoft Corporation. He can be contacted at htran [at] viosoft.com.


 
This article was originally published on LinuxDevices.com and has been donated to the open source community by QuinStreet Inc. Please visit LinuxToday.com for up-to-date news and articles about Linux and open source.



Comments are closed.