ARM WinCE용 어셈 샘플 이그잼플 :: 전산쟁이의 카피질

출처 : http://blogs.arm.com/software-enablement/155-how-to-call-a-function-from-arm-assembler/

번역 : 빵빵빵

오역이 있을 수 있으므로 100% 믿지 마시고... 지적 바랍니다.

How to Call a Function from ARM Assembler

어셈블러에서 함수 호출하기

Posted by ARM_DaveB,

LEAVE COMMENT

26 February 2010(원저작자입니다.)

Once you move beyond short sequences of optimised ARM assembler, the next likely step will be to managing more complex, optimised routines using macros and functions. Macros are good for short repeated sequences, but often quickly increase the size of your code. As lower power and smaller code sizes are often closely tied, it is not long before you will need to make effective and efficient use of the processor by calling functions from your carefully hand-crafted code.

한번 최적화된 ARM 어셈블러로 짧은 코드를 해보면, 다음은 매크로와 함수들을 사용해서 좀더 복잡한 코드를 만들어 보고 싶어지게 된다. 매크로는 반복되는 코드들을 간단하게 보이게 하기 위해 아주 좋다. 그러나 코드를 무지막지하게 증가시켜버릴 수도 있(으므로 주의해야 한)다.(주. 매크로는 실제로 조금 복잡한 코드를 한 단어 수준으로 간단하게 보여주는 것이므로 매크로를 여러번 사용하면 화면상에는 간단하게 보이지만 어셈블러는 원래의 복잡한 코드들을 여러번 사용한 것으로 보고 어셈블하므로 코드 사이즈가 커질수 밖에 없다.)

Leaving, only to Return
반환값만 남기다??

To start, here is a small example in ARM Assembler with one function calling another.

이제, 간단한 ARM 어셈블러 예제를 보겠는데, 다른 함수를 호출하는 예제이다.

CODE

.globl main
.extern abs
.extern printf

.text
output_str:
.ascii "The answer is %d\n\0"

@ returns abs(z)+x+y
@ r0 = x, r1 = y, r2 = z
.align 4
do_something:
push {r4, lr}
add r4, r0, r1
mov r0, r2
bl abs
add r0, r4, r0
pop {r4, pc}

main:
push {ip, lr}
mov r0, #1
mov r1, #3
mov r2, #-4
bl do_something
mov r1, r0
ldr r0, =output_str
bl printf
mov r0, #0
pop {ip, pc}

The interesting instructions, at least when we are talking about the link register and the stack, are push, pop andbl. If you are familiar with other assembler languages, then I suspect push and pop are no mystery. They simply take the provided register list and push them onto the stack - or pop them off and into the provided registers. bl, as you may have guessed, is no more than branch with link, where the address of the next instruction after the branch is loaded into the link register lr. Once the routine we are calling has been executed, lr can be copied back to pc, which will enable the CPU to continue from the code after the bl instruction

link register와 스텍에 대해 이야기할 때, 중요한 인스트럭션(명령)은 push, pop, bl 이다. 만일 다른 어셈블리언어에(x86 같은??? PC용 어셈블러) 익숙하다면, push와 pop은 동일하다고 생각한다. 이 명령들은 레지스터 리스트에 들어오는 값을 스텍에 저장하거나(push), 스텍에 있는 값을 레지스터로 꺼내온다(pop). 추측하고 있을지 모르겠지만... bl은 다음 인스트럭션(명령)의 주소를 link register(lr)에 저장하고 link로 분기한다. 한번 호출한 루틴이 실행되면, bl 인스트럭션 명령 실행 뒤의 명령을 CPU가 계속 실행 될 수 있게 lr(link register)은 pc(program counter?? 현재 실행하는 인스트럭션의 위치-주소) 복사된다.

In do_something we push the link register to the stack, so that we can pop it back off again to return, even though the call to abs will have overwritten the original contents of the link register. The program stores r4, because the ARM procedure call standard specifies that r4-r11 must be preserved between function calls and that the called function is responsible for that preservation. This means both that do_something needs to preserve the result of r0 + r1 in a register that will not be destroyed by abs, and that We must also preserve the contents of whichever register we use to hold that result. Of course in this particular case, we could have just used r3, but it is something that needs to be considered.

=>do_something 에서 link register(lr)를 스택에 저장했다. 중간에 abs를 호출해서 원래의 link register 값이 변경되더라도 리턴해서 되돌아 올때 pop해서 원래의 주소로 되돌아 올 수 있다. 프로그램이 r4에 저장한다. ARM 명령어 호출 표준 스팩에 r4~r11은 함수 호출과 호출된 함수의 응답에 대한 보존값을 저장하도록 정의되어 있다. 이것은 do_something은 r0 + r1의 결과 abs 호출에 의해 없어지지 않게 레지스터에 저장한다는 의미이다. 그리고 원하는 결과를 보존할 수 있다. 물론 r3 레지스터를 쓰고 싶으면 쓸 수는 있지만.. 잘 생각해서!!! 문제 없을지!!를 잘 생각하고 써야한다.

We push and pop the ip register, even though we do not have to preserve it, because the procedure call standard requires that the stack be 64-bit aligned. This gives a performance benefit when using the stack operations as they can take advantage of 64-bit data paths within the CPU.

=> 명령어 호출 표준(procedure call standard)가 64비트로 정렬된 스택을 요구하기 때문에, ip 레지스터가 (값이) 필요없다 하더라도, push 하고 pop 한다.

We could just push the value, after all if abs needs the register, then that is how it will preserve it. There is a minor performance case for pushing r4 rather than the value we know we will need, but the strongest argument is probably that just pushing/popping any registers you need at the start and end of the function makes for less error prone and more readable code.

You will also notice that the 'main' function also pushes and pops the contents of lr. That is because while the main code may be the first thing in my code to be executed, it is not the first thing to be executed when my program is loaded. The compiler will be insert calls to some basic setup functions before main is called, and to some final clean up calls for when we exit.

The Special Case of Windows CE

Windows CE uses a technique known as Structured Exception Handling to unwind the stack when an exception occurs. This requires anyone writing assembler code to take notice of some additional restrictions when implementing for that OS. Coding examples are available on MSDN, and should be consulted, but the general idea is that there should be no changes to the value of sp other than as the very first and very last instructions in your function. If you perform a stack push or pop at any other point the virtual unwinder can cause your application some very non-virtual trouble.

Passing on

It is almost certainly worth your time becoming familiar with the details of the ARM Procedure Call Standard but apart from the list of registers that need to be preserved that was covered earlier it is probably worth quickly covering the passing in of parameters and the returning of results.

The first four 32-bit values are passed in the registers r0-r3. If one of the parameters is 64 bits long, then either r0and r1 or r2 and r3 will be used - but not r1 and r2. The endianness used is officially defined to be "as if the value had been loaded from memory representation with a single LDM instruction". Rather than looking up what that means, I would suggest simply writing some code to test it. If there are more parameters than will fit in r0-r3, then the last of the values are written to the stack before the function is called.

Results are returned in r0, or r0 and r1 if it requires 64-bits. Check the link above for more detailed information, but that should cover most cases.

Need for Speed?

One important thing to remember when working with the link register is that the latest ARM processors provideReturn Stack Prediction in addition to normal branch prediction. If the processor comes across an instruction likepop {...,pc} or bx lr it will try to 'branch predict' the return. This allows the processor to successfully predict return branches when common code is called from many points and normal branch prediction techniques could not be used. On processors with longer pipelines this can be a useful optimisation. To make use of it from your assembler code you need to follow some simple guidelines:

Do

Use instructions like pop {pc} when you are returning normally
Use b instead of bl or blx if you do not expect to return to execute the next instruction
Use blx when calling code indirectly (using a value in a register) rather than loading directly to pc

추가 : http://recipes.egloos.com/4988629

Register 사용법을 총칭하여

PCS (Procedure Call Standard)라고 부르고요,

APCS : ARM Procedure Call Standard (구버전)

TPCS : Thumb Procedure Call Standard (구버전)

ATPCS : ARM-Thumb Procedure Call Standard (AAPCS의 선배)

AAPCS : Procedure Call Standard for ARM Architecture (현재 최신버전)

라고 이름 붙였네요. 결국 지금 사용되는 Procedure Call Standard (Register 사용법)은 이름 하야 AAPCS라고 부르는 게 맞겠습니다.

이 용법을 지금 잘 알아두면, 이후에 함수의 구조, Stack의 사용 등을 이해하기 쉬우니까, 꼭 알아 두셔야 해요.

AAPCS에 의한 각 Register의 사용법은 Table과 같습니더.