CHAPTER
Software Porting
24
CHAPTER OUTLINE 24.1 Overview....................................................................................................... 772 24.2 Porting software from 8-bit/16-bit MCUs to CortexÒ-M MCUs ........................... 772 24.2.1 Architectural differences ........................................................... 772 24.2.2 Common modifications.............................................................. 774 24.2.3 Memory size requirements......................................................... 775 24.2.4 Non-applicable optimizations for 8-bit or 16-bit microcontrollers.. 776 24.2.5 Example e migrate from 8051 to ARMÒ CortexÒ-M..................... 777 Vector table................................................................................... 777 Data type ...................................................................................... 777 Interrupt ....................................................................................... 779 Sleep mode................................................................................... 779 24.3 Porting software from ARM7TDMIÔ to CortexÒ-M3/M4..................................... 779 24.3.1 Overview of the hardware differences.......................................... 779 Memory map ................................................................................ 779 Interrupts...................................................................................... 782 MPU............................................................................................. 782 System control .............................................................................. 783 Operation modes........................................................................... 783 Differences between FIQ and non-maskable interrupt .................... 783 24.3.2 Assembly language files ............................................................ 783 Thumb state ................................................................................. 784 ARM state..................................................................................... 784 24.3.3 C language files ........................................................................ 786 24.3.4 Pre-compiled object files and libraries ........................................ 787 24.3.5 Optimization ............................................................................ 787 24.4 Porting software between different CortexÒ-M processors................................ 788 24.4.1 Differences between different CortexÒ-M processors .................... 788 Instruction set ............................................................................... 788 IT instruction block ....................................................................... 788 Exclusive access instructions......................................................... 790 Programmer’s model ..................................................................... 790 NVIC............................................................................................. 792 System-level features .................................................................... 792
The Definitive Guide to ARMÒ CortexÒ-M3 and Cortex-M4 Processors. http://dx.doi.org/10.1016/B978-0-12-408082-9.00024-5 Copyright Ó 2014 Elsevier Inc. All rights reserved.
771
772
CHAPTER 24 Software Porting
Low power features ....................................................................... Debug and trace features .............................................................. 24.4.2 Required software changes ........................................................ 24.4.3 Embedded OS .......................................................................... 24.4.4 Creating portable program code for CortexÒ-M processors.............
792 792 792 795 797
24.1 Overview Software porting is a common task for many software engineers. Even if the source code of a project is written in C, there can still be fair amount of work when porting the code due to: • • • •
Different peripherals Different memory map Different ways of handling interrupts Tool chain specific C language extensions
CMSIS-Core and the architecture consistency between various CortexÒ-M processors make software migration between different Cortex-M devices much easier. However, very often we also need to port software from other architectures to ARMÒ Cortex-M, or from classic ARM processors such as ARM7TDMIÔ to Cortex-M. In this chapter, we will cover these areas.
24.2 Porting software from 8-bit/16-bit MCUs to CortexÒ-M MCUs 24.2.1 Architectural differences There are many architectural differences between common 8-bit/16-bit architectures and ARMÒ architectures. For example, the size of data types can be different, as shown in Table 24.1.
Table 24.1 Data Size Comparison between ARM and 8-bit/16-bit Microcontrollers Data Type
8-bit/16-bit Microcontrollers
ARM Architecture
char short int integer pointers float double
8-bits 16-bits 16-bits 8/16/24-bits 32-bits 32-bits
8-bits 16-bits 32-bits 32-bits 32-bits 64-bits
24.2 Porting software from 8-bit/16-bit MCUs to CortexÒ-M MCUs
The differences can affect the program code in various ways, such as integer overflow behavior, program size, etc. For example, a program with an integer array might need to change in order to retain the same memory size for the array: const int mydata = {0x1234, 0x2345 ..};
might change to: const short int mydata = {0x1234, 0x2345 ..};
For floating point handling, if you want to retain 32-bit precision you might need to change the code to make sure that the floating point operations are all single precision, especially if you want to take advantage of the floating point unit in the CortexÒ-M4 processor. For example, the code: X=T*atan(T2*sin(X)*cos(X)/(cos(X+Y)+cos(X-Y)-1.0));
Should be changed to: X=T*atanf(T2*sinf(X)*cosf(X)/(cosf(X+Y)+cosf(X-Y)-1.0F));
Alternatively you can choose to use double-precision calculation if a higher accuracy benefits your application. However, this can increase code size and execution time. Another area of difference when comparing against 8-bit and 16-bit architectures is how data are stored in memory. The first one is data alignment: In an 8-bit processor, the memory system is 8-bits wide and there is no data alignment concern. However, in ARM Cortex-M microcontroller systems the memory is 32-bit, so a piece of data can be aligned or unaligned (see section 6.6 and Figure 6.6). By default, a C compiler does not generate unaligned data. If a data structure is defined with elements of various sizes, it might need to insert padding space to keep data elements aligned (Figure 24.1).
FIGURE 24.1 Padding space can be present in structure
773
774
CHAPTER 24 Software Porting
• •
The padding space in a structure can have several different impacts. For example: The total data memory size could increase if you have an array of structures. Data structure code with hardcoded address offsets for data elements might fail. Memory copy code with hardcoded structure sizes might fail.
In general, program code that is written in a portable way (e.g., using “sizeof()” instead of hard coding the size) can avoid most of the issues. You might also want to rearrange the elements inside the structure to avoid the extra padding space. The second area related to data storage is about the way that local variables are stored. Some 8-bit architectures place local variables in static memory locations in the SRAM if all the registers are used. In the ARM architecture, local variables are typically placed in the stack memory if all the registers are used. Since each time a function is called the stack could be at a different address, the local variables do not have a static memory location. The advantage of using the stack for local variable is that if the function is not active, its local variables do not take up memory space. However, some of the debugging techniques that rely on local variables having a static location will not work. For those cases, you might need to add the “static” keyword when declaring the local variable, or change it to a global variable.
24.2.2 Common modifications When porting applications from these microcontrollers to the CortexÒ-M, modifications to the software typically involve: •
•
•
•
Startup code and vector table e Different processor architectures have different startup code and interrupt vector tables. Usually the startup code and the vector table will have to be replaced. Stack allocation adjustment e With the Cortex-M processors, the stack size requirement can be very different from an 8-bit or 16-bit architecture. In addition, the methods to define stack location and stack size are also different from 8-bit and 16-bit development tools. Architecture specific/toolchain-specific C language extensions e Many of the C compilers for 8-bit and 16-bit microcontrollers support a number of C language extensions. These include special data types like Special Function Registers (SFR) and bit data in 8051, or various “#pragma” statements in various C compilers. Interrupt control e In 8-bit and 16-bit microcontroller programming, the interrupt configuration is usually done by writing directly to interrupt control registers. When porting the applications to the ARMÒ Cortex-M processor family, such code should be converted to use the CMSIS-Core interrupt control functions. For example, enable and disable of interrupts can be converted to “__enable_irq()” and “__disable_irq().” Configuration of individual interrupts can be handled by various NVIC functions in CMSIS-Core.
24.2 Porting software from 8-bit/16-bit MCUs to CortexÒ-M MCUs
•
•
•
•
•
•
Peripheral programming e In 8-bit and 16-bit microcontroller programming, peripheral control is usually handled by programming the registers directly. When using ARM microcontrollers, many microcontroller vendors provide device-driver libraries to make using the microcontroller easier. You can use these library functions to reduce software development time, or write to the hardware registers directly if preferred. If you prefer to program the peripherals by accessing the registers directly, it is still beneficial to use the header files in the device driver library, as these have all the peripheral registers defined and can save you time preparing and validating the code. Assembly code and inline assembly e Obviously all the assembly and inline assembly code will need to be rewritten. In most cases, you can rewrite the required function in C when the application is ported to Cortex-M microcontrollers. Unaligned data e Some 8-bit or 16-bit microcontrollers might support unaligned data. In normal situations, C compilers do not generate unaligned data unless we use the __packed attribute when declaring data. Unaligned data handling is less efficient than aligned data in Cortex-M3 and Cortex-M4 and is not supported in Cortex-M0/M0þ. As a result, some data structure definitions or pointer manipulation code might need to be changed for better portability and efficiency. If necessary, we can still apply the __packed attribute in data structures to support unaligned data elements inside. Adjustment of code due to data size differences e As described in section 24.2.1, integers in most 8-bit and 16-bit processors are 16-bit, while in ARM architectures integers are 32-bit. For example, when porting a program file from these processors to the ARM architecture, we might want to change “int” in the code to use “short int” or “int16_t” (in “stdint.h,” introduced in C99) so that the size remains unchanged. Floating point e As described in section 24.2.1, a program which uses floating point calculations might need to be modified when porting from 8-bit/16-bit architecture to the ARM architecture. Adding fault handlers e In many 8-bit and 16-bit microcontrollers, there are no fault exceptions. While embedded applications can operate without any fault handlers, adding fault handlers can help an embedded system to recover from error conditions (e.g., data corruption caused by voltage drop or electromagnetic interference).
24.2.3 Memory size requirements One of the areas mentioned in section 24.2.2 is the stack memory. After porting to the ARMÒ architecture, the required stack size could increase or decrease, depending on the application. The stack size might increase because: •
Each register push takes 4 bytes of memory in ARM, while in 16-bit or 8-bit each register push take 2 bytes or 1 byte.
775
776
CHAPTER 24 Software Porting
•
In ARM programming, local variables are often stored in the stack, while in some architectures local variables might be defined in a separate data memory area. On the other hand, the stack size could decrease because:
•
•
With 8-bit or 16-bit architecture, multiple registers are required to hold large data items and often these architectures have fewer registers compared to ARM, so more stacking would be required. More powerful addressing modes in ARM means address calculations can be carried out on the fly without taking up register space. The reduction of register use for an operation can reduce the stacking requirement.
Overall, the total RAM size required could decrease significantly after porting because most local variables do not take up SRAM space when the function is not active. Also, with more registers available in the ARM processor’s register bank compared to some other architectures, some of the local variables might only need to be stored in the register bank instead of taking up memory space. Depending on the application types, the program memory requirement in ARM CortexÒ-M is often lower than 8-bit microcontrollers and most 16-bit microcontrollers. So when you port your applications from these microcontrollers to an ARM Cortex-M microcontroller, you might be able to use a microcontroller device with smaller flash memory size. The reduction of the program memory size is often caused by: • • •
More efficient handling of 16-bit and 32-bit data (including integers, pointers) More powerful addressing modes Some memory access instructions can handle multiple data, including PUSH and POP
There can be exceptions e for applications that contain only a small amount of code, the code size in ARM Cortex-M microcontrollers could be larger compared to 8-bit or 16-bit microcontrollers because: • •
A Cortex-M microcontroller might have a much larger vector table due to more interrupts. The C startup code for a Cortex-M processor might be larger. If you are using ARM development tools like KeilÔ MDK or Development Suite 5 (DS-5Ô ), switching to the MicroLIB run-time library might help to reduce the code size.
24.2.4 Non-applicable optimizations for 8-bit or 16-bit microcontrollers Some optimization techniques used in 8-bit/16-bit microcontroller programming are not required on ARMÒ processors. In some cases, these optimizations might result in extra overhead due to architecture differences. For example, many 8-bit
24.2 Porting software from 8-bit/16-bit MCUs to CortexÒ-M MCUs
microcontroller programmers use byte variables as loop counters for array accesses: unsigned char i; /* use 8-bit data to avoid 16-bit processing */ char a[10], b[10]; for (i=0;i<10;i++) a[i] = b[i];
When compiling the same program on ARM processors, the compiler will have to insert a UXTB instruction to replicate the overflow behavior of the array index (“i”). To avoid this extra overhead we should declare “i” as integer “int,” “int32_t” or “uint32_t” for best performance. Another example is the unnecessary use of casting. For example, the following code uses casting to avoid the generation of a 16x16 multiply operation in an 8-bit processor: unsigned int x, y, z; z = ((char) x) * ((char) y); /* assumed both x and y must be less than 256 */
Again, such a casting operation will result in extra instructions in the ARM architecture. Since CortexÒ-M processors can handle a 32x32 multiply with 32-bit result in a single instruction, the program code can be simplified into: unsigned int x, y, z; z = x * y;
24.2.5 Example e migrate from 8051 to ARMÒ CortexÒ-M In general, since most applications can be programmed entirely in C on the Cortex-M microcontrollers, the porting of applications from 8-bit/16-bit microcontrollers is usually straightforward and easy. Here we will see some simple examples of the modifications required.
Vector table In the 8051, the vector table contains a number of JMP instructions that branch to the start of the interrupt service routines (as shown in left hand side of table 24.2). In some development environments, the compiler might create the vector table for you automatically. In ARMÒ, the vector table contains the initial value of the main stack pointer and starting addresses of the exception handlers (right hand side of table 24.2). The vector table is part of the startup code, which is often provided by the development environment. For example, when creating a new project in the KeilÔ MDK project wizard, it will offer to copy and add the default startup code, which contains the vector.
Data type In some cases, we need to modify the data type so as to maintain the same program behavior, as shown in Table 24.3.
777
778
8051
Cortex-M
org 00h jmp start org 03h ; Ext Int0 vector ljmp handle_interrupt0 org 0Bh ; Timer 0 vector ljmp handle_timer0 org 13h ; Ext Int1 vector ljmp handle_interrupt1 org 1Bh ; Timer 1 vector ljmp handle_timer1 org 23h ; Serial interrupt ljmp handle_serial0 org 2bh ; Timer 2 vector ljmp handle_timer2
__Vectors DCD __initial_sp ; Top of Stack DCD Reset_Handler ; Reset Handler DCD NMI_Handler ; NMI Handler DCD HardFault_Handler ; Hard Fault DCD MemManage_Handler ; MPU Fault DCD BusFault_Handler ; Bus Fault DCD UsageFault_Handler; Usage Fault DCD 0,0,0,0 ; Reserved DCD SVC_Handler ; SVCall Handler DCD 0,0 ; Reserved DCD PendSV_Handler ; PendSV Handler DCD SysTick_Handler ; SysTick Handler ; External Interrupts DCD WWDG_IRQHandler ; Window WatchDog .
Table 24.3 Data Type Change during Software Porting 8051
Cortex-M
int my_data[20]; // array of 16-bit values
short int my_data[20]; // array of 16-bit values
CHAPTER 24 Software Porting
Table 24.2 Vector Table Porting
24.3 Porting software from ARM7TDMIÔ to CortexÒ-M3/M4
Some function calls might also need to be changed if we want to ensure only single precision floating point is used, as shown in Table 24.4. Some special data types in 8051 are not available on the Cortex-M: bit, sbit, sfr, sfr16, idata, xdata, bdata. They are compiler-specific and are not supported on the ARM architecture.
Interrupt Interrupt control code in 8051 is normally written using direct accesses to SFRs. They need to be changed to CMSIS-Core functions when porting to ARM Cortex-M microcontrollers, as shown in Table 24.5. The interrupt service routine will also require minor modifications. Some of the special directives used by an interrupt service routine will need to be removed when the application code is ported to a Cortex-M microcontrollers, as shown in Table 24.6.
Sleep mode Entering sleep mode is different too. In 8051, sleep mode can be entered by setting the IDL (idle) bit in PCON. In Cortex-M, you can use vendor-specific functions provided in the device-driver library, or use the WFI instruction directly as shown in Table 24.7 (but this will not give you the best low power optimization).
24.3 Porting software from ARM7TDMIÔ to CortexÒ-M3/M4 24.3.1 Overview of the hardware differences The ARM7TDMIÔ is a very successful and popular processor for microcontrollers. Currently it is still shipping in large volumes, and is used by many designers. In some cases some of these designers decided to migrate from ARM7TDMI to CortexÒ-M microcontrollers. There are a number of characteristic differences between ARM7-based systems and Cortex-M3/M4-based systems (e.g., memory map, interrupts, Memory Protection Unit [MPU], system control, and operation modes).
Memory map The most obvious target of modification in porting programs between different microcontrollers is their memory map differences. In the ARM7Ô , memory and peripherals can be located at almost any address, whereas the Cortex-M3 and Cortex-M4 processors have a predefined memory map. Memory address differences are usually resolved at the compile and link stages. Peripheral code porting could be more time consuming because the programmer’s model for the peripheral could be completely different. In that case, device-driver code might need to be completely rewritten, or alternatively, the code changed to use new device driver library code from the microcontroller vendors.
779
780
8051
Cortex-M
Y=T*atan(T2*sin(Y)*cos(Y)/(cos(X+Y)+cos(X-Y)-1.0));
Y=T*atanf(T2*sinf(Y)*cosf(Y)/(cosf(X+Y) +cosf(X-Y)-1.0F));
Table 24.5 Interrupt Control Change during Software Porting 8051
Cortex-M
EA = 0; /* Disable all interrupts */ EA = 1; /* Enable all interrupts */
__disable_irq(); /* Disable all interrupts */ __enable_irq(); /* Enable all interrupts */
EX0 = 1; /* Enable Interrupt 0 */ EX0 = 0; /* Disable Interrupt 0 */ PX0 = 1; /* Set interrupt 0 to high priority*/
NVIC_EnableIRQ(Interrupt0_IRQn); NVIC_DisableIRQ(Interrupt0_IRQn); NVIC_SetPriority(Interrupt0_IRQn, 0);
CHAPTER 24 Software Porting
Table 24.4 Floating Point C Code Change during Software Porting
24.3 Porting software from ARM7TDMIÔ to CortexÒ-M3/M4
Table 24.6 Interrupt Handler Change during Software Porting 8051
Cortex-M
void timer1_isr(void) interrupt 1 using 2 { /* Use register bank 2 */ .; return; }
__irq void timer1_isr(void) { .; return; }
Table 24.7 Sleep Mode Control Change During Software Porting 8051
Cortex-M
PCON = PCON j 1; /* Enter Idle mode */
__WFI(); /* Enter sleep mode */
Table 24.8 Mapping of ARM7TDMI Exceptions and Modes to the Cortex-M3 or Cortex-M4 Processor Modes and Exceptions in the ARM7
Corresponding Modes and Exceptions in the Cortex-M3
Supervisor (default) Supervisor (software interrupt) FIQ IRQ Abort (prefetch) Abort (data) Undefined System User
Privileged, Thread Privileged, SVC Privileged, interrupt Privileged, interrupt Privileged, bus fault exception Privileged, bus fault exception Privileged, usage fault exception Privileged, Thread User access (non-privileged), Thread
Many ARM7 products provide a memory remap feature so that the vector table can be remapped to SRAM after boot-up. In the Cortex-M3 or Cortex-M4 microcontrollers, the vector table can be relocated using the VTOR register so that memory remapping is no longer needed. Therefore, the memory remap feature might be unavailable in many Cortex-M3 and Cortex-M4 microcontroller products. Big endian support in the ARM7 is different from such support in the Cortex-M3 and Cortex-M4. Program files can be recompiled to the new big endian system, but hardcoded lookup tables might need to be converted during the porting process. In ARM720T, and some later ARMÒ processors like ARM9Ô , a feature called “high vectors” (or “Hivecs”) is available, which allows the vector table to be
781
782
CHAPTER 24 Software Porting
relocated to 0xFFFF0000. Although it is often used for other purposes, this feature was introduced to support Windows CE and is not available in any of the current Cortex-M processors.
Interrupts The second target is the difference in the interrupt controller being used. Program code for control of the interrupt controller, such as enabling or disabling interrupts, will need to be changed because the NVIC has a different programmer’s model. In addition, new code is required for setting up interrupt priority levels and vector addresses for various interrupts. In most cases you can utilize the NVIC control functions included in CMSIS-Core. This makes your software much more portable. Interrupt wrapper code for nested interrupt handling can be removed. In the Cortex-M processors, the NVIC has built-in nested interrupt handling. The interrupt return method is also changed. This requires modification of interrupt return in assembler code. With the Cortex-M processors, C Handlers can be normal C functions and do not require special compile directives. Enable and disable of interrupts, previously done by modifying Current Program Status Register (CPSR), must be replaced by setting up the Interrupt Mask register. In addition, in the ARM7TDMI, it is possible to re-enable interrupts at the same time as returning from an interrupt handler due to the restoration of CPSR from SPSR (Saved Program Status Register). In the Cortex-M processors, if interrupts are disabled during an interrupt handler by setting PRIMASK, FAULTMASK, or BASEPRI, the mask registers should be cleared manually before interrupt return. Otherwise, the mask registers are still set and interrupts will not be re-enabled. In the Cortex-M3 and Cortex-M4 processors, some registers are automatically saved by the stacking and unstacking mechanisms. Therefore, some of the software stacking operations could be reduced or removed. However, in the case of the Fast Interrupt request (FIQ) handler, traditional ARM cores have separate registers for FIQ (R8eR11). Those registers can be used by the FIQ without the need to push them on to the stack. In the Cortex-M processors, these registers are not stacked automatically, so when an FIQ handler is ported to the Cortex-M processor, either the registers being used by the handler must be changed or a stacking step will be needed. There are also differences in error handling. The Cortex-M3 and Cortex-M4 processors provide various fault status registers so that the cause of faults can be located. In addition, new fault exception types are defined in the Cortex-M processors (e.g., stacking and unstacking faults, memory management faults, and hard faults). Therefore, the fault handlers will need to be rewritten.
MPU The Memory Protection Unit (MPU) will also need code to configure and control it. Microcontroller products based on the ARM7TDMI/ARM7TDMI-S do not have MPUs, so moving the application code to the Cortex-M3 or Cortex-M4 microcontrollers should not be a problem. However, products based on the ARM720T have
24.3 Porting software from ARM7TDMIÔ to CortexÒ-M3/M4
a Memory Management Unit (MMU), which has different functionality to the MPU in Cortex-M3 and Cortex-M4 processors. If an application uses the MMU to support a virtual memory system, it cannot be ported to the Cortex-M3 as the MPU does not support address translation.
System control System control is another key area to look into when you’re porting applications. The Cortex processors have built-in instructions for entering sleep mode. In addition, the device-specific system controller inside Cortex-M3 and Cortex-M4 microcontroller products is likely to be completely different from that of the ARM7 products, so the code that handles system management features will need to be rewritten.
Operation modes In the ARM7, there are seven operation modes; in the Cortex-M3 and Cortex-M4 processors, these have been changed to a different scheme (see Table 24.8). A normal Interrupt Request (IRQ) can be used to replace the FIQ in the ARM7 because in Cortex-M processors, we can configure the priority for any particular interrupt to be the highest; thus it will be able to preempt other exceptions, just like the FIQ in the ARM7. However, due to the difference between banked FIQ registers in the ARM7 and the stacked registers in the Cortex-M processors, the registers being used in the FIQ handler must be changed, or the registers used by the handler must be saved to the stack manually.
Differences between FIQ and non-maskable interrupt Many engineers might expect the FIQ in the ARM7 to be directly mapped to the Non-Maskable Interrupt (NMI) in the Cortex-M processors. In some applications this is possible, but a number of differences between the FIQ and the NMI need special attention when you’re porting applications using the NMI as an FIQ. First, the NMI cannot be disabled, whereas on the ARM7, the FIQ can be disabled by setting the F-bit in the CPSR. So in a Cortex-M system it is possible for an NMI handler to start right at boot-up time, whereas in the ARM7 the FIQ is disabled at reset. Second, in the Cortex-M processors you cannot use SVC in an NMI handler, whereas you can use a software interrupt (SWI) in an FIQ handler on the ARM7. During execution of an FIQ handler on the ARM7, it is possible for other exceptions to take place (except FIQ and IRQ, because the I and F bits are set automatically when the FIQ is served). However, on the Cortex-M processors, a fault exception inside the NMI handler can cause the processor to lock up.
24.3.2 Assembly language files Porting assembly files depends on whether the code is written for ARMÒ state or Thumb state.
783
784
CHAPTER 24 Software Porting
Thumb state If the code is written for Thumb state, porting is much easier. In most cases, the file can be reused without a problem. However, a few Thumb instructions in the ARM7Ô are not supported in the CortexÒ-M3 and Cortex-M4 as follows: • • •
Any code that tries to switch to ARM state. The SWI instruction is replaced by SVC (note that the code for parameter passing and result return will need to be updated). Finally, make sure that the program accesses the stack only in full descending stack operations. It is possible, though uncommon, to implement a different stacking model (e.g., full ascending) on an ARM7.
ARM state The situation for ARM code is more complicated. There are several scenarios as follows: •
•
•
•
•
Vector table: In the ARM7, the vector table starts from address 0x0 and consists of branch instructions. In the Cortex-M processors, the vector table contains the initial value for the stack pointer followed by the reset vector address and then by the addresses of all the other exception handlers. Due to these differences, the vector table will need to be completely rewritten. Normally, the startup code you get from the microcontroller vendors should include the vector table so you do not have to create the vector table yourself. Register initialization: In the ARM7, it is often necessary to initialize the banked registers for different modes. For example, there are banked stack pointers (R13), link registers (R14), and SPSRs for each of the exception modes in the ARM7. Since the Cortex-M processor has a different programmer’s model, the register initialization code will have to be changed. In fact, the register initialization code on the Cortex-M processors will be much simpler because there is no need to switch the processor into a different mode. In most simple applications without an OS, you can just use the Main Stack Pointer for the whole project, so you do not have to initialize multiple stack pointers as in ARM7. Mode switching and state switching codes: Since the operating mode scheme in the Cortex-M processors is different from that of the ARM7, the code for mode switching needs to be removed or changed. The same applies to ARM/Thumb state switching code. Interrupt enabling and disabling: In the ARM7, IRQ interrupts can be enabled or disabled by clearing or setting the I-bit in the CPSR. In the Cortex-M processors, this is done by clearing or setting an Interrupt Mask register, such as PRIMASK or FAULTMASK. Furthermore, there is no F-bit in the Cortex-M processors because there is no FIQ input. Coprocessor accesses: There is no coprocessor support on the current range of Cortex-M processors, so this kind of operation cannot be ported.
24.3 Porting software from ARM7TDMIÔ to CortexÒ-M3/M4
•
•
•
•
•
•
•
Interrupt handler and interrupt return: In the ARM7, the first instruction of the interrupt handler is in the vector table, which normally contains a branch instruction to the actual interrupt handler. In the Cortex-M processors, this step is no longer needed. For interrupt returns, the ARM7 relies on manual adjustment of the return program counter. In the Cortex-M processors, the correctly adjusted program counter is saved to the stack and the interrupt return is triggered by loading the special value EXC_RETURN into the program counter. Instructions such as MOVS and SUBS should not be used as interrupt returns on the Cortex-M processors. Because of these differences, interrupt handlers and interrupt return codes need modification during porting. Because you can use normal C functions for interrupt handling, it might be easier to recode the interrupt handlers in C. Nested interrupt support code: In the ARM7, when a nested interrupt is needed, usually the IRQ handler will need to switch the processor to system mode or SVC mode before re-enabling interrupts. This is not required in the Cortex-M processors. FIQ handler: If an ARM7 FIQ handler is to be ported to a Cortex-M interrupt, you might need to add an extra step to save the contents of R8eR11 to stack memory. In the ARM7, R8eR12 are banked, so the FIQ handler can skip the stack push for these registers. However, on the Cortex-M processors, R0eR3 and R12 are saved on to the stack automatically, but R8eR11 are not. SWI handler: The SWI instruction is replaced by SVC. However, when porting a SWI handler to SVC, the code to extract the parameters passed with the SWI instruction needs to be updated. The calling SVC instruction address can be found in the stacked PC, which is different from the SWI in the ARM7, where the program counter address has to be determined from the link register. SWP instruction (swap): There is no Swap Instruction (SWP) in the Cortex-M processors. If SWP was used for semaphores, they will need to be recoded using the exclusive access instructions. This requires rewriting the semaphore code. If the instruction was used purely for data transfers, this can be replaced by multiple memory access instructions. Access to CPSR and SPSR: The CPSR in the ARM7 is replaced with combined Program Status registers (xPSR) in the Cortex-M processors and the SPSR has been removed. If the application needs to access the current values of processor flags, the program code can be replaced with a read access to the APSR. If an exception handler would like to access the Program Status register (PSR) before the exception takes place, it can find the value on the stack, because the value of xPSR is automatically saved to the stack when an interrupt is accepted. So there is no need for an SPSR in the Cortex-M processors. Conditional execution: In the ARM7, conditional execution is supported for many ARM instructions, whereas most Thumb-2 instructions do not have the condition field inside the instruction coding. When porting these instructions to the Cortex-M3 and Cortex-M4 processors, the assembly tool might automatically convert these conditional instructions to use an IF-THEN (IT) instruction
785
786
CHAPTER 24 Software Porting
•
•
block; alternatively, we can manually insert the IT instructions or insert branches to produce conditionally executed sequences. One potential issue with replacing conditional sequences with IT instruction blocks is that this could increase the code size and, as a result, could cause minor problems. For instance, load/store operations in another part of the program might then exceed the access range of the instruction. Use of the program counter value: When running ARM code on the ARM7, the read value of the PC during an instruction is the address of the current instruction plus 8. This is because the ARM7 has three pipeline stages and, when reading the PC during the execution stage, the program counter has already been incremented twice, 4 bytes at a time. When porting code that processes the PC value to the Cortex-M processors, since the code will be in Thumb, the offset of the program counter will only be 4. Use of the value of R13: In the ARM7, the stack pointer R13 is a full 32-bit value; in the Cortex-M processors, the lowest 2 bits of the stack pointer are always forced to zero. Therefore, in the unlikely case that R13 is used as a data register, the code has to be modified because the lowest 2 bits would be lost.
For the rest of the ARM program code, we can try to compile it as Thumb/ Thumb-2 and see if further modifications are needed. For example, some of the pre-index and post-index addressing modes support by the ARM7 are not supported by the Cortex-M processors and have to be recoded into multiple instructions. Some of the code might involve long branch ranges or large immediate data values that cannot be compiled as Thumb code and so must be modified to Thumb-2 code manually.
24.3.3 C language files Porting C program files is much easier than porting assembly files. In most cases, application code in C can be recompiled for the CortexÒ-M processors without any problem. However, there are still a few areas that potentially need modification, as follows: •
•
Inline assemblers: Some C code might contain inline assembly that needs modification. This code can be easily located via the __asm keyword. In some older versions of ARMÒ C compilers, inline assembler is not supported and this might have to be changed to Embedded Assembler. (See section 20.5 for details.) Interrupt handler: In the C program you can use __irq to create interrupt handlers that work with the ARM7Ô . Due to the difference between the ARM7 and the Cortex-M exception models, such as saved registers and interrupt returns, depending on development tools being used, keywords indicating that a function is an interrupt handler (such as the __irq keyword) might need to be removed. In ARM development tools including KeilÔ MDK-ARM or DS-5Ô , uses of __irq directive on the Cortex-M processor are allowed, and in general are recommended for reasons of clarity. In some other toolchains, however, you might need to remove some of these compiler-specific keywords.
24.3 Porting software from ARM7TDMIÔ to CortexÒ-M3/M4
•
Pragma directives: ARM C compiler pragma directives like “#pragma arm” and “#pragma thumb” should be removed.
24.3.4 Pre-compiled object files and libraries Most C compilers will provide pre-compiled object files for various function libraries and startup code. Many of those (such as startup code for traditional ARMÒ processor cores) cannot be used on the CortexÒ-M processors due to the difference in operating modes and states. Some of them will have source code available and can be recompiled for Thumb-2. Refer to your tool vendor documentation for details.
24.3.5 Optimization After getting the program to work with a CortexÒ-M microcontroller, you might be able to further improve it to obtain better performance and lower memory use. A number of areas should be explored: •
• •
•
•
•
•
Use of Thumb-2 instructions: For example, if a 16-bit Thumb instruction transfers data from one register to another and then carries out a data processing operation on it, it might be possible to replace the sequence with a single Thumb-2 instruction. This can reduce the number of clock cycles required for the operation. Bit band: If peripherals are located in bit-band regions, access to control register bits can be greatly simplified by accessing the bit via a bit-band alias. Multiply and divide: Routines that require divide operations, such as converting values into decimal for display, can be modified to use the divide instructions in the Cortex-M3 and Cortex-M4 processors. For multiplication of larger data, the multiply instructions in the Cortex-M3/M4, such as unsigned multiply long (UMULL), signed multiply long (SMULL), multiply accumulate (MLA), multiply and subtract (MLS), unsigned multiply accumulate long (UMLAL), and signed multiply accumulate long (SMLAL) can be used to reduce the complexity of the code. Immediate data: Some of the immediate data that cannot be coded in 16-bit Thumb instructions can be produced using 32-bit Thumb instructions. This means that you might be able to reduce to complexity in some code fragments by reducing the number of steps to set up an immediate data. Branches: Some long distance branches that cannot be coded in 16-bit Thumb code (usually ending up with multiple branch steps) can be coded with 32-bit Thumb instructions, reducing code size and branch overhead. Boolean data: Multiple boolean data items (either 0 or 1) can be packed into a single byte/half-word/word in bit-band regions to save memory space. They can then be accessed via the bit-band alias. Bit-field processing: The Cortex-M3 and Cortex-M4 processors provide a number of instructions for bit-field processing, including Unsigned Bit Field eXtract (UBFX), Signed Bit Field eXtract (SBFX), Bit Field Insert (BFI), Bit
787
788
CHAPTER 24 Software Porting
•
•
Field Clear (BFC), and Reverse Bits (RBIT). They can simplify many code sequences for peripheral programming, data packet formation or extraction, and serial data communications. IT instruction block: Some short branches might be replaceable by an IT instruction block. This may avoid wasting clock cycles when the pipeline is flushed during branching. ARM/Thumb state switching: In some situations, ARM developers frequently divide code amongst source files so that some of them can be compiled to ARM code and others compiled to Thumb code. This is usually needed to get the right balance between code density and execution speed. With Thumb-2 features in the Cortex-M processors, this step is no longer needed, so some of the state switching overhead can be removed, producing short code, less overhead, and possibly fewer program files.
24.4 Porting software between different CortexÒ-M processors 24.4.1 Differences between different CortexÒ-M processors In most cases, porting software between different CortexÒ-M processors is relatively straightforward due to the consistency of the architecture. However, there are some differences between the various Cortex-M processors.
Instruction set One of the major differences between Cortex-M processors is the instruction set support. The Cortex-M processors are designed to be upward-compatible, so instructions available on Cortex-M0/M0þ/M1 (ARMv6-M architecture) can also be used on Cortex-M3 and Cortex-M4 (ARMv7-M architecture) (Figure 24.2). In theory a binary program image compiled for ARMv6-M can run directly on a device with ARMv7-M. However, in practice the memory map and peripherals could be different and, in any case, it is best to recompile the code to take advantage of the additional instructions available. When moving downwards, however, the program code will need to be recompiled. Also, assembly code (including inline assembly and Embedded Assembly) may also need to be modified. For example, when porting an application from a Cortex-M3 microcontroller to a Cortex-M0 microcontroller, the following instructions are not available.
IT instruction block • •
Compare and branch (compare and branch if zero [CBZ] and compare and branch if non-zero [CBNZ]) Multiple accumulate instructions (multiply accumulate [MLA], multiply and subtract [MLS], signed multiply accumulate long [SMLAL], and unsigned
24.4 Porting software between different CortexÒ-M processors
Instruction set of the Cortex-M processors
FIGURE 24.2
789
790
CHAPTER 24 Software Porting
•
•
multiply accumulate long [UMLAL]) and multiply instructions with 64-bit results (unsigned multiply long [UMULL] and signed multiply long [SMULL]) Hardware divide instructions (unsigned divide [UDIV] and signed divide [SDIV]) and saturation (signed saturate [SSAT] and unsigned saturate [USAT]) Table branch instruction (Table Branch Half-word [TBH] and Table Branch Byte [TBB])
Exclusive access instructions • • • •
Bit field processing instructions (unsigned bit field extract [UBFX], signed bit field extract [SBFX], Bit Field Insert [BFI], and Bit Field Clear [BFC]) Some data processing instructions (count leading zero [CLZ], rotate right extended [RRX], and reverse bit [RBIT]) Load/store instructions with addressing modes or register combinations that are only supported with 32-bit instruction encoding Load/store instructions with translate (load word data from memory to register with unprivileged access [LDRT] and store word to memory with unprivileged access [STRT])
When porting software from Cortex-M4 to Cortex-M3, the floating point instructions and the instructions in the DSP extension are also unavailable.
Programmer’s model There are a number of small differences between the programmer’s model for ARMv7-M (Cortex-M3 and Cortex-M4) and ARMv6-M (Cortex-M0, CortexM0þ and Cortex-M1): •
• •
Unprivileged level is not available on the Cortex-M0 and is optional on the Cortex-M0þ. This also affects bit 0 of the CONTROL register, which is not available if unprivileged level is not present (Figure 24.3). The FAULTMASK and BASEPRI registers (for exception masking) are not available on ARMv6-M. Only the Cortex-M4 processor has the optional floating point register bank and FPSCR register. The Program Status Register (PSR) also has some differences:
• • • •
The Application PSR in ARMv6-M does not have the Q bit. The GE bits are only available on the Cortex-M4 processor. The ICI/IT bits (Interrupt Continuable Instruction/IF-THEN) are not available on ARMv6-M. The width of the IPSR is only 6 bits in Cortex-M0/M0þ/M1 processor because they only support up to 32 interrupts.
Exception Start (reset)
Privileged Thread
Program of CONTROL register
Exception Exception exit
Unprivileged Thread
Functions (and banked registers)
R0
General Purpose Register
R1
General Purpose Register
R2
General Purpose Register
R3
General Purpose Register
R4
General Purpose Register
R5
General Purpose Register
R6
General Purpose Register
R7
General Purpose Register
R8
General Purpose Register
R9
General Purpose Register
R10
General Purpose Register
R11
General Purpose Register
R12 R13 (MSP)
Not available in Cortex-M0, optional in Cortex-M0+ Not available in Cortex-M0 and Cortex-M0+
Main Stack Pointer (MSP), Process Stack Pointer (PSP) Link Register (LR)
R15
Program Counter (PC)
xPSR
Functions Program Status Registers
PRIMASK FAULTMASK
Interrupt Mask Registers
BASEPRI CONTROL
FIGURE 24.3 Programmer’s model differences
High Registers
General Purpose Register R13 (PSP)
R14
Name
Low Registers
Control Register
Special Registers
24.4 Porting software between different CortexÒ-M processors
Privileged Handler
Exception exit
Name
791
792
CHAPTER 24 Software Porting
NVIC The NVIC feature on the Cortex-M processor is configurable. This means that the number of interrupts supported and the number of programmable interrupt priorities can be decided by the chip manufacturers. Table 24.9 lists the differences in the NVIC in different Cortex-M processors.
System-level features There are some differences in the system-level features, as can be seen from Table 24.10.
Low power features The low power support at the processor level is identical, as shown in Table 24.11. However, at chip-design level, different microcontrollers have different low power features and therefore the code for low power optimization usually needs to be changed when porting applications from one microcontroller to another.
Debug and trace features There are further differences in the debug and trace features, as can be seen in Table 24.12.
24.4.2 Required software changes For microcontroller applications using CMSIS-compliant device libraries, in most cases you will need to: • • • •
Replace the device driver header files Replace device specific startup code Adjust Interrupt Priority level if needed Adjust compilation options such as processor type, floating point options
To improve software portability, you should use the interrupt control functions provided in CMSIS-Core to set up interrupt configurations in the NVIC. If your application accesses the NVIC registers directly, you may need to adjust the source code during porting, because the NVIC in ARMv6-M does not allow byte or half-word accesses. For example, the definition of the priority level registers in the CMSIS-Core header files is different between ARMv7-M and ARMv6-M. ARMv6-M does not have the Software Trigger Interrupt Register (NVIC>STIR). Therefore, in CortexÒ-M0/M0þ processors, software needs to use the Interrupt Set Pending Register (NVIC->ISPR) to trigger an interrupt in software. The programmer’s model of the SysTick timer is basically the same. However, the SysTick timer value in the Cortex-M3 and Cortex-M4 is reset to 0 and in the Cortex-M0 and Cortex-M0þ the timer initial value can be undefined. As a result, when porting SysTick setup code, you must make sure that the program code initializes the SysTick timer value.
Features
Cortex-M0
Cortex-M0D
Cortex-M3
Cortex-M4
Number of IRQ System Exceptions
1 to 32 5 (NMI, HardFault, SVC, PendSV, SysTIck)
1 to 32 5
1 to 240 9
Programmable Priority levels Priority Grouping Masking registers
4 No PRIMASK
4 No PRIMASK
Vector Table Offset Register Software Trigger Interrupt Register Interrupt Active Status Registers Dynamic priority change support Register accesses Double word stack alignment
No No
Optional No
1 to 240 9 (ARMv6-M system exceptions þ 3 configurable fault handlers þ debug monitor) 8 to 256 Yes PRIMASK, FAULTMASK, BASEPRI Yes Yes
8 to 256 Yes PRIMASK, FAULTMASK, BASEPRI Yes Yes
No
No
Yes
Yes
No
No
Yes
Yes
32-bit Always enable
32-bit Always enable
8/16/32-bit Programmable
8/16/32-bit Programmable
24.4 Porting software between different CortexÒ-M processors
Table 24.9 NVIC Feature Comparison
793
794
Features
Cortex-M0
Cortex-M0D
Cortex-M3
Cortex-M4
Privileged/unprivileged SysTick Timer MPU Bit band
No Optional No Not included in the processor, can be added at the system level No von Neumann HardFault
Optional Optional Optional Not included in the processor, can be added at the system level Yes von Neumann HardFault
Yes Yes Optional Optional
Yes Yes Optional Optional
No (Debug FSR for debug only) System Reset Request
No (Debug FSR for debug only) System Reset Request
No
No
No Harvard HardFault þ 3 other fault handlers CFSR, HFSR, DFSR, AFSR System Reset Request þ VECTRESET Yes
No Harvard HardFault þ 3 other fault handlers CFSR, HFSR, DFSR, AFSR System Reset Request þ VECTRESET Yes
No
No
Yes
Yes
Single cycle I/O interface Bus architecture Fault Handling Fault Status Registers Self-reset Unaligned accesses support Exclusive access
CHAPTER 24 Software Porting
Table 24.10 System Feature Comparison
24.4 Porting software between different CortexÒ-M processors
Table 24.11 Low Power Feature Comparison Features
Cortex-M0
Cortex-M0D
Cortex-M3
Cortex-M4
Sleep modes
Sleep and deep sleep Yes Yes Yes Yes
Sleep and deep sleep Yes Yes Yes Yes
Sleep and deep sleep Yes Yes Yes Yes
Sleep and deep sleep Yes Yes Yes Yes
Yes
Yes
Yes
Yes
Sleep-on-exit WIC support SRPG support Event support (e.g., SEV) SEVONPEND
Typically you might also need to adjust the clock frequency of the microcontroller and modify the code that utilizes the low power features of the microcontrollers. When moving from one Cortex-M microcontroller to another, there might be differences in terms of program execution speed. For example, when porting an application from a Cortex-M0 microcontroller to a Cortex-M3 microcontroller, you might be able to reduce the clock frequency of the microcontroller to achieve the same performance but with lower power consumption.
24.4.3 Embedded OS In an application with an embedded OS, you might need to switch to a different version of the OS in order to allow it to work properly. For example, an embedded OS written for the CortexÒ-M3 processor might work on the Cortex-M4 processor as long as the application does not use the floating point unit. But as soon as the floating point unit is used, it will have to provide context saving and restore for floating point register banks, as well as deal with extra information in the CONTROL register, EXC_RETURN value, and different stack frame sizes. A Cortex-M0 application project running with an embedded OS might need some extra adjustment. In the Cortex-M0 processor, there is no unprivileged access level, as all application threads can access the NVIC and the registers in the System Control Space (SCS). When the application is ported to other Cortex-M processors, the OS might run the threads in unprivileged state by default and all the accesses to NVIC and SCS registers would be blocked. So you might need some adjustments in the project to avoid NVIC and SCS accesses in threads, or you could adjust the OS configuration to enable the threads to run in privileged state. Some embedded OSs might utilize the MPU feature. The programmer’s models of the MPUs in the Cortex-M3/M4 and Cortex-M0þ processors are mostly the same, but there are some minor differences. These are listed in section 11.8, Table 11.12. For example, the address bit field in the MPU Base Address Register for the CortexM3/M4 MPU allows you to define a MPU region as small as 32 bytes. In the
795
796
Features
Cortex-M0
Cortex-M0D
Cortex-M3
Cortex-M4
Debug interface
Typically either Serial wire or JTAG Yes
Typically either Serial wire or JTAG Yes
Typically both Serial Wire and JTAG Yes
Typically both Serial Wire and JTAG Yes
Yes No Yes Up to 4
Yes No Yes Up to 4
Up to 2
Up to 2
Yes Yes Yes Up to 8 (6 instructions and 2 literal data) Up to 4
Yes Yes Yes Up to 8 (6 instructions and 2 literal data) Up to 4
No No No No
No No No No
Yes Yes Yes Yes
Yes Yes Yes Yes
Program run control (Halting, resume, single step) On-the-fly memory accesses Debug monitor Software breakpoint Hardware breakpoint comparators Hardware watchpoint comparators Instrumentation trace Data, event and profiling trace Profiling counter PC sampling register via trace connection PC sampling register via debug connection Instruction Trace
Yes
Yes
Yes
Yes
No
Optional ETM
Optional ETM
Trace interface
No
Optional Micro Trace Buffer (MTB) MTB instruction trace via debug connection
Debug and Trace registers accesses from software (needed by debug monitor)
No
Serial Wire Viewer (SWV) or Trace Port interface Yes
Serial Wire Viewer (SWV) or Trace Port interface Yes
No
CHAPTER 24 Software Porting
Table 24.12 Debug and Trace Feature Comparison
24.4 Porting software between different CortexÒ-M processors
Cortex-M0þ processor, the smallest supported region size is 256 bytes. However, by using the sub-region disable feature, you can create a 32-byte region with only minor modifications to the MPU setup code.
24.4.4 Creating portable program code for CortexÒ-M processors In some projects, we need to create program code that can be reused on various CortexÒ-M processors including Cortex-M0, Cortex-M0þ, Cortex-M3, and Cortex-M4. In order to maximize software reusability, there are a few areas that should be considered when developing embedded application code: • •
•
•
Use CMSIS-Core functions to access processor features instead of directly accessing system registers. Avoid using any features that are limited to the Cortex-M3 and Cortex-M4 processors. For example, in CMSIS-Core, the Interrupt Active Status Register is not available in the Cortex-M0 and Cortex-M0þ. Other features that are limited to Cortex-M3 and Cortex-M4 include: bit band, unaligned transfers, and dynamic interrupt priority change. When developing portable program code, we can enable the UNALIGN_TRP bit in the Configuration Control Register (SCB->CCR) to detect any unaligned data transfers and modify the code to prevent this when unaligned data accesses are found. When creating assembly code (e.g., inline assembly, embedded assembly) we also need to make sure that the instructions used are available on ARMv6-M.
797