Executing code from RAM in STM32

Nixmd picture Nixmd · Mar 5, 2017 · Viewed 12.7k times · Source

I recently have started programming on a STM32F4 nucleo board. I just have figured out that programming into the flash is only possible for a limited amount of times (though it's not a few, but it is an evaluation board and it will be programmed over and over to develop different projects). After that I read somewhere that it is possible to directly program into RAM instead of flash, but could not find any technical information about it.

Does anybody know how to modify linker/makefile to compile and link the program to be executed from starting address of RAM and not flash?

ps: I use generated codes by STM32CubeMX for System workbench and a script to generate makefile for the project


old_timer picture old_timer · Mar 5, 2017

If you recently started using it then you have a long time before the flash wears out. You might be getting drive full errors, just unplug and replug the board. I have had these things for years and have not worn out the flash yet. Not to say it cant be done, it can but you are not likely there unless you wrote a flash thrashing program that wore it out.

You will need openocd (or some other debugger, maybe your IDE provides that, I dont use those so cant help there). openocd and gnu tools are trivial to come by so going to walk through that.

From the correct directory, or by copying these files from openocd

openocd -f stlink-v2-1.cfg -f stm32f4x.cfg

(one or both might have dependencies other files they include, can pull those in or whatever it takes).

should end with something like this and not exit back to the command line

Info : stm32f4x.cpu: hardware has 6 breakpoints, 4 watchpoints

In another window

telnet localhost 4444

Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger

In that window you can halt the processor

> halt
stm32f4x.cpu: target state: halted
target halted due to debug-request, current mode: Thread 
xPSR: 0x61000000 pc: 0x080000b2 msp: 0x20000ff0

Full sized arm processors your entry point is an instruction and you just start executing. The cortex-m uses a vector table you cannot just branch there.

.global _start
stacktop: .word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang

    bl notmain
    b hang

hang:   b .

You could in theory branch to the reset handler address, but the linker script is going to want that in flash, anything position dependent will not work. And your stack pointer might not be set if you rely on the vector table to do that. so instead something like this would work, part of a complete example


.cpu cortex-m0

.global _start
    ldr r0,stacktop
    mov sp,r0
    bl notmain
    b .

stacktop: .word 0x20001000

.globl PUT32
    str r1,[r0]
    bx lr

.globl GET32
    ldr r0,[r0]
    bx lr


void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );

int notmain ( void )
    unsigned int ra;


    rom : ORIGIN = 0x08000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000

    .text : { *(.text*) } > ram
    .rodata : { *(.rodata*) } > ram
    .bss : { *(.bss*) } > ram

basically replace the rom references with ram. (your linker script if gnu is likely way more complicated than this one, but this works just fine could add a .data here as needed).

arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding  -mcpu=cortex-m0 -mthumb -c notmain.c -o notmain.o
arm-none-eabi-ld -o notmain.flash.elf -T flash.ld flash.o notmain.o
arm-none-eabi-objdump -D notmain.flash.elf > notmain.flash.list
arm-none-eabi-objcopy notmain.flash.elf notmain.flash.bin -O binary
arm-none-eabi-as --warn --fatal-warnings -mcpu=cortex-m0 sram.s -o sram.o
arm-none-eabi-ld -o notmain.sram.elf -T sram.ld sram.o notmain.o
arm-none-eabi-objdump -D notmain.sram.elf > notmain.sram.list
arm-none-eabi-objcopy notmain.sram.elf notmain.sram.hex -O ihex
arm-none-eabi-objcopy notmain.sram.elf notmain.sram.bin -O binary

my build of both a flash version and an sram version of the program.

so now we have our telnet into the openocd server, the processor is halted, lets look at a memory location and change it

> mdw 0x20000400
0x20000400: 7d7d5889 
> mww 0x20000400 0x12345678
> mdw 0x20000400           
0x20000400: 12345678 

and run our new sram based program

> load_image /path/to/notmain.sram.elf
64 bytes written at address 0x20000000
downloaded 64 bytes in 0.008047s (7.767 KiB/s)
> resume 0x20000001

let it run, script speed is probably still to slow but certainly taking the time to type the halt command is plenty.

> halt
stm32f4x.cpu: target state: halted
target halted due to debug-request, current mode: Thread 
xPSR: 0x41000000 pc: 0x20000008 msp: 0x20001000
> mdw 0x20000400 10
0x20000400: 12345679 12345678 ce879a24 fc4ba5c7 997e5367 9db9a851 40d5083f fbfbcff8 
0x20000420: 035dce6b 65a7f13c 

so the program ran, the program reads 0x20000400 saves it to 0x20000404 increments and saves that to 0x20000400 and it did all of that.

> load_image /path/to/notmain.sram.elf
64 bytes written at address 0x20000000
downloaded 64 bytes in 0.008016s (7.797 KiB/s)
> resume 0x20000000
> halt
stm32f4x.cpu: target state: halted
target halted due to debug-request, current mode: Thread 
xPSR: 0x41000000 pc: 0x20000008 msp: 0x20001000
> mdw 0x20000400 10                           
0x20000400: 1234567a 12345679 ce879a24 fc4ba5c7 997e5367 9db9a851 40d5083f fbfbcff8 
0x20000420: 035dce6b 65a7f13c 

so we didnt need to or the start address with one, which you do with a BX, they must just shove the address right into the pc, and/or do the right thing for us.

If you were to only modify your linker script to replace the roms with rams.

20000000 <_start>:
20000000:   20001000
20000004:   20000041
20000008:   20000047
2000000c:   20000047
20000010:   20000047
20000014:   20000047
20000018:   20000047
2000001c:   20000047
20000020:   20000047
20000024:   20000047
20000028:   20000047
2000002c:   20000047
20000030:   20000047
20000034:   20000047
20000038:   20000047
2000003c:   20000047

20000040 <reset>:
20000040:   f000 f806   bl  20000050 <notmain>
20000044:   e7ff        b.n 20000046 <hang>

you could use the 0x20000041 address as your entry point (resume 0x20000041) but you have to deal with the stack pointer first.

By doing something like this

> reg sp 0x20001000
sp (/32): 0x20001000
> reg sp
sp (/32): 0x20001000
> resume 0x20000041

Note that the ram on theses is faster than rom and doesnt need wait states as you increase the clock frequency so if you do increase the clock frequency and debug in ram only, it may fail when you switch over to flash if you have not remembered to set the flash wait states...Other than that and having significantly less room for programs you can develop in ram all day long if you want.

One nice feature is that you can keep halting and re-loading. I dont know on this device/debugger, if you turn on the cache (some cortex-m4s have a cache if not all) you have to be careful to make sure that is off when you change programs. writing to memory is a data operation fetching instructions is an instruction fetch operation which could land in an instruction cache, if you execute some instruction at 0x20000100 and it gets cached in I cache. then you halt using the debugger then write a new program including the addresses in cache (0x20000100) when you run it the I cache has not been flushed so you would be running a mixture of prior programs in cache and the new program in data, which is a disaster at best. So either never turn on the caches when running this way or come up with a solution to this problem (clear the caches before you stop the program, use the reset button to reset the processor between runs, power cycle, etc).