ARM Cortex-M4: Running code from external flash

AGM picture AGM · Jan 14, 2015 · Viewed 13.4k times · Source

Is it possible to separate base FW and application code on ARM Cortex-M4 architecture (e.g STM32 F4). What I´d like to do is to run applications from external flash and base FW from internal flash. Applications are all implementing same "API" (single header file), but functionality is different.

Idea is that base FW is offering drivers, engine and UI and can work as standalone. Applications would provide extra functionality to base FW when needed. All applications cannot be flashed on internal flash since total code size for all applications is too big for internal flash. Another reason is that we´d like to update / add applications on the fly without re-flashing the device.

So, far I have few ideas how to do this but is any of these feasible or are there other options?

  1. Load applications on the fly to internal SRAM
    • RAM consumption might be problem.
    • Not sure if base FW and application code can be executed at "same time". Can application code functions be called from based FW and and wise verse? I have seen this technique used with flash-loaders, but once you start to run code from RAM the code from flash cannot be executed anymore?
  2. Flash application from external flash to internal flash.
    • Not sure how long internal flash will last. What is maximum amount of write cycles to internal flash. There is need to change application 1-20 times / day
    • Can part of the internal flash be flashed when executing code from there (application loader)
  3. Find Cortex-M4 that supports running code same time at internal flash and external flash.
    • I haven't find any, probably not possible with Cortex-M4 architecture?

All tips, hints and example codes appreciated!

EDIT: Thanks for the answers, need some time to digest those.

Main reason for this trial is to allow updating the device functionality without flashing the base FW, not so much of saving SRAM/internal flash. Kind-of plugin architecture to offer simple interface to extend system functionality without need to alter the underlying system. If I cannot build system executing code from external flash (SD card, NAND) I will try first loading application on the fly to SRAM/internal flash. But I will also dig deeper to emcraft solution.

There is no need to stick on STM chips, I just happen to have their devkits in my table. The final target is to load applications from SD card or NAND memory, so in this point I don't want to limit implementation to work only with NOR flash.

I´ll start to work with minimal implementation by using STM32 F4 devkit. First I need to wrap some NAND/SD card on it. I will try both options for loading applications to SRAM and internal flash to see how those works and what is the impact for performance. As Clifford said the challenge will be more in linking, building and toolset settings. Even-tough I can force application to be always in same place on memory the functions will be in different place, need to figure out how to take care of this. Examples/demos would be helpful.

Spec for my minimal implementation.

Project 1: Base FW
    Driver for accessing applications from external flash
    Minimal filesystem to write and read applications to/from external flash
    UART commands -- Write applications to external flash -- Load applications from external flash to SRAM/Internal flash -- Execute application and print result to UART
Interface.h
    int functionWrapper(int functionNumber)
    bool initApplication()
    int executeMathOperation1(int a, int b)
    int executeMathOperation2(int a, int b)
Project 2: Application 1
    MathOp1: Sums up two values
    MathOp2: Multiply up two values
Project 3: Application 2
    MathOp1: Subtracts two values
    MathOp2: Divide two values

I haven't think the final OS, but most probably it will be Free/OpenRTOS

Answer

Clifford picture Clifford · Jan 14, 2015

The problem is not the processor; executing code in different memory spaces is a matter of building, linking and loading your code appropriately, and that is largely a toolchain issue than a matter of chip selection.

The first problem is selection of a device that actually has an external memory interface that is memory mapped. In this case for example serial memory devices such as NAND flash, or mass storage devices such as SD card are not suitable. It must be NOR flash on the system address/data bus.

Second in most cases the external memory interface must be configured for the correct memory type, bus width and timing etc. in order to be addressable. Than means that you cannot boot directly into software on the external memory because software must run in order to perform the configuration.

Thirdly, typically your toolchain will compile and link your application into a single monotithic application - separating it into BIOS/OS and application is not trivial, and for bare-metal targets (ie. not running a full OS such as Linux with load/execute and dynamic linking etc. built-in) there is no standard method - you'll have to cook your own.

For the case of a bootloader starting up, loading an application and running it, it is perhaps simple because once the application is running the bootloader plays no further part - the bootloader only needs to know the start address of the application. However in your case you want to separately compile and link two separate software entities and have the application be able to access your BIOS/OS code, so the application entity needs to know the routine entry-point addresses of the independently linked BIOS/OS. One way to do this is to generate a link map of the BIOS/OS (which is toolchain specific), and from that generate an entry-point lookup table (and array of function pointers essentially) that you link with each application, That way your application will have the means to back-call BIOS/OS.

You may however not need to do that, you can link your application into disjoint memory address regions and program the internal and external memory devices separately (sometimes called "scatter-loading). That way the linker is responsible for resolving the internal and external addresses and calling in either direction is possible. You need to ensure that the start-up code that configures external memory is in the internal memory of course, but it is possible to instruct the linker to place specific code in specific memory or let it decide when it does not matter.

The need to re-flash the code does not itself require the architecture you describe, you could simply implement a bootloader (occupying reserved pages of internal flash) that can load data from some source such as SD card, USB, serial or NAND flash, and write it to the appropriate internal or external flash pages; if the data loaded is for example in Intel format Hex file, it will contain address information that will tell the bootloader where to write the data. In this approach the only permanent code is the bootloader (which must configure the external memory), and your BIOS/OS and application can be monolithic and "scatter loaded" across internal and external flash.

A word of warning however Cortex-M devices are optimised to execute code and load data over separate buses. On STM32 the internal flash and internal SRAM are on separate buses allowing instruction and data fetches to occur in parallel. when running code from and external memory, you have to realise that not only is that bus likely to be slower, instruction and data fetches from external memory will be serialised. So performance may take a significant hit.