I thought many programmers of Arduino/AVRs MCUs could share some of knowledge.
My specific problem was this:
In my case on an Atmel Atmega128 AVR.
Based on ADC data I was running a loop which was doing some calculations into serial console, also it drove an interrupt.
What happened was that when I did use a certain amount of serial output the program suddenly became highly unstable.
Random jumps within the code, random unknown interrupts, random variable corruption, random resets without MCU register bits set.
Adding a buffer at a specific point solved the instability.
Changing optimization parameters of gcc also changed the behaviour, without optimization the code was quite stable.
Check my answer for possibly reasons and the real cause.
There are quite many traps, these are some mean ones:
1) Sudden resets can have multiple reasons. Your AVR is maybe running on a too low voltage, either lower the frequency or increase the voltage. Check the datasheet, they have a graph and they are serious about it :)
2) Another potential issue is an unhandled interrupt, you must handle every interrupt as an unknown IRQ will cause an immediate reset.
I like to add this code into a special "catch all" ISR which catches all unhandled IRQs.
ISR(BADISR_vect)
{
for (;;) UDR0='!';
}
This snippet will write a mass of ! into the UART. Alternatively you can let a LED flash, etc. Just make sure not to RETURN from there as this just hides the problem and you might not find the bug as the program keeps running.
3) In your main() or init code you should check the MCU Status Register as soon as possible and set it to zero.
This register will hold the reason for the reset in most cases of reset.
if(MCUCSR & (1<<PORF )) myprintf0P(PSTR("Power-on reset.\n"));
if(MCUCSR & (1<<EXTRF)) myprintf0P(PSTR("External reset!\n"));
if(MCUCSR & (1<<BORF )) myprintf0P(PSTR("Brownout reset!\n"));
if(MCUCSR & (1<<WDRF )) myprintf0P(PSTR("Watchdog reset!\n"));
if(MCUCSR & (1<<JTRF )) myprintf0P(PSTR("JTAG reset!\n"));
MCUCSR = 0;
4) Another nice reason for unexpected behaviour is the compiler optimization.
You can choose from quite a few options, the more you optimize the more your code will be compacted (in general at least). Useless data and functions become removed, code becomes compacted and changed into faster or smaller instructions.
Often programmers disable or reduce the optimization during writing the code, this helps as the debugging process will not randomly jump lines and show quite accurate what is going on based on the own code.
However, if you have a small memory problem (like an off-by one bug) then an unoptimized code might run without visible issues but as soon as optimization is turned higher the variable positions might change, or suddenly two variables are next to each other in stack or heap so an off-by-one write can suddenly affect code which was not affected before.
Debugging tools like valgrind are not available for AVR, so my best hint is to write with activated brain.
If you play with pointers then doublecheck if you never run out of bounds.
5) Compiler optimizations can 'destroy' your polling code.
For example, you are writing an atomic (8 bit) variable/register in your ISR (uart, ADC, TWI, etc). In your main loop you now look if this variable changes as you use it as indicator/flag for new data.
This is a proper way of writing code but your compiler does not know that you are changing this variable in your ISR.
So it is well possible that an optimization routine just acts as if this variable is static, after all you run an endless loop and you only READ from it in this loop.
The solution is to set the variable volatile.
Here an example for a FIFO ring buffer with two indexes that are read and written from normal code and ISR code:
struct fifo
{
uint8_t size; /* size of buffer in bytes */
volatile uint8_t read; /* read pointer */
volatile uint8_t write;
unsigned char *buffer; /* fifo ring buffer */
};
6) This was my specific problem, it caused all of above and more.
In my case the whole problem came from my stupidity of using the AVR at 3.3Volt and 16MHZ. It would require around 4.5V at this frequency to run stable.
Early on I made quite some tests and the MCU seemed to run rock stable, but as the code size increased the stability lowered.
It acted as if I had a very serious memory corruption, possibly triggered by an ISR.
Or as if some libc functions (the progmem related ones) were faulty.
Putting the device to 5V solved it. That wisdom took me countless of hours software analysis, I literaly searched in depth for every possible cause on software side.
Lesson: If you program a microcontroller, never treat it as pure software :)
7) For advanced memory corruption analysis you can set the stack to a specific predefined state. This can greatly aid your debugging as you can watch where variables grow in data.
Also the loss of a null termination lets your pointer run into known data instead of unknown data.
Just add a C file to your project with code like this:
extern void *_end, *__stack;
#define __ALD(x) ((uintptr_t)(x) - ((uintptr_t)(x) & 0x03))
#define __ALU(x) ((uintptr_t)(x) + ((uintptr_t)(x) & 0x03))
void _stackfill(void) __attribute__((naked)) __attribute__((optimize("O3"))) __attribute__((section (".init1")));
void _stackfill(void)
{
uint32_t* start = (uint32_t*)__ALU(&_end);
uint32_t* end = (uint32_t*)__ALD(&__stack);
for (uint32_t *pos = start; pos < end; pos++)
*pos = 0x41424142; // ends up as endless ascii BABA
}
This code will automatically hook into itself into the init section of your code and write the pattern BABABABABABABA all over your sram.
This has no bad effect on your program, it just initializes the sram with a known pattern.
If you look at it during debugging you will see where variables are allocated and where not.
It works fine, can also be written into init3.
That's it for now. I hope this short roundup will help some programmers to solve strange/frustrating behaviour with their AVR.
The code parts are written for ATMEGA 128 but will run on any 8 bit AVR, just some register names might need a small change.