I have a simple C program:
// it is not important to know what the code does you may skip the code
main.c
#include <bsp.h>
unsigned int AppCtr;
unsigned char AppFlag;
int SOME_LARGE_VARIABLE;
static void AppTest (void);
void main (void)
{
AppCtr = 0;
AppFlag = 0;
AppTest();
}
static void Foo(void){
SOME_LARGE_VARIABLE=15;
}
static void AppTest (void)
{
unsigned int i;
i = 0;
while (i < 200000) {
i++;
}
BSP_Test();
SOME_LARGE_VARIABLE=3;
Foo();
}
bsp.c
extern int SOME_LARGE_VARIABLE;
extern unsigned char AppFlag;
unsigned int long My_GREAT_COUNTER;
void BSP_Test (void) {
SOME_LARGE_VARIABLE = 5;
My_GREAT_COUNTER = 4;
}
(the program does not do anything useful... My goal is to extract the variable names their location where they are being declared and their memory address)
When I compile the program I get the file a.out
which is an elf file containing debug information.
Someone on the company wrote a program in .net 5 years ago that will get all this information from the a.out file. This is what the code returns:
// Name Display Name Type Size Address
For this small program it works great and also for other large projects.
That code is 2000 lines long with several bugs and it does not support .NET version 4. That's why I am trying to recreate it.
So my question is, I am lost in the sense that I don't know what approach to take in order to solve this problem. These are the options I have been considering:
Organize the buggy code of the program I showed on the first image and try to see what it does and how it parses the a.out file in order to get that information. Once I fully understand it try to figure out why it does not support version 3 and 4.
I am ok at creating regex expressions so maybe try to look for the pattern in the a.out file by doing something like: So far I was able to find the pattern where there is just one file (main.c). But when there are several files it get's more complicated. I haven't tried it yet. Maybe it will be not that complicated and it will be possible to find the pattern.
Install Cygwin so that I can use linux commands on windows such as objdump
, nm
or elfread
. I have't played enough with the commands when I use those commands such as readelf -w a.out
I get way more information that I need. There are some cons why I have not spend that much time with this approach:
Cons: It takes a while to install cygwin on windows and when giving this application to our customers we don't want them to have to install it. Maybe there is a way of just installing the commands objdump and elfread without having to install the whole thing
Pros: If we find the right command to use we will not be reinventing the wheel and save some time. Maybe it is a matter of parsing the results of a command such as objdump -w a.out
In case you want to download the a.out file in order to parse it here it is.
I will to be able to get the global variables on a.out file. I will like to know what type each variable is (int, char, ..), what memory address they have and I will also like to know on what file the variable is being declared (main.c or someOtherFile.c). I will appreciate if I don't have to use cygwin as that will make it more easy to deploy. Since this question asks for a lot, I attempted to split it into more:
perhaps I should delete the other questions. sorry being redundant.
Here is what I will do. Why reinvent the wheel!
Download linux commands that will be needing on windows from here.
on the bin directory there should be: readelf.exe
Note we will not need Cygwin or any program so deploying will be simple!
Once we have that file execute in cmd:
// cd "path where readelf.exe is"
readelf.exe -s a.out
and this is the list that will come out:
so if you take a look we are interested in getting all the variables that are of type OBJECT with size greater than 0.
Once we got the variables we can use the readelf.exe -w a.out
command to take a look at the tree and it looks like: let's start looking for one of the variable we found on step 2 (SOME_GREAT_COUNTER) Note that at the top we know the location where the variable is being declared, we got more information such as the line where it was declared and the memory address
The last thing we are missing to do is to get the type. if you take a look we see that the type is = <0x522>. What that means is that we have to go to 522 of the tree to get more info about that time. If we go to that part this is what we get: From looking at the tree we know that SOME_LARGE_VARIABLE is of type unsigned long