I want to learn lLinux Kernel programming.
What would be the starting points for that? What could be some of the simpler problems to target?
**TODO** +editPic: Linux Kernel Developer -> (Ring Layer 0)
+addSection: Kernel Virtualization Engine
KERN_WARN_CODING_STYLE: Do not Loop unless you absolutely have to.
Recommended Books for the Uninitialized
void *i
"Men do not understand books until they have a certain amount of life, or at any rate no man understands a deep book, until he has seen and lived at least part of its contents". –Ezra Pound
A journey of a thousand code-miles must begin with a single step. If you are in confusion about which of the following books to start with, don't worry, pick any one of your choice. Not all those who wander are lost. As all roads ultimately connect to highway, you will explore new things in your kernel journey as the pages progress without meeting any dead ends, and ultimately connect to the code-set
. Read with alert mind and remember: Code is not Literature.
What is left is not a thing or an emotion or an image or a mental picture or a memory or even an idea. It is a function. A process of some sort. An aspect of Life that could be described as a function of something "larger". And therefore, it appears that it is not really "separate" from that something else. Like the function of a knife - cutting something - is not, in fact, separate from the knife itself. The function may or may not be in use at the moment, but it is potentially NEVER separate.
Solovay Strassen Derandomized Algorithm for Primality Test:
Read not to contradict and confute; nor to believe and take for granted; nor to find talk and discourse; but to weigh and consider. Some books are to be tasted, others to be swallowed, and some few to be chewed and digested: that is, some books are to be read only in parts, others to be read, but not curiously, and some few to be read wholly, and with diligence and attention.
static void tasklet_hi_action(struct softirq_action *a)
{
struct tasklet_struct *list;
local_irq_disable();
list = __this_cpu_read(tasklet_hi_vec.head);
__this_cpu_write(tasklet_hi_vec.head, NULL);
__this_cpu_write(tasklet_hi_vec.tail, this_cpu_ptr(&tasklet_hi_vec.head));
local_irq_enable();
while (list) {
struct tasklet_struct *t = list;
list = list->next;
if (tasklet_trylock(t)) {
if (!atomic_read(&t->count)) {
if (!test_and_clear_bit(TASKLET_STATE_SCHED,
&t->state))
BUG();
t->func(t->data);
tasklet_unlock(t);
continue;
}
tasklet_unlock(t);
}
local_irq_disable();
t->next = NULL;
*__this_cpu_read(tasklet_hi_vec.tail) = t;
__this_cpu_write(tasklet_hi_vec.tail, &(t->next));
__raise_softirq_irqoff(HI_SOFTIRQ);
local_irq_enable();
}
}
Core Linux ( 5 -> 1 -> 3 -> 2 -> 7 -> 4 -> 6 )
“Nature has neither kernel nor shell; she is everything at once” -- Johann Wolfgang von Goethe
Reader should be well versed with operating system concepts; a fair understanding of long running processes and its differences with processes with short bursts of execution; fault tolerance while meeting soft and hard real time constraints. While reading, it's important to understand and n/ack
the design choices made by the linux kernel source in the core subsystems.
Threads [and] signals [are] a platform-dependent trail of misery, despair, horror and madness (~Anthony Baxte). That being said you should be a self-evaluating C expert, before diving into the kernel. You should also have good experience with Linked Lists, Stacks, Queues, Red Blacks Trees, Hash Functions, et al.
volatile int i;
int main(void)
{
int c;
for (i=0; i<3; i++) {
c = i&&&i;
printf("%d\n", c); /* find c */
}
return 0;
}
The beauty and art of the Linux Kernel source lies in the deliberate code obfuscation used along. This is often necessitated as to convey the computational meaning involving two or more operations in a clean and elegant way. This is especially true when writing code for multi-core architecture.
Video Lectures on Real-Time Systems, Task Scheduling, Memory Compression, Memory Barriers, SMP
#ifdef __compiler_offsetof
#define offsetof(TYPE,MEMBER) __compiler_offsetof(TYPE,MEMBER)
#else
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
#endif
Linux Device Drivers ( 1 -> 2 -> 4 -> 3 -> 8 -> ... )
"Music does not carry you along. You have to carry it along strictly by your ability to really just focus on that little small kernel of emotion or story". -- Debbie Harry
Your task is basically to establish a high speed communication interface between the hardware device and the software kernel. You should read the hardware reference datasheet/manual to understand the behavior of the device and it's control and data states and provided physical channels. Knowledge of Assembly for your particular architecture and a fair knowledge of VLSI Hardware Description Languages like VHDL or Verilog will help you in the long run.
Q: But, why do I have to read the hardware specs?
A: Because, "There is a chasm of carbon and silicon the software can't bridge" - Rahul Sonnad
However, the above doesn't poses a problem for Computational Algorithms (Driver code - bottom-half processing), as it can be fully simulated on a Universal Turing Machine. If the computed result holds true in the mathematical domain, it's a certainty that it is also true in the physical domain.
Video Lectures on Linux Device Drivers (Lec. 17 & 18), Anatomy of an Embedded KMS Driver, Pin Control and GPIO Update, Common Clock Framework, Write a Real Linux Driver - Greg KH
static irqreturn_t phy_interrupt(int irq, void *phy_dat)
{
struct phy_device *phydev = phy_dat;
if (PHY_HALTED == phydev->state)
return IRQ_NONE; /* It can't be ours. */
/* The MDIO bus is not allowed to be written in interrupt
* context, so we need to disable the irq here. A work
* queue will write the PHY to disable and clear the
* interrupt, and then reenable the irq line.
*/
disable_irq_nosync(irq);
atomic_inc(&phydev->irq_disable);
queue_work(system_power_efficient_wq, &phydev->phy_queue);
return IRQ_HANDLED;
}
Kernel Networking ( 1 -> 2 -> 3 -> ... )
“Call it a clan, call it a network, call it a tribe, call it a family: Whatever you call it, whoever you are, you need one.” - Jane Howard
Understanding a packet walk-through in the kernel is a key to understanding kernel networking. Understanding it is a must if we want to understand Netfilter or IPSec internals, and more. The two most important structures of linux kernel network layer are: struct sk_buff
and struct net_device
static inline int sk_hashed(const struct sock *sk)
{
return !sk_unhashed(sk);
}
Kernel Debugging ( 1 -> 4 -> 9 -> ... )
Unless in communicating with it one says exactly what one means, trouble is bound to result. ~Alan Turing, about computers
Brian W. Kernighan, in the paper Unix for Beginners (1979) said, "The most effective debugging tool is still careful thought, coupled with judiciously placed print statements". Knowing what to collect will help you to get the right data quickly for a fast diagnosis. The great computer scientist Edsger Dijkstra once said that testing can demonstrate the presence of bugs but not their absence. Good investigation practices should balance the need to solve problems quickly, the need to build your skills, and the effective use of subject matter experts.
There are times when you hit rock-bottom, nothing seems to work and you run out of all your options. Its then that the real debugging begins. A bug may provide the break you need to disengage from a fixation on the ineffective solution.
Video Lectures on Kernel Debug and Profiling, Core Dump Analysis, Multicore Debugging with GDB, Controlling Multi-Core Race Conditions, Debugging Electronics
/* Buggy Code -- Stack frame problem
* If you require information, do not free memory containing the information
*/
char *initialize() {
char string[80];
char* ptr = string;
return ptr;
}
int main() {
char *myval = initialize();
do_something_with(myval);
}
/* “When debugging, novices insert corrective code; experts remove defective code.”
* – Richard Pattis
#if DEBUG
printk("The above can be considered as Development and Review in Industrial Practises");
#endif
*/
File Systems ( 1 -> 2 -> 6 -> ... )
"I wanted to have virtual memory, at least as it's coupled with file systems". -- Ken Thompson
On a UNIX system, everything is a file; if something is not a file, it is a process, except for named pipes and sockets. In a file system, a file is represented by an inode
, a kind of serial number containing information about the actual data that makes up the file. The Linux Virtual File System VFS
caches information in memory from each file system as it is mounted and used. A lot of care must be taken to update the file system correctly as data within these caches is modified as files and directories are created, written to and deleted. The most important of these caches is the Buffer Cache, which is integrated into the way that the individual file systems access their underlying block storage devices.
Video Lectures on Storage Systems, Flash Friendly File System
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
struct open_flags op;
int fd = build_open_flags(flags, mode, &op);
struct filename *tmp;
if (fd)
return fd;
tmp = getname(filename);
if (IS_ERR(tmp))
return PTR_ERR(tmp);
fd = get_unused_fd_flags(flags);
if (fd >= 0) {
struct file *f = do_filp_open(dfd, tmp, &op);
if (IS_ERR(f)) {
put_unused_fd(fd);
fd = PTR_ERR(f);
} else {
fsnotify_open(f);
fd_install(fd, f);
}
}
putname(tmp);
return fd;
}
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(AT_FDCWD, filename, flags, mode);
}
Security ( 1 -> 2 -> 8 -> 4 -> 3 -> ... )
"UNIX was not designed to stop its users from doing stupid things, as that would also stop them from doing clever things". — Doug Gwyn
No technique works if it isn't used. Ethics change with technology.
"F × S = k" the product of freedom and security is a constant. - Niven's Laws
Cryptography forms the basis of trust online. Hacking is exploiting security controls either in a technical, physical or a human-based element. Protecting the kernel from other running programs is a first step toward a secure and stable system, but this is obviously not enough: some degree of protection must exist between different user-land applications as well. Exploits can target local or remote services.
“You can't hack your destiny, brute force...you need a back door, a side channel into Life." ― Clyde Dsouza
Computers do not solve problems, they execute solutions. Behind every non-deterministic algorithmic code, there is a determined mind. -- /var/log/dmesg
Video Lectures on Cryptography and Network Security, Namespaces for Security, Protection Against Remote Attacks, Secure Embedded Linux
env x='() { :;}; echo vulnerable' bash -c "echo this is a test for Shellsock"
Kernel Source ( 0.11 -> 2.4 -> 2.6 -> 3.18 )
"Like wine, the mastery of kernel programming matures with time. But, unlike wine, it gets sweeter in the process". --Lawrence Mucheka
You might not think that programmers are artists, but programming is an extremely creative profession. It's logic-based creativity. Computer science education cannot make anybody an expert programmer any more than studying brushes and pigment can make somebody an expert painter. As you already know, there is a difference between knowing the path and walking the path; it is of utmost importance to roll up your sleeves and get your hands dirty with kernel source code. Finally, with your thus gained kernel knowledge, wherever you go, you will shine.
Immature coders imitate; mature coders steal; bad coders deface what they take, and good coders make it into something better, or at least something different. The good coder welds his theft into a whole of feeling which is unique, utterly different from that from which it was torn.
Video Lectures on Kernel Recipes
linux-0.11
├── boot
│ ├── bootsect.s head.s setup.s
├── fs
│ ├── bitmap.c block_dev.c buffer.c char_dev.c exec.c
│ ├── fcntl.c file_dev.c file_table.c inode.c ioctl.c
│ ├── namei.c open.c pipe.c read_write.c
│ ├── stat.c super.c truncate.c
├── include
│ ├── a.out.h const.h ctype.h errno.h fcntl.h
│ ├── signal.h stdarg.h stddef.h string.h termios.h
│ ├── time.h unistd.h utime.h
│ ├── asm
│ │ ├── io.h memory.h segment.h system.h
│ ├── linux
│ │ ├── config.h fdreg.h fs.h hdreg.h head.h
│ │ ├── kernel.h mm.h sched.h sys.h tty.h
│ ├── sys
│ │ ├── stat.h times.h types.h utsname.h wait.h
├── init
│ └── main.c
├── kernel
│ ├── asm.s exit.c fork.c mktime.c panic.c
│ ├── printk.c sched.c signal.c sys.c system_calls.s
│ ├── traps.c vsprintf.c
│ ├── blk_drv
│ │ ├── blk.h floppy.c hd.c ll_rw_blk.c ramdisk.c
│ ├── chr_drv
│ │ ├── console.c keyboard.S rs_io.s
│ │ ├── serial.c tty_io.c tty_ioctl.c
│ ├── math
│ │ ├── math_emulate.c
├── lib
│ ├── close.c ctype.c dup.c errno.c execve.c _exit.c
│ ├── malloc.c open.c setsid.c string.c wait.c write.c
├── Makefile
├── mm
│ ├── memory.c page.s
└── tools
└── build.c
Linux_source_dir/Documentation/*