Every night, uncounted numbers of devices across the globe update their operating systems (OS), and everyday users log on expecting fast, secure connections and services to keep their increasingly online lives moving forward. But as artificial intelligence and other more complex systems come online, the foundation of all them is teetering.
Every aspect of society — from government and industry to education and entertainment — relies on devices with stable operating systems. And every OS relies on a collection of millions of lines of code that underpin its functions. That large and growing collection of code is collectively called “the kernel,” and like the seed it is named for, it is essential to financial markets, business and industry, education, government, and national security.
“If it stops working, then nothing will work. The world would pretty much shut down,” said Dan Williams, assistant professor of computer science in the College of Engineering.
The problem? The kernel is written in an aging coding language called C and constantly grows bigger and more complex to serve more devices and applications. Meanwhile, fewer computer scientists become interested in working on it.
“It’s hard to get students interested in a tangled mess of code written in C when there is so much going on with artificial intelligence and other innovations. But all those innovations rely on the kernel. We need more people working on it,” Williams said.
To further that goal, Williams recently received a five-year, $600,000 National Science Foundation (NSF) Faculty Early Career Development (CAREER) award to explore new ways to stabilize and update the existing Linux kernel — globally one of the most frequently used kernels. The grant also will support educating new generations of computer scientists in the skills needed to work with the kernel.
“Dan Williams’ commitment to innovative work on this critical global challenge is emblematic of how our faculty and students lead the way,” said Cal Ribbens, head of the Department of Computer Science. “I believe the work his team does will encourage current and future students and faculty to tackle deep technical issues, such as a stable and robust OS kernel.”
Freezes and fixes
Part of the operating system of every server, computer, and networked device, the kernel acts like an air traffic controller — making sure software and cloud-based services run without crashing into each other. It sits between the hardware, software, and cloud services that users interact with, managing all communications between them. When a user touches an object on the screen or types on a keypad, it’s the kernel that ensures those commands are followed.
“When a computer freezes, that often points to a mistake in the kernel,” Williams said. “It’s working to fix something that went wrong. If it can’t fix it, the computer crashes — sometimes taking important data with it.”
All those overnight updates your OS does for you? Williams said many of those are fixes to bugs that have cropped up in the kernel. But the updates are bandages. Bugs, viruses, memory shortages, and other issues can slow it down and make the system vulnerable to attacks, like ransomware.
Globally, computing relies on just a handful of kernels, Williams said. A vast swath of devices, including Google, Amazon, and Android, run on the Linux kernel. No matter which OS a system uses, all kernels suffer from similar weaknesses, he said.
According to an analysis conducted in 2021, there are more than 28.8 million lines of code in the Linux OS kernel.
“It is so complex that no one person can understand all its functions,” Williams said. “And in some ways, it is mysterious even to those who study it. It’s a black box.”
Williams spent 10 years at IBM Research testing new ways to fix or replace the Linux kernel. But so many industries and critical government systems rely on it, changes can cause unintended problems. And the stakes are so high that it’s functionally impossible to replace it wholesale with a new one.
“The kernel is in many ways too big to fail. And this unchecked kernel complexity has become a significant barrier to entry for students and practitioners to learn or innovate at the kernel level,” said Williams. “If this is not addressed, we may lose any hope of improving, securing, or maintaining the essential kernels our society depends on.”
A new approach
Two years ago, Williams came to Virginia Tech with an idea for a new approach that could help stabilize the kernel and could also help more students get interested in studying it. Under this NSF CAREER grant, his research team will experiment with building a new open-source extension framework that would target existing weaknesses in the kernel.
The idea is to build these sections of code in the more flexible and robust Rust language and install them as bypasses to problematic sections of the existing kernel. If the approach is successful, adding more extensions could, over time, replace the existing kernel with a better one — without disrupting existing applications and users.
Williams said his lab will work on creating a new model of the kernel for undergraduate and graduate students to experiment with. It could be used in new computer science classes and educational “boot camps.” Because anyone can use the open source code the team develops, it could spur more broad interest and experimentation.
The work also could prepare students for internship opportunities in industry to work on existing kernels in real-world situations.
“Without exposure to the real systems in use today, the number of students qualified and interested in operating systems kernels will fall below what it should be for a piece of software as crucial for society as the OS kernel,” Williams said. “Our efforts on this project can help us reach our educational goals, and the open-source nature of our work will allow those already working in this space to engage in innovative thinking and approaches around OS kernels.”
Suzanne Miller for Virginia Tech