Google Is 2 Billion Lines of Codeâ€”And It's All in One Place : Web & Search Engine Optimizers : RippleSmith

How big is Google? We can answer that question in terms of revenue or stock price or customers or, well, metaphysical influence. But thatâ€™s not all. Google is, among other things, a vast empire of computer software. We can answer in terms of code.

Googleâ€™s Rachel Potvin came pretty close to an answer Monday at an engineering conference in Silicon Valley. She estimates that the software needed to run all of Googleâ€™s Internet servicesâ€”from Google Search to Gmail to Google Mapsâ€”spans some 2 billion lines of code. By comparison, Microsoftâ€™s Windows operating systemâ€”one of the most complex software tools ever built for a single computer, a project under development since the 1980sâ€”is likely in the realm of 50 million lines.

So, building Google is roughly the equivalent of building the Windows operating system 40 times over.

‘The numbers are absolutely staggering.’ Sam Lambert, Director of Systems, Github

The comparison is more apt than you might think. Much like the code that underpins Windows, the 2 billion lines that drive Google are one thing. They drive Google Search, Google Maps, Google Docs, Google+, Google Calendar, Gmail, YouTube, and every other Google Internet service, and yet, all 2 billion linesÂ sit in a single code repository available to all 25,000 Google engineers. WithinÂ the company, Google treats its code like anÂ enormous operating system. â€œThough I canâ€™t prove it,â€ Potvin says, â€œI would guess this is the largest single repository in use anywhere in the world.â€

Google is an extreme case. But its exampleÂ shows how complex our software has grownÂ in the Internet ageâ€”and how weâ€™ve changed our coding tools and philosophies to accommodate this added complexity. Googleâ€™s enormous repository is available only to coders inside Google. But in a way, itâ€™s analogous to GitHub, the public open source repository where engineers can share enormous amounts of codeÂ with the Internet at large. Weâ€™re moving toward a world in which we regularly collaborate on code at a massive scale. This is the only way we can keep up with the rapid evolution of modern Internet services.

â€œHaving 25,000 developers, as Google does, means itâ€™s sharing code with a diverse set of people with diverse set of skills,â€ says Sam Lambert, the director of systems at GitHub. â€œBut, as a small company, you can get some of that same advantage using GitHub and open source. Thereâ€™s that saying: â€˜A rising tide raises all boats.’â€

The flip side is that building and running a 2-billion-line monolith is no simple task. â€œIt must be a technical challengeâ€”a huge feat,â€ Lambert says. â€œThe numbers are absolutely staggering.â€

Part of the genius of GitHub is that it lets coders so easily share and collaborate on code. But GitHub doesnâ€™t house a single software project. It spans millions of projects. Google goes a step further, combining many projects into one. Given the difficulty of juggling that much code across that many engineers, this may seem slightly crazy. But according to Potvin, it works.

Listen to the Piper

Basically, Google has built its own â€œversion control systemâ€ for juggling all this code. The system is called Piper, and it runs across the vast online infrastructure Google has built to run all its online services. According to Potvin, the system spans 10 different Google data centers.

Itâ€™s not just that all 2 billion lines of code sit inside a single system available to just about every engineer inside the company. Itâ€™s that this system gives Google engineers an unusual freedom to use and combine code from across myriad projects. â€œWhen you start a new project,â€ Potvin tells WIRED, â€œyou have a wealth of libraries already available to you. Almost everything has already been done.â€ Whatâ€™s more, engineers can make a single code change and instantly deploy it across all Google services. In updating one thing, they can update everything.

Google is an extreme case. But its example shows how complex our software has become in the age of the Internet.

There are limitations this system. Potvin says certain highly sensitive codeâ€”stuff akin to the Googleâ€™s PageRank search algorithmâ€”resides in separate repositories only available to specificÂ employees. And because they donâ€™t run on the â€˜net and are very different things, Google stores code for its two device operating systemsâ€”Android and Chromeâ€”on separate version control systems. But for the most part, Google code is a monolith that allows for the free flow of software building blocks, ideas, and solutions.

The Bot Factor

As Lambert point out, building and running such a system requires not only know-how but enormous amounts of computing power. Piper spans about 85 terabytes of data (aka 85,000 gigabytes), and Googleâ€™s 25,000 engineers make about 45,000 commits (changes) to the repository each day. Thatâ€™s some serious activity. While the Linux open source operating spans 15 million lines of code across 40,000 software files, Google engineers modify 15 million lines of code across 250,000 files each week.

Building and running such a system requires not only know-how but enormous amounts of computing power.

At the same time, Piper must work to remove much of the burden from human coders. It must ensure that humans can wrap their heads around all that code; that they donâ€™t step on each otherâ€™s toes with code changes; that they can readily remove bugs and unused code from the repository. And because all of this is so difficult, it must actually take some of that work away from the humans. Now that Google has switched to Piper from its previous version control systemâ€”a tool called Perforceâ€”automated â€˜bots handle a majority of the commits.

This doesnâ€™t mean â€˜bots are writing code. But they are generating a lot of the data and configuration files needed to run the companyâ€™s software. â€œYou need to make a concerted effort to maintain code health,â€ Potvin says. â€œAnd this is not just humans maintaining code health, but robots too.â€

Piper for Everyone

Could others benefit from the same kind of system? Certainly. And they do. The main Facebook app spans upwards of 20 million lines of code, and the company treats the whole thing as a single project. Others do the same on a smaller scale. As companies approach the size of a Google or a Facebook, the logistics can get in the way. But Google and Facebook are exploring ways of changing thatâ€”for everyone.

The two internet giants are working on an open source version control system that anyone can use to juggle code on a massive scale. Itâ€™s based on an existing system called Mercurial. â€œWeâ€™re attempting to see if we can scale Mercurial to the size of the Google repository,â€ Potvin says, indicating that Google is working hand-in-hand with programming guru Bryan Oâ€™Sullivan and others who help oversee coding work at Facebook.

That may seem extreme. After all, few companies juggle as much code as Google or Facebook do today. But in the near future, they will.

Go Back to Top. Skip To: Start of Article.

Listen to the Piper

The Bot Factor

Piper for Everyone

Related Posts

Turning Google traffic into leads, and what’s new in SEO

The Week in Tech: Google’s Quantum Leap

Google is getting better at understanding your awkwardly phrased searches