An investigation in Google’s Architecture

How does Google Work

An investigation in Google’s Architecture, System, Web Servers and Software

Abstract

This report is based on SOFTENG 325 material and investigation in Google’s system, architecture and servers.

The report discusses the structure of Google’s architecture and the main components of Google’s Web servers such as Google File System, Bigtable and MapReduce as well as what characteristics make Google’s system scalable, fault tolerance, fast response and highly available. It also introduces AJAX, Google Gears and Chrome Browser and discusses how and why these technologies are important to Google’s Web servers.

The result of this investigation shows that Google use multithreading, replication, load balancing, and distributed system in the architecture models to ensure it could deliver high aggregate performance to a large number of clients.

Introduction

Google’s Web application lets different queries run on different processors and, by partitioning the overall index, also lets a single query use multiple processors. To handle this workload Google uses cluster architecture. It also includes Google File System (GFS) which is a large distributed log structured file system. Bigtable, a distributed hash mechanism built on top of GFS. MapReduce, a distributed execution software that parses and analyzes Google’s distributed storage of data.

AJAX technology is very important in Google’s Web applications’ interface. This technology offers faster response and uses less bandwidth than traditional way. In order to let users access their applications without Web Google invented Gears, which could store data locally so that users could use Web applications like desktop software. Google released Chrome on September, a new Web browser with faster JavaScript engine and Gears which could improve use experience of Web applications built on AJAX.

All these Web servers, technologies and software above are used by Google to build its superior performance Web servers. And keep the system efficiency and reliable.

Google Architecture Overview

Google’s service consists of multiple clusters distributed worldwide, each time Google process a query a DNS-based load-balancing system selects a cluster by accounting for the user’s geographic proximity to each physical cluster so that it could minimizes the round up time for the user’s request. At the mean time the load-balancing system also considering the available capacity at the various clusters. Google use a hardware-based load balancer to monitors the available set of Google Web servers (GWSs) and perform local load balancing of requests across a set of them.

Google execute each search query in two major phases. In the first phase the index servers consult an inverted index that maps each query word to a matching list of documents. The index servers then determine a set of the individual query words, and they compute a relevance score determines the order of results on the output page. The final result of this first phase of query execution is an ordered list of document identifiers (docides). The second phase taking this list of docides and computing the actual tide and uniform resource locator of these documents, along with a query-specific (docservers) handle this job, fetching each document from dish to extract the tide and the keyword-in-context snippet. When all phases are completed, a GWS generates the appropriate HTML for the output page and returns it to the user’s browser.

Google use replication for capacity and fault-tolerance, it can also decrease the response time of each query. For example, a pool of machines serves requests for each index shard, and the overall index cluster contains one pool of each shard. Each query goes to one machine (or a subset of machines) assigned to each shard. If a shard’s replica goes down, the load balancer will avoid using it for queries, and other components of Google’s cluster-management system will try to revive it or eventually replace it with another machine. With this structure the service will not be interrupted, and all parts of the index remain available. Google stores dozens of copies of an online, low-latency of the entire Web across its clusters to offer performance and availability replication required. Google can safely perform updates by diverting queries away from a service replica during an update. They also use very large amounts of inherent parallelism in application. The system transforms a search of matching documents in a large index into many searches for matching documents in a set of smaller indices, followed by merging step. They also divide the query stream into multiple streams, each handled by a cluster. The advantage of this kind of architecture is that it is easy to increase the serving capacity by adding machines to each pool. And the shards could increase with the growth of the index. By parallelizing the search over many machines Google can reduce the average latency necessary to answer a query.

The Google File System

Google File System is a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance and delivers high aggregate performance to a large number of clients. It is the core storage platform of Google.

A GFS cluster consists a single master and multiple chunkservers and is accessed by multiple clients. Files are divided into fixed-size chunks. Each chunk is identified by an immutable and globally unique 64 bit chunk handle assigned by the master at the time of chunk creation. For reliability Google store three replicas on different computers in a given server cluster so that if a request tries to read a file from one of those computers, and it fails to respond within a certain time, at least two others will be able to fulfill the request. This kind of redundancy is important for Google’s system when it experience all sorts of system failures.

A GFS cluster consists of a master server and hundreds or thousands of chunkservers. The master maintains all file system metadata. This includes the namespace, access control information, the mapping from the files to chunks, and the current location of chunks. The master periodically communicates with each chunkserver in HeartBeat massages to give it instructions and collect its state. When an application requests a given file, the master server provides the addresses of the relevant chunkservers. Neither the client nor the chunkserver caches file data. Client caches offer little benefit because most applications stream through huge files or have working sets too large to be cached. Not having them simplifies the client and the overall system by eliminating cache coherence issues. (Clients do cache metadata.) Chunkservers need not cache file data because chunks are stored as local files and so Linux’s buffer cache already keeps frequently accessed data in memory.

A new application coming on line can use an existing GFS cluster. So that Google can easily add its new servers into the existing system. GFS’s architecture is designed for easy scaling, fast response and highly available which is suit for Google’s system.

The Bigtable

Bigtable is a large scale, fault tolerant, self managing system. It can handle millions of reads/writes per second. Bigtable stores structured data used by applications such as web indexing, Google Analytics, Google Maps, Google Earth, Google Finace and My Search History. Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterrupted array of bytes. The Bigtable implementation has three major components: a library that is linked into every client, one master server, and many tablet servers. Tablet servers can be dynamically added (or removed) from a cluster to accommodate changes in workloads. The master assigns tablets to tablet servers, detects the addition and expiration of tablet servers, balances tablet-server load, and garbage collection of files in GFS. It also handles schema changes such as table and column family creations. Each tablet server manages a set of tablets. The tablet server handles read and write requests to the tablets that it has loaded, and also splits tablets that have grown too large. Because Bigtable clients do not rely on the master for tablet location information, most clients never communicate with the master. As a result, the master is lightly loaded in practice.

A Bigtable cluster stores a number of tables. Each table consists of a set of tablets, and each tablet contains all data associated with a row range. Initially, each table consists of just one tablet. As a table grows, it is automatically split into multiple tablets. Bigtable could provide performance and high availability. Google can scale the capacity of their clusters by simply adding more machines to the system as their resource demands change over time.

The MapReduce

MapReduce is a programming model and an associated implementation for processing and generating large data sets.

MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. It breaks calculations into two parts—a first stage, which produces a set of intermediate results, and a second, which computes a final answer. The computation takes a set of input key/value pairs, and produces a set of output key/value pairs. The user of the MapReduce library expresses the computation as two functions: Map and Reduce. Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function. The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invocation. The intermediate values are supplied to the user's reduce function via an iterator. This allows Google to handle lists of values that are too large to fit in memory.

MapReduce includes its own middleware—server software that automatically breaks computing jobs apart and puts them back together. Programs incorporating MapReduce load large quantities of data, which are then broken up into pieces of 16 to 64 megabytes. The MapReduce run-time system creates duplicate copies of each map or reduce function, picks idle worker machines to perform them and tracks the results. Worker machines load their assigned piece of input data, process it into a structure of key-value pairs, and notify the master when the mapped data is ready to be sorted and passed to a reduce function. In this way, the map and reduce functions alternate chewing through the data until all of it has been processed. An answer is then returned to the client application. If something goes wrong along the way, and a worker fails to return the results of its map or reduce calculation, the master reassigns it to another computer.

Google Servers and AJAX

As Google have grown, they have added several new services for their users. In most of Google’s servers they use huge amounts of AJAX (asynchronous JavaScript and XML) in the user interface. Google’s web crawlers also execute JavaScript code so that their search engine could index web applications written in AJAX, which most of other search engines could not do currently.

AJAX use XHTML and CSS for presentation, Document Object Model for display of interaction with data, XML and XSLT for the interchange and manipulation of data, XMLHttpRequest object for asynchronous communication and finally use JavaScript to bring these technologies together.

In many cases, the pages on a website consist of much content that is common between them, such as menus, bottoms, tables and frameworks. Content has to be reloaded on every request in traditional way. The advantage of AJAX is a web application can request only the content that needs to be updated, thus reducing bandwidth usage. The use of asynchronous request allows user interface to be more interactive and decrease response time. Even if the application has not changed on the server side the application still will be faster. And since scripts and style sheets only have to be requested once AJAX can reduce connection to the server. But the dynamically created pages do not register themselves with the browser’s history engine, so in lots of AJAX applications the “back” button in the browser is useless, it will not return user to perverse content. Google use invisible Iframe to trigger changes in the browser’s history and changing the anchor portion of the URL when AJAX is run and monitoring for changes.

Google Gears

Gears is an open source project that enables more powerful web applications, by adding new features to user’s web browser. It is software offered by Google. Google’s and other Web applications bring user’s data online and available anywhere there’s an Internet connection. And whenever there is no internet connection Gears could help user keeping using Web applications.

Gears includes a Database module that can store data locally, a WokerPool module that provides parallel execution of JavaScript code, a LocalServer module that caches and serves application resources like HTML, JavaScript and images, a Desktop module that lets web applications interact more naturally with desktop and a Geolocation module that lets web applications detect the geographical location of their users. It supports Google Chrome, Internet Explorer, Mozilla Firefox and Apple Safari now. A number of web applications include Google’s YouTube, Docs, Reader and Picasa use Gears. Other companies like MySpace, Zoho, Remember The Milk, Buxfer and WordPress 2.6 added support for Gears.

With Gears Google and other companies could provide more stable servers. Web applications no longer relay on Internet connection and users could manage their data when the servers have errors or without Web access just like use desktop software.

Chrome Browser

Google recently released their new software, Chrome Browser. Now only Windows version is available.

Chrome Browser has learnt a lot from open source projects like Firefox and WebKit. Chrome uses unique multi-process architecture and V8 JavaScript engine. Web applications cannot be responsive and stable without a fast and reliable JavaScript engine. Since Google mainly use JavaScript in their applications. Chrome Browser has been designed for performance. The benefit of multi-process architecture is that if one of the processes is crashed (caused by network problem, application error or security problem.) other processes will not be affect. This feature provides a stable browse experience for users. V8 is a new JavaScript engine specifically designed for fast execution of large JavaScript applications. There are three key areas to V8’s performance: fast property access, dynamic machine code generation and efficient garbage collection. V8 engine guarantees the speed of Web applications. Chrome Browser also includes Gears with could easily transfer a supported Web application into a desktop shortcut for easy access.

Though Chrome Brower (or Gears) is not a Web application, but this software is designed to improve the user experience, the speed and the stability of Web servers. It allows data to reside on the Web but be accessible offline. It is an important part of Google’s architecture. When the power of Web servers reaches their limitations the desktop software could push it further more.

Conclusions

Google use multithreading and parallelization to guarantee their Web applications are scalability, stability. Google servers divide huge amounts of data into blocks and use special system to store the location of data to ensure the fast access of them. Replication of data makes Google’s system highly available and faster. Load balancing, parallelization and the use of AJAX offer very fast response times.

Google’s architecture, data storage system (GFS and Bigtable), and programming model (MapReduce) all show some common characteristics and a simple structure. They all divide a large object into small pieces so that it is easy to store and access or divide a quest into smaller ones to make the process quicker. They all have replication and fault tolerance features build in to keep data safety and system stable. The implementation of each part is different but the main idea is the same: to make sure the system’s performance.

Google’s architecture makes their system highly scalable and available thus it suits for a company like Google. Because they process huge amounts of requests every second and the number of users of their Web applications increases every day. Reliable and response time are the more important for their system. But Google’s system is complicated and hard to construct, and it is also hard for Google to combine other different systems, such as those small companies they acquired (YouTube, Jaiku and Joyspot). It takes a long time for Google to move those applications to their own servers. The system is also hard for new programmer to get used to and develop new application.

Google not only build a highly reliable Web server, but also use AJAX to improve the user interface and response time. They also use desktop software such as Gears and Chrome to conquer other problems with cannot be solved at servers end such as networking so that user could have a better Web.

Reference

1, Web Search for a Planet: The Google Cluster Architecture. Luiz Barroso, Jeffrey Dean, and Urs Hoelzle. March, 2003 from http://research.google.com/archive/googlecluster.html

2, The Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. October, 2003 from http://labs.google.com/papers/gfs.html

3, Bigtable: A Distributed Storage System for Structured Data. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. November, 2006 from http://labs.google.com/papers/bigtable.html

4, MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. December, 2004 from http://labs.google.com/papers/mapreduce.html

5, How Google Works. David F. Carr. 6th of July, 2006 from http://www.baselinemag.com/c/a/Projects-Networks-and-Storage/How-Google-Works-%5B1%5D/

6, Google Architecture. Todd Hoff . 1st of July, 2008 from http://highscalability.com/google-architecture

7, Google’s Browser Puts the Cloud to Work. Om Malik. 3rd of September, 2008 from http://www.businessweek.com/technology/content/sep2008/tc2008092_805324.htm

8, Google and the Wisdom of Clouds. Stephen Baker. 13th of December, 2007 from http://www.businessweek.com/magazine/content/07_52/b4064048925836.htm

9, How Google Works. Jonathan Strickland from http://computer.howstuffworks.com/google.htm

10, Design Elements - V8 JavaScript Engine – Google Code from http://code.google.com/apis/v8/design.html

11, Google Chrome from http://en.wikipedia.org/wiki/Google_Chrome

12, Ajax (programming) from http://en.wikipedia.org/wiki/AJAX

13, Gears API – Google Code from http://code.google.com/apis/gears/

14, Gears (software) form http://en.wikipedia.org/wiki/Google_Gears

15, The Search: How Google and Its Rivals Rewrote the Rules of Business the Transformed Our Culture. By John Battelle. Published by Nicholas Brealey. 2006. ISBN 1857883624