New Systems and Algorithms for Scalable Fault Tolerance

Sen, Siddhartha

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01mc87pq32q

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Freedman, Michael J.	en_US
dc.contributor.advisor	Tarjan, Robert E.	en_US
dc.contributor.author	Sen, Siddhartha	en_US
dc.contributor.other	Computer Science Department	en_US
dc.date.accessioned	2013-05-21T13:34:07Z	-
dc.date.available	2013-05-21T13:34:07Z	-
dc.date.issued	2013	en_US
dc.identifier.uri	http://arks.princeton.edu/ark:/88435/dsp01mc87pq32q	-
dc.description.abstract	Users of Internet services are increasingly intolerant of delays and outages, while demanding a consistent online experience. A website that is down or misbehaving is reported within seconds, often with an embarrassing screenshot that spreads through the news like wildfire. Among these failures, the most notorious are the ones that manifest arbitrary behavior, such as returning the wrong content to users or accidentally deleting their data. Unfortunately, protecting against such failures---whether due to misconfigurations, bugs, or even malice---is prohibitively expensive, because most existing solutions do not scale beyond a single server's performance. As a result, these solutions are not used for customer-facing services, where scalability is required to cope with large user populations. This thesis describes new systems and algorithms for tolerating arbitrary failures in Internet services, inspired by real-world debacles. Unlike prior work, our solutions are highly scalable. Our approach integrates theoretical innovations into the later stages of system design, giving robust guarantees that are also practical. We begin with a real failure that occurred in the indexing technique used by a certain database provider, and explain theoretically why the technique failed. We remedy the technique by introducing a new class of tree data structures, called relaxed trees, with provably good properties. Our analysis of relaxed trees makes use of exponential potential functions. Then, we describe a general system for tolerating arbitrary failures, called Prophecy, that delivers scalable performance on read-mostly workloads. With a modest trust assumption, Prophecy is practical for modern Internet services, as our evaluation confirms. Finally, we devise two techniques to scale this fault tolerance to very large-scale systems and general workloads. The first is an algorithm for securely composing many small replica groups, subject to an adversary that can coordinate faulty nodes across the groups dynamically. The second is a technique for improving the fault tolerance within each replica group, by adding small, trusted broadcast channels that mitigate the impact of faulty nodes.	en_US
dc.language.iso	en	en_US
dc.publisher	Princeton, NJ : Princeton University	en_US
dc.relation.isformatof	The Mudd Manuscript Library retains one bound copy of each dissertation. Search for these copies in the <a href=http://catalog.princeton.edu> library's main catalog </a>	en_US
dc.subject	balanced trees	en_US
dc.subject	Byzantine fault tolerance	en_US
dc.subject	database access methods	en_US
dc.subject	expander graphs	en_US
dc.subject	join-leave attacks	en_US
dc.subject	partial broadcast	en_US
dc.subject.classification	Computer science	en_US
dc.subject.classification	Applied mathematics	en_US
dc.title	New Systems and Algorithms for Scalable Fault Tolerance	en_US
dc.type	Academic dissertations (Ph.D.)	en_US
pu.projectgrantnumber	690-2143	en_US
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Sen_princeton_0181D_10607.pdf		1.33 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse