Course info
Nov 12, 2013
1h 26m

This course covers an introduction to the academic concepts behind distributed databases, with some examples using Riak as the example database. The course moves past the academic into a full breadth coverage of the operations and development practices and approaches, including a short tour through distributed database patterns and where to start, along with full coverage of clients and other tooling for use with distributed databases.

About the author
About the author

I’m a jovial, proactive, polyglot, test & code, get things done well, software architect, engineer, code monkey, coder and distributed systems advocate. I run the gamut of dev stacks from Ruby on Rails, Node.js and .NET. A favorite these days is JavaScript and I'm diving into some Erlang even.

Section Introduction Transcripts
Section Introduction Transcripts

Welcome to the Distributed Database Course. I'm Adron Hall with Basho Technologies. In this course, I've broken it out to five different modules. I have an introduction, where first I'll go through some academic ideas and concepts about Distributed Databases and I'll cover what they are, what they do, and how they're actually Distributed. Next, I'll dive into the architecture and some tooling around Distributed Databases themselves. Then onto an installation of our sample database that we'll be using. Then the operational management of that database cluster that we build and onto a little development against that actual cluster. In this first module, we'll take a look into what Distributed Database are. I'll introduce Riak, which we'll be using our preeminent example of an ideal Distributed Database. It's architectural examples are a great starting point to learn about Distributed Databases, and really focus on some of the core notions of a Distributed Database such as being a masterless system and not having any single point of failure within that system. We'll cover a short Distributed Architecture overview and then dive into a quick little history of Riak, onto the summary and then right onto the next module.

Architecture and Tooling
Hello, welcome to module 2 of Distributed Databases. In this module, I'm going to dive a bit into the architecture and tooling around Distributed Databases. In this module, we'll first go through an introduction with a dive into the basic academic concepts behind Distributed Databases, and the cluster technology behind these systems. Then, I'll step through a few of the tools out there today for Distributed Databases and their use in business intelligence, extractions, transformation, and loading, reporting, and administration and management to provide an idea of what's available out there, and the directions that the industry is going. I'll cover tools not only limited to Riak, but others such as Cassandra and HBase tools too.

Welcome to module 3 of the Distributed Databases course. In this section, I'll be stepping through the installation of Riak into a developer console type of setup. This is most commonly used for development and testing against Riak, and it's a good example because it steps you through all of the things you need to do to actually setup a fully distributed multi-node cluster, except it works on a single machine, so it makes it super easy to just step through each of the steps. First, we'll start off with the installing of Erlang, which we'll need to build Riak from source. A normal installation onto multiple nodes can use a precompiled file to have that executable service, but in this particular situation, since we're going to be setting up a developer console for these things, it needs to be built from source for the specific running operating system that you're using. In this situation, I'll walk through each of the steps for OS-X, Ubuntu, and Cent OS. For Ubuntu and Cent OS, I'll actually be using instances that are running in Windows Azure. Then we'll step through some other minor features around setup and installation and ongoing operation of the cluster. We'll set it up as a generally, 4 - 5 node cluster is good for development. Usually, I'll setup 4 and if the multiple nodes aren't too important for what development I'm doing, I might setup 3 even, but ideally, it's best to setup at least a 5 node cluster even in a development environment.

Welcome to the final chapter of Distributed Databases on Development. . In this module, I'm going to step into some of the characteristics of Distributed Programming. There's quite a few concepts, patterns, guidelines, and designs that make distributed programming very different than the traditional vertical stack development of client, maybe some middle ware and server side. I'll talk about accessing and writing the data into a distributed system. We'll talk about some patterns around that, then we'll dive into some available client libraries that are offered on the Riak Distributed Database and after that, I'll summarize and that will complete the Distributed Databases course.