Course info
Sep 17, 2015
4h 3m

Do you want to get the absolute most performance out of your hardware? Want to write code that scales across CPU registers, multi-core, and machine clusters? Then this is the course for you!

About the author
About the author

Dmitri is a developer, speaker, podcaster, technical evangelist and wannabe quant.

More from the author
ReSharper Fundamentals
2h 54m
Apr 28, 2017
More courses by Dmitri Nesteruk
Section Introduction Transcripts
Section Introduction Transcripts

Open Multi-Processing (OpenMP)
Hi there. In this module of the High-performance Computing course, we're going to talk about OpenMP, which is a technology that allows you to quickly add multithreading to your program. All right, so in this module we'll briefly discuss the differences between imperative and declarative parallelization, we'll discuss what OpenMP actually is, and then we'll take a look at OpenMP in action, and then we'll have to deal with the various ways in which OpenMP enables you to share work among several threads, how to control synchronization, and also how to share data between the different threads.

Message Passing Interface (MPI)
Hi there. In this module, we're going to talk about the Message Passing Interface, which is another piece of our parallel computing puzzle. So what are we going to talk about? Well, first of all, I'll talk about the different scales on which you can execute your parallel code, and we'll see how MPI fits into the overall picture. Then we'll take a look at MPI, what it actually is, where you can get a distribution of MPI, and what kind of executables and binaries exist within the MPI distribution. Then we're going to write our first MPI application. After that, we're going to look at how to send not just primitive data, but custom objects over the wire. Then we're going to look at Boost. MPI, which is a really useful wrapper library provided by Boost to make working with MPI a lot easier. Then we're going to talk about two forms of communication, point-to-point communication, as well as collective communication. And we'll finish the module by looking at some of the less significant points in the MPI operations.

C++ Accelerated Massive Parallelism (C++ AMP)
Hi there. In this module, we're going to talk about something called C++ AMP. AMP stands for Accelerated Massive Parallelism, but it's really now mainly about GPU computation. So what are we going to talk about in this module? Well, we're going to first of all discuss the state of GPU computing as it is today, and then we're going to look at what C++ AMP actually is. We're going to write a kind of Hello, World for C++ AMP, then we're going to look at how to look at devices, look at the actual graphics cards and find out information about what they can actually do. We're going to consider just one of the complicated topics of C++ AMP, which is called tiling, and then we'll take a look at how you can leverage external libraries, libraries that somebody has already written for C++ AMP.

Generative Art Demo
Hi and welcome to this last module of the course. Now, in this module, we're going to do a demonstration project that's basically going to implement all of those things that we've talked about in the course. So we're going to do some generative art. Now what we're going to do in this course specifically is first of all I'll explain the methodology for generating random images. It's not that simple, to be honest, but we'll get through it, and after that, we're going to do a plain C++ implementation without any kind of parallelism, just to see what the speed is like for implementing a single-threaded application for random image generation. And then we'll add some multithreading using OpenMP, that's going to be a lot of fun, and hopefully won't take too much time. Then we're going to look at implementing SIMD, and we'll actually try and figure out whether SIMD causes the same results of our computations or whether using SIMD somehow skews the mathematical functions to one end or another. And finally, we'll take a look at clustered execution with MPI.