Hamburger Icon
  • Labs icon Lab
  • Core Tech
Labs

Guided: Foundations of Computing - Implementing Algorithms

In this lab, you’ll explore how to approach coding with a technique used by beginners and experts alike: reasoning about algorithms and problem-solving using pseudocode. You'll see how programmers often think and communicate about the structure of part of a program without getting distracted by the syntax of a particular language. To do so, you'll practice refining, correcting, and defining pseudocode implementations of selected foundational algorithms.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 40m
Published
Clock icon Mar 06, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Introduction

    This lab will guide you through how to avoid "coder's block" by breaking down problems into structured steps and communicating them clearly via pseudocode. You'll practice working with pseudocode implementations of searching and sorting algorithms while learning their pros and cons.

    One common concept used to classify algorithms is known as big-O notation. It approximates how an algorithm's resource requirements grow as the input dataset increases.

    Whether the resource in question is processing time or memory, some common examples include:

    • O(1) ("constant") — Its resource remains the same regardless of dataset size.
    • O(n) ("linear") — Its resource doubles when the dataset doubles.
    • O(n²) ("quadratic") — Its resource quadruples when the dataset doubles.

    Big-O notation is often assumed to refer to an algorithm's worst-case scenario. However, for some algorithms, the worst-case, average, and/or best-case scenarios can be the same.

    Some notes:

    • You can assume that the algorithms presented in this lab work with integer data only, though similar techniques often apply to strings and other data types. This is just to keep it simple: facts like 11 > 5 are often more intuitive than comparisons like "a" > "Z".
    • You'll write all of your code in the .txt files in the src/ directory.
    • While indentation does have meaning in the tasks of this lab, you're free to use blank lines however you find helpful; they're ignored by the lab.
    • If you get stuck on a task, you can consult the solutions/ folder by enabling the Show filetree setting in the upper-right corner.
  2. Challenge

    Linear Search

    An algorithm is like a recipe — it's simply a set of ingredients and step-by-step instructions to follow in order.

    That remains true even if the instructions themselves can change their ordering. For example, a cake recipe might say, "If that wasn't enough icing to cover the cake, repeat the previous three steps as needed," assuming the previous three steps were for preparing a small batch of icing.

    The linear search algorithm is a recipe for finding the position of a value within an array. There are two ingredients (inputs): the array to search and the value to find. The product of the recipe (expected output) is either the numbered position where the value is found or -1 if the value isn't in the array.

    The linear search algorithm is the same procedure you might use when looking for the cutlery in an unfamiliar kitchen: check each drawer (element) of the array (set of drawers) in order (hence the term "linear") until you find the cutlery.

    ...or until the end of the array is reached (Wrong set of drawers!).

    ...unless the array is empty to begin with (No drawers here!).

    These two last additions show that, though this algorithm is simple, you'll need to write extra steps for a few special cases when breaking it down into pseudocode.

    How to "Run" Pseudocode Since pseudocode isn't written in a particular programming language, there's no compiler or interpreter that can run it. So to "run" pseudocode, you have to act as the computer, interpreting each line of code, tracking the value of each variable, and following the program's control flow.

    As with writing pseudocode, there are no fixed rules when it comes to running pseudocode. Blackboards or whiteboards, if available, can be helpful. Paper and pencils can work equally well. Some may prefer to work in a spreadsheet or a text editor. Use whatever helps you simulate a debugger, tracking variables as you step through each line of code.

    For example, a test run of the main loop of the linear search algorithm (say, searching for a target of 7 among 3, 12, 8) might look like this:

    | Position | Value to Compare | Equals Target? | Return Value | |----------|------------------|----------------|--------------| | 0 | 3 | no | | | 1 | 12 | no | | | 2 | 8 | no | -1 |

    Or, if you're used to a language where arrays are 1-based:

    | Position | Value to Compare | Equals Target? | Return Value | |----------|------------------|----------------|--------------| | 1 | 3 | no | | | 2 | 12 | no | | | 3 | 8 | no | -1 |

    Or, it might look like neither of these — maybe you used a chalkboard and erased and rewrote your variables as you executed each line. Maybe, you had an extra column for whether the position is at the end of the list already. Do whatever works best for you.

    Pseudocode, as a communication tool, is most useful when it's clear and concise. But as you saw with these special cases, sometimes being too concise can introduce ambiguity or leave out an important detail. For example, omitting "until you find the cutlery" above would mean checking all drawers even after finding some cutlery. This is either inefficient, or a valid algorithm for a different problem: one where you assume there might be multiple cutlery drawers and your goal is to find all of them.

    "Clear" is also contextual. Pseudocode written for "rubber duck debugging" (i.e., mainly for you) can take whatever form is most useful for you. Simply from habit, it might somewhat resemble the programming language you're most familiar with.

    In the first task, the pseudocode is for a presentation that will have a general programming audience. For maximum clarity, you should avoid semantics and syntax that are tied to particular programming languages. You may disagree with the official solution from task 1. It's possible that you found a different selection of steps clearer. Indeed, there's no objective best format for pseudocode; nonetheless, the points made in task 1 should be useful to consider.

    Furthermore, translating between pseudocode and real code isn't necessarily trivial. Consider the step with 'go to' in it— in many languages, this detail is handled implicitly by a for loop and doesn’t need to be explicitly stated. The early return for the empty array case may be unnecessary for the same reason.

    Complexity As its name suggests, the linear search algorithm runs in linear time.

    It uses constant space, since regardless of the dataset size, it only needs to keep track of one variable (the current search position within the array).

  3. Challenge

    The Boyer-Moore Majority Vote Algorithm

    The linear search algorithm works fine if you know the value you're searching for.

    But sometimes, you know what you're searching for — in a sense — but not its exact value. For example, imagine you have a collection of votes and you're searching for the candidate with more than half the votes (assuming there is one). This is the case with the Boyer-Moore majority vote algorithm.

    This algorithm is especially space-efficient. Rather than maintain a vote count for each candidate, it uses just two variables in total: the current guess as to who the majority candidate is and a special counter.

    It's also time-efficient because it accomplishes its task after having seen each vote only once.

    In other words, it runs in O(n) time and uses O(1) space (i.e., linear time and constant space).

    So, how is it possible to figure out the majority candidate without counting the votes in the traditional sense? The secret lies within the special counter, which starts at zero. It tracks how many votes ahead the current guess is relative to the others — one vote for the current guess increases the count by one, while a vote for anyone else decreases it by one. If it encounters enough votes for other candidates, the count will reach zero, and the algorithm will set the next vote as the new guess, with the counter indicating that it is now one vote ahead of the other candidates.

    At the end of the data, the current guess is the majority candidate. The intuition is that a majority candidate will have enough votes to finish with a positive relative (or net) vote count, regardless of the order in which the votes are counted.

    Sample Run Supposing the input is [2,3,3,1,1,1,2,3,3], here's how a run might look: | Current Ballot | Guess | Net Count | |----------------|-------|-----------| | | | 0 | | 2 | 2 | 1 | | 3 | 2 | 0 | | 3 | 3 | 1 | | 1 | 3 | 0 | | 1 | 1 | 1 | | 1 | 1 | 2 | | 2 | 1 | 1 | | 3 | 1 | 0 | | 3 | 3 | 1 |
    Recall that this algorithm assumes that there is a candidate with a majority of votes. Sometimes, this can be determined beforehand. For example, in a voting system where abstaining is not allowed, there are only two candidates, and the number of votes is odd. However, that's a very specific scenario.

    When you can't assume a majority candidate, you have to add a follow-up pass to this algorithm to verify the result. Otherwise, the output of the algorithm can't be trusted. It may not even be the candidate with the most votes (and that's assuming there was no tie for first place).

    Nonetheless, two passes is still considered O(n) time complexity.

  4. Challenge

    Bubble Sort

    It's time to switch gears and look at sorting algorithms.

    The first and simplest is called bubble sort. It gets its name from the way it causes smaller elements to "bubble up" the list with each pass. Yes, each — sorting an array in one pass is only possible in some very specific scenarios, which you'll see later.

    How many passes the algorithm needs depends on the input data. This means that, unlike the first two algorithms presented in this lab, bubble sort has differing worst-case and best-case running times. Though, like the other two, bubble sort needs only constant (O(1)) space.

    Bubble sort works by going through a given array from start to finish, and swapping an element with its neighbor if the two are out of order. The algorithm continues to make passes until there's one where it doesn't need to swap anything, indicating that the array must be sorted.

    In the best-case scenario, the array is already sorted. In this case, the algorithm makes one pass, and finding that no bubbling was needed, stops. Its running time is thus linear (O(n)).

    In the worst-case scenerio, the array is reverse-sorted. In this case, the algorithm will take n - 1 passes to sort it, plus the final swap-less pass to discover the fact that it's sorted. Its running time is thus quadratic (O(n²)). The running time for the average case is quadratic, too.

    Sample Run Supposing the input is [11,4,3,6], here's how a run might look: | Array | Swapped? | Element | Element to Its Right | More? | |----------|----------|---------|----------------------|-------| | 11,4,3,6 | false | 11 | 4 | yes | | 4,11,3,6 | true | 11 | 3 | yes | | 4,3,11,6 | true | 11 | 6 | yes | | 4,3,6,11 | false | 4 | 3 | yes | | 3,4,6,11 | true | 4 | 6 | no | | 3,4,6,11 | true | 6 | 11 | no | | 3,4,6,11 | false | 3 | 4 | no | | 3,4,6,11 | false | 4 | 6 | no | | 3,4,6,11 | false | 6 | 11 | no |
    You may have noticed that elements moving down the array move much faster than those moving up. For example, the heaviest element starting at the top will sink to the bottom in a single pass, being moved by every swap. In contrast, the lightest element starting at the bottom will only bubble up by one spot during each pass, taking _n - 1_ passes to arrive at the correct spot.

    Variations on the bubble sort algorithm try to address the above property so as to improve performance while maintaining most of the simplicity. They use techniques such as alternating the direction of the pass, comparing elements with a progressively smaller gap rather than only adjacent ones from the start, or shortening each successive pass by one element. As you saw above, the heaviest of the remaining elements is always positioned correctly after a single pass and does not need to be re-examined.

    While the bubble sort is a simple algorithm to reason about and to implement, its running time requirements make it a terrible choice in almost all practical cases. This is despite the fact that the version you implemented just now already has an optimization thanks to the "swapped" flag it uses. Without that, you would need n - 1 passes to be sure the data was sorted, regardless of input. Thus the best-case scenario would be quadratic, too.

    Bubble sorting can still be a competitive option if you know the array is almost sorted to begin with, but that's a pretty specific scenario.

  5. Challenge

    Selection Sort

    Perhaps, even more intuitive than the bubble sort algorithm, is the selection sort algorithm.

    It's the procedure you might follow with big, heavy objects in a storage container. Look at all the objects to determine which object should go in the first spot, then swap it with what's in the first spot if it isn't there already. Next, look through the remaining objects to find the object that should go in the second spot and swap it with what's in the second spot if it isn't there already. Repeat for the third, fourth, fifth spots, etc. Once you've ensured that the second-to-last spot has the correct object, you're done, as the last object is correctly placed by process of elimination.

    Selection sort complexity is comparable to the version of bubble sort you implemented in the previous task. Both are in-place sorting algorithms (i.e., they don't require extra arrays), so they use constant space. Additionally, both require enough repeated passes to run in quadratic time in both the worst and average cases. However, the selection sort algorithm also has a quadratic best case since it requires a pass for each item, regardless of the input.

    That said, it almost always outperforms bubble sorting. That's because their quadratic running time can be examined a little more closely. Both algorithms have quadratic worst and average cases when it comes to the number of comparisons made, but not the number of swaps. In that case, the bubble sort algorithm remains quadratic in both scenarios, whereas the selection sort algorithm is linear since it requires only one swap per item to place it in its correct final position.

    Sample Run Supposing the input is [11,4,3,6], here's how a run might look: | Array | Element to Swap | `indexOfMinimum` | Element at `indexOfMinimum` | Index of Swap Candidate | Element at Swap Candidate | Candidate Is Less? | |----------|-----------------|----------------|---------------------------|-------------------------|---------------------------|--------------------| | 11,4,3,6 | first | first | 11 | second | 4 | yes | | 11,4,3,6 | first | second | 4 | third | 3 | yes | | 11,4,3,6 | first | third | 3 | fourth | 6 | no | | 3,4,11,6 | second | second | 4 | third | 11 | no | | 3,4,11,6 | second | second | 4 | fourth | 6 | no | | 3,4,11,6 | third | third | 11 | fourth | 6 | yes | | 3,4,11,6 | third | fourth | 6 | | | | | 3,4,6,11 | | | | | | |
    You may have noticed that selection sort incorporates something similar to the linear search algorithm in its inner loop.

    As with bubble sort, selection sort can be improved with some minor variations. To extend the storage container example, you might have seen that swapping is itself somewhat wasteful. Why put an object back in a spot from which it will have to be moved again later? For this reason, as each item is deplaced by the correct one, you might instead search for its own rightful position, by counting how many items are smaller than it. Such a variation is known as the cycle sort algorithm, which is uniquely optimal in terms of the number of writes. In the storage container, this approach might save you some heavy lifting, and in a computing context, might reduce disk wear.

    In another variation, you might find both the minimum and maximum with each pass, reducing the number of comparisons overall.

  6. Challenge

    Insertion Sort

    Where the selection sort algorithm could be useful with heavy objects in a storage container, the insertion sort algorithm is exactly the kind you're likely to use while playing cards. When you're dealt your first card, your hand is already sorted because it has only one card. As you're dealt each new card, you insert it at the correct position among the other already sorted cards in your hand.

    With a small number of cards, you're likely to be able to tell where a card should go with a quick glance. However, to break this down into a structured algorithm, that step will need to be expanded a bit.

    What about the step of inserting a card? Is it as easy as telling the computer something like, "insert the 4 just before the 5"?

    Actually, it depends. If you use a linked list — a data collection where each element contains a pointer to the next element — it might be fairly straightforward. At a higher abstraction, some languages and standard libraries might offer an insert function for arrays, or may not even tell you if the underlying implementation uses an array, a linked list, or some other structure. As you can see, pseudocode, even if it doesn't resemble any particular programming language, can't hide all of the assumptions being made about the language it will eventually be translated into.

    For this lab, you can assume the use of a plain, old-fashioned array — one without a handy insert function. Since you can't simply squeeze a new index in (like you can with a hand of cards), you'll have to account for shifting items over to make room for the element to be inserted (called the key).

    At this level, the insertion sort algorithm goes like this. For each key from the second item until the end of the array, look for an item smaller than the key by going backward through the known-to-be-already-sorted part of the array, shifting each element to the right whenever it's bigger than the key. Then put the key to the right of the element found to be smaller (or at the beginning, if none was found).

    This algorithm's worst and average-case performance is as bad as that of bubble sort (quadratic), and its best-case is as good (linear). Its average case is interesting — it's technically quadratic both in terms of comparisons and write operations, unlike selection sort, which is linear in terms of writes. Yet on average it outperforms selection sort. How can this be?

    It comes down to the fact that, on each iteration of the selection sort algorithm, it must check all the remaining unsorted data to make its next selection — regardless of that data. In contrast, the insertion sort algorithm, when it's lucky enough to find a key that's already bigger than the sorted part of the array, needs only a single comparison to discover this fact and move on to the next key. (Or if its new position is nearby, only a small number of comparisons.)

    Sample Run Supposing the input is [11,4,3,6], here's how a run might look: | Array | Key | `valueCopy` | Shifter | Shifter Is First? | Value to Left of Shifter | `valueCopy` Is More Than Value to Left of Shifter? | Shifter Is Key? | |-----------|--------|-----------|---------|-------------------|--------------------------|--------------------------------------------------|-----------------| | 11,4,3,6 | second | 4 | second | no | 11 | no | | | 11,11,3,6 | second | 4 | first | yes | | | no | | 4,11,3,6 | third | 3 | third | no | 11 | no | | | 4,11,11,6 | third | 3 | second | no | 4 | no | | | 4,4,11,6 | third | 3 | first | yes | | | no | | 3,4,11,6 | fourth | 6 | fourth | no | 11 | no | | | 3,4,11,11 | fourth | 6 | third | no | 4 | yes | no | | 3,4,6,11 | | | | | | | |
    With insertion sort outperforming bubble sort in the best-case scenario and outpeforming selection sort in the average-case scenario, it may seem like the best selection of the three. In fact, just like the other two, it can be further improved with variations like comparing over distances or using a binary search (which you'll do in a later task) instead of a linear search to find where the key needs to go.
  7. Challenge

    Counting Sort

    Comparison-based sorting algorithms (like bubble, selection, and insertion) are not the only possibility. As you saw with the Boyer-Moore majority vote algorithm, special starting conditions can allow for novel approaches.

    In fact, there's a sorting algorithm that takes a similar type of input to the Boyer-Moore majority vote algorithm, and it's known as the counting sort algorithm. It takes an array with lots of small, duplicate values.

    But that's where the similarity ends. The counting sort algorithm is a specialized sorting algorithm and isn't about votes, so it doesn't, for example, have a majority requirement. A better example context for the counting sort algorithm is a task list, where each task can have a priority rating — a number from 1 through some small number k.

    It works in three stages.

    Its first stage is the "counting" part. Simply count up how many tasks have each priority. Since the values are small, you can use the values themselves as indexes to a count array. This is akin to a whiteboard tally — for example, a task with priority #3 means adding one to the value in box #3.

    The second stage is to convert the tallies to prefix sums telling where the last task with a given priority will go. The first count is already correct as a prefix sum; the rest are simply arrived at by adding the previous prefix sum.

    The third stage is to work backward through the input array and use it to populate an output array. Each task is placed according to the corresponding prefix sum. The prefix sum is then decremented, so that the next task with the same priority will go to its left.

    Since these stages work identically regardless of the input data, the counting sort algorithm has the same best, average, and worst-case scenarios: O(n + k). That is to say, it has a linear running time, unless k gets out of hand. To use an extreme example, suppose you have a dataset where there are no duplicates in what you're sorting. In this case, k could equal n, or could be even bigger (maybe you're sorting messages by their timestamps as represented by the number of milliseconds since the first message — k could be enormous compared to n). Then the counting sort algorithm would perform much worse than a quadratic one. So the algorithm's performance depends not only on the size of the data, but also the highest value it contains.

    Sample Run Supposing the input is [1,4,4,4,3,4,1], here's how a run might look:

    ### Stage 1 | Counts | Element | Value at Element | `whichCount` | Value at `whichCount` | |---------|---------|------------------|------------|---------------------| | 0,0,0,0 | first | 1 | first | 0 → 1 | | 1,0,0,0 | second | 4 | fourth | 0 → 1 | | 1,0,0,1 | third | 4 | fourth | 1 → 2 | | 1,0,0,2 | fourth | 4 | fourth | 2 → 3 | | 1,0,0,3 | fifth | 3 | third | 0 → 1 | | 1,0,1,3 | sixth | 4 | fourth | 3 → 4 | | 1,0,1,4 | seventh | 1 | first | 1 → 2 | | 2,0,1,4 | | | | |



    Stage 2

    | Counts | Element | Value to the Left of Element | Value at Element | |---------|---------|------------------------------|------------------| | 2,0,1,4 | second | 2 | 0 → 2 | | 2,2,1,4 | third | 2 | 1 → 3 | | 2,2,3,4 | fourth | 3 | 4 → 7 | | 2,2,3,7 | | | |



    Stage 3

    At this point you have [1,4,4,4,3,4,1] as the input and you've calculated the prefix sums to be [2,2,3,7]: | Element | Value at Element | whichCount | Value at whichCount in the Count Array | whichOutputPosition | Output | Counts | |---------|------------------|------------|----------------------------------------|---------------------|---------------|---------| | seventh | 1 | first | 2 | second | _,1,_,_,_,_,_ | 1,2,3,7 | | sixth | 4 | fourth | 7 | seventh | _,1,_,_,_,_,4 | 1,2,3,6 | | fifth | 3 | third | 3 | third | _,1,3,_,_,_,4 | 1,2,2,6 | | fourth | 4 | fourth | 6 | sixth | _,1,3,_,_,4,4 | 1,2,2,5 | | third | 4 | fourth | 5 | fifth | _,1,3,_,4,4,4 | 1,2,2,4 | | second | 4 | fourth | 4 | fourth | _,1,3,4,4,4,4 | 1,2,2,3 | | first | 1 | first | 1 | first | 1,1,3,4,4,4,4 | 0,2,2,3 |

    It's worth noting the counting sort algorithm is the first one you've seen that doesn't operate in place. Its space requirements are _O(n + k)_, just like its running time complexity, in contrast to the constant (_O(1)_) space requirements of all the other algorithms you've seen so far in this lab.
  8. Challenge

    Merge Sort

    Restricting input conditions isn't the only way to achieve better than quadratic running time in sorting. In fact, there are dozens of general sorting algorithms that are more optimal than bubble, selection, and insertion sort. One of the most efficient and widely used is the merge sort algorithm.

    The merge sort algorithm is more sophisticated than the others you've seen so far. It uses two powerful algorithmic techniques. The first, recursion, is where a function calls itself. The second technique, divide and conquer, is where you break a large problem into smaller and smaller ones until you reach a problem that is trivial to solve.

    With these techniques, the basic merge sort algorithm is simple: divide the input in half, use the merge sort algorithm on each of the halves, and then merge the two sorted halves.

    It's the "merge" part of the algorithm that has to be described in a bit more detail to be able to implement it. To keep this part simple, you can assume the availability of these array methods:

    1. Push: Append to the rightmost end. (E.g., pushing 3 onto [1, 2] gives [1, 2, 3].)
    2. Shift: Remove from the leftmost end. (E.g., shifting from [1, 2, 3] gives [2, 3].)
    3. Subarray: Return the subarray consisting of the specified consecutive elements. (E.g., the subarray of [10, 20, 30, 40, 50, 60] from the start to the 30 is [10, 20, 30].)
    4. Concatenate: Return an array containing one array's sequence followed by another's. (E.g., [1, 4] concatenated with [3, 2] gives [1, 4, 3, 2].)

    Using those functions, the merge subalgorithm takes two arrays (assumed to be sorted) and loops until one array is empty, pushing the lesser of the two arrays' leftmost values to the output array with each iteration. Then it pushes everything in the nonempty array to the output array.

    Thanks to its divide-and-conquer approach, merge sort has a consistent running time — regardless of input — of O(n log n). That means that it's faster than quadratic, but slower than linear. But like the counting sort algorithm, it needs a separate output array, rather than sorting in place (i.e., it has a linear space requirement). It's also less straightforward to implement than the first three sorting algorithms in this lab.

    Sample Run Supposing the input is [10,9,1,7,4], here's how a run might look:
    1. middleElement is 1, now sort [10,9,1] and [7,4].
      1. (Sorting [10,9,1]): middleElement is 9, now sort [10,9] and [1].
        1. (Sorting [10,9]): middleElement is 10, now merge [10] and [9].
          1. Follow the Else: outputArray becomes [9], rightSubArray becomes empty, loop ends.
          2. Concatenate leftSubArray onto outputArray: outputArray becomes [9,10].
        2. Now merge [9,10] and [1].
          1. Follow the Else: outputArray becomes [1], rightSubArray becomes empty, loop ends.
          2. Concatenate leftSubArray onto outputArray: outputArray becomes [1,9,10].
      2. (Sorting [7,4]): middleElement is 7, now merge [7] and [4].
        1. Follow the Else: outputArray becomes [4], rightSubArray becomes empty, loop ends.
        2. Concatenate leftSubArray onto outputArray:outputArray becomes [4,7].
      3. Now merge [1,9,10] and [4,7].
        1. Follow the If: outputArray becomes [1], leftSubArray becomes [9,10].
        2. Follow the Else: outputArray becomes [1,4], rightSubArray becomes [7].
        3. Follow the Else: outputArray becomes [1,4,7], rightSubArray becomes empty, loop ends.
        4. Concatenate leftSubArray onto outputArray: outputArray becomes [1,4,7,9,10].
    As with many previous algorithms, there exist variations of merge sort that improve its running time or space complexity at the cost of some additional sophistication. Divide-and-conquer doesn't always require splitting into two halves — it's possible to divide the array just once into multiple subsequences, aiming to identify already sorted parts. Some storage optimizations can be applied to reduce the extra _O(n)_ space that merge sort requires compared to in-place algorithms.
  9. Challenge

    Binary Search

    Now that you know a few classical ways to sort data, it's time to return to search algorithms. So far, you've learned about the linear search algorithm and much more specific Boyer-Moore majority vote algorithm. In terms of general searching, it's hard to improve much upon the linear search algorithm — if you don't know anything special about your data, how much better can you get than "look at each item until you find what you want"?

    Fortunately, there's quite a common search scenario where you can do better: Namely, when you know your data is sorted.

    Perhaps the easist algorithm that leverages that assumption is called the binary search algorithm. It also happens to use the divide-and-conquer technique you just saw demonstrated with merge sort.

    Suppose you pick some element from the array and it's not what you're looking for. Because the data is sorted, you can tell whether it's before or after this element that you need to continue searching. As its name implies, the binary search algorithm doesn't pick some element, it picks the middle one, effectively halving the remaining area to be searched if further iterations are still needed.

    In other words: take the middle element. Is it what you're looking for? Then you're done. Is it more than what you're looking for? Then restart the algorithm with the left half of what remains. Is it less? Then restart with the right half instead. If you reach a point where the remaining area to be searched is empty, you know the dataset doesn't contain what you're looking for, which you can indicate with a return value of -1.

    That's all there is to it.

    Like the linear search algorithm, this one has a constant space requirement and a constant running time in the best-case scenario (i.e., when the first element you check is the one you want). But the binary search algorithm has better than linear running time in both the average and worst-case scenarios: It has O(log n) (logarithmic) running time. That means that if the dataset is growing exponentially, the running time is only growing linearly.

    Sample Run Supposing the input is [1,2,3,5,10,11,17,23,24,30,39,45] and the target search value is 30, here's how a run might look: | `leftBoundary` | `rightBoundary` | Search Range | `middleElement` | Value at `middleElement` | Comparison | |--------------|---------------|---------------------------------|---------------|------------------------|------------------| | first | twelfth | 1,2,3,5,10,11,17,23,24,30,39,45 | sixth | 11 | less than target | | seventh | twelfth | 17,23,24,30,39,45 | ninth | 24 | less than target | | tenth | twelfth | 30,39,45 | eleventh | 39 | more than target | | tenth | tenth | 30 | tenth | 30 | equals target |
    As always, there are numerous variations on the binary search algorithm to improve upon its requirements by adding some complexity.

    Aside from the implementation of binary search and its derivatives in standard libraries, one extremely useful application of it is oddly hands-on in the world of software development: the git bisect command.

    When a bug is discovered in the current version of some software codebase, sometimes it's not immediately obvious what area of the codebase contains the bug or how it got there. In such cases, you may be tempted to start working backward through the codebase's revision history until you find a version that is free of its symptoms.

    But that's like performing a linear search by hand. In this context, a read/comparison operation isn't merely some nearly instantaneous internal operation the computer does on your behalf. It's a set of steps you're performing yourself: a git checkout followed by some kind of test for the bug, which may involve a compilation step first. With very large codebases, a single read/comparison could literally take hours. Your choice of algorithm in searching for the commit that introduced the bug can make a massive difference.

    That's where git bisect comes in: you specify the most recent version known to be bug-free (which could be as far back as the start of the repo history if you're unsure), and it guides you through a binary search. It will manually check out the appropriate 'middle' commit, ask if it contains the bug, and continue until it pinpoints the offending commit.

  10. Challenge

    Jump Search

    The final algorithm of this lab is another search algorithm, the jump search. Like the binary search algorithm, the jump search algorithm expects sorted input.

    You can think of the jump search algorithm as taking the input array as if it were one long row of data, which you then break up into rows such that, stacked on top of each other, they're the shape of a square (roughly speaking, in cases where input array length isn't a square number). You then check the values in the first column only, from top to bottom. If the first value is bigger than the target, it isn't in the dataset. If any other of these first-column values is bigger than the target, you know the value must be in the previous row (i.e., the row above) and you can run a linear search on said row. If none of the first-column values is bigger than the target, it may or may not be in the final row — the linear search will tell you for sure.

    If you ignore the square visualization, you can see where the algorithm gets its name: you're effectively jumping forward through the data (up to √n times), then jumping backward and doing a linear search (searching through at most √n elements).

    Put another way, a jump search is a linear search where you first refine the starting and ending points of the linear search. You refine them by previewing a forward jump from the current starting point until you find a value larger than the target. At this point, you keep the current starting point, set the previewed point as the ending point, and stop jumping forward. Whether this happens or not, you finish by running the linear search.

    With less than √n jumps and less than √n elements in the linear search step, the jump search algorithm runs in O(√n) time. This is still better than the linear search algorithm, but not as good as the binary search algorithm. Its main advantage is in contexts where jumping backward is especially expensive compared to jumping forward, as is the case with HDD drives thanks to how their platters are accessed. In this case, jump searches requirement to only jump backward once can mean it outperforms a binary search, which may need up to log(n) backward jumps.

    Sample Run Suppose the input is [1,2,3,5,10,11,17,19,20,23,24,30,39,41,44,45] and the target search value is 30. The square would look like this:

    | 1 | 2 | 3 | 5 |

    | 10 | 11 | 17 | 19 |

    | 20 | 23 | 24 | 30 |

    | 39 | 41 | 44 | 45 |

    Your jumpSize would be 4 (the width of the square). You'd end up checking 1, 10, 20, then 39 before running a linear search on the third row.

    Stage 1

    | rowStart | rowStop | rowStart + jumpSize | rowStart + jumpSize Is within Bounds? | Value at rowStart + jumpSize | Value Is More Than Target? | |----------|------------|---------------------|---------------------------------------|------------------------------|----------------------------| | first | sixteenth | fifth | yes | 10 | no | | fifth | sixteenth | ninth | yes | 20 | no | | ninth | sixteenth | thirteenth | yes | 39 | yes | | ninth | thirteenth | | | | |



    Stage 2

    | Element | Value at Element | Value Is Target? | |----------|------------------|------------------| | ninth | 20 | no | | tenth | 23 | no | | eleventh | 24 | no | | twelfth | 30 | yes |

    Yes, even the jump search algorithm can be optimized further by incorporating multiple levels of jumps.

    Now that you understand several foundational searching and sorting algorithms, it's important to note that it's almost always unnecessary — indeed, unwise — to implement these particular ones yourself in real-world applications. Every language has a library that implements them, and probably in a way that employs some clever optimizations. Often enough they'll abstract searching and sorting altogether, so that you may not even know which algorithm is being used under the hood when you call find() or sort(), or perhaps use WHERE or ORDER BY clauses in SQL.

    Nonetheless, knowing how these algorithms work gives you valuable insights into designing higher-level algorithms. If you're writing, say, a product recommendation engine, and find yourself nesting loops two or three levels deep, you'll know at a glance why it's likely to have trouble scaling.


    Congratulations on completing this lab!

Kevin has 25+ years in full-stack development. Now he's focused on PostgreSQL and JavaScript. He's also used Haxe to create indie games, after a long history in desktop apps and Perl back ends.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.