1 00:00:00,080 --> 00:00:05,520 We're going to be looking at the very famous old\h algorithm called Quicksort. It's very clever and\h\h 2 00:00:05,520 --> 00:00:10,960 it's very fast but it can also be very simple.\h So what I'm actually going to show you today is\h\h 3 00:00:10,960 --> 00:00:18,800 how you can do Quicksort in just five lines\h of code. Where did the Quicksort algorithm\h\h 4 00:00:18,800 --> 00:00:24,560 actually come from? This is Tony Hoare or\h Sir Tony Hoare to give him his proper name\h\h 5 00:00:24,560 --> 00:00:30,280 and he was a computer scientist from Oxford and\h actually he was Oxford's head of computer science\h\h 6 00:00:30,280 --> 00:00:38,040 for many, many years. And he published in 1962 a\h very famous paper with a lovely one-word title,\h\h 7 00:00:38,040 --> 00:00:42,440 it's just called Quicksort. And you know you've\h done something quite fundamental and important\h\h 8 00:00:42,440 --> 00:00:47,640 when you can have a paper with a one-word title.\h So this was published more than 60 years ago,\h\h 9 00:00:47,640 --> 00:00:52,280 it's one of the older algorithms in computer\h science, but actually Tony invented this algorithm\h\h 10 00:00:52,280 --> 00:00:59,440 a few years before that in 1959, but the actual\h paper which everyone cites was published in 1962.\h\h 11 00:00:59,440 --> 00:01:05,400 What we're actually going to do is see how to do\h quicksort in two different ways. So first of all,\h\h 12 00:01:05,400 --> 00:01:10,800 I'm going to explain on the lovely computer file\h paper how to do quicksort with a simple example.\h\h 13 00:01:10,800 --> 00:01:15,440 And then we're going to move over to the laptop\h and see how to write it in actual code and in\h\h 14 00:01:15,440 --> 00:01:21,840 five lines of code. And what we're going to do is\h put the numbers from one to nine into these boxes\h\h 15 00:01:21,840 --> 00:01:34,160 in random order. So let's put the one here, two,\h three, four, five, six, seven, eight, and nine. So\h\h 16 00:01:34,160 --> 00:01:39,800 we've got nine numbers here in jumbled up order,\h and we want to think how can we actually sort\h\h 17 00:01:39,800 --> 00:01:46,480 these. So the way the quicksort algorithm works is\h you first of all pick what's called a pivot value.\h\h 18 00:01:46,480 --> 00:01:51,080 And for symmetry reasons here, and we'll see why\h in a second, I'm just going to pick the middle\h\h 19 00:01:51,080 --> 00:01:56,840 value in the list, the 5 here, so highlight\h that in red. So we've picked our pivot value,\h\h 20 00:01:56,840 --> 00:02:01,680 and the next step in the algorithm now is that\h we're going to divide up the remaining numbers,\h\h 21 00:02:01,680 --> 00:02:06,240 depending on whether they're less than or\h greater than the pivot value. So we're going\h\h 22 00:02:06,240 --> 00:02:11,400 to send the numbers which are less than the 5\h down the left-hand side, so we'll end up with\h\h 23 00:02:11,400 --> 00:02:17,120 4 of these. And we're going to send the numbers\h which are greater than 5 down the right-hand side,\h\h 24 00:02:17,120 --> 00:02:21,440 and we'll also end up with 4 of these. And this\h is just a really simple process now. We just go\h\h 25 00:02:21,440 --> 00:02:26,560 through the list apart from the pivot, and if it's\h less than five we write it here, greater than five\h\h 26 00:02:26,560 --> 00:02:32,840 we write it here. So two is less than five, so\h that goes down here. Seven is greater than five,\h\h 27 00:02:32,840 --> 00:02:38,960 so that goes here, and we keep going. And again,\h we have to be very careful not to mess it up.\h\h 28 00:02:38,960 --> 00:02:44,000 We've got four numbers here, less than five, and\h we've also got four numbers greater than five.\h\h 29 00:02:44,000 --> 00:02:51,080 The next step in the algorithm is we're going to\h sort in some way these two sub lists. So suppose\h\h 30 00:02:51,080 --> 00:02:58,400 we sorted these numbers here. We're going to get\h four numbers. We're going to get one, two, three,\h\h 31 00:02:58,400 --> 00:03:04,520 four. And suppose we sorted in some way, and again\h we'll come back to this in a second, we sort these\h\h 32 00:03:04,520 --> 00:03:10,720 numbers here and we will get six, seven, eight,\h and nine. And then the final step in the algorithm\h\h 33 00:03:10,720 --> 00:03:15,920 is the kind of obvious one. we've got two sorted\h sub-lists here. We've got 1, 2, 3, 4, 6, 7, 8,\h\h 34 00:03:15,920 --> 00:03:20,920 9, and we've got our original pivot up the top\h here. All we're going to do is bring everything\h\h 35 00:03:20,920 --> 00:03:26,040 back together. So we need nine boxes again. So\h we copy down the 1, 2, 3, 4. We copy down the\h\h 36 00:03:26,040 --> 00:03:32,040 original pivot value, which we use to divide the\h list into two parts. And then we copy down the 6,\h\h 37 00:03:32,040 --> 00:03:38,520 7, 8, and 9. So what we've done in moving from the\h top of the page down to the bottom of the page is\h\h 38 00:03:38,520 --> 00:03:43,360 we've sorted the numbers from 1 to 9. But there's\h a couple of questions remaining, and actually I'm\h\h 39 00:03:43,360 --> 00:03:49,880 going to put Sean on the spot now here. So first\h question for you, Sean. So how do we actually sort\h\h 40 00:03:49,880 --> 00:03:55,000 the two sublists? Because I said we can sort them\h in some way. Any ideas how we could sort the two\h\h 41 00:03:55,000 --> 00:04:00,240 sublists? Well, kind of presumably you've got to\h do the same thing again, right? You've got to pick\h\h 42 00:04:00,240 --> 00:04:05,200 a pivot and do something like that. Yeah, exactly.\h So you do the same thing again. You just apply the\h\h 43 00:04:05,200 --> 00:04:10,480 quick sort algorithm again. So we could take\h the numbers 2, 4, 1, 3, pick a pivot, divide\h\h 44 00:04:10,480 --> 00:04:16,560 up the numbers, and do the sorting and merging.\h And it's a recursive algorithm. So quick sort is\h\h 45 00:04:16,560 --> 00:04:22,880 defined in terms of itself. We divide up into two\h parts based on a pivot value. We recursively sort\h\h 46 00:04:22,880 --> 00:04:28,960 the two parts using the same algorithm. And then\h we combine everything together at the end. But\h\h 47 00:04:28,960 --> 00:04:34,080 when you're writing recursive programs, you always\h have to worry, how do they stop? So can I ask Sean\h\h 48 00:04:34,080 --> 00:04:37,800 another question again. There needs to be an end\h case, is that right? Yeah, exactly. So you have to\h\h 49 00:04:37,800 --> 00:04:43,000 stop. So when does this algorithm actually stop?\h So I'm just thinking about 2, 4, 1, 3. So if you\h\h 50 00:04:43,000 --> 00:04:49,080 had, I don't know, pick 4, you've got 1, 2, 3.\h I don't know, how do you pass? So the algorithm\h\h 51 00:04:49,080 --> 00:04:53,960 stops when you have no numbers left, because\h every time you call this algorithm recursively,\h\h 52 00:04:53,960 --> 00:04:57,840 the sublists are going to be smaller, because you\h have a pivot value. So you might like have a pivot\h\h 53 00:04:57,840 --> 00:05:01,920 of 1, right? As in you've only got one number,\h so you can't pivot. And then there'll be maybe\h\h 54 00:05:01,920 --> 00:05:07,080 nothing left to actually sort. So the kind of\h base case for the algorithm or when the recursion\h\h 55 00:05:07,080 --> 00:05:12,880 stops is when you don't have any numbers left. So\h that's the way QuickSort works in pictures. You\h\h 56 00:05:12,880 --> 00:05:18,040 pick a pivot, divide up based on the two values,\h recursively sort and then merge back together at\h\h 57 00:05:18,040 --> 00:05:22,440 the end, and the algorithm stops when you don't\h have any numbers left. What we're going to do now\h\h 58 00:05:22,440 --> 00:05:28,120 is move over to the laptop and see how to do\h it in code. And as by the title of the video,\h\h 59 00:05:28,120 --> 00:05:32,880 it's going to be five lines of code. So of course\h I'm going to use my favorite programming language\h\h 60 00:05:32,880 --> 00:05:36,880 which is Haskell, which is a functional language,\h it's very concise, but everything I'm going to\h\h 61 00:05:36,880 --> 00:05:42,240 show you today you can do in basically any high\h level language today that's got some kind of nice\h\h 62 00:05:42,240 --> 00:05:48,240 facilities for manipulating lists. But Haskell is\h a good choice because it's so concise. So what I'm\h\h 63 00:05:48,240 --> 00:05:53,480 going to do is start up an editor, I'll just make\h a temporary file here for putting the code in,\h\h 64 00:05:53,480 --> 00:05:59,000 and we're going to write a quick sort function. So\h the quick sort function takes a list as an input,\h\h 65 00:05:59,000 --> 00:06:04,720 and it gives a list as an output. And as we talked\h about a few minutes ago, the base case for the\h\h 66 00:06:04,720 --> 00:06:11,760 definition is if you quicksort a list with nothing\h in it, then that stops the program running and\h\h 67 00:06:11,760 --> 00:06:17,520 you get a list with nothing in it. So this is the\h base case for our definition. And the tiny piece\h\h 68 00:06:17,520 --> 00:06:22,960 of notation here is in the language I'm using,\h the empty list is a square bracket followed by\h\h 69 00:06:22,960 --> 00:06:28,280 a closed square bracket. But in other languages,\h it may be slightly different. So the other case,\h\h 70 00:06:28,280 --> 00:06:33,360 which is the interesting case, is how do you\h quicksort a non-empty list? And this is the\h\h 71 00:06:33,360 --> 00:06:38,280 only other tiny bit of notation, which I need\h to maybe explain here. This notation here means\h\h 72 00:06:38,280 --> 00:06:43,080 a non-empty list where x is the first thing.\h That's actually going to be our pivot value.\h\h 73 00:06:43,080 --> 00:06:47,520 We're going to have the first thing in the list\h being the pivot rather than the middle. And xs\h\h 74 00:06:47,520 --> 00:06:52,280 is everything else in the list. So this is just a\h way of breaking down a list into the first thing,\h\h 75 00:06:52,280 --> 00:06:56,920 which is going to be our pivot value and\h everything else in the list. And then we\h\h 76 00:06:56,920 --> 00:07:02,160 can just go back to the example and think how do\h you actually quick sort a list? Well what we're\h\h 77 00:07:02,160 --> 00:07:09,120 going to do is we're going to have the smaller\h numbers in a list so I'll call that smaller\h\h 78 00:07:09,120 --> 00:07:13,800 and we're going to have the larger numbers in the\h list and we'll call that larger and the way we get\h\h 79 00:07:13,800 --> 00:07:19,160 these is very simple. To get the smaller numbers\h all we're going to do is filter out the numbers\h\h 80 00:07:19,160 --> 00:07:24,200 which are less than or equal to the first value in\h the list, which is here called x, and that's our\h\h 81 00:07:24,200 --> 00:07:29,800 pivot value. That was 5 in our example. So we're\h going to filter out the values smaller than x\h\h 82 00:07:29,800 --> 00:07:36,800 from the remaining numbers xs. Okay, so nice and\h easy. This is a primitive in the language that\h\h 83 00:07:36,800 --> 00:07:41,640 I'm using. It filters things out of a list. Other\h languages, this may be called something different,\h\h 84 00:07:41,640 --> 00:07:46,040 but other languages will have a similar\h kind of feature. To get the larger numbers,\h\h 85 00:07:46,040 --> 00:07:51,880 we can do exactly the same thing. we can filter\h out the numbers which are bigger than x from the\h\h 86 00:07:51,880 --> 00:07:57,360 list of remaining numbers. And now we have the\h kind of second level of the picture which we had.\h\h 87 00:07:57,360 --> 00:08:02,040 We had the numbers 1 to 9 jumbled up. We picked\h a pivot. Now we're picking the first value. We\h\h 88 00:08:02,040 --> 00:08:06,840 divide the list up into two parts, the smaller\h numbers and the larger numbers. And then we can\h\h 89 00:08:06,840 --> 00:08:12,240 think, how do we continue from here? So let me\h write the rest of the code. It's really short,\h\h 90 00:08:12,240 --> 00:08:20,040 all of this. So qsort larger. So this is the\h entire code now. So let me explain the kind\h\h 91 00:08:20,040 --> 00:08:24,640 of bit up here where we kind of put everything\h together. All we're doing here is we're quick\h\h 92 00:08:24,640 --> 00:08:30,040 sorting the smaller numbers. So that was down the\h left hand side of the example which we did. Then\h\h 93 00:08:30,040 --> 00:08:34,880 we're quick sorting the larger numbers. So that\h was down the right hand side where we had the 6,\h\h 94 00:08:34,880 --> 00:08:40,840 7, 8 and 9 jumbled up. And then all we do is kind\h of combine everything together and put the pivot\h\h 95 00:08:40,840 --> 00:08:47,800 value, which here is x, in the middle. So plus\h plus is the operator in this particular language,\h\h 96 00:08:47,800 --> 00:08:53,040 which joins two lists together. Here, we're just\h putting the pivot into a little list on its own,\h\h 97 00:08:53,040 --> 00:08:59,000 and then we just smash everything together.\h And basically this is the quicksort algorithm.\h\h 98 00:08:59,000 --> 00:09:04,400 And what's interesting here is that this is\h just five lines of code. We have a base case,\h\h 99 00:09:04,400 --> 00:09:09,160 which says if you quicksort an empty list, you get\h an empty list. And then we have four lines which\h\h 100 00:09:09,160 --> 00:09:15,400 are dealing with the recursion, which are telling\h us how to quicksort a non-empty list. And for me,\h\h 101 00:09:15,400 --> 00:09:20,280 when you see kind of quicksort in pictures, it's\h kind of intuitively obvious how it's working,\h\h 102 00:09:20,280 --> 00:09:24,040 but you might think, do I really understand\h what's going on? But when you look at the\h\h 103 00:09:24,040 --> 00:09:29,160 code here, that's it. I mean, this is a complete\h implementation of quicksort in just five lines of\h\h 104 00:09:29,160 --> 00:09:34,000 code. And it's kind of hard to think how you could\h express the essence of this quicksort algorithm\h\h 105 00:09:34,000 --> 00:09:38,440 any more concisely than we've got here. There's\h really no redundant symbols here. If you've seen\h\h 106 00:09:38,440 --> 00:09:42,600 quicksort in some other languages, it may be kind\h of quite a few more lines. It could be up to a\h\h 107 00:09:42,600 --> 00:09:47,600 page of code in some other languages. But if you\h use a very high level language, like the one I'm\h\h 108 00:09:47,600 --> 00:09:53,200 using here, you can really express the essence of\h the algorithm extremely concisely. Let's see how\h\h 109 00:09:53,200 --> 00:09:59,480 it actually runs in practice. And let's see what\h kind of performance it's got. So what I've done is\h\h 110 00:09:59,480 --> 00:10:04,840 I've prepared another file called qsort,\h and let me load that into the system. So\h\h 111 00:10:04,840 --> 00:10:10,640 what I've got here is three bits of code now.\h I've got the quicksort algorithm in five lines,\h\h 112 00:10:10,640 --> 00:10:16,440 which we've just developed. I've also got another\h sorting algorithm here called insertion sort. And\h\h 113 00:10:16,440 --> 00:10:20,800 I'm not going to go into the details of how this\h works. But the important point here is quicksort\h\h 114 00:10:20,800 --> 00:10:25,760 is a fast algorithm. Insertion sort is a slow\h algorithm. And we're going to see that in practice\h\h 115 00:10:25,760 --> 00:10:31,320 when we do a little example. And of course,\h we're going to need some random lists to actually\h\h 116 00:10:31,320 --> 00:10:38,080 sort. So I've written a little bit of code here\h that's going to randomize a list of numbers. And\h\h 117 00:10:38,080 --> 00:10:42,000 it doesn't actually properly randomize. I'm just\h using what's called a riffle shuffle here. So if\h\h 118 00:10:42,000 --> 00:10:46,440 you've ever played cards, you know, you can often\h shuffle cards by taking an entire deck, splitting\h\h 119 00:10:46,440 --> 00:10:51,120 it in half, riffling it, which means kind of\h interleaving the odd and even cards. And if you do\h\h 120 00:10:51,120 --> 00:10:56,040 that a bunch of times, that's a kind of reasonable\h approximation to randomizing things. So you can\h\h 121 00:10:56,040 --> 00:11:02,200 see at the top here, if I'm randomizing a list of\h numbers, I'm just going to riffle shuffle it five\h\h 122 00:11:02,200 --> 00:11:09,080 times. But the details of this are not important\h here. So let's make a list of random numbers. So\h\h 123 00:11:09,080 --> 00:11:18,320 let's take randomize one up to 5,000. So we've got\h 5,000 random numbers. Let's have a look at that.\h\h 124 00:11:18,320 --> 00:11:23,520 So, well, it's not very interesting. It's just\h 5,000 random numbers. But now what we can do is we\h\h 125 00:11:23,520 --> 00:11:29,920 can quicksort it. And if we quicksort this list of\h 5,000 random numbers, almost as soon as I hit the\h\h 126 00:11:29,920 --> 00:11:35,800 return key, we get the result. It's sorted it into\h the correct order. And this is not running on a\h\h 127 00:11:35,800 --> 00:11:39,960 compiler here. This is an interpreted programming\h language, which I'm using. So if I actually\h\h 128 00:11:39,960 --> 00:11:45,800 compile this, this would be about 10 times faster.\h But even just on an interpreter, this is a fast\h\h 129 00:11:45,800 --> 00:11:52,640 algorithm. So let's contrast this by seeing how a\h not-so-good sorting algorithm works. So insertion\h\h 130 00:11:52,640 --> 00:12:00,160 sort. Let's try insertion sorting the same list of\h 5,000 numbers. Well, it's not even done 1,000 yet.\h\h 131 00:12:00,160 --> 00:12:07,800 Now it's up to 2000 and it's taking kind of five\h or six seconds to sort 5000 numbers. And actually,\h\h 132 00:12:07,800 --> 00:12:16,400 this gets even worse if we consider a longer list.\h Let's try randomizing 10,000 numbers. If I quick\h\h 133 00:12:16,400 --> 00:12:23,520 sort that, it takes a bit longer this time, maybe\h a second or two. But this time, if I run insertion\h\h 134 00:12:23,520 --> 00:12:33,240 sort, it's not even got to 500, 1000. and this\h is a really slow algorithm. Not only is Quicksort\h\h 135 00:12:33,240 --> 00:12:38,040 kind of a very clever algorithm, it's also a very\h efficient algorithm. And what I've been trying to\h\h 136 00:12:38,040 --> 00:12:43,760 show you here in this short video is that if you\h look at Quicksort in the right way, using a very\h\h 137 00:12:43,760 --> 00:12:48,280 high-level language, like the one we've been\h looking at here, then Quicksort can also be a\h\h 138 00:12:48,280 --> 00:13:01,040 very simple algorithm, which you can write in\h just five lines of code. I'd like to show you\h\h 139 00:13:01,040 --> 00:13:08,120 one extra thing, Sean, and I have to get a bit of\h Haskell magic in here, you know me. So we're going\h\h 140 00:13:08,120 --> 00:13:12,240 to have a look again quickly at the quicksort\h algorithm, just five lines of code. Something\h\h 141 00:13:12,240 --> 00:13:17,521 that's interesting about it is that it doesn't\h just sort numbers, it can sort other numbers.