Sorting and Searching Algorithms

Now that we went through a variety of VHDL design building blocks and we saw the techniques to create both generic, reusable and at the same time efficient (in terms of speed and area) designs, using either behavioral inference or primitive instantiations, it is time to put what we have learned in practice. We have already looked briefly at FIRs, Finite Impulse Response Filters in the context of introducing the DSP48 primitive, there is a lot more to be said and I will get back to FIRs, especially if there are demands for that in the comments.

For now we will look at another fundamental design problem, that of sorting. In mathematical terms the problem is simple - you are given a list of N objects and a function that can compare two objects and tell you which one comes first and you need to produce an output list of the same objects but with them in increasing order. The objects could be numbers, words, books, etc. and the compare function could be the mathematical "<" operator, alphabetical order or ISBN numbers. The operation of sorting would rank athletes running in a stadium based on their race times, would turn a heap of words into a dictionary and a stack of books into a library.

Sorting is important because it makes searching easier. Searching is also simple in mathematical terms, you are given a list of objects and a new object and you are to tell if the new item is in the list or not and if it is, what is its position. Looking for a particular value in unsorted and sorted lists is very different in terms of number of operations required to do the search. With an unsorted list you have to do an exhaustive search, compare with every item in the list until you either find a match or run out of list elements. On average, looking for a match in a list of N elements requires N/2 comparisons, or in mathematical notation O(N). On the other hand, if the list is sorted, with a single comparison with the item in the middle of the list you can rule out half of the list elements, then apply the same thing recursively to the remaining half until you find your item. On average this requires ceil(log2(N)) comparisons or O(log2(N)). If the list is large and you have to search into it many times the difference between searching in sorted and unsorted lists will be huge.

The sorting problem is well known in Computer Science, there are many algorithms, some simple but not so good for large N like bubble sort and insertion sort, which are O(N2) and some more sophisticated ones like quick sort, heap sort and odd-even merge sort which are O(N·log2(N)) or O(N·log2(N)2) . The problem with these algorithms is that they are sequential, while they work on a list of any size it takes many operations to do a sort and their throughput is low, no matter how fast your sequential computer is.

The practical applications for efficient sorting and searching algorithms are numerous and you can probably come up with your own examples but I will give two here that are relevant for hardware implementations. The network routing algorithms that underline the Internet use IP addresses and routing tables to direct incoming data packets towards their destination and this is done by searching for the destination IP of the packet within these routing tables - this has to be done extremely fast and cannot be done in software, it has to be a hardware implementation. The second example is from image processing where an operation called median filtering replaces a pixel in an image with the median of the pixels in its neighborhood. For example, the 5x5 set of pixels centered around our pixel needs to be sorted and the middle element of the sorted list of 25 pixels is the output of the filter and replaces the input pixel. This has to be done for every pixel in the image and if we are talking about high resolution video like HD, or 4K or even 8K the pixel rates are hundreds of MHz so you need to sort sets of 9, 25, 49, etc. pixels every nanosecond or so.

These are examples of sorting tasks that cannot be done in software efficiently and require hardware implementations. We would like to implement very fast sorting in hardware  in parallel not with a sequential algorithm, ideally in one clock, sorting a different list of N elements every clock. If we restrict the generality of our sorting algorithm from sorting a list of any size to sorting a list of fixed size then parallel sorting algorithms with good hardware implementations that can sort one list of items every clock are possible. These are called Sorting Networks and next week we will take a closer look at them and how to implement fast and efficient sorting in hardware.

Back to the top: The Art of FPGA Design