diff options
| author | Andreas Kapp Lindquist <andkaplin05@gmail.com> | 2025-10-28 14:13:59 +0100 |
|---|---|---|
| committer | mithe24 <mithe24@student.sdu.dk> | 2025-10-29 13:49:57 +0100 |
| commit | e119499fc68085b5c915091de2432465c24724e0 (patch) | |
| tree | e255610bf1ccf508a0658018efd2eb6bce06c669 /report/report.tex | |
| parent | 731bba8d505e1da4a5b920871655e785c5e90717 (diff) | |
| download | sorter-e119499fc68085b5c915091de2432465c24724e0.tar.gz sorter-e119499fc68085b5c915091de2432465c24724e0.zip | |
report(*): Did some small refactoring, and added pretty plot
Diffstat (limited to 'report/report.tex')
| -rw-r--r-- | report/report.tex | 222 |
1 files changed, 118 insertions, 104 deletions
diff --git a/report/report.tex b/report/report.tex index d412a4e..4341383 100644 --- a/report/report.tex +++ b/report/report.tex @@ -32,7 +32,7 @@ \vspace{1cm} \large - October 20$^\text{th}$ 2025 + October 31$^\text{st}$ 2025 \end{center} \newpage @@ -41,86 +41,49 @@ \section{Introduction} Sorting is one of the most fundamental operations in programming. Sorting -algorithms play a crucial role in efficient searching, organization and +algorithms play a crucial role in efficient searching, organization, interpretation of data, and optimization problems. Furthermore, sorting is often times the first step towards designing complicated algorithms in computer -science. This project aims to study efficient sorting of a list of coordinates -by the second value of each coordinate, in Assembly. Although abstract high -level languages such as Python and Java allow faster and convenient development, +science. This project aims to study efficient sorting of a list of coordinates, +when sorting by the second coordinate. This will be done solely in +\textit{x86\_64 Linux Assembly}. Although abstract high level languages such as +\textit{Python} and \textit{Java} allow faster and convenient development, Assembly provides precise control over program flow, memory usage and instruction level execution, making Assembly an excellent environment for -benchmarking and performance analysis. In this project the two algorithms -quick-sort and insertion-sort will be implemented and compared against each -other. Quick-sort is expected to excel on large random datasets, while -insertion-sort is more suitable for small or nearly sorted cases. Several test -instances consisting of uniformly random coordinates will be created with the -goal of comparing the implementations actual runtime to their theoretical -runtimes and evaluate each implementation. +benchmarking and performance analysis. + +Two sorting algorithms quicksort and insertionsort will be implemented and +compared against each other. Quicksort is expected to excel on large random +datasets, while insertionsort is more suitable for small or nearly sorted +cases. Several test instances consisting of uniformly random coordinates will be +created with the goal of comparing the implementations actual runtime to their +theoretical runtimes and evaluate each implementation. \section{Design} -\subsection{How are the input file and output formatted?} +\subsection{Input and Output format} Each line contains one and only one coordinate consisting of two values formatted as \texttt{$\langle$x$\rangle$\textbackslash -t$\langle$y$\rangle$\textbackslash n}, where $x$ and $y$ are non-negative -integers in the range 0 to 32767. Although limiting, this range is a design -decision taken in order to avoid unnecessary complexity and keep focus on the -main objective of the project, efficient sorting. Including negative numbers -would for example require extra logic for parsing, comparison, converting back -and forth to ASCII, and allocating memory. Both input and output follow this -strict format. - - -\subsection{Program Flow} -The program is structured into four main phases: reading input, parsing data, -sorting, and generating output. Each phase is designed to handle a specific -task, which creates modularity and clarity. Within each phase a set of functions -manage the corresponding operations, allowing the programmer to focus on one -task at a time. This design also greatly supports debugging by simplifying -tracking and solving errors. Below is a flowchart visualizing each step, -followed by a short summary of the program. - -\begin{figure}[H] - \centering - \includegraphics[scale=0.6]{flowchart.drawio.png} - \caption{Program flow} - \label{fig:flowchart} -\end{figure} - -\textbf{Reading Input:} The first step of the program is to find the input file, -use an open syscall, calculate the size of the file using a function called -\textit{getFileSize}, allocating memory, and saving the coordinates. - -\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that -only useful data will be extracted. This process omits the tabs, and newlines, -which in turn significantly simplifies iteration through the data, by ensuring -that only 8-byte coordinates are left. Lastly an important design choice has -been creating an array of pointers to each coordinate pair. This design is -explained in further detail later. All of this logic has been wrapped in a -function \textit{make\_array\_from\_file}, which utilizes several smaller helper -functions to perform this parsing process. +t$\langle$y$\rangle$\textbackslash n}, where \texttt{x} and \texttt{y} are +non-negative integers in the range 0 to 32767. Although limiting, this range is +a design decision taken in order to avoid unnecessary complexity and keep focus +on the main objective of the project, efficient sorting. Including negative +numbers would for example require extra logic for parsing, comparison, +converting back and forth to ASCII, and allocating memory. Both input and output +follow this strict format. -\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on -the coordinates. The currently implemented sorting algorithms are quick-sort and -insertion-sort. Both algorithms have been designed to allow sorting both on the -x-coordinate and the y-coordinate. This is easily achievable because of the -array of pointers. The chosen algorithm can be passed as a command line -argument, otherwise a default algorithm will be run. This design where an -algorithm can be passed to the sorting function, allows the programmed to be -extended with new algorithms if needed. +\subsection{Selection of different algorithms} +The ability to choose which algorithm to use can be quite beneficial, if you +know certain aspects about your data. For that reason, and beause it makes +testing different algorithms easiers, the program will be able to run with +different algorithms passed as arguments. -\textbf{Generating output:} A side effect of the -\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte -integers. This allows the sorting algorithm to make numeric comparisons on the -data. But this also means that the data should be converted back to ASCII, -before printing the output. - -\subsection{Why an array of pointers?} +\subsection{Format when sorting} In order to be a little more general, and make our program easily expandable to -different types of data, we wanted to create out own array datatype, that +different types of data, we wanted to create our own array datatype, that allows us to sort something more abstract, with respect to a key in that abstract object. To do this we settled on an array of memory addresses, that -point to where the abstract objects are, see \autoref{fig:array}. These object +point to where the abstract objects are, see \autoref{fig:array}. These objects would in practice also be represented as arrays. With this we can specify which index (key) in the sub-arrays we want to sort by, and then when we swap elements in the array, we simply swap the memory addresses around. @@ -160,11 +123,54 @@ in the array, we simply swap the memory addresses around. \draw[->] (3.5,1.1) -- (3.5,1.9); \end{tikzpicture} \caption{The format of the array $A$, where $p_n$ is the $n^\text{th}$ - memory address, pointing to an array where $e_nm$ is the $m^\text{th}$ - element of the $n^\text{th}$ element in $A$.} + memory address, pointing to an array where $e_{nm}$ is the $m^\text{th}$ + element of the $n^\text{th}$ array in $A$.} \label{fig:array} \end{figure} +\subsection{Program Flow} +The program is structured into four main phases: selecting an algorithm, +converting the inputfile to the array format, sorting that arrau, and printing +the array to stdout. Each phase is designed to handle a specific task, which +creates modularity and clarity. Within each phase a set of functions manage the +corresponding operations, allowing the programmer to focus on one task at a +time. This design also greatly supports debugging by simplifying tracking and +solving errors. Below is a flowchart visualizing each step, followed by a short +summary of the design. + +\begin{figure}[H] + \centering + \includegraphics[width=\linewidth]{design.drawio.png} + \caption{Program flow} + \label{fig:flowchart} +\end{figure} + +\textbf{Select Algorithm:} Th + +\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that +only useful data will be extracted. This process omits the tabs, and newlines, +which in turn significantly simplifies iteration through the data, by ensuring +that only 8-byte coordinates are left. Lastly an important design choice has +been creating an array of pointers to each coordinate pair. This design is +explained in further detail later. All of this logic has been wrapped in a +function \textit{make\_array\_from\_file}, which utilizes several smaller helper +functions to perform this parsing process. + +\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on +the coordinates. The currently implemented sorting algorithms are quick-sort and +insertion-sort. Both algorithms have been designed to allow sorting both on the +x-coordinate and the y-coordinate. This is easily achievable because of the +array of pointers. The chosen algorithm can be passed as a command line +argument, otherwise a default algorithm will be run. This design where an +algorithm can be passed to the sorting function, allows the programmed to be +extended with new algorithms if needed. + +\textbf{Generating output:} A side effect of the +\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte +integers. This allows the sorting algorithm to make numeric comparisons on the +data. But this also means that the data should be converted back to ASCII, +before printing the output. + \section{Implementation} This four phase design has been converted into Assembly code using the x86\_64 instruction set. Furthermore the design has been split up in modules, where each @@ -249,41 +255,49 @@ and saves the result in a \textit{csv} file. The results of this \textit{csv} file is shown \begin{figure}[H] -% \centering -% \begin{tikzpicture} -% -% \begin{axis}[ -% xlabel={size}, -% ylabel={seconds}, -% ymode=log, -% width=\textwidth, -% legend pos=north west, -% scatter/classes={ -% isort_random={mark=*,red}, -% isort_reverse={mark=*,green}, -% isort_sorted={mark=*,blue}, -% qsort_random={mark=*,cyan}, -% qsort_reverse={mark=*,magenta}, -% qsort_sorted={mark=*,yellow} -% } -% ] -% -% \addplot[ -% scatter, only marks, -% scatter src=explicit symbolic -% ] -% table [x=size, y=real, meta=type, col sep=comma] -% {../test/benchmark_results.csv}; -% \legend{isort\_random, isort\_reverse, isort\_sorted, qsort\_random, qsort\_reverse, qsort\_sorted} -% -% % \addplot[domain=100:100000, samples=10, thick, densely dotted, black]{x/250000}; -% % \addlegendentry{$\mathcal O(\text{size})$} -% % -% % \addplot[domain=100:100000, samples=10, thick, dashed, black]{(x*x)/4000000000}; -% % \addlegendentry{$\mathcal O(\text{size}^2)$} -% -% \end{axis} -% \end{tikzpicture} + \centering + \begin{tikzpicture} + \begin{axis}[ + xlabel={size({$n$})}, + ylabel={seconds}, + ymode=log, + % xmode=log, + width=\textwidth, + legend pos=north west, + scatter/classes={ + isort_random={mark=*,red}, + isort_reverse={mark=*,green}, + isort_sorted={mark=*,blue}, + qsort_random={mark=*,cyan}, + qsort_reverse={mark=*,magenta}, + qsort_sorted={mark=*,yellow} + } + ] + + % -- Scatter plot -- + \addplot[scatter, only marks, scatter src=explicit symbolic] table + [x=size, y=time, meta=type, col sep=comma] + {../test/benchmark_results.csv}; + \legend{isort\_random, isort\_reverse, isort\_sorted, qsort\_random, + qsort\_reverse, qsort\_sorted} + + % -- O(n) + \addplot[domain=1:92709, samples=100, thick, densely dotted, black] + {(x * 1/18000000+0.0003)}; + \addlegendentry{$\mathcal O(n)$} + + % -- O(nlogn) + \addplot[domain=1:92709, samples=100, thick, dotted, black] + {(x*ln(x)) * 1/80000000 + 0.0003}; + \addlegendentry{$\mathcal O(n\log n)$} + + % -- O(n^2) + \addplot[domain=1:92709, samples=100, thick, dashed, black] + {(x*x) * 1/5000000000+0.0003}; + \addlegendentry{$\mathcal O(n^2)$} + + \end{axis} + \end{tikzpicture} \caption{Benchmarking data from Insertionsort and Quicksort for both random, sorted and reverse sorted data.}\label{fig:benchmark} \end{figure} |