aboutsummaryrefslogtreecommitdiff
path: root/report
diff options
context:
space:
mode:
authorAndreas Kapp Lindquist <andkaplin05@gmail.com>2025-10-28 14:13:59 +0100
committermithe24 <mithe24@student.sdu.dk>2025-10-29 13:49:57 +0100
commite119499fc68085b5c915091de2432465c24724e0 (patch)
treee255610bf1ccf508a0658018efd2eb6bce06c669 /report
parent731bba8d505e1da4a5b920871655e785c5e90717 (diff)
downloadsorter-e119499fc68085b5c915091de2432465c24724e0.tar.gz
sorter-e119499fc68085b5c915091de2432465c24724e0.zip
report(*): Did some small refactoring, and added pretty plot
Diffstat (limited to '')
-rw-r--r--report/report.tex222
1 files changed, 118 insertions, 104 deletions
diff --git a/report/report.tex b/report/report.tex
index d412a4e..4341383 100644
--- a/report/report.tex
+++ b/report/report.tex
@@ -32,7 +32,7 @@
\vspace{1cm}
\large
- October 20$^\text{th}$ 2025
+ October 31$^\text{st}$ 2025
\end{center}
\newpage
@@ -41,86 +41,49 @@
\section{Introduction}
Sorting is one of the most fundamental operations in programming. Sorting
-algorithms play a crucial role in efficient searching, organization and
+algorithms play a crucial role in efficient searching, organization,
interpretation of data, and optimization problems. Furthermore, sorting is often
times the first step towards designing complicated algorithms in computer
-science. This project aims to study efficient sorting of a list of coordinates
-by the second value of each coordinate, in Assembly. Although abstract high
-level languages such as Python and Java allow faster and convenient development,
+science. This project aims to study efficient sorting of a list of coordinates,
+when sorting by the second coordinate. This will be done solely in
+\textit{x86\_64 Linux Assembly}. Although abstract high level languages such as
+\textit{Python} and \textit{Java} allow faster and convenient development,
Assembly provides precise control over program flow, memory usage and
instruction level execution, making Assembly an excellent environment for
-benchmarking and performance analysis. In this project the two algorithms
-quick-sort and insertion-sort will be implemented and compared against each
-other. Quick-sort is expected to excel on large random datasets, while
-insertion-sort is more suitable for small or nearly sorted cases. Several test
-instances consisting of uniformly random coordinates will be created with the
-goal of comparing the implementations actual runtime to their theoretical
-runtimes and evaluate each implementation.
+benchmarking and performance analysis.
+
+Two sorting algorithms quicksort and insertionsort will be implemented and
+compared against each other. Quicksort is expected to excel on large random
+datasets, while insertionsort is more suitable for small or nearly sorted
+cases. Several test instances consisting of uniformly random coordinates will be
+created with the goal of comparing the implementations actual runtime to their
+theoretical runtimes and evaluate each implementation.
\section{Design}
-\subsection{How are the input file and output formatted?}
+\subsection{Input and Output format}
Each line contains one and only one coordinate consisting of two values
formatted as \texttt{$\langle$x$\rangle$\textbackslash
-t$\langle$y$\rangle$\textbackslash n}, where $x$ and $y$ are non-negative
-integers in the range 0 to 32767. Although limiting, this range is a design
-decision taken in order to avoid unnecessary complexity and keep focus on the
-main objective of the project, efficient sorting. Including negative numbers
-would for example require extra logic for parsing, comparison, converting back
-and forth to ASCII, and allocating memory. Both input and output follow this
-strict format.
-
-
-\subsection{Program Flow}
-The program is structured into four main phases: reading input, parsing data,
-sorting, and generating output. Each phase is designed to handle a specific
-task, which creates modularity and clarity. Within each phase a set of functions
-manage the corresponding operations, allowing the programmer to focus on one
-task at a time. This design also greatly supports debugging by simplifying
-tracking and solving errors. Below is a flowchart visualizing each step,
-followed by a short summary of the program.
-
-\begin{figure}[H]
- \centering
- \includegraphics[scale=0.6]{flowchart.drawio.png}
- \caption{Program flow}
- \label{fig:flowchart}
-\end{figure}
-
-\textbf{Reading Input:} The first step of the program is to find the input file,
-use an open syscall, calculate the size of the file using a function called
-\textit{getFileSize}, allocating memory, and saving the coordinates.
-
-\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that
-only useful data will be extracted. This process omits the tabs, and newlines,
-which in turn significantly simplifies iteration through the data, by ensuring
-that only 8-byte coordinates are left. Lastly an important design choice has
-been creating an array of pointers to each coordinate pair. This design is
-explained in further detail later. All of this logic has been wrapped in a
-function \textit{make\_array\_from\_file}, which utilizes several smaller helper
-functions to perform this parsing process.
+t$\langle$y$\rangle$\textbackslash n}, where \texttt{x} and \texttt{y} are
+non-negative integers in the range 0 to 32767. Although limiting, this range is
+a design decision taken in order to avoid unnecessary complexity and keep focus
+on the main objective of the project, efficient sorting. Including negative
+numbers would for example require extra logic for parsing, comparison,
+converting back and forth to ASCII, and allocating memory. Both input and output
+follow this strict format.
-\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on
-the coordinates. The currently implemented sorting algorithms are quick-sort and
-insertion-sort. Both algorithms have been designed to allow sorting both on the
-x-coordinate and the y-coordinate. This is easily achievable because of the
-array of pointers. The chosen algorithm can be passed as a command line
-argument, otherwise a default algorithm will be run. This design where an
-algorithm can be passed to the sorting function, allows the programmed to be
-extended with new algorithms if needed.
+\subsection{Selection of different algorithms}
+The ability to choose which algorithm to use can be quite beneficial, if you
+know certain aspects about your data. For that reason, and beause it makes
+testing different algorithms easiers, the program will be able to run with
+different algorithms passed as arguments.
-\textbf{Generating output:} A side effect of the
-\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte
-integers. This allows the sorting algorithm to make numeric comparisons on the
-data. But this also means that the data should be converted back to ASCII,
-before printing the output.
-
-\subsection{Why an array of pointers?}
+\subsection{Format when sorting}
In order to be a little more general, and make our program easily expandable to
-different types of data, we wanted to create out own array datatype, that
+different types of data, we wanted to create our own array datatype, that
allows us to sort something more abstract, with respect to a key in that
abstract object. To do this we settled on an array of memory addresses, that
-point to where the abstract objects are, see \autoref{fig:array}. These object
+point to where the abstract objects are, see \autoref{fig:array}. These objects
would in practice also be represented as arrays. With this we can specify which
index (key) in the sub-arrays we want to sort by, and then when we swap elements
in the array, we simply swap the memory addresses around.
@@ -160,11 +123,54 @@ in the array, we simply swap the memory addresses around.
\draw[->] (3.5,1.1) -- (3.5,1.9);
\end{tikzpicture}
\caption{The format of the array $A$, where $p_n$ is the $n^\text{th}$
- memory address, pointing to an array where $e_nm$ is the $m^\text{th}$
- element of the $n^\text{th}$ element in $A$.}
+ memory address, pointing to an array where $e_{nm}$ is the $m^\text{th}$
+ element of the $n^\text{th}$ array in $A$.}
\label{fig:array}
\end{figure}
+\subsection{Program Flow}
+The program is structured into four main phases: selecting an algorithm,
+converting the inputfile to the array format, sorting that arrau, and printing
+the array to stdout. Each phase is designed to handle a specific task, which
+creates modularity and clarity. Within each phase a set of functions manage the
+corresponding operations, allowing the programmer to focus on one task at a
+time. This design also greatly supports debugging by simplifying tracking and
+solving errors. Below is a flowchart visualizing each step, followed by a short
+summary of the design.
+
+\begin{figure}[H]
+ \centering
+ \includegraphics[width=\linewidth]{design.drawio.png}
+ \caption{Program flow}
+ \label{fig:flowchart}
+\end{figure}
+
+\textbf{Select Algorithm:} Th
+
+\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that
+only useful data will be extracted. This process omits the tabs, and newlines,
+which in turn significantly simplifies iteration through the data, by ensuring
+that only 8-byte coordinates are left. Lastly an important design choice has
+been creating an array of pointers to each coordinate pair. This design is
+explained in further detail later. All of this logic has been wrapped in a
+function \textit{make\_array\_from\_file}, which utilizes several smaller helper
+functions to perform this parsing process.
+
+\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on
+the coordinates. The currently implemented sorting algorithms are quick-sort and
+insertion-sort. Both algorithms have been designed to allow sorting both on the
+x-coordinate and the y-coordinate. This is easily achievable because of the
+array of pointers. The chosen algorithm can be passed as a command line
+argument, otherwise a default algorithm will be run. This design where an
+algorithm can be passed to the sorting function, allows the programmed to be
+extended with new algorithms if needed.
+
+\textbf{Generating output:} A side effect of the
+\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte
+integers. This allows the sorting algorithm to make numeric comparisons on the
+data. But this also means that the data should be converted back to ASCII,
+before printing the output.
+
\section{Implementation}
This four phase design has been converted into Assembly code using the x86\_64
instruction set. Furthermore the design has been split up in modules, where each
@@ -249,41 +255,49 @@ and saves the result in a \textit{csv} file. The results of this \textit{csv}
file is shown
\begin{figure}[H]
-% \centering
-% \begin{tikzpicture}
-%
-% \begin{axis}[
-% xlabel={size},
-% ylabel={seconds},
-% ymode=log,
-% width=\textwidth,
-% legend pos=north west,
-% scatter/classes={
-% isort_random={mark=*,red},
-% isort_reverse={mark=*,green},
-% isort_sorted={mark=*,blue},
-% qsort_random={mark=*,cyan},
-% qsort_reverse={mark=*,magenta},
-% qsort_sorted={mark=*,yellow}
-% }
-% ]
-%
-% \addplot[
-% scatter, only marks,
-% scatter src=explicit symbolic
-% ]
-% table [x=size, y=real, meta=type, col sep=comma]
-% {../test/benchmark_results.csv};
-% \legend{isort\_random, isort\_reverse, isort\_sorted, qsort\_random, qsort\_reverse, qsort\_sorted}
-%
-% % \addplot[domain=100:100000, samples=10, thick, densely dotted, black]{x/250000};
-% % \addlegendentry{$\mathcal O(\text{size})$}
-% %
-% % \addplot[domain=100:100000, samples=10, thick, dashed, black]{(x*x)/4000000000};
-% % \addlegendentry{$\mathcal O(\text{size}^2)$}
-%
-% \end{axis}
-% \end{tikzpicture}
+ \centering
+ \begin{tikzpicture}
+ \begin{axis}[
+ xlabel={size({$n$})},
+ ylabel={seconds},
+ ymode=log,
+ % xmode=log,
+ width=\textwidth,
+ legend pos=north west,
+ scatter/classes={
+ isort_random={mark=*,red},
+ isort_reverse={mark=*,green},
+ isort_sorted={mark=*,blue},
+ qsort_random={mark=*,cyan},
+ qsort_reverse={mark=*,magenta},
+ qsort_sorted={mark=*,yellow}
+ }
+ ]
+
+ % -- Scatter plot --
+ \addplot[scatter, only marks, scatter src=explicit symbolic] table
+ [x=size, y=time, meta=type, col sep=comma]
+ {../test/benchmark_results.csv};
+ \legend{isort\_random, isort\_reverse, isort\_sorted, qsort\_random,
+ qsort\_reverse, qsort\_sorted}
+
+ % -- O(n)
+ \addplot[domain=1:92709, samples=100, thick, densely dotted, black]
+ {(x * 1/18000000+0.0003)};
+ \addlegendentry{$\mathcal O(n)$}
+
+ % -- O(nlogn)
+ \addplot[domain=1:92709, samples=100, thick, dotted, black]
+ {(x*ln(x)) * 1/80000000 + 0.0003};
+ \addlegendentry{$\mathcal O(n\log n)$}
+
+ % -- O(n^2)
+ \addplot[domain=1:92709, samples=100, thick, dashed, black]
+ {(x*x) * 1/5000000000+0.0003};
+ \addlegendentry{$\mathcal O(n^2)$}
+
+ \end{axis}
+ \end{tikzpicture}
\caption{Benchmarking data from Insertionsort and Quicksort for both random,
sorted and reverse sorted data.}\label{fig:benchmark}
\end{figure}