report(*): Did some small refactoring, and added pretty plot

author: Andreas Kapp Lindquist <andkaplin05@gmail.com> 2025-10-28 14:13:59 +0100
committer: mithe24 <mithe24@student.sdu.dk> 2025-10-29 13:49:57 +0100
commit: e119499fc68085b5c915091de2432465c24724e0 (patch)
tree: e255610bf1ccf508a0658018efd2eb6bce06c669 /report
parent: 731bba8d505e1da4a5b920871655e785c5e90717 (diff)
download: sorter-e119499fc68085b5c915091de2432465c24724e0.tar.gz
sorter-e119499fc68085b5c915091de2432465c24724e0.zip
1 files changed, 118 insertions, 104 deletions
diff --git a/report/report.tex b/report/report.tex
index d412a4e..4341383 100644
--- a/report/report.tex
+++ b/report/report.tex
@@ -32,7 +32,7 @@
 
     \vspace{1cm}
     \large
-    October 20$^\text{th}$ 2025
+    October 31$^\text{st}$ 2025
 \end{center}
 
 \newpage
@@ -41,86 +41,49 @@
 
 \section{Introduction}
 Sorting is one of the most fundamental operations in programming. Sorting
-algorithms play a crucial role in efficient searching, organization and
+algorithms play a crucial role in efficient searching, organization,
 interpretation of data, and optimization problems. Furthermore, sorting is often
 times the first step towards designing complicated algorithms in computer
-science. This project aims to study efficient sorting of a list of coordinates
-by the second value of each coordinate, in Assembly. Although abstract high
-level languages such as Python and Java allow faster and convenient development,
+science. This project aims to study efficient sorting of a list of coordinates,
+when sorting by the second coordinate. This will be done solely in
+\textit{x86\_64 Linux Assembly}. Although abstract high level languages such as
+\textit{Python} and \textit{Java} allow faster and convenient development,
 Assembly provides precise control over program flow, memory usage and
 instruction level execution, making Assembly an excellent environment for
-benchmarking and performance analysis. In this project the two algorithms
-quick-sort and insertion-sort will be implemented and compared against each
-other. Quick-sort is expected to excel on large random datasets, while
-insertion-sort is more suitable for small or nearly sorted cases. Several test
-instances consisting of uniformly random coordinates will be created with the
-goal of comparing the implementations actual runtime to their theoretical
-runtimes and evaluate each implementation.
+benchmarking and performance analysis.
+
+Two sorting algorithms quicksort and insertionsort will be implemented and
+compared against each other. Quicksort is expected to excel on large random
+datasets, while insertionsort is more suitable for small or nearly sorted
+cases. Several test instances consisting of uniformly random coordinates will be
+created with the goal of comparing the implementations actual runtime to their
+theoretical runtimes and evaluate each implementation.
 
 \section{Design}
 
-\subsection{How are the input file and output formatted?}
+\subsection{Input and Output format}
 Each line contains one and only one coordinate consisting of two values
 formatted as \texttt{$\langle$x$\rangle$\textbackslash
-t$\langle$y$\rangle$\textbackslash n}, where $x$ and $y$ are non-negative
-integers in the range 0 to 32767. Although limiting, this range is a design
-decision taken in order to avoid unnecessary complexity and keep focus on the
-main objective of the project, efficient sorting. Including negative numbers
-would for example require extra logic for parsing, comparison, converting back
-and forth to ASCII, and allocating memory. Both input and output follow this
-strict format.
-
-
-\subsection{Program Flow}
-The program is structured into four main phases: reading input, parsing data,
-sorting, and generating output. Each phase is designed to handle a specific
-task, which creates modularity and clarity. Within each phase a set of functions
-manage the corresponding operations, allowing the programmer to focus on one
-task at a time. This design also greatly supports debugging by simplifying
-tracking and solving errors. Below is a flowchart visualizing each step,
-followed by a short summary of the program.
-
-\begin{figure}[H]
-    \centering
-    \includegraphics[scale=0.6]{flowchart.drawio.png}
-    \caption{Program flow}
-    \label{fig:flowchart}
-\end{figure}
-
-\textbf{Reading Input:} The first step of the program is to find the input file,
-use an open syscall, calculate the size of the file using a function called
-\textit{getFileSize}, allocating memory, and saving the coordinates.
-
-\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that
-only useful data will be extracted. This process omits the tabs, and newlines,
-which in turn significantly simplifies iteration through the data, by ensuring
-that only 8-byte coordinates are left. Lastly an important design choice has
-been creating an array of pointers to each coordinate pair. This design is
-explained in further detail later. All of this logic has been wrapped in a
-function \textit{make\_array\_from\_file}, which utilizes several smaller helper
-functions to perform this parsing process.
+t$\langle$y$\rangle$\textbackslash n}, where \texttt{x} and \texttt{y} are
+non-negative integers in the range 0 to 32767. Although limiting, this range is
+a design decision taken in order to avoid unnecessary complexity and keep focus
+on the main objective of the project, efficient sorting. Including negative
+numbers would for example require extra logic for parsing, comparison,
+converting back and forth to ASCII, and allocating memory. Both input and output
+follow this strict format.
 
-\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on
-the coordinates. The currently implemented sorting algorithms are quick-sort and
-insertion-sort. Both algorithms have been designed to allow sorting both on the
-x-coordinate and the y-coordinate. This is easily achievable because of the
-array of pointers. The chosen algorithm can be passed as a command line
-argument, otherwise a default algorithm will be run. This design where an
-algorithm can be passed to the sorting function, allows the programmed to be
-extended with new algorithms if needed.
+\subsection{Selection of different algorithms}
+The ability to choose which algorithm to use can be quite beneficial, if you
+know certain aspects about your data. For that reason, and beause it makes
+testing different algorithms easiers, the program will be able to run with
+different algorithms passed as arguments.
 
-\textbf{Generating output:} A side effect of the
-\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte
-integers. This allows the sorting algorithm to make numeric comparisons on the
-data. But this also means that the data should be converted back to ASCII,
-before printing the output. 
-
-\subsection{Why an array of pointers?}
+\subsection{Format when sorting}
 In order to be a little more general, and make our program easily expandable to
-different types of data, we wanted to create out own array datatype, that
+different types of data, we wanted to create our own array datatype, that
 allows us to sort something more abstract, with respect to a key in that
 abstract object. To do this we settled on an array of memory addresses, that
-point to where the abstract objects are, see \autoref{fig:array}. These object
+point to where the abstract objects are, see \autoref{fig:array}. These objects
 would in practice also be represented as arrays. With this we can specify which
 index (key) in the sub-arrays we want to sort by, and then when we swap elements
 in the array, we simply swap the memory addresses around. 
@@ -160,11 +123,54 @@ in the array, we simply swap the memory addresses around.
         \draw[->] (3.5,1.1) -- (3.5,1.9);
     \end{tikzpicture}
     \caption{The format of the array $A$, where $p_n$ is the $n^\text{th}$
-    memory address, pointing to an array where $e_nm$ is the $m^\text{th}$
-    element of the $n^\text{th}$ element in $A$.}
+    memory address, pointing to an array where $e_{nm}$ is the $m^\text{th}$
+    element of the $n^\text{th}$ array in $A$.}
     \label{fig:array}
 \end{figure}
 
+\subsection{Program Flow}
+The program is structured into four main phases: selecting an algorithm,
+converting the inputfile to the array format, sorting that arrau, and printing
+the array to stdout. Each phase is designed to handle a specific task, which
+creates modularity and clarity. Within each phase a set of functions manage the
+corresponding operations, allowing the programmer to focus on one task at a
+time. This design also greatly supports debugging by simplifying tracking and
+solving errors. Below is a flowchart visualizing each step, followed by a short
+summary of the design.
+
+\begin{figure}[H]
+    \centering
+    \includegraphics[width=\linewidth]{design.drawio.png}
+    \caption{Program flow}
+    \label{fig:flowchart}
+\end{figure}
+
+\textbf{Select Algorithm:} Th
+
+\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that
+only useful data will be extracted. This process omits the tabs, and newlines,
+which in turn significantly simplifies iteration through the data, by ensuring
+that only 8-byte coordinates are left. Lastly an important design choice has
+been creating an array of pointers to each coordinate pair. This design is
+explained in further detail later. All of this logic has been wrapped in a
+function \textit{make\_array\_from\_file}, which utilizes several smaller helper
+functions to perform this parsing process.
+
+\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on
+the coordinates. The currently implemented sorting algorithms are quick-sort and
+insertion-sort. Both algorithms have been designed to allow sorting both on the
+x-coordinate and the y-coordinate. This is easily achievable because of the
+array of pointers. The chosen algorithm can be passed as a command line
+argument, otherwise a default algorithm will be run. This design where an
+algorithm can be passed to the sorting function, allows the programmed to be
+extended with new algorithms if needed.
+
+\textbf{Generating output:} A side effect of the
+\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte
+integers. This allows the sorting algorithm to make numeric comparisons on the
+data. But this also means that the data should be converted back to ASCII,
+before printing the output. 
+
 \section{Implementation}
 This four phase design has been converted into Assembly code using the x86\_64
 instruction set. Furthermore the design has been split up in modules, where each
@@ -249,41 +255,49 @@ and saves the result in a \textit{csv} file. The results of this \textit{csv}
 file is shown 
 
 \begin{figure}[H]
-%     \centering
-%     \begin{tikzpicture}
-%
-%       \begin{axis}[
-%           xlabel={size},
-%           ylabel={seconds},
-%           ymode=log,
-%           width=\textwidth,
-%           legend pos=north west,
-%           scatter/classes={
-%             isort_random={mark=*,red},
-%             isort_reverse={mark=*,green},
-%             isort_sorted={mark=*,blue},
-%             qsort_random={mark=*,cyan},
-%             qsort_reverse={mark=*,magenta},
-%             qsort_sorted={mark=*,yellow}
-%           }
-%         ]
-%
-%         \addplot[
-%           scatter, only marks,
-%           scatter src=explicit symbolic
-%         ]
-%         table [x=size, y=real, meta=type, col sep=comma]
-%         {../test/benchmark_results.csv};
-%         \legend{isort\_random, isort\_reverse, isort\_sorted, qsort\_random, qsort\_reverse, qsort\_sorted}
-%
-%         % \addplot[domain=100:100000, samples=10, thick, densely dotted, black]{x/250000};
-%         %   \addlegendentry{$\mathcal O(\text{size})$}
-%         %
-%         % \addplot[domain=100:100000, samples=10, thick, dashed, black]{(x*x)/4000000000};
-%         %   \addlegendentry{$\mathcal O(\text{size}^2)$}
-%
-%       \end{axis}
-%     \end{tikzpicture}
+    \centering
+    \begin{tikzpicture}
+        \begin{axis}[
+                xlabel={size({$n$})},
+                ylabel={seconds},
+                ymode=log,
+                % xmode=log,
+                width=\textwidth,
+                legend pos=north west,
+                scatter/classes={
+                    isort_random={mark=*,red},
+                    isort_reverse={mark=*,green},
+                    isort_sorted={mark=*,blue},
+                    qsort_random={mark=*,cyan},
+                    qsort_reverse={mark=*,magenta},
+                    qsort_sorted={mark=*,yellow}
+                }
+            ]
+
+        % -- Scatter plot --
+        \addplot[scatter, only marks, scatter src=explicit symbolic] table 
+            [x=size, y=time, meta=type, col sep=comma]
+            {../test/benchmark_results.csv};
+        \legend{isort\_random, isort\_reverse, isort\_sorted, qsort\_random,
+            qsort\_reverse, qsort\_sorted}
+
+        % -- O(n)
+        \addplot[domain=1:92709, samples=100, thick, densely dotted, black]
+          {(x * 1/18000000+0.0003)};
+        \addlegendentry{$\mathcal O(n)$}
+
+        % -- O(nlogn)
+        \addplot[domain=1:92709, samples=100, thick, dotted, black]
+          {(x*ln(x)) * 1/80000000 + 0.0003};
+        \addlegendentry{$\mathcal O(n\log n)$}
+
+        % -- O(n^2)
+        \addplot[domain=1:92709, samples=100, thick, dashed, black]
+          {(x*x) * 1/5000000000+0.0003};
+        \addlegendentry{$\mathcal O(n^2)$}
+
+      \end{axis}
+    \end{tikzpicture}
     \caption{Benchmarking data from Insertionsort and Quicksort for both random,
     sorted and reverse sorted data.}\label{fig:benchmark}
 \end{figure}
author	Andreas Kapp Lindquist <andkaplin05@gmail.com>	2025-10-28 14:13:59 +0100
committer	mithe24 <mithe24@student.sdu.dk>	2025-10-29 13:49:57 +0100
commit	e119499fc68085b5c915091de2432465c24724e0 (patch)
tree	e255610bf1ccf508a0658018efd2eb6bce06c669 /report
parent	731bba8d505e1da4a5b920871655e785c5e90717 (diff)
download	sorter-e119499fc68085b5c915091de2432465c24724e0.tar.gz sorter-e119499fc68085b5c915091de2432465c24724e0.zip