aboutsummaryrefslogtreecommitdiff
path: root/report
diff options
context:
space:
mode:
authorAndreas Kapp Lindquist <andkaplin05@gmail.com>2025-10-29 12:25:26 +0100
committermithe24 <mithe24@student.sdu.dk>2025-10-29 13:49:57 +0100
commitb1dce8fb94ee951efb6f15d6fd6142b74f39e7e0 (patch)
treeb950abe374db80878df1a8c7275b4a419272ca1f /report
parent0bf10fbb9dd5c26b605321014324a47549cb7b4c (diff)
downloadsorter-b1dce8fb94ee951efb6f15d6fd6142b74f39e7e0.tar.gz
sorter-b1dce8fb94ee951efb6f15d6fd6142b74f39e7e0.zip
report: Moved some stuff
Diffstat (limited to '')
-rw-r--r--report/report.tex134
1 files changed, 68 insertions, 66 deletions
diff --git a/report/report.tex b/report/report.tex
index 0a36f9a..32de401 100644
--- a/report/report.tex
+++ b/report/report.tex
@@ -66,6 +66,39 @@ theoretical runtimes and evaluate each implementation.
\section{Design}
+\subsection{Program Flow}
+The program is structured into four main phases: selecting an algorithm,
+converting the inputfile to the array format, sorting that array, and printing
+the array to stdout. Each phase is designed to handle a specific task, which
+creates modularity and clarity. Within each phase a set of functions manage the
+corresponding operations, allowing the programmer to focus on one task at a
+time. This design also greatly supports debugging by simplifying tracking and
+solving errors. Below is a flowchart visualizing each step, followed by a short
+summary of the design.
+
+\begin{figure}[H]
+ \centering
+ \includegraphics[width=\linewidth]{Design.drawio.png}
+ \caption{Program flow}
+ \label{fig:flowchart}
+\end{figure}
+
+\textbf{Parsing the data:} After reading the input, the received file is to be
+parsed, meaning that only useful data will be extracted. This will in turn
+significantly simplifies iteration through the data, by ensuring that only
+meaningful are left. Lastly an important design choice has been creating an
+array of pointers to each coordinate pair. This design is explained in further
+detail later.
+
+\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on
+the coordinates, to sort by a preferred coordinate. The chosen algorithm can be
+passed as a command line argument, otherwise a default algorithm will be run.
+This design where an algorithm can be passed to the sorting function, allows the
+programmed to be extended with new algorithms if needed.
+
+\textbf{Generating output:} The last step of the program is converting the
+parsed data back into a format that can be printed.
+
\subsection{Input and Output format}
Each line contains one and only one coordinate consisting of two values
formatted as \texttt{$\langle$x$\rangle$\textbackslash
@@ -77,15 +110,16 @@ numbers would for example require extra logic for parsing, comparison,
converting back and forth to ASCII, and allocating memory. Both input and output
follow this strict format.
-\subsection{Selection of different algorithms}
+\subsection{Selecting algorithms}
The ability to choose which algorithm to use can be quite beneficial, if you
know certain aspects about your data. For example \textit{Quicksort} is very
good with random data, but bad when the data is sorted, and
-\textit{insertionsort} is good with sorted data, but bad with random data.For
-that reason, and beause it makes testing different algorithms easiers, the
-program will be able to run with different algorithms passed as arguments.
+\textit{insertionsort} is good with sorted data, but bad with random data. For
+that reason, and because it makes testing different algorithms easier, because
+it is more flexible, the program will be able to run with different algorithms
+passed as arguments.
-\subsection{Why Insertion-sort and Quick-sort?}
+\subsection{Why Insertionsort and Quicksort?}
Insertion sort and quick sort have been handpicked to research the trade-offs
between algorithmic simplicity and computational efficiency. Furthermore,
these algorithms are widely different in nature. Insertion sort is an iterative
@@ -96,7 +130,7 @@ datasets, while insertion sort is more suitable for small or nearly sorted
cases, which is another great motivation to understand what is really
happening when these algorithms are executed, and how the dataset affects them.
-\subsection{Format when sorting}
+\subsection{Sorting Format}
In order to be a little more general, and make our program easily expandable to
different types of data, we wanted to create our own array datatype, that
allows us to sort something more abstract, with respect to a key in that
@@ -146,51 +180,6 @@ in the array, we simply swap the memory addresses around.
\label{fig:array}
\end{figure}
-\subsection{Program Flow}
-The program is structured into four main phases: selecting an algorithm,
-converting the inputfile to the array format, sorting that arrau, and printing
-the array to stdout. Each phase is designed to handle a specific task, which
-creates modularity and clarity. Within each phase a set of functions manage the
-corresponding operations, allowing the programmer to focus on one task at a
-time. This design also greatly supports debugging by simplifying tracking and
-solving errors. Below is a flowchart visualizing each step, followed by a short
-summary of the design.
-
-\begin{figure}[H]
- \centering
- \includegraphics[width=\linewidth]{Design.drawio.png}
- \caption{Program flow}
- \label{fig:flowchart}
-\end{figure}
-
-\textbf{Reading Input:} The first step of the program is to find the input file,
-use an open syscall, calculate the size of the file using a function called
-\textit{getFileSize}, allocating memory, and saving the coordinates.
-
-\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that
-only useful data will be extracted. This process omits the tabs, and newlines,
-which in turn significantly simplifies iteration through the data, by ensuring
-that only 8-byte coordinates are left. Lastly an important design choice has
-been creating an array of pointers to each coordinate pair. This design is
-explained in further detail later. All of this logic has been wrapped in a
-function \textit{make\_array\_from\_file}, which utilizes several smaller helper
-functions to perform this parsing process.
-
-\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on
-the coordinates. The currently implemented sorting algorithms are quick-sort and
-insertion-sort. Both algorithms have been designed to allow sorting both on the
-x-coordinate and the y-coordinate. This is easily achievable because of the
-array of pointers. The chosen algorithm can be passed as a command line
-argument, otherwise a default algorithm will be run. This design where an
-algorithm can be passed to the sorting function, allows the programmed to be
-extended with new algorithms if needed.
-
-\textbf{Generating output:} A side effect of the
-\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte
-integers. This allows the sorting algorithm to make numeric comparisons on the
-data. But this also means that the data should be converted back to ASCII,
-before printing the output.
-
\section{Implementation}
This four phase design has been converted into Assembly code using the x86\_64
instruction set. Furthermore the design has been split up in modules, where each
@@ -201,7 +190,26 @@ conventions, with a few exceptions where all registers are preserved. Below is a
detailed explanation of some of the more interesting functions.
\subsection{Array maker}
-\snippet{19}{46}{../src/array_maker.s}
+We implemented a global function \texttt{array\_maker}, that takes the
+\textit{file descriptor} as input, and returns a pointer to the start of the
+array, and the number of elements. With this it then first calls a function
+\texttt{create\_file\_buffer}, that reads the contents of the file and puts it
+in memory. It then passes this address, and the length of the file onto the
+function \texttt{make\_pairwise\_data}
+
+\begin{figure}[H]
+ \snippet{19}{46}{../src/array_maker.s}
+ \caption{Code snippet from \texttt{array\_maker.s}}\label{fig:arraysnippet}
+\end{figure}
+
+Load file in memory
+
+create pairwise data
+
+allocate space for the array
+
+Go through each element in the array and create a pointer that points to a pair
+in the pairwise data.
\subsection{Insertion-sort}
This is a classic implementation of the iterative insertion-sort. The function
@@ -266,11 +274,14 @@ The script that generates test files, generates files of size 0, 5000, 10000,
50000, 100000, 500000 and 1000000. (a size of 10, means the file consists of 10
coordinates). For each $n$ we also create three different kinds of test files.
One where all of the data is random, one where the data is already sorted, and
-one where it is reversely sorted. This way we can see both the average case, and
-worst case of any algorithm we decide to use. Now we can use our
-\textit{test.sh} script, which runs the program with each of the generated test
-files for all sorting algorithms of the program, and compares the output with
-the output from the GNU Core Utilities program \textit{sort}.
+one where it is reversely sorted.\footnote{We also did run it for $n=$ 1 million
+with random, but to get a pretty plot we decided to omit these values, so it was
+not too sparse. Quicksort ran for 0.159s and Insertionsort ran for 386.935s.
+This makes Quicksort roughly 2400 times faster than Insertionsort.} This way we
+can see both the average case, and worst case of any algorithm we decide to use.
+Now we can use our \textit{test.sh} script, which runs the program with each of
+the generated test files for all sorting algorithms of the program, and compares
+the output with the output from the \textit{builtin} function \textit{sort}.
Given all of the tests pass, we can then run \textit{benchmark.sh}, which times
how long it takes for our program to run with each test file for each algorithm,
@@ -334,15 +345,6 @@ random. We also see that Quicksort behaves as expected with a best-case runtime
of $\mathcal O(n\log n)$ when the input is random, and a worst-case runtime of
$\mathcal O(n^2)$ when the input is either sorted or reversely sorted.
-Footnote
-
-isort\_random,1000000,386.935
-qsort\_random,1000000,0.148
-isort\_random,1000000,343.616
-qsort\_random,1000000,0.159
-isort\_random,1000000,384.022
-qsort\_random,1000000,0.155,0.146
-
\section{Conclusion}