diff options
| author | Andreas Kapp Lindquist <andkaplin05@gmail.com> | 2025-10-27 10:38:01 +0100 |
|---|---|---|
| committer | mithe24 <mithe24@student.sdu.dk> | 2025-10-29 13:49:57 +0100 |
| commit | 6e82c40715c3dacec01f2ecc8cf15ac4c2907f66 (patch) | |
| tree | 97a54963ba59da772e714dc71527c3667cc97a3f /report | |
| parent | c4d95bde5fcda18b19766fe8ec2912c1ebede4e4 (diff) | |
| download | sorter-6e82c40715c3dacec01f2ecc8cf15ac4c2907f66.tar.gz sorter-6e82c40715c3dacec01f2ecc8cf15ac4c2907f66.zip | |
report(*): Made lines fit 80 characters, and word correction
Diffstat (limited to '')
| -rw-r--r-- | report/report.tex | 134 |
1 files changed, 77 insertions, 57 deletions
diff --git a/report/report.tex b/report/report.tex index ea6822b..e08cc7f 100644 --- a/report/report.tex +++ b/report/report.tex @@ -15,7 +15,7 @@ \begin{center} \Huge - Sorting in x86\_64 linux Assembly + Sorting in x86\_64 Linux Assembly \vspace{1cm} \Large @@ -37,34 +37,45 @@ \newpage \section{Introduction} -Sorting is one of the most fundamental operations in programming. Sorting algorithms play a crucial role -in efficient searching, organization and interpretation of data, and optimization problems. Furthermore, -sorting is often times the first step towards designing complicated algorithms in computer science. This -project aims to study efficient sorting of a list of coordinates by the second value of each coordinate, -in Assembly. Although abstract high level languages such as Python and Java allow faster and convenient development, -Assembly provides precise control over program flow, memory usage and instruction level execution, making Assembly an excellent -evironment for benchmarking and performance analysis. In this project the two algorithms quick-sort and insertion-sort -will be implemented and compared against each other. Quick-sort is expected to excel on large random datasets, -while insertion-sort is more suitable for small or nearly sorted cases. -Several test -instances consisting of uniformly random coordinates will be created with the goal of comparing the implementations -actual runtime to their theoretical runtimes and evaluate each implementation. +Sorting is one of the most fundamental operations in programming. Sorting +algorithms play a crucial role in efficient searching, organization and +interpretation of data, and optimization problems. Furthermore, sorting is often +times the first step towards designing complicated algorithms in computer +science. This project aims to study efficient sorting of a list of coordinates +by the second value of each coordinate, in Assembly. Although abstract high +level languages such as Python and Java allow faster and convenient development, +Assembly provides precise control over program flow, memory usage and +instruction level execution, making Assembly an excellent environment for +benchmarking and performance analysis. In this project the two algorithms +quick-sort and insertion-sort will be implemented and compared against each +other. Quick-sort is expected to excel on large random datasets, while +insertion-sort is more suitable for small or nearly sorted cases. Several test +instances consisting of uniformly random coordinates will be created with the +goal of comparing the implementations actual runtime to their theoretical +runtimes and evaluate each implementation. \section{Design} \subsection{How are the input file and output formatted?} -Each line contains one and only one coordinate consisting of two values formatted as \texttt{$\langle$x$\rangle$\textbackslash t$\langle$y$\rangle$\textbackslash n}, -where x and y are non-negative integers in the range 0 to 32767. Although limitting, this range is a design desicion taken in order to avoid unnecessary complexity and -keep focus on the main objective of the project, efficient sorting. Including negative numbers would for example require extra logic for parsing, comparision, converting back and forth to ASCII, -and allocating memory. Both input and output follow this strict format. +Each line contains one and only one coordinate consisting of two values +formatted as \texttt{$\langle$x$\rangle$\textbackslash +t$\langle$y$\rangle$\textbackslash n}, where $x$ and $y$ are non-negative +integers in the range 0 to 32767. Although limiting, this range is a design +decision taken in order to avoid unnecessary complexity and keep focus on the +main objective of the project, efficient sorting. Including negative numbers +would for example require extra logic for parsing, comparison, converting back +and forth to ASCII, and allocating memory. Both input and output follow this +strict format. \subsection{Program Flow} -The program is structured into four main phases: reading input, parsing data, sorting, and generating output. Each phase is -designed to handle a specific task, which creates modulariy and clarity. Whitin each phase a set of functions manage the corresponding -operations, allowing the programmer to focus on one task at a time. This design also greatly supports debugging -by simplifying tracking and solving errors. Below is a flowchart visualizing each step, followed by -a short summary of the program. +The program is structured into four main phases: reading input, parsing data, +sorting, and generating output. Each phase is designed to handle a specific +task, which creates modularity and clarity. Within each phase a set of functions +manage the corresponding operations, allowing the programmer to focus on one +task at a time. This design also greatly supports debugging by simplifying +tracking and solving errors. Below is a flowchart visualizing each step, +followed by a short summary of the program. \begin{figure}[H] \centering @@ -73,37 +84,43 @@ a short summary of the program. \label{fig:flowchart} \end{figure} -\textbf{Reading Input:} The first step of the program is to find the input file, use an open syscall, calculate the size of the file using a function -called \textit{getFileSize}, allocating memory, and saving the coordinates. +\textbf{Reading Input:} The first step of the program is to find the input file, +use an open syscall, calculate the size of the file using a function called +\textit{getFileSize}, allocating memory, and saving the coordinates. -\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that only useful data will be extracted. This process omits -the tabs, and newlines, which in turn significantly simplifies iteration through the data, by ensuring that only 8-byte coordinates are left. -Lastly an important design choice has been creating an array of pointers to each coordinatepair. -This design is explained in further detail later. All of this logic has been wrapped in a function \textit{make\_array\_from\_file}, which -utilizes several smaller helper functions to perform this parsing process. +\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that +only useful data will be extracted. This process omits the tabs, and newlines, +which in turn significantly simplifies iteration through the data, by ensuring +that only 8-byte coordinates are left. Lastly an important design choice has +been creating an array of pointers to each coordinate pair. This design is +explained in further detail later. All of this logic has been wrapped in a +function \textit{make\_array\_from\_file}, which utilizes several smaller helper +functions to perform this parsing process. -\textbf{Sorting:} -After parsing, a sorting algorithm of choice can be used on the coordinates. The currently implemented sorting algorithms are -quick-sort and insertion-sort. Both algorithms have been designed to allow sorting both on the x-coordinate and the y-coordinate. This -is easily achievable because of the array of pointers. The chosen algorithm can be passed as a command line argument, otherwise -a default algorithm will be run. This design where an algorithm can be passed to the sorting function, allows the programmed to be +\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on +the coordinates. The currently implemented sorting algorithms are quick-sort and +insertion-sort. Both algorithms have been designed to allow sorting both on the +x-coordinate and the y-coordinate. This is easily achievable because of the +array of pointers. The chosen algorithm can be passed as a command line +argument, otherwise a default algorithm will be run. This design where an +algorithm can be passed to the sorting function, allows the programmed to be extended with new algorithms if needed. -\textbf{Generating output:} A side effect of the \textit{make\_array\_from\_file} is converting ASCII characters -to 8-byte integers. This allows the sorting algorithm to make numeric comparisons on the data. But this also means that the data should -be converted back to ASCII, before printing the output. +\textbf{Generating output:} A side effect of the +\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte +integers. This allows the sorting algorithm to make numeric comparisons on the +data. But this also means that the data should be converted back to ASCII, +before printing the output. \subsection{Why an array of pointers?} -In order to be a litte more general, and make our program easily expandable to +In order to be a little more general, and make our program easily expandable to different types of data, we wanted to create out own array datatype, that -allowes us to sort something more abstract, with respect to a key in that -abstract object. -To do this we settled on an array of memory addresses, that point to where the -abstract objects are, see \autoref{fig:array}. These object would in practice -also be represented as -arrays. With this we can specify which index (key) in the sub-arrays we want to -sort by, and then when we swap elements in the array, we simply swap the memory -addresses around. +allows us to sort something more abstract, with respect to a key in that +abstract object. To do this we settled on an array of memory addresses, that +point to where the abstract objects are, see \autoref{fig:array}. These object +would in practice also be represented as arrays. With this we can specify which +index (key) in the sub-arrays we want to sort by, and then when we swap elements +in the array, we simply swap the memory addresses around. \begin{figure}[H] \centering @@ -146,22 +163,25 @@ addresses around. \end{figure} \section{Implementation} -This four phase design has been converted into Assembly code using the x86 64 instruction set. -Furthermore the design has been split up in modules, where each module is a collection of closely -related functions. For example \textit{Array\_maker.s} contains functions meant to parse data. -All functions in this program follow the x86 calling conventions, with a few exceptions -where all registers are preserved. Below is a detailed explanation of some of the more +This four phase design has been converted into Assembly code using the x86\_64 +instruction set. Furthermore the design has been split up in modules, where each +module is a collection of closely related functions. For example +\textit{array\_maker.s} contains functions meant to parse data. All functions in +this program follow the x86 calling conventions, with a few exceptions where all +registers are preserved. Below is a detailed explanation of some of the more interesting functions. \subsection{Insertion-sort} -This is a classic implementation of the iterative insertion-sort. The function takes three arguments: -address of the array of pointers to each coordinate, number of coordinates and which coordinate dimension to sort by. -The first step of this algorithm is to check the number of coordinates with a compare statement, if n $\le$ 1, -the array is already sorted and nothing should be done. Otherwise the algorithm initializes two indices i and j, -representing the outer and inner loop in insertion-sort. A snippet of the algorithm is seen below. +This is a classic implementation of the iterative insertion-sort. The function +takes three arguments: address of the array of pointers to each coordinate, +number of coordinates and which coordinate dimension to sort by. The first step +of this algorithm is to check the number of coordinates with a compare +statement, if n $\le$ 1, the array is already sorted and nothing should be done. +Otherwise the algorithm initializes two indices i and j, representing the outer +and inner loop in insertion-sort. A snippet of the algorithm is seen below. \begin{algorithm}[H] -\caption{Isertion-sort inner Loop} +\caption{Insertion-sort inner Loop} \SetAlgoNoLine \SetKw{KwCmp}{cmpq} \SetKw{KwJl}{jl} |