report/report.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287

\documentclass{article}

\usepackage{geometry}
\usepackage{hyperref}
\usepackage{pgfplots}
\usepackage{tikz}
\usepackage{minted}
\usepackage{booktabs}
\usepackage{siunitx}
\usepackage{algorithm2e}
\usepackage{float}
\geometry{a4paper}

\begin{document}

\begin{center}
    \Huge
    Sorting in x86\_64 Linux Assembly
    \vspace{1cm}

    \Large
    Navid S.

    \vspace{0.5cm}
    Andreas K. L.

    \vspace{0.5cm}
    Mikkel T.

    \vspace{1cm}
    \large
    October 20$^\text{th}$ 2025
\end{center}

\newpage
\tableofcontents
\newpage

\section{Introduction}
Sorting is one of the most fundamental operations in programming. Sorting
algorithms play a crucial role in efficient searching, organization and
interpretation of data, and optimization problems. Furthermore, sorting is often
times the first step towards designing complicated algorithms in computer
science. This project aims to study efficient sorting of a list of coordinates
by the second value of each coordinate, in Assembly. Although abstract high
level languages such as Python and Java allow faster and convenient development,
Assembly provides precise control over program flow, memory usage and
instruction level execution, making Assembly an excellent environment for
benchmarking and performance analysis. In this project the two algorithms
quick-sort and insertion-sort will be implemented and compared against each
other. Quick-sort is expected to excel on large random datasets, while
insertion-sort is more suitable for small or nearly sorted cases. Several test
instances consisting of uniformly random coordinates will be created with the
goal of comparing the implementations actual runtime to their theoretical
runtimes and evaluate each implementation.

\section{Design}

\subsection{How are the input file and output formatted?}
Each line contains one and only one coordinate consisting of two values
formatted as \texttt{$\langle$x$\rangle$\textbackslash
t$\langle$y$\rangle$\textbackslash n}, where $x$ and $y$ are non-negative
integers in the range 0 to 32767. Although limiting, this range is a design
decision taken in order to avoid unnecessary complexity and keep focus on the
main objective of the project, efficient sorting. Including negative numbers
would for example require extra logic for parsing, comparison, converting back
and forth to ASCII, and allocating memory. Both input and output follow this
strict format.


\subsection{Program Flow}
The program is structured into four main phases: reading input, parsing data,
sorting, and generating output. Each phase is designed to handle a specific
task, which creates modularity and clarity. Within each phase a set of functions
manage the corresponding operations, allowing the programmer to focus on one
task at a time. This design also greatly supports debugging by simplifying
tracking and solving errors. Below is a flowchart visualizing each step,
followed by a short summary of the program.

\begin{figure}[H]
    \centering
    \includegraphics[scale=0.6]{flowchart.drawio.png}
    \caption{Program flow}
    \label{fig:flowchart}
\end{figure}

\textbf{Reading Input:} The first step of the program is to find the input file,
use an open syscall, calculate the size of the file using a function called
\textit{getFileSize}, allocating memory, and saving the coordinates.

\textbf{Parsing the data:} Next, the received file is to be parsed, meaning that
only useful data will be extracted. This process omits the tabs, and newlines,
which in turn significantly simplifies iteration through the data, by ensuring
that only 8-byte coordinates are left. Lastly an important design choice has
been creating an array of pointers to each coordinate pair. This design is
explained in further detail later. All of this logic has been wrapped in a
function \textit{make\_array\_from\_file}, which utilizes several smaller helper
functions to perform this parsing process.

\textbf{Sorting:} After parsing, a sorting algorithm of choice can be used on
the coordinates. The currently implemented sorting algorithms are quick-sort and
insertion-sort. Both algorithms have been designed to allow sorting both on the
x-coordinate and the y-coordinate. This is easily achievable because of the
array of pointers. The chosen algorithm can be passed as a command line
argument, otherwise a default algorithm will be run. This design where an
algorithm can be passed to the sorting function, allows the programmed to be
extended with new algorithms if needed.

\textbf{Generating output:} A side effect of the
\textit{make\_array\_from\_file} is converting ASCII characters to 8-byte
integers. This allows the sorting algorithm to make numeric comparisons on the
data. But this also means that the data should be converted back to ASCII,
before printing the output. 

\subsection{Why an array of pointers?}
In order to be a little more general, and make our program easily expandable to
different types of data, we wanted to create out own array datatype, that
allows us to sort something more abstract, with respect to a key in that
abstract object. To do this we settled on an array of memory addresses, that
point to where the abstract objects are, see \autoref{fig:array}. These object
would in practice also be represented as arrays. With this we can specify which
index (key) in the sub-arrays we want to sort by, and then when we swap elements
in the array, we simply swap the memory addresses around. 

\begin{figure}[H]
    \centering
    \begin{tikzpicture}
        \draw (0,0) rectangle (4,1); % outer box
        \node at (-0.5,0.5) {$A=$};

        % vertical dividers
        \draw (1,0) -- (1,1);
        \draw (2,0) -- (2,1);
        \draw (3,0) -- (3,1);
        \node at (0.5,0.5) {$p_1$};
        \node at (1.5,0.5) {$p_2$};
        \node at (2.5,0.5) {$\cdots$};
        \node at (3.5,0.5) {$p_n$};

        \draw (0.1,2) rectangle (0.8,5);
        \node at (0.5,2.5) {$e_{11}$};
        \node at (0.45,3.6) {$\vdots$};
        \node at (0.5,4.5) {$e_{1m}$};

        \draw (1.1,2) rectangle (1.8,5);
        \node at (1.5,2.5) {$e_{21}$};
        \node at (1.45,3.6) {$\vdots$};
        \node at (1.5,4.5) {$e_{2m}$};

        \draw (3.1,2) rectangle (3.8,5);
        \node at (3.5,2.5) {$e_{n1}$};
        \node at (3.45,3.6) {$\vdots$};
        \node at (3.5,4.5) {$e_{nm}$};

        \draw[->] (0.5,1.1) -- (0.5,1.9);
        \draw[->] (1.5,1.1) -- (1.5,1.9);
        \draw[->] (3.5,1.1) -- (3.5,1.9);
    \end{tikzpicture}
    \caption{The format of the array $A$, where $p_n$ is the $n^\text{th}$
    memory address, pointing to an array where $e_nm$ is the $m^\text{th}$
    element of the $n^\text{th}$ element in $A$.}
    \label{fig:array}
\end{figure}

\section{Implementation}
This four phase design has been converted into Assembly code using the x86\_64
instruction set. Furthermore the design has been split up in modules, where each
module is a collection of closely related functions. For example
\textit{array\_maker.s} contains functions meant to parse data. All functions in
this program follow the x86 calling conventions, with a few exceptions where all
registers are preserved. Below is a detailed explanation of some of the more
interesting functions.

\subsection{Insertion-sort}
This is a classic implementation of the iterative insertion-sort. The function
takes three arguments: address of the array of pointers to each coordinate,
number of coordinates and which coordinate dimension to sort by. The first step
of this algorithm is to check the number of coordinates with a compare
statement, if n $\le$ 1, the array is already sorted and nothing should be done.
Otherwise the algorithm initializes two indices i and j, representing the outer
and inner loop in insertion-sort. A snippet of the algorithm is seen below.

\begin{algorithm}[H]
\caption{Insertion-sort inner Loop}
\SetAlgoNoLine
\SetKw{KwCmp}{cmpq}
\SetKw{KwJl}{jl}
\SetKw{KwMov}{movq}
\SetKw{KwJge}{jge}
\SetKw{KwDec}{decq}
\SetKw{KwJmp}{jmp}

\textbf{inner\_loop:} \\
\KwCmp{$0$, \%r14} \tcp*[r]{if j < 0 stop and insert the key}
\KwJl insert\_key \\

\KwMov{$(\%rdi, \%r14, 8)$, \%r10} \tcp*[r]{pointer to key → r10}
\KwMov{$(\%r10, \%rdx, 8)$, \%r11} \tcp*[r]{key → r11}

\KwCmp{\%r11, \%r9} \\
\KwJge insert\_key \tcp*[r]{if A[i] >= A[j] stop}

\KwMov{\%r10, $8(\%rdi, \%r14, 8)$} \tcp*[r]{A[j+1] = pointer to key of j}

\KwDec{\%r14} \tcp*[r]{move to j - 1}
\KwJmp inner\_loop \\

\textbf{insert\_key:} \tcp*[r]{Insert the key here}

\end{algorithm}

The two registers R8 and R9 save the current index and value of "i" while R10
and R11 are used for "j". The formula for loading the value of an element consists of two steps:
firstly loading the pointer to the element, and secondly loading the actual value of this pointer. This
is a result of implementing an array of pointers to allow more flexibility, instead of directly using the array coordinates.
Lines 3 and 4 of the pseudo code demonstrate how this process is done. Each pointer is 8-bytes, therefore the correct pointer
is calculated by adding index * 8 to RDI, which stores the head of the pointer array. Next, because each value is also 8 bytes,
RDX * 8 to the pointer, where RDX is the coordinate dimension the algorithm uses to sort, can be used to retrieve the actual value of this pointer.
After loading the values, compare statements can be used to find the correct placement of each coordinate.
Like the standard insertion-sort, this implementation creates a sorted part on the left side of the array. Therefore
the first compare checks if A[i] is equal or bigger than A[j], if this is the case A[i] can not be smaller than any element
at j or before. Otherwise the algorithm keeps searching for the correct placement of A[i] in the sorted part by decreasing j and repeating
the inner loop. If j at some point hits 0, the element A[i] is the current smallest coordinate
and should be inserted there. This process continues until R15 which holds the index of "i" is greater than the number of coordinates.

\section{Evaluation}
\begin{itemize}
    \item Test insertion-sort
    \item Test quick-sort
    \item Benchmark insertion-sort vs. quick-sort random data
    \item Benchmark insertion-sort vs. quick-sort sorted data
    \item Benchmark insertion-sort vs. quick-sort unsorted data
\end{itemize}

To evaluate whether out program works, we created three \textit{shell scripts},
one for generating test data (\textit{generate\_test\_data.sh}), one for testing
the validity of the output of the program (\textit{test.sh}) and one for testing
the execution time of the program (\textit{benchmark.sh}).

The script that generates test files, generates files of size $n$ for all $n\in
\{10000\cdot i \mid 0 <= i <= 10\}$ (a size of 10, means the file consists of
10 coordinates). For each $n$ we also create three different kinds of test files. Now we can use our \textit{test.sh} script, which runs the
program with each of the generated test files and compares the output with the
output from the \textit{builtin} function \textit{sort}.

Given all of the tests pass, we can then run \textit{benchmark.sh}, which times
how long it takes for out program to run with each test file, and saves the
result in a \textit{csv} file.
\begin{figure}[H]
    \centering
    \begin{tikzpicture}
      \begin{axis}[
          xlabel={size},
          ylabel={seconds},
          % xmode=log,
          % ymode=log,
          legend pos=north west,
          scatter/classes={
            random={mark=*,red},
            reverse={mark=*,green},
            sorted={mark=*,blue}
          }
        ]

        \addplot[
          scatter, only marks,
          scatter src=explicit symbolic
        ]
        table [x=size, y=real, meta=type, col sep=comma]
        {../test/benchmark_results.csv};
        \legend{random, reverse, sorted}

        \addplot[domain=100:100000, samples=10, thick, densely dotted, black]{x/250000};
          \addlegendentry{$\mathcal O(\text{size})$}

        \addplot[domain=100:100000, samples=10, thick, dashed, black]{(x*x)/4000000000};

          \addlegendentry{$\mathcal O(\text{size}^2)$}
      \end{axis}
    \end{tikzpicture}
    \caption{Benchmarking of Insertion-sort with random data, reversely sorted
    data, and sorted data.}
\end{figure}

\section{Conclusion}
    
\end{document}