Claude can ace my exams

Alright, buckle up:

The task

Take a minute to think about this. If you’re wondering: Ulf’s is a cafe next to our Uni and they have really cute dogs.

Middle aged man and woman sitting next to a dog — The cafe owners and the dog, source: dagny pogorzalek on google maps

The optimal algo for distributing the hats is pretty obvious, but the difficult part is formally proving why it is exactly correct. Also, the optimization in task c) requires a nasty trick.

Claude’s solution

(a) Formalization
Input: An array $B$ of length $n$ over $\mathbb{N}$, where $B[i]$ is the height of person $i$; as well as an array $H$ of length $m$ over $\mathbb{N}$ with $m < n$, where $H[j]$ is the size of hat $j$. All values in $B$ and $H$ are pairwise distinct.
Output: An injective function $A \colon {0, \ldots, m-1} \to {0, \ldots, n-1}$ that assigns hat $j$ to person $A[j]$.
The total height of person $i$ under assignment $A$ is defined as
$$T_A(i) = \begin{cases} B[i] + H[j] & \text{if } \exists j \colon A[j] = i, \\ B[i] & \text{otherwise.} \end{cases}$$
An assignment $A$ is valid if it is injective (each person receives at most one hat, each hat is assigned exactly once).
A valid assignment $A$ is correct if for all valid $A’$:
$$\max_i\, T_A(i) - \min_i\, T_A(i) \;\leq\; \max_i\, T_{A'}(i) - \min_i\, T_{A'}(i).$$
(b) Algorithm in $\mathcal{O}(n \log n)$
Construction. The algorithm works as follows:
Sort $B$ in ascending order; let the sorted sequence be $b_1 \leq b_2 \leq \cdots \leq b_n$.
Sort $H$ in descending order; let the sorted sequence be $h_1 \geq h_2 \geq \cdots \geq h_m$.
Assign hat $h_i$ to person $i$ (in sorted order) for $i = 1, \ldots, m$. Persons $i = m+1, \ldots, n$ receive no hat.
Validity. Each of the $m$ hats is assigned to exactly one person, and each person receives at most one hat. The assignment is thus injective and valid.
Correctness. We show by an exchange argument that the output is correct. Let $A^*$ be an optimal assignment. We transform $A^*$ step by step into the above solution without increasing the spread.
Step 1: The $m$ smallest persons receive hats.
Suppose $A^*$ assigns a hat $h$ to person $j$ with height $b_j$, while person $i$ with $b_i < b_j$ has no hat. We consider the swap in which person $i$ receives hat $h$ and person $j$ receives no hat.
Before the swap, these two persons contribute total heights $b_i$ and $b_j + h$. After the swap, they contribute $b_i + h$ and $b_j$.
The overall maximum cannot increase: the largest of the two old values is $b_j + h$, and both new values satisfy $b_i + h < b_j + h$ and $b_j < b_j + h$.
The overall minimum cannot decrease: the smallest of the two old values is $b_i$, and both new values satisfy $b_i + h > b_i$ and $b_j > b_i$.
Since the maximum does not increase and the minimum does not decrease, the spread does not grow through this swap. By repeatedly applying such swaps, $A^*$ can be transformed into a solution in which the $m$ smallest persons wear the hats.
Step 2: Largest hat to smallest hat-wearer.
Now fix the $m$ hat-wearers $b_1 \leq \cdots \leq b_m$. Suppose two hat-wearers with heights $b_i < b_j$ wear hats $h_a < h_b$ (smaller hat on shorter person). We swap the hats: person $i$ receives $h_b$, person $j$ receives $h_a$.
The new maximum of these two values is $\max(b_i + h_b,; b_j + h_a)$. Since $b_i + h_b < b_j + h_b$ and $b_j + h_a < b_j + h_b$, we have $\max(b_i + h_b, b_j + h_a) < b_j + h_b$. The overall maximum cannot increase.
The new minimum of these two values is $\min(b_i + h_b,; b_j + h_a)$. Since $b_i + h_b > b_i + h_a$ (because $h_b > h_a$) and $b_j + h_a > b_i + h_a$ (because $b_j > b_i$), the new minimum is $> b_i + h_a$. The overall minimum cannot decrease.
Thus the spread does not increase. By repeated application, we arrive at exactly the assignment described in step 3. Since $A^*$ is optimal, the output of the algorithm is also optimal.
Runtime analysis. Sorting $B$ takes $\mathcal{O}(n \log n)$, sorting $H$ takes $\mathcal{O}(m \log m) \leq \mathcal{O}(n \log n)$, the assignment step takes $\mathcal{O}(m)$. The total runtime is $\mathcal{O}(n \log n)$.
(c) Improvement to $\mathcal{O}(n + m \log m)$
The runtime of $\mathcal{O}(n \log n)$ arises from fully sorting $B$. However, all that is needed is to know which $m$ persons have the smallest heights, and those $m$ persons must be sorted among themselves.
One first determines the $m$-th minimum of $B$ using a linear-time selection algorithm (e.g. median of medians) in $\mathcal{O}(n)$. Then $B$ is partitioned in $\mathcal{O}(n)$ into the $m$ smallest entries and the rest. The $m$ smallest heights are sorted in $\mathcal{O}(m \log m)$, and the hats are also sorted in $\mathcal{O}(m \log m)$. The remainder of the algorithm runs as in (b) in $\mathcal{O}(m)$. The total runtime is therefore $\mathcal{O}(n + m \log m)$.

The intended solution

So now let’s look at what our professor wrote (courtesy of Georg Tennigkeit).

a)
We model the missing hats as hats of size 0. A person without a hat thus corresponds to a person with a hat of size 0.
Input: We receive two 1-indexed arrays $P$ and $H$, both over $\mathbb{N}$ and of length $n$. $P$ contains the person heights in millimeters, and $H$ contains the hat sizes in millimeters, where exactly $n - m$ entries are 0.
Output: We output a list $Z$ of $n$ pairs of natural numbers, i.e. $Z = {(p_1, h_1), (p_2, h_2), \ldots (p_n, h_n)}$. Each pair corresponds to the height of a person and the size of the hat we assign to that person.
$Z = {(p_1, h_1), (p_2, h_2), \ldots (p_n, h_n)}$ is valid if $(p_1, p_2, \ldots p_n)$ is a permutation of $P$ and $(h_1, h_2, \ldots h_n)$ is a permutation of $H$. The size difference of a solution is $d(Z) = \max{ p_i + h_i \mid (p_i, h_i) \in Z } - \min{ p_i + h_i \mid (p_i, h_i) \in Z }$. $Z$ is correct if it is valid and for all valid $Z’$: $d(Z) \leq d(Z’)$.
b)
We want to assign the $i$-th largest hat to the $i$-th smallest person. To do this, we sort $P$ in ascending order and $H$ in descending order and then insert the pair $(P[i], H[i])$ into the output for each $1 \leq i \leq n$.
$Z$ is valid, since the algorithm inserts each element of $P$ and $H$ exactly once into $Z$.
We prove optimality via an exchange argument. Let $Z$ be our solution and $Z^*$ an optimal solution. We consider both solutions sorted in ascending order by the person height of each tuple. By the algorithm, $Z$ is thus also sorted in descending order by hat size. If $Z^*$ is also sorted in descending order by hat size, the solutions are already identical and $Z$ is also optimal. Otherwise there exist $(p_i, h_i)$ and $(p_j, h_j)$ in $Z^*$ such that $p_i < p_j$ and $h_i < h_j$. We note that among these two tuples, $(p_i, h_i)$ has the smaller total height $p_i + h_i$, and $(p_j, h_j)$ has the larger total height $p_j + h_j$. If we now replace these tuples with $(p_i, h_j)$ and $(p_j, h_i)$, it follows from the previous inequalities that $p_i + h_j \geq p_i + h_i$ and $p_j + h_i \geq p_i + h_i$, so the new smallest total height is greater than or equal to the previously smallest total height. Furthermore $p_i + h_j \leq p_j + h_j$ and $p_j + h_i < p_j + h_j$, so the new largest total height is less than or equal to the previously largest total height. Since we have not increased the minimum or decreased the maximum among these two tuples, and all other tuples remain unchanged, $d(Z^*)$ cannot have increased through this swap.
In this way, every inversion can be resolved until $Z^*$ is sorted by hat sizes in descending order, without worsening the value of the solution. In the end, $Z^* = Z$, which means $Z$ is optimal.
Sorting $P$ and $H$ takes $\mathcal{O}(n \log n)$, constructing the solution takes $\mathcal{O}(n)$, so the total runtime is $\mathcal{O}(n \log n)$.
c)
We do not need to sort all person heights: since we know that the $n - m$ smallest hat sizes are all 0 and these are assigned to the $n - m$ tallest persons, we do not need to sort those $n - m$ tallest persons. Instead, we can use Quickselect to find the $m$-th rank height in $P$, then remove all taller persons from $P$ and match them directly with the 0-hats. Afterwards, $m$ persons and $m$ hats remain, which we distribute using the previous algorithm.
The runtime is then $\mathcal{O}(n)$ for Quickselect and filtering, and $\mathcal{O}(m \log m)$ for the remaining algorithm, giving a total of $\mathcal{O}(n + m \log m)$.

Verdict

Full points. Claude wrote a clean proof, pretty much exactly mirroring the professor’s intended solution. It also got the median of medians optimization right. The only complaint would be that it was actually too verbose. Also, it didn’t use the simplification of 0-height hats, making its proof a bit more convoluted.

Back to the article