Chapter 4

Complete Precomputation

One approach to achieving fast query times is complete precomputation of all possible query results. With this method, answers to valid queries can be retrieved efficiently through simple lookups.

Algorithm Description

A significant advantage of this approach is the inherent parallelism in the setup process. Specifically, we can execute a forward Dijkstra search from each listing node in parallel, storing the shortest path distances to all other nodes in a lookup table of size O(LV)\mathcal{O}(L \cdot V), where LL denotes the number of listings and VV is the total number of vertices in the graph.

The setup phase of the FullPrecompute approach computes the shortest path distance from each listing to all reachable vertices in the graph. These distances are stored in a lookup table. Once the table is constructed, it is sorted in ascending order of the shortest path distances.

55112211ABCDE
55112211ABCDE
Road Intersection
Listing

A

B

C

D

E

At query time, we read the lookup table until the travel time exceeds the maximum allowed travel time.

Precomputed Results
A
B: 5
E: 6
B
B: 0
E: 1
C
B: 6
E: 7
D
B: 8
E: 9
E
E: 0
B: 9

Results

Time Complexity

The setup phase requires executing a shortest path algorithm from each listing node. Assuming Dijkstra's algorithm is used with a binary heap, the time complexity for one listing is O((V+E)logV)\mathcal{O}((V + E) \log V), where VV is the number of vertices and EE is the number of edges. The sorting of the results table takes O(LlogL)\mathcal{O}(L \cdot \log L). For LL listings, the total setup time is O(L(V+E)logV)\mathcal{O}(L \cdot (V + E) \log V) (remember that LVL \subseteq V). Note that this step is highly parallelizable, and can be reduced by a factor of 1num_threads\frac{1}{\text{num\_threads}}.

The query phase is output sensitive, meaning the time complexity depends on the number of output listings. This is optimal, an algorithm must at the very least output the result it computes. If RR denotes the result set, then the time complexity is O(R)\mathcal{O}(R).

Space Complexity

The primary space cost arises from storing the precomputed distances. Each listing requires a table entry for every vertex in the graph, resulting in a total space complexity of O(LV)\mathcal{O}(L \cdot V). In addition to this, the graph itself must also be stored in memory, which requires O(V+E)\mathcal{O}(V + E) space. Therefore, the overall space requirement is O(LV+E)\mathcal{O}(L \cdot V + E).

To illustrate this with a real world example, consider New York City, where there are approximately 17,000 listings. The graph representing the city occupies approximately 3.6MB of memory. Storing the shortest path information for each listing yields a total memory requirement of 3.6MB×17,00061,200MB60GB3.6\,\text{MB} \times 17{,}000 \approx 61{,}200\,\text{MB} \approx 60\,\text{GB}.

Empirical Performance

In practice, the full precomputation setup takes approximately five minutes to complete on our dataset. For benchmarking purposes, this process was repeated 100 times to compute confidence intervals. During testing, it is likely that FullPrecompute was throttled, leading to a total benchmark runtime of approximately 24 hours, primarily due to the setup overhead.

Average Setup Time (NYC)

Dijkstra
[1.4ms1.4ms1.4ms]
Dial
[1.3ms1.3ms1.3ms]
FullPrecompute
[14.8min15.3min15.8min]

Despite its long setup time, FullPrecompute offers unparalleled query performance. In our experiments, it was approximately 1,000 times faster than the Dial algorithm at query time.

Average Query Time (NYC)

Dijkstra
[31.0ms31.2ms31.4ms]
Dial
[12.2ms12.3ms12.4ms]
FullPrecompute
[11.6µs11.9µs12.3µs]

Whether such query performance is necessary depends on the application context. In scenarios where queries are invoked frequently and response time is critical, this approach may be justified. However, in our use case, where query responses are transmitted over a network, the benefit of such speed is diminished due to the dominance of network latency in total response time.

Consequently, we seek alternative methods that offer a better balance between setup time, memory usage, and query performance. The next chapter introduces such a method.