You are correct that
- sequentially scanning a linked list or an array takes time O(n), and that
- it’s much faster to scan an array than a linked list due to locality of reference.
How can this be? The issue has to do with what you’re counting with that O(n). Typically, when doing an analysis of an algorithm we assume that looking up a location in memory takes time O(1), and since in both cases you’re doing O(n) memory lookups the total time is O(n).
However, the assumption that all memory lookups take the same amount of time is not a very good one in practice, especially with linked structures. You sometimes see analyses performed that do this in a different way. We assume that, somewhere in the machine, there’s a cache that can hold B elements at any one time. Every time we do a memory lookup, if it’s in cache, it’s (essentially) free, and if it’s not in cache then we do some work to load that memory address – plus the contents of memory around that location – into cache. We then only care about how many times we have to load something into the cache, since that more accurately predicts the runtime.
In the case of a linked list, where cells can be scattered randomly throughout memory, we’d expect to do Θ(n) memory transfers when scanning a linked list, since we basically will never expect to have a linked list cell already in cache. However, with an array, every time we find an array element not in cache, we pull into cache all the adjacent memory locations, which then means the next few elements will definitely be in the cache. This means that only (roughly) every 1/B lookups will trigger a cache miss, so we expect to do Θ(n / B) memory transfers. This predicts theoretically what we see empirically – it’s much faster to sequentially scan an array than a linked list.
So to summarize, this is really an issue of what you’re measuring and how you measure it. If you just count memory accesses, both sequential accesses will require O(n) memory accesses and thus take O(n) time. However, if you just care about cache misses, then sequential access of a dynamic array requires Θ(n / B) transfers while a linked list scan requires Θ(n) transfers, which is why the linked list appears to be slower.
This sort of analysis is often used when designing data structures for databases. The B-tree (and its relative the B+-tree), which are used extensively in databases and file systems, are specifically tuned around the size of a disk page to reduce memory transfers. More recent work has been done to design “cache-oblivious” data structures that always take optimal advantage of the cache even without knowing its size.
CLICK HERE to find out more related problems solutions.