There are three kinds of efficiency you need to consider”
- Memory efficiency; i.e. minimizing RAM utilization
- CPU efficiency
- Programmer efficiency; i.e minimizing the amount of your valuable time spent on writing, writing testcases, debugging, and maintaining the code.
Note that the above criteria work against each other.
Memory Efficiency
The memory size in bytes of an array of references N in Java given by
N * reference_size + array_header_size + padding
where:
reference_size
is the size of a reference in bytes (typically 4 or 8)array_header_size
is typically 12 bytespadding
is greater or equal to zero, and less than the heap node size granularity.
The array itself also has a unique reference which must be held in memory somewhere.
So, if you split a large array into M smaller arrays, you will be using at least (M - 1) * 16
extra bytes of RAM, and possibly more. On the other hand, we are talking about bytes here, not kilobytes or megabytes. So this is hardly significant.
CPU Efficiency
This is harder to predict. The CPU utilization effects will depends largely on what you do with the arrays, and how you do it.
If you are simply subscripting (indexing) an array, that operation doesn’t depend on the array size. But if you have multiple arrays (e.g. an array of arrays) then there will be additional overheads in determining which array to in subscript.
If you are searching for something in an array, then the larger the array you have to search the longer it will take (on average). But if you split a large array into smaller arrays, that doesn’t necessarily help … unless you know before hand which of the smaller arrays to search.
Programmer Efficiency
It will probably make your code more complicated if you use multiple arrays rather than one. More complicated code means more programmer effort in all phases of the application’s development and maintenance lifecycle. It is hard to quantify how much extra effort is involved. However programmer effort means cost (paying for salaries) and time (deadlines, time to market, etc), and this is likely to outweigh any small savings in memory and CPU.
Scalability
You said:
Some people are saying 1000 entries aren’t too much, so I think I’ll stick to a big array. What would be too big? 10k, 100k?
Once again, it depends on the context. In reality, the memory used for an array of 100K instances of X depends largely on the average size of X. You will most likely run out of memory to represent the X instances instead of the array.
So, if you want your application to scale up indefinitely, you should probably change the architecture so that it fetches the questions / answers from the database on demand rather than loading them all into memory on start up.
Premature Optimization
Donald Knuth is often (mis-)quoted1 as saying:
“Premature optimization is the root of all evil.”
What he is pointing out is that programmers are inclined to optimize things that don’t really need optimizing, or spend their effort optimizing the wrong areas of their code based on incorrect intuitions.
My advice on this is the following:
Don’t do fine-grained optimization too early. (This doesn’t mean that you should ignore efficiency concerns in the design and coding stages, but my advice would be to only consider on the major issues; e.g. complexity of algorithms, granularity of APIs and database queries, and so on. Especially things that would be a lot of effort to fix later.)
If and when you do your optimization, do it scientifically:
- Use a benchmark to measure performance.
- Use a profiler to find performance hotspots and focus your efforts on those.
- Use the benchmark to see if the optimization has improved things, and abandon optimizations that don’t help.
Set some realistic goals (or time limits) for your optimization and stop when you reach them.
1 – The full quotation is more nuanced. Look it up. And in fact, Knuth is himself quoting Tony Hoare. For a deeper exploration of this, see https://ubiquity.acm.org/article.cfm?id=1513451
CLICK HERE to find out more related problems solutions.