Nov 7, 2024

The Factorio Architecture
Part 2

Making the equations

TL;DR

Now that we have learned about ways math could be used to make better software, it is time to actually start making the equations so we can do the math!

Software Engineering

Software Architecture

Part 1

Introduction

In part 1 of the Factorio Architecture I went over how Factorio relates to software and since it is a game that also is about scaling factories it also relates to the foundations of Chemical Engineering, specifically unit operations. Now that we have gone over how unit operations relates to software, I can now go over how to use them to our advantage and make equations to predict the performance of our code.

All the code used in this blog post can be found on GitHub.

Estimating Performance

Since there are no equations that will help predict the performance of unit operations, we will need to derive them. As I mentioned in part 1, if this way of thinking becomes mainstream we, as an industry, will have plenty of published equations to use at our disposal so deriving equations would be only necessary in unique circumstances.

In order to determine the equations for our unit operations, we will need to first come up with experiments that generate reproducible results. Designing experiments is always a challenging task to accomplish in science, but is the only way to produce reliable equations.

I have done my best, but performance testing is outside my area of expertise, so it is very likely there are errors in the way I collected data. Please treat the results as illustrative.

Once we conduct our experiments we will simply use trend lines to create equations that describe the performance of our program. While it might be more satisfying to have elegant equations like V = IR, F = ma, or PV = nRT, you have to keep in mind that generalized equations often take decades or even centuries to derive. The ideal gas law, for example, comprises the work of several scientists over a 200-year period. Our industry might end up being luckier and advance faster than other engineering disciplines, but in the meantime we can still use simple empirical results to make equations and know that it is both helpful to us now and helpful in creating more generalized equations in the future.

Something to keep in mind is that our equations do not need to work for all conditions. It is perfectly acceptable in science to do this. Going back to the ideal gas law, the equation only works on, you guessed it, ideal gases. There are other equations such as the Van der Waals equation which can describe a “real” gas. The ideal gas law itself is an empirical equation, meaning an equation based on applying trend lines to experimental data. Since the ideal gas law is empirical, it also has a “magic number”, the gas constant, which shows that not every variable in an equation needs to make sense from a theoretical prospective. Finally, the ideal gas law, while useful, is still just an estimate. Predicting the exact conditions of a gas is not only difficult, but, in most cases, completely unnecessary since the error is small enough not to make a difference in your approach. The only real requirement for good equations is that they reasonably predict an outcome based on the inputs you provide.

Hopefully academic researchers will be able to publish results so that for most unit operations you will not need to conduct the experiments yourself.

Methodology

The Algorithm

I wrote code in Go, C, C#, Java, JavaScript, and Python to do the following:

Read a CSV file with fake people’s name and age (I/O)
Validate each row’s data (Validate)
Perform quick sort on all people by age (Sort)
Print the people to the console (I/O)
Free all the used memory (I/O)

You can see the algorithm visually here. I did my best to keep the code as identical as possible to keep the results as fair as possible. As you look at these results your takeaway should not be something like “Go is so much faster than Python! You would have to be an idiot to choose Python!”. You should instead adopt an engineering mindset “When quick sorting under 1 megabyte of data python is only taking 10,000 more cycles than go, which is an acceptable performance hit for our use case. If we were to exceed 1 gigabyte of data, then the cycle difference is great enough to where we would need to choose a different language in order to still run on the same processor.”

The Computers

I ran the algorithm on the following machines

An Ubuntu desktop with a 13th Gen Intel(R) Core(TM) i9-13900K processor
A Raspberry PI 3 Model B

The CSV Files

For all programming languages and computers used, these test CSV files were used.

Measuring CPU Usage

I created a utility command to measure the cycle count at any given moment. All the programs made system calls to print these results to standard output during each run. Only cycles were measured for CPU performance and not time. Any reference to time is derived by dividing the number of cycles by an assumed clock speed.

i9 Cycles

The i9 processor runs were done using rdtsc to measure the cycles.

Raspberry Pi Cycles

The Raspberry Pi runs were done using pmccntr_el0 to measure the cycles.

Running the Tests

I used this script to perform the test. I forced the script to remain on the same core using taskset to help with the accuracy of the cycle measurements.

taskset 0x1 ./run-performance-tests.sh

Results

Whole Program

The most common way to measure performance tends to be by time. While measuring by time is not entirely a bad way to get a sense of performance issues, it does not give us the whole picture. Notably run time is greatly impacted by the clock speed of the processor. When an individual developer is looking to fix a performance issue there is nothing wrong with only using time as a measurement since the relative changes in run time are often good enough to identify if a performance problem has been addressed. With that in mind, let us take a look at the results for our program written in C.

Total Records	PI 3 Model B@1.2 GHz	i9@5.0 GHz
0	11.7	0.6
10	12.2	0.6
1000	19.2	0.9
2500	29.8	1.2
5000	46.9	2.0
7500	66.1	2.7
10000	86.6	3.5
25000	232.9	8.2
50000	560.3	17.8

So what does this graph tell us? It tells us the performance hit that happens when picking the Raspberry Pi with actual numbers. While it is pretty obvious that a processor with a higher clock speed is more likely to be faster and that a Raspberry Pi cannot compete, in terms of speed, with an i9, having actual data means we know how much the difference is and if the cost of the i9 is worth it.

We can make our equations too by simply adding a trend line to our data and generating an equation which describes the line! Excel and its equivalents support this, so you do not even need to use a fancy python library!

PI 3 Model B@1.2 GHz:
y = 8.63e-08x^2 + 6.65e-03x + 1.20e+01, R^2 = 1.0000

i9@5.0 GHz:
y = 1.53e-09x^2 + 2.67e-04x + 6.04e-01, R^2 = 0.9999

I will mostly avoid showing you the equations for trend lines for the rest of this blog post when analyzing other graphs as it is the same process to create the equations. It is worth emphasizing again that having equations based on experiments is the foundation of engineering. Having a good objective understanding of the capabilities of your system can save you a lot of pain in the long run.

Let us take a look at another dimension to this problem by comparing different languages on the same processor.

Total Records	c	c#	go	java	javascript	python
0	11.7	404.6	17.8	242.7	108.7	38.0
10	12.2	400.1	18.5	482.5	112.3	37.4
1000	19.2	428.2	27.6	816.7	268.0	65.5
2500	29.8	454.8	42.6	1108.4	561.2	115.2
5000	46.9	512.4	68.5	1345.9	753.9	223.7
7500	66.1	597.5	96.6	1633.3	913.7	365.9
10000	86.6	748.1	125.0	1852.0	1056.3	542.0
25000	232.9	1318.0	303.7	2654.9	1941.0	2172.6
50000	560.3	2887.7	680.4	3890.7	3516.4	7207.6

As we can see Go and C maintain pretty high performance while other languages quickly start being significantly slower which is not a surprise. There is an interesting result, however, where Python’s run time performance is near C’s for small data sets and get slower faster than all the other languages. I do not know enough about Python to know why this could be happening, but let us assume that there is not anything wrong with my code. This is an example of a good trade-off using hard data in action. Instead of having a subjective argument with your co-workers about which language you prefer or in general which language is faster, you can now talk about which language with be “good enough” for your specific use case. This is why doing the math is needed. You probably will not find very many surprising results, but you will have concrete data to know the exact trade-off you are making. Let us look at all the tested languages with the i9 now.

Total Records	c	c#	go	java	javascript	python
0	0.6	12.4	0.8	13.2	4.9	1.3
10	0.6	12.9	0.8	18.0	5.0	1.3
1000	0.9	13.2	1.2	30.2	8.2	2.2
2500	1.2	14.2	1.8	43.3	18.1	3.4
5000	2.0	16.3	2.8	55.5	24.2	5.8
7500	2.7	17.7	3.7	66.8	28.5	9.0
10000	3.5	22.9	4.4	75.9	32.1	13.1
25000	8.2	38.5	10.6	120.3	52.9	46.7
50000	17.8	77.3	22.5	169.4	88.0	147.0

Of course the i9 is significantly faster, but we do get another interesting result. The relative performance of all the languages is about the same. The only notable exception is Python. Unlike on the Raspberry Pi, Python is not slower than Java at 50,000 records, but it still has the same characteristic of performing near C and then quickly slowing down relative to other languages. This seemingly uninteresting fact has a useful implication. If the languages have similar performance characteristics across different processors, then it means that you can do something like run performance tests locally and then have an ability to predict the performance on another processor. More research needs to be done to say this for sure, but, if true, creates a powerful tool for picking which processors would theoretically meet the performance requirements for a given project. I could see this being especially useful in a domain like PC gaming where there is a wide variety of machines your program will run on. Keep in mind that predicting the performance on another machine does not have to be 100% accurate. It only needs to indicate to a developer if their solution is viable. There is no substitute for running on the real hardware.

As we can see, time does give us some useful information, but let us look at another way to see the same problem by measuring CPU cycles.

Total Records	c	c#	go	java	javascript	python
0	14049436	485547668	21379826	291228532	130455582	45654207
10	14673178	480137892	22151392	579027841	134769978	44894099
1000	23070695	513780196	33105671	980007552	321642420	78654262
2500	35755022	545718792	51157895	1330116181	673467684	138203850
5000	56255947	614851366	82142513	1615104160	904631965	268447346
7500	79357338	717013256	115972913	1959954743	1096460380	439043576
10000	103882221	897774909	150006710	2222374502	1267586764	650366166
25000	279428372	1581585683	364426834	3185905300	2329240783	2607152428
50000	672324581	3465236218	816512131	4668892689	4219671377	8649140063

As you can see this graph looks nearly identical to the time based graph seen earlier. We will only look at the Raspberry Pi since I only measured the cycles for my experiments, which means the graphs will look identical in shape when compared to their time equivalent graphs. So if they are practically the same in these experiments, then why look at the cycles? The cycles give us a proxy for the number of instructions being used to execute a program. This is useful for understanding if your series of unit operations is as efficient as it can be without needing to know the clock speed of the CPU. If we make the assumption that the code for each language I wrote is as efficient as can be and there is no way to beat the performance of C, then we now know how much it costs us to choose particular languages. Other people could try to implement my series of unit operations on their own and would know how efficient their implementation is in comparison to the “ideal” version. So for example, if their program in C# is 99% as efficient as my C# implementation then there is very little, in terms of “performance tricks”, that can be done. This is great! You do not have to waste time seeing if you can get more performance out of your C# implementation, you now know you need to pick a different series of unit operations, programming language, or processor. This is what engineering is all about; having concrete data to help you make good decisions in the way you invest your resources.

While all this information is useful, it still is not the best for convincing management to side with you. In my experience, the best language to use to talk to nontechnical people is a language everyone understands, money. How do we do that? If we know the number of watts our processor consumes and the cost of the watts in dollars per kilowatt-hour then we can multiply them both divided by 1,000 to get dollars per hour like so:

W * 1 kW / 1000 W * $/kWh = $/h

You can see the assumed power consumption and cost here. Since we already know the time taken to run our algorithm we simply need to multiply that time by the dollars per hour. As you can see below the results mirror the time and CPU cycle graphs because we multiplied all the results by the same constant to determine the cost.

Total Records	c	c#	go	java	javascript	python
0	1.54e-09	5.31e-08	2.34e-09	3.18e-08	1.43e-08	4.99e-09
10	1.60e-09	5.25e-08	2.42e-09	6.33e-08	1.47e-08	4.91e-09
1000	2.52e-09	5.61e-08	3.62e-09	1.07e-07	3.51e-08	8.60e-09
2500	3.91e-09	5.96e-08	5.59e-09	1.45e-07	7.36e-08	1.51e-08
5000	6.15e-09	6.72e-08	8.98e-09	1.77e-07	9.89e-08	2.93e-08
7500	8.67e-09	7.84e-08	1.27e-08	2.14e-07	1.20e-07	4.80e-08
10000	1.14e-08	9.81e-08	1.64e-08	2.43e-07	1.39e-07	7.11e-08
25000	3.05e-08	1.73e-07	3.98e-08	3.48e-07	2.55e-07	2.85e-07
50000	7.35e-08	3.79e-07	8.92e-08	5.10e-07	4.61e-07	9.45e-07

Another unsurprising result, it turns out it is very cheap to run a computer for a few seconds. It is difficult to really get a sense of what this all means. Specifically, how do we convey the message to our nontechnical people about what trade-offs they are making when picking a processor and a language? Often in chemical engineering, the best way to illustrate the cost is in terms of the product you manufacture. In the world of software, our product is data. We can easily calculate this by taking the size of the file processed divided by the cost to run the whole algorithm to give us Gigabytes per Dollar which gives us the following results.

Total Records	c	c#	go	java	javascript	python
0	6.48e+00	0.00e+00	4.32e+00	0.00e+00	0.00e+00	2.16e+00
10	1.14e+02	4.32e+00	7.56e+01	2.16e+00	1.30e+01	3.67e+01
1000	6.94e+03	3.11e+02	4.84e+03	1.64e+02	4.99e+02	2.04e+03
2500	1.12e+04	7.34e+02	7.82e+03	3.00e+02	5.94e+02	2.90e+03
5000	1.42e+04	1.30e+03	9.73e+03	4.94e+02	8.83e+02	2.98e+03
7500	1.52e+04	1.68e+03	1.04e+04	6.13e+02	1.10e+03	2.74e+03
10000	1.54e+04	1.79e+03	1.07e+04	7.21e+02	1.27e+03	2.47e+03
25000	1.44e+04	2.54e+03	1.10e+04	1.26e+03	1.72e+03	1.54e+03
50000	1.19e+04	2.31e+03	9.80e+03	1.71e+03	1.90e+03	9.26e+02

This is a very useful metric for business people to have, especially early on in a project, because it gives them an idea of how much money they would need to charge in order to make a significant profit. As you can see in my experiments, Java was generally the least profitable language to use and C was the most profitable to use. This gives us an ability to actually give more than a passing thought about the language picked. You can see that if we are not running our computer for very long or have a low volume of data then picking the “fastest” language does not give us much of a benefit, but if the only way to make the project profitable is to use C, then the nontechnical people can easily see that. Something that I often hear is a fear to change programming languages, because they could be difficult to hire for. If your company stands to make millions more dollars per year by simply hiring an expensive C developer or training their existing team in C programming, barring not having enough capital to fund the project, it is hard to imagine they would reject such an idea outright.

Let us now take a look at the cost of C running on different processors.

Total Records	PI 3 Model B@1.2	i9@4.7	i9@5.0
0	6.48e+00	1.29e+00	1.12e+00
10	1.14e+02	2.58e+01	2.25e+01
1000	6.94e+03	1.69e+03	1.47e+03
2500	1.12e+04	2.94e+03	2.56e+03
5000	1.42e+04	3.68e+03	3.21e+03
7500	1.52e+04	4.12e+03	3.59e+03
10000	1.54e+04	4.23e+03	3.69e+03
25000	1.44e+04	4.52e+03	3.94e+03
50000	1.19e+04	4.13e+03	3.60e+03

We finally now see a situation where the Raspberry Pi shines, cost to run! If data is low volume and is allowed to be slow to produce, then we now know that the Raspberry Pi is a perfect solution to our problem. This is what engineers do! They use experiments and mathematical models to guide them to a solution which avoids costly and unnecessary trial and error in production.

Unit Operations

For the unit operations section we are just going to take a look at the cycle count for the sorting section of our program since all the same cost and performance analysis can be done for the other unit operations.

Total Records	c	c#	go	java	javascript	python
0	2792159	5114654	3688675	10687175	10681266	3956866
10	2832413	6012382	3514230	9621183	10935949	3872077
1000	3563015	9436473	4348165	41001187	45432890	20673766
2500	5113914	16487233	4628367	47206292	67640476	53269002
5000	8396329	37741884	6353428	72961195	79474389	136594279
7500	13346105	64668119	9374607	131656583	97613712	260292514
10000	19307134	142073658	12793925	172160720	110865507	426776896
25000	80935209	466851246	50487404	438295255	237565658	2103350194
50000	280796513	1475048036	175304622	722500522	668473923	7676206450

As you can see the languages perform about the same relative to each other. Below are the results of deriving equations for our trend lines. Note I used a x^2 instead of a x * log(x) trend line because I could not get sklearn to create a good equation for x * log(x). Since quick sort’s performance is O(n * log(n)) then x * log(x) should be a better fit, but this will be good enough for our purposes.

c:
y = 9.75e-02x^2 + 6.85e+02x + 2.75e+06, R^2 = 1.0000

c#:
y = 4.31e-01x^2 + 8.01e+03x + -2.28e+06, R^2 = 0.9995

go:
y = 6.26e-02x^2 + 3.06e+02x + 3.56e+06, R^2 = 1.0000

java:
y = -9.11e-02x^2 + 1.90e+04x + 1.73e+06, R^2 = 0.9967

javascript:
y = 1.56e-01x^2 + 4.66e+03x + 4.12e+07, R^2 = 0.9945

python:
y = 2.78e+00x^2 + 1.45e+04x + 3.15e+05, R^2 = 1.0000

In a world where there are not readily available equations to use, focusing on performance critical unit operations is probably a good compromise. You can use your intuition to know which unit operations could potentially cause issues, and then you could test different algorithms/programming languages/hardware to see which approach will be the best for your situation. Having equations for individual unit operations also helps with doing “what if” scenarios with a tighter feedback loop than experimenting with performance in production. For example, you can see what different algorithms could improve the performance of your unit operation (e.g. use merge sort instead of quick sort) or use a different sequence of unit operations altogether.

Conclusion

As you can see the results generally showed that picking certain languages and hardware has a cost to it. This is unsurprising as I think most developers would agree that there is a trade-off in making decisions with regard to languages and hardware. What is key is that we actually graphed and derived equations describing the behaviors of the program. We no longer are making decisions based on gut feelings and hoping they are profitable. We are proving it using equations derived from experimentation. Welcome to the world of engineering!

Back to home Part 1

The Factorio Architecture Part 2