As you may know, I started working for Scrive while I was on a round-the-world backpacking trip. I had a laptop in my backpack. It was a heavy and powerful laptop. I had very little trouble running cabal build. Compiling our source from a clean build did take a while. But it maybe took a minute or two.
Two things have happened since then: we have way more code now and my computer broke and decided to buy a netbook to replace it. I regret not buying more RAM for my computer. I only got 1GB. Compiling is painful now.
If I have no programs open, a clean build takes 18 minutes. If I have anything open, like a web browser or Skype, it takes way longer. It starts to swap out to disk and it just takes forever. Worse, I can’t use the computer at all while it’s thrashing.
I had to find a solution. I found some tricks that I want to share here. I couldn’t find a page that detailed all of these ideas, so here it is. I’m not an expert. I just did some research and put things together. I did some simple tests of different options with the “time” command.
I started observing the build. I had top open and I just watched its memory usage.
The first thing I noticed was that ghc took between 20-30% of my memory. Then it would call ld, which would take 40-50%. At its max, that’s 80% of my memory for compiling. That’s too much.
What was happening was that ghc was compiling everything, which consumed lots of memory. Then, without freeing that memory, it forked an ld process (the linker), which also took up its own memory. Was there any way to cause ghc to free its memory before forking ld?
I found an option to ghc which tells it not to link. Bingo! I wrote a shell script:
cabal build --ghc-options="-c" && cabal build
Bam! What this does is separates out the compile phase from the link phase. First, ghc compiles all of the source and exits. Then it runs again, sees that the files are already compiled, then starts the linker. In the linking phase, ghc takes about 2% of my memory. ld still takes 40-50%. The total memory used is way less and there’s very little thrashing!
A clean build went from 18 minutes to 16 minutes. Not a huge improvement, but it took significantly less memory. Now I could have open programs during the compile. If I have a lot of software open, there may still be a few seconds of thrashing where my system becomes unresponsive. But it’s noticeably less than before.
I used that trick for a few weeks and it worked great. Then I got greedy and wanted it faster! What else? I’ve probably looked through the man page for ghc 10 times looking for something. I finally found this option: -O0. It turns off optimizations.
cabal build --ghc-options="-c -O0" && cabal build
16 minutes became 7 minutes.
Wow! Well, I was happy. Most of the time while I’m developing, I just want to know that the code compiles. I don’t care how fast the final binary runs. So, for my uses, optimizations are just a waste of time. Bye bye!
That’s all I’ve found for ghc. I’ve tried other options but they don’t seem to do anything to the build time. Actually, ghc is quite fast now. I decided to attack the linker. The biggest problem with the linker is the amount of memory it takes. 50% of my memory does not leave much for the rest of the system.
It turns out that only a short while ago, gnu ld was optimized for linking a few files that had lots of symbols each. The default size of the hash table used to store the symbol table was huge. And it made one for each file. But ghc does some magic that makes lots of little files, each with a few exported symbols. The linker wasted a lot of memory with all of these huge symbol tables with only a few symbols in each.
Recently, ld was given an option to control the default size of the hash table and another one to try to use less memory:
--hash-size=n Set the default hash size (it will grow). --reduce-memory-overheads Do some magic to reduce memory usage.
I’ve tried both of these options, but they do not seem to have an effect on build time or memory usage. My guess is that either the linker does this automatically now or ghc is already passing it those options. ld also has an option for turning off optimizations:
This one also does not seem to have an effect. Either way, I still put them in my script, just in case.
cabal build --ghc-options="-O0 -c" && \ cabal build --ghc-options="-O0 -optl -O0 -optl --reduce-memory-overheads -optl --hash-size=3"
-optl passes the following option to the linker.
Well, that’s all I’ve got for now. I reduced my clean build time from 18 minutes to 7 minutes and maximum memory usage from 80% to 52%. Non-clean builds are of course much faster, too. If you use this script, I estimate that you’ll cut build times in half.
Please let me know if you have any other tips for improving the performance of developer builds. And let me know if you find this useful. I’d be delighted to learn that this was helpful.
The next thing I want to tackle is ghci. What a hog!