There’s an interesting article at the WP’s Wonkblog, presenting one informed technician’s thoughts on the problems being encountered by users trying to access the PPACA’s portal, healthcare.gov. He makes a number of interesting observations but the two I found most interesting were this:
SK: The Obama administration has said that all these problems are happening because of overwhelming traffic. How good of an explanation is that?
JB: That seems like not a very good excuse to me. In sites like these there’s a very standard approach to capacity planning. You start with some basic math. Like, in this case, you look at all the federal states and how many uninsured people they have. Out of those you think, maybe 10 percent would log in in the first day. But you model for the worst case, and that’s how you come up with your peak of how many people could try to do the same thing at the same time.
SK: What would you be doing right now if you were running healthcare.gov?
JB: First I would put some really good instrumentation in place. The problem is if you’re fighting a fire, and it’s dark, you don’t know what’s going on. In other words, you can’t manage what you can’t measure. So first I would put something in place so you can measure what’s happening.
The second thing I’d do is I’d start building a very good load testing environment, so everything could be simulated in a load test, and move faster. Really everything is about speed right now, how quickly can you find problems and fix them. Ninety percent of the effort is really finding what to fix. Making the coding changes is only about 10 percent.
Neither of those two observations is particularly profound. Just ordinary good practice.
They’d better hope that the problems can be solved by throwing additional hardware at the site. That’s an easy solution and inexpensive to implement both in elapsed time and labor. Changing the architecture at this point could be disastrous. It’s probably out of the question.
My experience is that software developers are strongly predisposed to continue doing what they’re accustomed to doing. Getting someone from the outside to audit the code is probably a good idea but it will take time.