Or “How to Lie With Statistics”. As I read James Taranto’s column today on the early reports of successful enrollments at Healthcare.gov, I realized that many people might not understand how sensitive error reporting can be to just how you discern and report errors. Take optical character recognition (OCR) software, for example. When reading the advertising blurbs for it you frequently encounter extremely high recognition rates, 99% or higher. 99% of what?
What they don’t generally tell you is that the recognition rate is characters. Not pages or, worse yet, documents. When you’re talking about a recognition rate of 99% of characters that means that on average the program will correctly recognize 99 out of 100 characters. That in turn means that on average for every page with more than a tweet-full of characters the page recognition rate is zero. The document recognition rate is zero. That can make the difference between a particular project making financial sense or not.
In the real world as opposed to in a lab that’s an important distinction. In dealing with a large project of thousands or even millions of documents it determines how many proofreaders you’ll need. It can determine how long the project will take and how much it will cost.
Twenty years ago the company of which I was a principle at the time had a project for the Federal Reserve that required us to scan, store, process, recognize, and index about a million pages worth of documents in a very tight timeframe. It was a monumental task, as you might imagine. The niggling little details matter.
That’s what I think of when I read reports of 50,000 or 500,000 enrollments. What’s the enrollment rate and how is it measured?
Let me also remind you that the open enrollment period for the healthcare exchanges ends on March 31, 2014 and that time is passing. What was 200 days is now more like 150. That affects the peak load requirements of the system. If anybody tells you that a system that can accept and process 100 applications a day can definitely accept and process 100,000 applications per day, fire him.
Can Healthcare.gov meet the challenge? Sure. Will it? Who knows? Pointing to the tremendously different processing requirements of the Massachusetts system as a model is fatuous. They are not comparable. We’re in unknown territory.