Defects in data occur frequently. You may have too few samples; some may be anomalous. Dealing with these defects may be art form as much as science and the methods for doing that include interpolation, extrapolation, and smoothing. Interpolation is when you construct new data points within the range of known data points. When you draw a line between two data points, you’re interpolating.
Extrapolation is the process of estimating, based on the data points you actually have, the values that might be observed beyond the observed range. When you draw a curve longer than the data you actually have, you’re extrapolating.
Smoothing is when you make the data points you have look better. Sort of like when you blur the focus on a photo of an old lady. It’s doing what you think nature would have done if she had a better sense of aesthetics. When a graphic artist uses Photoshop to make a model appear taller or thinner or give her a narrower waist, he or she is actually creating a picture of a model who does not exist in real life.
There’s something of a brouhaha going on now in the econblogosphere. Thomas Piketty, author of Capital in the Twenty-First Century, the book that’s causing a stir in econ circles days, is being accused of photoshopping his data:
Thomas Piketty’s book, ‘Capital in the Twenty-First Century’, has been the publishing sensation of the year. Its thesis of rising inequality tapped into the zeitgeist and electrified the post-financial crisis public policy debate.
But, according to a Financial Times investigation, the rock-star French economist appears to have got his sums wrong.
The data underpinning Professor Piketty’s 577-page tome, which has dominated best-seller lists in recent weeks, contain a series of errors that skew his findings. The FT found mistakes and unexplained entries in his spreadsheets, similar to those which last year undermined the work on public debt and growth of Carmen Reinhart and Kenneth Rogoff.
The details are in the linked article but some of the things found include anomalies, just plain errors, and what appear to be smoothing. How damaging are the discoveries?
For example, once the FT cleaned up and simplified the data, the European numbers do not show any tendency towards rising wealth inequality after 1970. An independent specialist in measuring inequality shared the FT’s concerns.
Dr. Piketty, of course, rejects the notion that his data manipulations were deliberate or intended to deceive.
The data manipulations don’t prove that his conclusions are wrong but his retort or the defense of his supporters don’t prove that they’re right, either. I suspect this is something that will be debated for years.