Error sorting out

A very important procedure of data cleaning is to check all available sources for various sorts of errors. For this purpose we adapt the total survey error theory. This evaluation of all data sets is called “Error sorting out".

The SMRE-metadatabase checks for the following potential errors:

  • Sampling errors
  • Non sampling errors namely:
    • Specification errors (by the concept of questionnaire and/or the answer options or by translation)
    • Measurement errors
    • Frame errors
    • Nonresponse errors
    • Data processing errors like coding failures or data table errors.

Some of the error evaluation is done by the SMRE-metadatabase automatically. The metadatabase double checks e.g. automatically whether the original source reports a distribution which comes up reasonable close to 100 percent or whether the non-response category is exciding certain levels. Some of the error sorting out procedures cannot be done automatically. This is especially true for the analysis of the specification error (i.e. questionnaire wording). Here prudence judgements are required how the data and the way they were polled fit together.

The data “Error Sorting Out” process brings down the number of data sets to that group of data sets, which are valid expressions of the intent to measure religious affiliation in its objective sense without errors. The result of our data cleaning leaves the us with a comparable and robust collection of data sets to be analysed further.

In the appendix of our working paper (Liedhegener / Odermatt 2018) all countries including the data sets sorted out by named errors are listed.