The indicators made available through the GDL Area Database are created by aggregation from the household surveys datasets. Aggregation means taking the average of the values of a characteristic of individuals or households in each area.
For instance, the indicator of the educational level of the population aged 20-39 in an area is the mean years of education of the respondents in this age group in the area. And the area’s vaccination coverage is the percentage of children aged one who received a specific vaccination.
In all cases, the sample weight factors available in the datasets are used to get indicators that are as well as possible representative for the areas to which they apply.
The sub-national areas that are used for aggregation are based on the geographic information that is present in the survey datasets. The available regional coding is often, but not always, based on official administrative subdivisions used in the countries.
Generally first-level administrative units are used, but there are many deviations. Some datasets use a coding of their own, that may consist of a combination of administrative units or be completely stand alone.
Even when official classifications are used, the situation may be complicated by the fact that those classifications change over time. It often happens that new regions are created by splitting up or merging existing regions. In those cases, the subdivisions used in earlier and/or later surveys have to be adjusted to keep comparability over time. Such adjustments always imply a reduction of the number of areas.
Reductions sometimes also are made for small datasets, to increase the number of cases on the basis of which aggregation takes place. This means that the subdivision used in the GDL Area Database often contain somewhat less regions than the official ones in a given year. On average, about ten subnational regions are distinguished within the countries.
Quality of indicators
The fact that the indicators are created on the basis of household surveys means that they to a certain extent suffer from bias. Their quality (correctness) depends on the design, size, structure and quality of the household survey on which they are based and on the number of sub-national areas distinguished. Detailed information on survey design and quality can be obtained from the producing organizations mentioned in the data section.
The indicators also suffer to some extent from random aggregation error, as they are based on samples of the total population. Given that no clear criteria exist for the number of cases to be used for aggregation of indicators, we provide for all our indicators in all our regions the number of persons/households on which they are based. This offers users the possibility to make their own choices in this respect.
Although all surveys we use are representative at the national level, they are not always designed to be representative for the sub-national areas we distinguish. In those cases, we still present our indicators, as they often are the only available data for the region, which we consider better than no information at all.
The indicators shown at the GDL Area Database website can be downloaded from the website as a CSV or Excel file by clicking on “Download this”. The complete set of the indicators can be downloaded by clicking on “Download all”. You then enter a page where you find separate CSV files with for each region the indicators and the number of households or persons on which they are based and with the labels and descriptions of the indicators.