The GDL indicators are created by aggregation from household survey datasets. We aggregate by taking the weighted average of the values associated with a characteristic present among individuals or households within an area.
For instance, the indicator of the educational level of the population aged 20-39 in an area is the mean years of education of the respondents in this age group in the area. And the area’s vaccination coverage is the percentage of children up to the age of one year who received a specific vaccination.
Across indicators and areas, the sample weight factors available in the original datasets are used to get indicators that are as well as possible a representation for the areas to which they apply.
The subnational areas that are used for aggregation are based on the geographic information that is present in the survey datasets. The available regional coding is often, but not always, based on official administrative subdivisions used in the countries.
Whenever possible, first-level administrative units are used, but there are many deviations. Some datasets use a coding of their own, that may consist of a combination of administrative units or be completely stand alone. Even when official classifications are used, the situation may be complicated by the fact that those classifications change over time. It often happens that new regions are created by splitting up or merging existing regions. In those cases, the subdivisions used in earlier and/or later surveys are adjusted to keep comparability over time. Such adjustments tend to imply a reduction of the number of areas.
Reductions are also made for small datasets, to increase the number of cases on the basis of which aggregation takes place. This means that the subdivisions used in the GDL Area Database often contain fewer regions than the official ones in a given year. On average, about ten subnational regions are distinguished within a country.
Quality of the indicators
The fact that the indicators are created on the basis of household surveys implies that they are likely to suffer from bias, at least to some extent. Their quality (correctness) depends on the design, size, structure and quality of the household survey on which they are based and on the number of sub-national areas distinguished and the number of observations at the country level and within the individual subnational areas. Detailed information on survey design and quality can be obtained from the producing organizations mentioned in the data section.
The indicators also suffer to some extent from random aggregation error, as they are based on samples of the total population. Given that no clear criteria exist for the number of cases to be used for aggregation of indicators, we provide for all our indicators in all our regions the number of persons/households on which they are based. This offers the users the possibility to make their own choices in this respect.
Importantly, the original surveys are designed to be representative at the national and in most cases at the regional level. Even when the original surveys are not representative at the sub-national level, we still present our indicators, as they are often the only available data for the region, which we consider better than no information at all.