# Quality report — Utrecht

Auto-generated by `synthpop report`. Maps to the challenge's Should/Could criteria.

## 1. Coverage

- **Region:** Utrecht (GM0344)
- **Buurten:** 109
- **Synthetic individuals:** 376770
- **Synthetic households:** 194055
- **Random seed:** 42
- **Reproduce:** `synthpop run --config configs/utrecht.toml`

## 2. Marginal fit vs CBS (WMAPE over buurten)

Weighted MAPE = total absolute error / total target — the template's
"TAE / total" metric. A plain per-cell MAPE is dominated by near-empty
industrial buurten (a business park with ~5 households split 4 ways), so the
weighted figure reflects the true fit.

| Variable | Metric | Value | Target |
|---|---|---|---|
| Age band | WMAPE | 0.104 % | < 1 % |
| Household type | WMAPE | 0.000 % | < 1 % |
| Housing type | WMAPE | 0.038 % | < 1 % |

Age is fit from an exact age pool and household type from exact CBS counts.
Housing is assigned by deterministic largest-remainder allocation, so the
per-buurt housing column matches its target to within one household per cell.

## 3. Cross-domain consistency (S1)

- **Children (0-14) living in a with-kids household:** 100.00 % (role constraint).

age × household_type (synthetic person counts):

| age \ household | single | no-kids | with-kids |
|---|---|---|---|
| 0-14 | 0 | 0 | 58640 |
| 15-24 | 20549 | 15105 | 22121 |
| 25-44 | 45344 | 38873 | 54548 |
| 45-64 | 21767 | 21542 | 36821 |
| 65+ | 12490 | 11310 | 17660 |

## 4. Spatial coherence (S2)

- **Moran's I of %65+ across buurten (k=6 nearest):** 0.022
  Positive values indicate neighbouring buurten have similar age structure,
  i.e. spatial clustering is preserved. Because synthetic marginals equal CBS,
  this matches the source's spatial autocorrelation by construction.

## 5. Wastewater linkage (S4)

- **Catchments covered:** 33 (GWSW rioleringsgebieden)
- Synthetic population + density per catchment (top 8 by population):

| Catchment | Buurten | Synthetic pop | Area (km²) | Density (/km²) |
|---|---|---|---|---|
| (NL.WBHCODE.14.Rioleringsgebied.5402) Zuilen/Ondiep | 11 | 38235 | 3.93 | 9723 |
| (NL.WBHCODE.14.Rioleringsgebied.5404) Overvecht | 8 | 32400 | 5.91 | 5480 |
| (NL.WBHCODE.14.Rioleringsgebied.5393) Baden Powellweg | 8 | 32295 | 3.16 | 10224 |
| (NL.WBHCODE.14.Rioleringsgebied.5411) Korte Baanstraat | 12 | 26450 | 5.99 | 4413 |
| (NL.WBHCODE.14.Rioleringsgebied.5385) Kanaalweg | 6 | 22940 | 3.21 | 7151 |
| (NL.WBHCODE.14.Rioleringsgebied.5415) Lauwerecht/Tuinwijk | 7 | 18905 | 1.35 | 14004 |
| (NL.WBHCODE.14.Rioleringsgebied.5387) Vleuterweide | 3 | 18555 | 3.65 | 5090 |
| (NL.WBHCODE.14.Rioleringsgebied.5407) Hagelstraat | 5 | 16815 | 1.77 | 9502 |

## 6. Privacy (mandatory)

- **Quasi-identifiers (QI):** buurt_code, age_band, household_type, housing_type. Income, education and migration background are *sensitive attributes*, protected by — not part of — the QI set.
- **Raw k (smallest QI cell, before suppression):** 1 — 315 persons (0.08 %) sat in cells below k=5.
- **k-anonymity suppression pass:** local generalisation (mask housing → household → age → buurt→wijk) of only the records in sub-k cells. **Achieved min k = 5** (≥ 5); **0.08 %** of records had a QI generalised. Released as `population_kanon.csv` with a `qi_level` column.
- **Coordinates:** synthetic, uniform jitter up to 50 m inside the buurt polygon — never a real address.
- **No real micro-data** is used at any stage; all rows are generated from public aggregates, so the suppressed release is a defence-in-depth measure rather than a re-identification fix.

## 7. Known limitations

- **Income** is a national prior conditioned on housing type: CBS 86165NED (2024 vintage) suppresses buurt-level income (null even at gemeente level).
- **Education** is a national prior by age; calibration against CBS 82275NED is the next step.
- **Land use / proximity:** the starter catalogue's table IDs (`85870NED`, `70262NED`) are not buurt-level (`85870NED` is in fact *Bruto investeringen*; `70262NED` is gemeente-level). Wiring corrected sources is pending — see repo notes.
- **Catchments** are GWSW rioleringsgebieden (the canonical open layer); linking each to its downstream RWZI plant capacity (Emissieregistratie) is a documented next step.
- **Household composition** uses CBS's three-way split (single / no-kids / with-kids); finer single-parent/couple splits would need an extra prior.
