Radio ExpressTV
Live
Housing and the Data Problem
According to the latest estimates from the United Nations, 2.8 billion people worldwide lack adequate housing, while 318 million live without shelter. Despite billions of dollars being invested in finding solutions, governments and charities have struggled to make a real impact on this crisis.
A significant factor contributing to this situation is the lack of the necessary infrastructure to track and comprehend fundamental housing-related queries. Major data gaps mean that we often do not know which pieces of public land remain undeveloped, how many units are vacant, or where development proposals are stalling. The absence of common definitions for key terms complicates comparisons between different contexts—what “affordable housing” means in London is different from its meaning in Lagos, and entirely different again in Los Angeles. Worse still, the available data is rarely accessible to policymakers and researchers.
In most cities, there is no single authority responsible for tracking which public entities own which parcels of land. Public transportation agencies, school districts, and planning departments hold fragments of information that are rarely interconnected. Zoning laws can vary widely, not just between countries but also among neighboring municipalities.
This fragmentation results in poor policies. Without a comprehensive view of available resources and the factors affecting housing supply, policymakers cannot reliably identify effective interventions. Consequently, a city might invest heavily in subsidized housing while owning public land that could be developed at a lower cost. Governments set ambitious housing goals, yet they struggle to track progress or remove obstacles, effectively insulating themselves from any real accountability. The result is a patchwork of individual programs and a lack of clarity about whether any of them meet the essential needs related to housing accessibility and affordability.
Many hope that artificial intelligence will eventually help solve the housing problem. Currently, machine learning models can align disparate databases, discover underutilized land through satellite imagery, and simulate how emergent policy changes affect housing supply. However, these tools require organized and standardized inputs. Thus, unleashing this technology’s potential depends on the unexciting work of data engineering, making the construction of this infrastructure more urgent.
For instance, a pilot project organized by the Urban Institute and Cornell University’s Legal Constructs Lab to automate national zoning atlas methodologies found that machine learning models cannot reliably interpret zoning documents due to inconsistent formatting, nuanced legal distinctions, and local exceptions. Cities worldwide have experienced what practitioners refer to as the “dashboard of death”: expensive visualization tools fail because the foundational data infrastructure cannot support them.
The contrast with successful scientific infrastructure here is instructive. The Human Genome Project helped transform how scientists diagnose and treat diseases, partly by establishing the Bermuda Principles, which require participating laboratories to publish DNA sequences within 24 hours. This sparked a wave of collaboration that later enabled breakthroughs such as the discovery of CRISPR and the AI program AlphaFold. After researchers exchanged SARS-CoV-2 genomes in early 2020, vaccines could be developed at an unprecedented speed.
Recently, a group of experts in housing policy, data infrastructure, and governance convened as part of the 17 Rooms Initiative to discuss this issue. They agreed that housing needs a similar mechanism: a “Housing Genome Project” to standardize and share housing data and AI models globally.
This mechanism requires, first, the establishment of common classifications for land parcels, zoning patterns, definitions of vacancies, and development stages designed for interoperability instead of relying on a single vendor. Second, cities should share their models and datasets widely, enabling genuine comparisons with what succeeds in various contexts. Third, the standards and tools must be accompanied by guidance for building institutional capacity, including data governance, interagency coordination, and analytical capabilities needed to translate data into decisions.
Housing data undoubtedly presents challenges not faced by genomics. DNA follows universal biological rules; in contrast, housing varies according to regulatory and political environments. While some variation is essential to reflect local conditions, a far greater degree of data standardization is possible and necessary, requiring cooperation rather than top-down mandates. The Built for Zero initiative has helped over 150 communities make tangible progress on homelessness through shared data protocols and coordinated work, demonstrating the potential to construct a collective infrastructure to tackle complex issues.
Philanthropists seeking to strengthen communities, policymakers aiming to achieve housing goals, and tech experts developing sector-specific AI models face a significant barrier: the absence of a data foundation. The truth is that building this infrastructure is not as exciting as funding an app or announcing a new initiative. But without it, it becomes impossible to allocate resources effectively and learn from experience, akin to attempting precision medicine with medieval anatomical charts.
The Human Genome Project was a global commitment lasting thirteen years and resulted in the creation of a trillion-dollar industry. A similar investment in housing data infrastructure would finally enable us to understand what works and what succeeds, fund scalable solutions, and unleash innovations we have yet to imagine.
