This book explores the use of data to improve conditions in America’s communities, with particular attention to low-income communities and the people who live in them. Using data more intensively and creatively in decision-making at the local, metro, state, or federal level in itself will not eliminate poverty, produce healthier lives, or fully address the other major social problems of our time. But data driven decision-making can make a tremendous difference in results that communities and their residents care about.

Today, very few of the institutions that work in America’s communities can honestly be characterized as “data driven.” However, the past two decades have seen truly remarkable advances in the availability of relevant data and the technical ability to use the information inexpensively and in exciting ways. In the coming decade, these new capacities are likely to spur fundamental changes in how community-oriented institutions operate and how policy is made. The essays in this book examine this potential and promising new practices, as well as barriers to be overcome.

Before we begin talking about the data, however, we need to be clear about scope. This book was motivated by the 2012 volume, Investing in What Works for America’s Communities and the conversations it generated.[1] That volume endorsed an integrated view of community development, going well beyond the narrow “bricks and mortar” vision the field had settled into, to consider issues such as health, education, jobs, and connectivity. Similarly, although we, too, see community development playing a central role, we recognize the need for all institutions that affect our communities to more effectively use data. These include city and town governments as well as neighborhood groups, regional collaborations, social service providers, housing developers, community development financial institutions (CDFIs), public health agencies, private entrepreneurs, and philanthropy.

It is also important to note upfront that although this book talks most about how institutions can use data, we should not forget individuals and families. Important advances have made it easier for them to directly use data to make personal decisions, such as when to arrive at the bus stop or how to choose a doctor.

This first essay offers background information and framing to put the remaining essays in a broader context. We discuss four main themes:

  1. Emergence of the community information field describes how the intersection of new data, technology, and innovative institutions in the early 1990s led to a revolution in community information.
  2. Advances in the availability of data describes the sources of community data and new ways data are being transformed into useable information.
  3. Advances in the use of data opens with a framework on how community data can be used in decision-making, provides successful examples, and discusses roles of the actors in the process.
  4. Challenges—What is holding us back? reviews challenges and barriers to taking advantage of the full potential of community data and offers ideas on how to address them.

This essay is intended as an overview for the reader and cannot do justice to all these themes in this condensed format. For a more in-depth examination of the advances in the community information field during the past 20 years, we refer you to our recent book, Strengthening Communities with Neighborhood Data.[2]


Those working to improve America’s communities have always wanted factual information on conditions in their neighborhoods, how those conditions were changing, and how they compared with those in other parts of the city, county, or state. As late as the 1980s, however, the only viable source for such information was the U.S. census, which is updated only once a decade. In addition, census data do not cover many issues critical to making communities better: rates of property tax delinquency, crime, teen pregnancy, and housing sales are but a few examples of data the census ignores. Locally funded surveys were theoretically possible but, because they were (and remain) enormously expensive, they were almost never conducted. Transaction-by-transaction data on these topics most often existed on paper, buried somewhere in agency files. But the high cost of pulling the records from filing cabinets and plotting them on a map so that staff could add up thousands of transactions by neighborhood and visualize problems and the impact of efforts to solve them almost always made the task infeasible.

The implications of not having such data, however, were serious. Because they did not know where problems were most severe, those working in neighborhoods had no way to systematically target their services. They also had no viable way to measure how the neighborhoods they worked in were getting better or worse from year to year.

By around 1990, however, technology introduced a solution to this problem. As record-keeping for many local government agencies was automated, transaction records containing a street address or some other geographic identifier were now electronic. In addition, marked improvements in Geographic Information System (GIS) technology meant that data could be linked to a location, summed by neighborhood, and mapped with remarkable speed and efficiency.

Government departments found valuable ways to use these new capacities to analyze their own data; police departments mapping crime patterns is a prime example. Some hospitals began to examine data on the neighborhoods of their patients to better plan education campaigns and outpatient service delivery. Some cities began to release their GIS data on individual properties to the public, eventually over the web. The GIS data were an enormous time saver for community development corporations (CDCs) and other housing developers who previously had to spend days in city hall basements searching through musty paper records to collect enough information to make competent decisions about land assembly.

The ability to combine data sets was another major advance. It is difficult to learn much about neighborhood conditions and trends by looking at only one agency data set at a time. As a result, institutions in several cities convinced several local agencies to share their administrative data files. They made the commitment to regularly assemble the data across agencies and make the indicators available to a variety of users and the public at large. In these cities, users only had to go to one place to get time series data by neighborhood on a variety of topics (such as health, housing conditions, crime, and public assistance).

The “data intermediaries” who built the cross-agency information systems had different types of institutional homes, including community oriented university research centers, social service providers, and local foundations. Six of them joined together in 1995 to form the National Neighborhood Indicators Partnership (NNIP) to facilitate peer networking, advance methods, and inform social policy more broadly. The authors of this essay have led the team at the Urban Institute that coordinates the NNIP network, and we have noted several of the network’s contributions in this essay. As of 2014, nearly three dozen cities have adopted the NNIP model. Most began this work because of a strong interest in community building. These new capacities represented a true revolution at the time: moving from no data to very rich ongoing sources of information in just a few years.

The most important stories, of course, are not about information systems or underlying innovations in data availability and GIS technology. Rather, the most important stories are those about the practical applications they have yielded. These too have grown and are the subjects of many of the essays in this book.

The demand for richer data is now being generated by a much broader array of actors than initially. For example, although community developers in the early 1990s understood how vital data on multiple indicators of neighborhood conditions were, leaders in public health now see these data as essential as well. Their new emphasis on the social determinants of health requires them to understand all the interactions in poor neighborhoods that both positively and negatively affect the health of residents.


Since the mid-1990s, the data landscape for communities has continued to evolve. Our current challenges are too much data to sift through and difficulty in accessing the right data for the decisions at hand. The term “data” has many different meanings, and we first provide an overview of the types of data and recent data trends. The final portion of this section will describe ways that researchers, technologists, and policymakers are enhancing the value of the raw data stream for practitioners through both easier access and integration of data across sources and organizations.

Types of Data

Many types of data can be useful to practitioners. In particular, data can be generated from administrative records, surveys, or qualitative methods. Administrative data are collected as part of an organization’s operations, whether to provide services, collect taxes, run the courts, or maintain property records, but the data can be repurposed for other uses. Government agencies generate a wealth of administrative data in operating programs, such as data on school performance, food stamp enrollment, or reported crimes. They also maintain information on property ownership and characteristics, both as a definitive legal record and to assess and collect taxes. Finally, government agencies collect data for national data systems, such as monitoring births and deaths through the Vital Statistics program.[3]

Other sectors also collect administrative data that could be valuable for designing community initiatives or service programs. Businesses maintain administrative records, such as credit data for businesses or individuals or grocery purchases tied to loyalty cards. Nonprofits create data systems that record client characteristics and social services or property information for housing development. To use administrative data most effectively, community users should understand the original motivation for the data collection, which determines which individuals and properties are included in the data, which fields are likely to be of higher or lower quality, and how often the data are updated.

The second major source of data is surveys. These surveys may be about people and households and can be conducted at many levels, including nationally, such as the American Community Survey from the U.S. Census Bureau, and in a single city, such as the survey conducted to support the Jacksonville Quality of Life study described in Ben Warner’s essay. Surveys may also catalog a mix of community assets, such as parks, schools, and churches, or risk factors, such as graffiti, liquor stores or fast food outlets. Finally, property surveys catalog characteristics of residential or commercial properties, documenting vacancy, land use, and building condition. The Detroit Residential Parcel Survey, commissioned by the Detroit Blight Removal Task Force, is one impressive recent example (http://www.timetoendblight.com/). The nonprofit organization Data Driven Detroit partnered with Loveland Technologies, a local technology company, to develop and implement a streamlined data collection and quality control system for collecting information about residential properties. Called the Motor City Mapping Project, a team of 200 people collected information on almost 380,000 properties in just a few weeks. With this common set of information, government agencies, community developers, and neighborhood residents can better strategize about how to address blight and vacancy. Another innovation of the project is that they are exploring how to keep the data updated so they can have contemporaneous information for planning and track progress over time.

The third major source of data comes from qualitative methods. These can be interviews with knowledgeable stakeholders, formal focus groups, or community meetings. These data can provide information about external factors affecting neighborhoods and families and resident perceptions and priorities. Several of the essays in this volume emphasize the importance of including qualitative data to paint a fuller picture of communities. Ira Goldstein, for example, describes the need to “ground-truth”—check the validity of assumptions of—the neighborhood typology assignments derived from quantitative sources, incorporating data from visual surveys. The essays by Meredith Minkler and Patricia Bowie and Moira Inkelas recognize how including the voices of community residents and program clients improves data quality and results in a fuller picture of a program’s impacts. Claudia Coulton recommends using interviews and focus groups to understand the nature of mobility in a given neighborhood, including people’s motivations and decision process for moving.

New Trends in Data Sources

Data kept in government computers are of only limited use. The open-government data movement, which in the past few years has grown significantly, is about overcoming that problem, and affects governments at all levels. A central premise of open data is that transparency of processes and information enables citizens to hold governments accountable, and technology can improve transparency. Advocates view government data as a public good that should be available to the taxpayers who funded their creation. Open data can also encourage citizen engagement in government decision-making and spur economic growth through private-sector applications. Emily Shaw’s essay explains that open data encompasses both practice, as in distributing data through a city open data portal, and policy, formal tenets adopted by governments or organizations. Although the origins of the open data movement are rooted in government responsibilities, more nonprofit organizations are integrating the principles of open data in their work.

The emergence of “big data” is often mentioned along with open data. Big data is defined in many ways, but can be thought of as data with very high volume, velocity, and variety that traditional computational techniques cannot handle. These include data generated from traffic sensors, which may capture data every 20 seconds from hundreds of locations around a region. They also include new types of data that require new ways to analyze, such as posts to Facebook or Twitter or photos from Google StreetMaps.

More important than the specific definition of big data is the need to connect technologists and data scientists who can visualize, manipulate, and find patterns in this wealth of data with the local nonprofits and governments who could benefit from new insights that data can generate about neighborhood conditions, resident sentiment, and human behavior. One example of this convergence is Data Science for Social Good, a University of Chicago summer program for aspiring data scientists to apply data mining and machine learning to projects proposed by governments and nonprofits on topics such as education, health, energy, and transportation. DataKind, a nonprofit organization based in New York City, has a similar mission to bring new skills to benefit community groups. The organization brings nonprofits working on community problems together with data scientists to improve the quality of, access to, and understanding of data in the social sector. DataKind sponsors weekend “Data Dives,” arranges technical assistance engagements of up to a few months, and facilitates ongoing local relationships through their five local chapters.

Adding Value to Raw Data

Raw data alone are a little like the ingredients of a cake—necessary, but only useful if put together thoughtfully. Integrated data systems (IDS), for example, link individual-level records over time and across data sets from different programs and agencies, such as school performance, child welfare, birth data, and juvenile justice data.[4] IDS allow us to ask new questions about how various public systems overlap and point to a more holistic approach to helping children and families. Rebecca London and Milbrey McLaughlin share lessons from their experience with the Youth Data Archive, an IDS for several counties in California that is governed by a university-community partnership. The authors describe how the IDS process combined with a shared research agenda can support collaboration and improve youth services. John Petrila’s essay discusses the tension between using linked data sets like the Youth Data Archive to inform policy and the privacy and other concerns that emerge from the use of such data. He relates the political, legal, and technical challenges of establishing and using linked data and suggests potential solutions.

Linking data is an approach that extends beyond government-generated data. Robert Avery, Marsha Courchane, and Peter Zorn document the National Mortgage Database (NMDB), which links data for a sample of mortgage loans with credit and property data to gain a better picture of the housing finance market and borrower behavior. The database will break new ground in many ways, primarily by bringing together extensive private data sets on loan performance and borrower credit activity with public data, including property records, and adding occasional surveys. This will enhance understanding of both the workings of the mortgage market and the impact of housing on the broader economy.

Visualization is another way to add value to raw data. Ren Essene and Michael Byrne’s essay describes an excellent example of this. The Home Mortgage Disclosure Act (HMDA) dataset on mortgage originations has been available to the public for many years and has served as the basis for many valuable studies. The source data set, however, is large and difficult to manipulate. The work described in the essay made critical elements of HMDA quickly accessible as interactive charts and maps on the website of the Consumer Financial Protection Bureau (CFPB). In addition, the CFPB created a series of tools that make it easy for users with minimal computer skills to download understandable exhibits they can use directly. Code for America is a nonprofit organization committed to expanding user-centered design in local government websites. The organization brings advanced technology into the public sector through their Fellows program, which matches teams of designers and technologists with individual local governments for a year to tackle a proposed problem.


Although access to raw data is essential, using data for decision-making requires transformation, including excerpting selected data from a larger data set and then arranging the selections so they provide information to decision makers at the right time and in the right form. This section begins by framing how community data can be used to support better decisions. It then offers examples of integrated applications and discusses roles of different actors and broader applications.

Using Data to Make Better Decisions

Both the federal government and philanthropic funders have been urging communities to engage in data-driven decision-making. But what does that mean? There are three distinct functions in which data are critical to making good decisions, although in practice, they are often performed jointly:

  • Situation analysis: learning more about the problems and opportunities you face and their implications;
  • Policy analysis and planning: deciding the course of action to address your situation;
  • Performance management and evaluation: tracking and assessing the progress of your selected course of action, and making mid-course corrections based on what you learn.

Situation Analysis: Ben Warner’s essay discusses “Community Indicators Projects,” in which a group of local stakeholders (in one neighborhood or citywide) get together and select a multi-topic set of indicators they think best reflects their collective well-being. They recurrently collect data on the indicators and look over the results to see where things are getting better or worse, and by how much. Indicator projects are most effective when they lead directly to responsive action. In Warner’s community (Jacksonville, Florida), for example, a citizen’s committee reviews the recent trends and, depending on whether they are positive or negative, assigns either a “gold star” or a “red flag.” Every year, the organization selects one or more of the red flags to “mobilize the community for action through a shared learning engagement and advocacy process.”

Using data in this way is key to setting priorities, and it helps keep the focus on actions that are critical to addressing the issues that are most urgent and avoids too much investment in things that are getting better by themselves. For example, suppose that, even though foreclosures have been a hot topic in the last few neighborhood association meetings, new data show the neighborhood actually ranked low on that indicator compared with others and the rate was dropping fast. Alternatively, suppose that although the neighborhood ranked very low on juvenile crime in the prior year, the new data show that it had the second highest increase in the juvenile crime rate since then. Timely data that allow a community to distinguish between such trends are essential to the ability to allocate resources effectively.

Sometimes, the facts uncovered in a situation analysis are enough in themselves to force decision. In Providence, Rhode Island, for example, laws regarding sales of tax-foreclosed properties were revised after solid data demonstrated that the share of these properties purchased by slumlords had been shockingly high for several years.[5]

Policy Analysis and Planning: This function involves using data to design an effective course of action to respond to the findings of the situation analysis. It implies conceptualizing alternative ways of doing things and then assessing and comparing the likely advantages and disadvantages of each. The main uses of data in this function revolve around improving the ability to estimate the likely effects of the alternatives: in what ways and by how much they will change desired outcomes, how much they will cost, what side effects and unintended consequences they might produce, and so forth.

Community development practitioners have done this type of analysis for years when developing real estate: creating financial pro formas on alternative designs and schedules for new housing and other real estate projects and comparing the estimated rates of return. The recent innovations have been in use of data-driven analysis to uncover and quantitatively evaluate a more complicated mix of effects for a more complicated mix of programmatic actions.

So far, we know of no attempts to estimate the effects of all activities in a complex (multi-program) community initiative. But there is evidence that many local institutions are devoting considerably more effort to assembling and analyzing data to back up their planning than they did in the past. For example, the essay by Erika Poethig describes the evolution of Chicago’s use of data to enhance the quality of the city’s five-year plans for affordable housing. Nancy Andrews and Dan Rinzler describe how the Low Income Investment Fund’s innovative Social Impact Calculator leverages existing social science research to estimate the dollar value of the social benefits of their investments. Other essays in this volume discussing similar efforts to measure potential and actual program effects include Aaron Wernham’s essay on health impact assessments and Ian Galloway’s essay on making “impact investment” decisions.

Ira Goldstein’s essay describes a tool called Market Value Analysis, which starts by assembling a substantial amount of data on a city’s neighborhoods and then uses a statistical procedure (cluster analysis) to sort neighborhoods into several different “types,” based mostly on their housing market conditions. The contrasts among the market types are vivid and help city officials and community groups understand both the situations different neighborhoods face (situation analysis) and which mixes of programmatic actions are likely to work best in which types of neighborhoods (policy analysis). Interventions may involve cleaning up vacant lots, intensive code enforcement, stimulating market-based rehabilitation, acquiring and demolishing vacant buildings, or a mixture of strategies.

Foundations are emphasizing another use of data in planning: “evidence-based practice.” Do data from other locations show that a program you are considering is effective or not? If so, how did it work? Groups can use data on similar programs’ inputs, approaches, and effects (including changes in outcomes, costs, side effects, etc.) to estimate probable program effects. Although not all good solutions have been tried and evaluated, and many solutions that work in one situation are not directly replicable in another, programs whose inputs, outcomes, and impact are well documented can definitely provide practitioners with a better place to start.

Data are also key to flexibility—the ability to make mid-course corrections and multiple iterations as results suggest certain strategies are more or less successful, and as new problems develop. For some urban planners in the past, the idea was make a good plan and try to stick with it. As the world becomes increasingly complex and uncertain, being adaptable may be more effective than consistency. If one can quickly spot an important new trend, it is easier to adjust plans and secure better outcomes. As Patricia Bowie and Moira Inkelas demonstrate, this flexibility enabled by real-time data is important to improving outcomes for individuals as well as communities.

A new data-driven approach will not completely replace the way practitioners design strategies and programs. It will always be important to weigh the probable positives and negatives of different ways of doing things, based on experience, “how the world works,” and the facts at hand. Good judgment in this process will always be essential. But data will be used more often as a basis for estimating the effects of alternative strategies, replacing more of the guesswork and intuitive choices, and enhancing the predictability of outcomes.

Performance Management and Evaluation: Pressure to use data to expand the accountability of social programs has grown markedly over the past decade.[6] Whatever the intervention (whether it is a single youth employment program or a full comprehensive community initiative), the implication is that the managers need to select a set of indicators of the results they are trying to achieve. Then, as the work is underway, they regularly collect data on those indicators, hold meetings to review what has happened (good and bad), and design mid-course corrections to program plans as the data may suggest.

Performance management uses data to improve program performance in the short- to medium-term. In contrast, program evaluation attempts to determine whether a program has met its goals over the long term. Performance management is conducted by the program’s management team while the program is underway, whereas evaluation is most often conducted by outsiders after the fact. Evaluation has been much more frequently supported by funders than performance management over the past two decades.

Victor Rubin and Michael McAfee’s essay explains the requirements for effective performance management in the federal Promise Neighborhood initiative. They describe how a standard data and technology infrastructure can facilitate performance management. They acknowledge the challenges in cultivating an organizational culture that views data as essential to getting results, but share examples in Hayward and Nashville where data-driven approaches are taking hold. They also demonstrate how a national intermediary can support better practice locally. The essay by Cory Fleming and Randall Reid describes a similar process, “Performance Stat,” that has been adopted by a sizable number of state and local government agencies in the past few years.[7] Features seen as key to the success of this approach are insistence on the involvement of high-level officials in the management reviews and holding those reviews frequently and regularly, as well as careful thinking to select the right metrics up front. Review meetings work best, the authors suggest, when they are not mainly about celebrating success or addressing failure, but when they focus on figuring out what worked, what did not, and why, and then revising plans accordingly.

The essays by Susana Vasquez and Patrick Barry, and by Alaina Harkness, consider the application of performance management in nonprofit-managed community development. After-the-fact evaluations will still need to be supported, but Harkness argues that funders should place higher priority on building the data capacities of their grantees so the grantees can better manage their own programs in the short term.

Some communities are using “collective impact” strategies to make improvements. Collective impact is an expanded form of performance management that recognizes that most fundamental societal objectives (such as improving education) cannot be achieved by individual institutions working in a field one-by-one.[8] Rather than each institution employing performance management to improve results in its own narrowly defined silo, collective impact joins all of the relevant actors together in one performance management process, committing to the same overarching goals and using an ongoing system of “shared measurement” to track performance against the goals. To date, the collective impact approach has been applied most often to citywide or regional objectives. The most notable example is the “Strive” initiative, which focuses on education objectives in the Cincinnati area and other cities. However, the approach has now been applied successfully to many other problems and opportunities, including finding jobs for public housing residents (Chicago), reducing violent crime (Memphis), and addressing childhood obesity (Somerville, MA).[9]

It is important to recognize the differences between performance management and program evaluation. Ideally, program evaluations determine the extent to which the program caused the final outcomes that are observed. The only sure way to do that is to construct a plausible counterfactual. For example, if program participants are randomly assigned to either an experimental group that receives the program treatment) or a control group that does not, and the context for each group is the same or very similar, one can typically say that the program caused the differences in outcomes.[10] These randomized controlled trials (RCT) are considered the gold standard, but are extremely difficult to construct for complex efforts such as multi-program community initiatives. A variety of alternative approaches have been proposed that, even though they cannot meet the RCT standard in full, can provide useful information to guide decisions about future investments.[11] In their essay, David Fleming, Hilary Karasz, and Kirsten Wysen wrestle with these issues in evaluating programs that attempt to address the social determinants of health. Raphael Bostic explains why insisting on RCTs as the only standard of evidence may hinder, rather than promote, evidence-based policymaking.

Putting It All Together

So far, we have reviewed the three basic elements of data in community decision-making separately. In reality, however, they are often combined and span one or more institutional environments. And the process of decision-making normally does not occur in an orderly sequence; there is a considerable amount of back and forth among the elements. As one example, the essay by Alex Karner and his colleagues at University of California–Davis Center for Regional Change describes their work to help diverse stakeholders in California’s San Joaquin Valley prepare sustainable communities strategies that incorporated equity values, in a region characterized by significant inequality. This involved examining and presenting new data on multiple dimensions and using those data to devise collaborative regional planning strategies to advance social equity. Rather than following a pre-determined linear process, the personal relationships, staff capacity, and political climate shaped the ways in which the local advocates and planners incorporated data into regional planning. The next three sections illustrate other approaches used to expand data-driven decision-making by local players.[12]

Sharing Data Within a Place: The Camden Coalition of Healthcare Providers (CCHP) has developed an integrated (shared) data system that includes demographic, diagnosis, and financial information for all admissions and emergency room visits made by city residents to the city’s three main hospitals. Analysis showed that just 1 percent of the 100,000 people who used Camden’s medical facilities accounted for 30 percent of all costs. Under the leadership of a young physician, Jeffrey Brenner, the new approach focused on identifying and developing trusting relationships with many of these “super-utilizers.” Care is provided in home visits or over the phone. It consists of services that emphasize prevention, such as helping patients find a stable residence, ensuring they take their medications on schedule, and addressing their smoking and other substance abuse problems. The data system provided substantial information on each patient, allowing providers to target services sensitively. Results for the first 36 patients were impressive. The average number of hospital and E.R. visits for this group dropped from 62 per month before joining the program to 37. The average hospital bills for the group declined from $1.2 million per month to just over $500,000. The data were also used to target community-based interventions for diabetes. A New Yorker article featured this approach and the coalition is assisting other communities trying to build similar systems.[13]

NEO CANDO (the Northeast Ohio Community and Neighborhood Data for Organizing), developed and maintained by the Center on Urban Poverty and Community Development at Case Western Reserve University, is a property-based information system that illustrates key advances in the field.[14] The system incorporates vast amounts of data from many sources. It incorporates property-level data on topics that are typically in the files of local property tax assessors and recorders of deeds, such as ownership, physical characteristics, tax amounts and arrearage status, sales transactions, and sales prices. It also integrates, and makes available on a real-time basis, records of other city departments (e.g., housing code violations, building and demolition permits) and other data that are normally either unavailable or not integrated with other property records in a usable manner (e.g., vacancy status, foreclosure filings, sheriff’s sales, REO status).

The data are used for many purposes. The most notable is to support decisions about what to do with individual properties within neighborhoods. Groups of Cleveland stakeholders (CDCs, other nonprofits, city officials—with support from NEO CANDO staff) meet and jointly to examine recently updated parcel-level maps, tables, and analyses, paying attention to the spatial clustering of conditions as well as the circumstances of individual properties (situation analysis). Fairly sophisticated analyses have been used to support decisions by the Land Bank, CDCs, and others about which buildings warrant demolition or rehabilitation. The data also help the city’s code enforcement staff and other special purpose agencies and nonprofits to prioritize their activities. Because the data on individual properties are regularly updated, they show changes in status that can directly serve as a basis for performance management, answering questions such as, what types and how many properties did we address? What happened as a result of our efforts? And how rapidly?

Participants have said that the data and process of using the information in this way have been an important boost for collaboration and influence. That the organizations were all operating from the same data, and that the data were themselves broadly available, promoted broader inclusiveness and diminished controversy. Participants were less likely to disagree because they knew the reasoning and facts behind the choices that had been made.

Sharing Data Across Places: Shared measurement can also be helpful when individual organizations in different cities that belong to a network or industry agree to collect data so that selected indicators can be brought together centrally and the results shared with all participants. The essay by Maggie Grieve offers a useful framing of this approach and describes several examples, including a multi-year joint effort by NeighborWorks America and the Citi Foundation to capture and assess outcome measures for 30 Citi grantees operating financial coaching programs. This essay recognizes that shared measurement systems have special data quality and data consistency challenges, requiring common data standards (or crosswalks) to make them work.

Another valuable example is the emerging outcomes initiative described in the essay by Bill Kelly and Fred Karnas. Stewards of Affordable Housing for the Future (SAHF) members will collect consistent data on the outcomes of efforts to holistically improve the well-being of residents of affordable housing developments. The same underlying concept is behind the CoMetrics and HomeKeeper data sets, which facilitate cooperative businesses and community land trusts decisions, respectively, which are described in the essay by Annie Donovan and Rick Jacobus. The Aeris Cloud, which Paige Chapel discusses, allows CDFIs and investors to track a variety of financial and performance metrics against peers. This helps meet the information needs of capital markets while enabling both CDFIs and investors to realize efficiencies through standardized reporting.

The main motivation for these systems is to help improve decision-making by the individual participants. With access to the central system, participants can see how their own characteristics and performance compare with similar organizations on any number of selected metrics, and then adjust their own strategies accordingly. Comparing differences in program approaches against differences in results can yield the greatest benefit as managers think through the factors behind successful performance in a way they never could with internal data alone. An added benefit is that because data can be aggregated across multiple entities in a network, the resulting information informs users of the health and impact of the entire network.

Dashboards: Too much data can be almost as dysfunctional as too little. Thus, a higher priority is now being given to “boiling down” the data to focus on the most important and informative metrics—that is, a collection that will be manageable—and displaying results in ways that enable users to quickly grasp the main messages. One such tool is the “dashboard,” normally a one-page summary of key results presented in an easy-to-read (and remember) display. This tool directs focus on a comparatively small number of indicators. This focus does not mean discarding the rest of the data set. In today’s best practice, organizations maintain much more data than they put on their dashboards and use the information in supplementary analyses to shed light on the forces at work behind the key results. Bridget Catlin’s essay on county health rankings explains both the allure and perils of dashboards and “indices” and offers a broader assessment of visualizing and communicating information through design and display. Ben Warner’s essay also offers useful guidance on dashboards.

Actors, Roles, and Broader Uses

Using data effectively depends on more than the data; it also depends on who is involved, what roles they play, and how they use the data beyond its initial purpose.

Involving the Community: As in other endeavors, many decisions affecting a community can legitimately be made solely by the staff and managers of one “professional” institution, like a Head Start provider deciding how many staff to hire for next year, or a community health center deciding which of two alternative pieces of equipment to buy. In many cases, however, a community’s residents and their institutions should be involved in the process, whether they are directly involved in making the decisions or are consulted during the process. This is particularly true for more comprehensive community development initiatives, but it is also true for some decisions about individual programs, including, for example, the overall strategies of the Head Start provider or community health center.

Since 1995, “democratizing information” has been the theme of the local data intermediaries in NNIP. This means in part that, rather than conducting the analysis and writing the report themselves, they take the data to the community. They help the residents and community groups probe the data and structure them to support decisions that will benefit the community. The intention is that at the end of the day, the residents recognize that the decisions that have been made are their decisions. They “own” the process and its results, and the data intermediary was only their coach and facilitator. Some of the most valuable experiences have been when a neighborhood group understood a community problem (crime in the neighborhood, for example, is most prevalent near vacant buildings) and saw a solution (focus police resources near the vacant buildings), but the powers in the city paid no attention to them. Yet once the community presented maps that showed the overlay of crimes and vacant buildings at a public forum with the press in attendance, policies began to change.

The essay by Meredith Minkler on community based participatory research takes the concept a step further, explaining that involving the community in data collection generates more relevant data as well as more effective actions in response—albeit not without potential inconsistency with academic or programmatic standards. The essay by Patricia Bowie and Moira Inkelas discusses the development and use of data in real time by community residents and service providers to enhance health outcomes. In addition to quantitative data, these strategies encourage collection of qualitative data, which can convey understanding of critical topics that are normally impractical (often impossible) to quantify, such as community perspectives, social networks, and community assets. The greatest payoff is when stakeholders can use both qualitative and the quantitative data in mutually reinforcing ways.

Empowerment, Education, and Advocacy: The data that are generated for all phases of community decision-making can be extremely powerful in engaging diverse audiences and changing mindsets. Linking data to a plausible argument can convince neighborhood residents to get involved in an initiative, local philanthropies to provide needed funding, city councils to revise unproductive legislation, and the public to change their vote on an issue of community concern.[15]

Raphael Bostic’s essay makes this point in stark terms at the federal level, contrasting the relative budgetary success of programs to counter homelessness and the difficulties housing counseling programs have had getting and retaining funding. The data provide the credibility that is essential to both advocacy and longer-term education. But Bostic argues the data alone are not enough. Successful data-driven decision-making also requires a narrative, a clear story that makes a case that the public and relevant policymakers can understand, and an effective communication vehicle, such as publications, meetings, websites, and other strategies that will reach the relevant audience and convince them to act.

Enhancing Individuals’ Decisions: The information revolution is not just about institutions. It is also enabling individuals, including the residents of low-income communities, to access and use data directly. Many of these applications are quite straightforward, such as apps that show city buses’ arrival times or street snowplowing in real time, but many are more complex. These include online tools that can help individuals use personal data that may not be easily accessible to them to improve their lives, such as when applying for public benefits, preparing their tax returns, or developing new job skills. For example, expunge.io, a Code for America app built as a collaboration with the Mikva Juvenile Justice Council in Chicago, assists people who were arrested when they were under 18 to determine whether they are eligible to have the records erased (expunged), access their records, and help them apply for the process.

Amias Gerety and Sophie Raseman’s essay introduces My Data, an emerging strategy that allows individuals to access and bring together the electronic records that institutions keep about them (e.g., the records of doctors, schools, employers, utility companies) and use the data themselves for a variety of purposes (such as credibly verifying their situations to third parties such as mortgage lenders or prospective employers). They also discuss “Smart Disclosure,” the release of multiple data sets that allow developers to build applications that, for example, help individuals to compare potential college choices, taking into account interest, cost, and likely return.

Supporting Neighborhood Research: More and better use of data in research is essential to the future of low-income communities. City neighborhoods are varied and complex. Thus far, we have little capacity to predict how they will change, or to understand the interaction of forces that produce change and the implications of the changes that do occur. Recent work by Claudia Coulton has documented advances in neighborhood research and trends in work around key questions that remain unanswered. [16] A better understanding of the dynamics of neighborhood change will benefit all institutions engaged in community improvement. And better data will enhance that research.


The last 25 years have seen impressive advances in the capacity to use data to improve conditions in low-income communities. But much remains to be done before these communities can fully take advantage of this potential. To realize this potential, we must overcome challenges in the availability of good data, tackle privacy and confidentiality issues, and improve our ability to use the data. We also must implement today’s best practices in many more places. We urge you to keep these challenges in mind as you read the remaining essays in this volume. The concluding essay in this volume will return to these themes and suggest ways for the field to move forward.

The Availability of Relevant Data

Although progress is being made, we need forceful efforts to address five data-availability problems:

Access. Many relevant government data files are still not released to the public. Local data intermediaries (such as NNIP partners) have succeeded in convincing local agencies to share data broadly in several locales, and the open data movement has motivated sizable data releases online in many localities. Although this still represents a very small share of what should be released, the trend is in the right direction and accelerating. Although focused efforts to get more program managers to share their nonconfidential data externally are still needed, their willingness to do so appears to be expanding.

The challenge is different of course with systems that contain confidential information on individuals and households. The highest standards must be met to ensure that confidential information will not be released to the public. Even in these cases, however, some types of data (often summarized) are becoming more available for use in the public interest. The work discussed earlier on integrated data systems demonstrates that professionals are finding ways to use selected data in such systems for legitimate purposes while rigorously protecting the confidentiality of individual records.[17]

Another serious concern is where public data become proprietary; that is where governments either sell public files directly, or license the data to private firms who then charge often prohibitively high rates to would-be users. The public has already paid for the creation of the original data and should not have to pay a second time to access them. A strong national effort should be mounted to develop effective policy for these situations.

Quality and Timeliness. Many administrative records, especially at the local level, are still replete with errors. One of the most useful steps to reduce errors is to make the commitment to release files to the public. That commitment creates strong pressures on managers, and thereby staff, to improve (or create) strong quality control procedures. Ideally, such procedures would include systematic feedback loops that: (1) share data collected by the line staff back with them, ideally embedded in a process that encourages them to understand data issues and improve day-to-day operations; and (2) make it easy for nongovernment users to share identified errors with the agencies that own the data and have the authority to correct the source files. Timeliness is also critical to the value of data. For some types of decisions, data that are a year old, or even a few weeks old, are useless. Fortunately, this is an area seeing rapid progress. We are moving toward a time when a considerable amount of administrative data will be available to users on a real-time basis. The essay by Patricia Bowie and Moira Inkelas explains how important this is, as does our discussion of Cleveland’s property information system (NEO CANDO).

Usability. The open data movement has resulted in a growing number of raw administrative data sets released to the public over the web. Many are complex and can only be used directly by experts. More effort is needed by the originating agency or intermediaries to transform many of these data sets into more accessible sources for a wider range of community stakeholders. The HMDA visualization and query tools described earlier are excellent examples of more accessible data.     

Fragmentation. Most administrative data sets released by cities are the products of individual agencies and, because of different standards and protocols, the data sets cannot be used together. Yet, as we have noted, some of the most valuable community applications require the joint use of data from different sources. Data intermediaries (such as the NNIP partners and those developing integrated data systems) have solved this problem at the local level by developing data-sharing agreements across agencies, excerpting relevant data from various files and integrating them to create consistently defined indicators for common geographies. The problem for the field at this point is that such integration is not yet underway in nearly enough places.

Topical Coverage. Administrative data sets are compiled for operational purposes. It is not surprising that they do not contain information on a number of topics that are important to understanding and addressing neighborhood change. Claudia Coulton’s essay offers one example. She points out that it is impossible to understand shifts in some key neighborhood conditions without data on residential mobility, but that these data are hardly ever available in neighborhood indicator systems. However, as more managers of social programs recognize the importance of data about mobility, some programs are considering collecting data on moves of program participants. Similarly, data on characteristics of neighborhood social networks are not available in administrative data sets. There is hope here too, however. It has been suggested that it may ultimately be possible to obtain information on such networks by creative analysis of data from Facebook and/or other social media sites.

However, some concepts important to neighborhoods may never be captured in administrative data sets. In these cases, priority should be given to studies (involving limited surveys, analyzed in conjunction with administrative data) that scientifically identify administrative indicators that serve as good “proxies” for the missing concepts. Where this does not work, surveys and qualitative research remain the only possibilities. Hopefully, expanding the coverage of administrative data into new areas and aggressively searching for proxies will provide information on more community topics we now know little about, freeing up resources for better-focused surveys and qualitative work on key topics still not covered.

Tackling Privacy and Confidentiality

Open, big, and linked data raise the stakes for privacy issues, as it becomes easier to combine sources to identify individuals. One challenge is disclosure practices and norms related to the collection and use of data. Whether sensors are installed in neighborhoods to track pedestrian traffic or health records are consolidated to improve health care delivery, people may question the monitoring and lack of advance notification. In another example, employers may use proprietary data compiled from numerous sources, such as credit agencies, online commerce websites, or social media, to determine eligibility for employment. The applicant may or may not have a chance to review this data or even know that the information is being used in the process.

The nonprofit sector also needs to be concerned with these issues. The sector can play a role in asking governments and private companies to be more transparent in how data are collected, shared, and used in decision-making, but nonprofits should also be concerned about improving their own practices. As we encourage service and community groups to collect and use data about their clients, more training in obtaining permissions and sharing confidential data responsibly will be needed.

Nevertheless, as the successful implementation of integrated data systems in a number of communities and contexts demonstrates, personally identifiable data can be shared in a way that simultaneously protects people’s information and creates new understanding to benefit both the individual and the community. Advanced technology can assist in controlling permissions and structuring queries and in implementing analytic approaches like masking and synthetic data. The Data Quality Campaign has developed materials on communicating with the public on privacy issues related to education data, but more work is essential in all topic areas.[18]

Capacity to Use Data Productively

Although the barriers to making more relevant data accessible to community actors remain serious, the more formidable challenges lie in making productive use of data that are already available. We see a chain of reasons for these challenges. The underlying problem is that many of the institutions that work in low-income communities are not yet committed to regularly conducting the systematic management processes that create the demand for good data: situation analyses, policy analysis and planning, and performance management.

Why is that? One issue is the lack of education and training about data and practical approaches to using them for staff in community organizations. Another reason is the comparatively slow pace in the development of automated “decision-support” processes and tools. These aids could dramatically simplify the task of manipulating, structuring, and presenting the right data at the right time in any decision-making process. These tools also encourage the use of standardized data, which enhance effectiveness both across and within organizations.

But even if the relevant practitioners were adequately trained and motivated to be strong advocates for more systematic and data-driven decision-making in their organizations, the pace would still be very slow. Assembling and applying data to the complex processes that make up a community would still be too much work for most practitioners to handle on their own. Practitioners’ energies need to be focused on the already challenging work of community development. They should be responsible for improving their own internal data systems and using them more effectively. Although they will need to learn more about using data if they are going to move into a truly data-driven world, they will need help in the process. Therefore, perhaps the most important barrier is the lack of adequate institutional infrastructure to simplify the work of the front-line organizations in assembling and using data. New intermediary services are needed to help them build or transform their own internal information systems so they work with greater efficiency and are structured to support better decision-making. Intermediaries are also needed to help practitioners acquire and take advantage of data from other sources.

Many of the intermediaries should be local. NNIP partners, who consider their primary mission to be assembling local data and ensuring community institutions use the data, are good examples. But, as noted earlier, this type of organization does not yet exist in many places. Supporting infrastructure for data also needs to be strengthened substantially at the national level.

Notwithstanding these challenges, our overall conclusion about the state of the field is considerably more positive than negative. There is now substantial momentum behind expanding the availability of relevant data to help communities function better for the benefit of their residents. While lagging, efforts to develop tools and processes to help local practitioners put the data to productive use are accelerating as well. We are nearing an important inflection point. The coming decade could well see these new capacities spur fundamental changes in how America’s communities function.

[1]   Nancy O Andrews and David J. Erickson, eds., Investing in What Works for America’s Communities: Essays on People, Places and Purpose (San Francisco: Federal Reserve Bank of San Francisco and Low Income Investment Fund, 2012). Available at www.whatworksforamerica.org.

[2]   G. Thomas Kingsley, Claudia J. Coulton, and Kathryn L.S. Pettit, eds., Strengthening Communities with Neighborhood Data (Washington, DC: Urban Institute, 2014), available at www.urban.org/strengtheningcommunities.

[3]   See Claudia J. Coulton, “Catalog of Administrative Data Sources for Neighborhood Indicators” (Washington, DC: Urban Institute, 2008).

[4]   For more information about integrated data systems, see the Actionable Intelligence for Social Policy website at http://www.aisp.upenn.edu/

[5]   Jake Cowan, Stories: Using Information in Community Building and Local Policy (Washington, DC: Urban Institute, 2007) provides accounts of several other cases where bringing community data together in innovative but often quite simple ways has, in itself, led to important changes in laws and policies.

[6]   See, e.g., Mario Marino, Leap of Reason: Managing to Outcomes in an Era of Scarcity (Washington DC: Venture Philanthropy Partners, 2011), and Mark Friedman, Trying Hard Is Not Good Enough: How to Produce Measurable Improvements for Customers and Communities (Victoria, BC: Trafford Publishing, 2005).

[7]   For a useful review of these processes, see Robert D. Behn, “The Seven Big Errors of PerformanceStat.” Rappaport Institute/Taubman Center Policy Brief. (Cambridge, MA: Harvard University, John F. Kennedy School of Government, February 2008).

[8]   John Kania and Mark Kramer, “Collective Impact,” Stanford Social Innovation Review (Winter 2011); Fay Hanleybrown, John Kania, and Mark Kramer, “Channeling Change: Making Collective Impact Work,” Stanford Social Innovation Review (Winter 2012).

[9]   Hanleybrown et al., “Channeling Change: Making Collective Impact Work.”

[10]  There is a sizable literature on methods for evaluating social programs in varying circumstances, summarized by Adele V. Harrell et al., Evaluation Strategies for Human Services Programs: A Guide for Policymakers and Providers. (Washington, DC: Urban Institute, 1996).

[11]  See James P. Connel et al., eds., New Approaches to Evaluating Community Initiatives: Concepts, Methods and Context (Washington, DC: Aspen Institute, 1995).

[12]  Just a few examples are noted here. For more, see Stories: Using Information in Community Building and Local Policy., by Jake Cowan. (Washington, DC: Urban Institute, 2007); Federal Reserve Board of Governors, Putting Data to Work: Data-Driven Approaches to Strengthening Neighborhoods (Washington DC: Federal Reserve Board of Governors. December, 2011); and Chapters 5 and 6 of Strengthening Communities with Neighborhood Data (by G. Thomas Kingsley, Claudia J. Coulton and Kathryn L. S. Pettit, Washington, DC: Urban Institute Press, 2014).

[13]  Atul Gawande, “The Hot Spotters: Can We Lower Medical Costs by Giving the Neediest Patients Better Care?” The New Yorker, January 24, 2011. The CCHP database contains 600,000 records. It was developed initially for CamConnect, the NNIP partner in Camden, but it is now operated by CCHP.

[14]  This account is based on Lisa Nelson, “Cutting Through the Fog: Helping Communities See a Clearer Path to Stabilization.” In Strengthening Communities with Neighborhood Data, edited by G. Thomas Kingsley, Claudia J. Coulton, and Kathryn L. S. Pettit (Washington, DC: Urban Institute Press, 2014).

[15]  Many relevant stories can be found on the NNIP website, www.neighborhoodindicators.org.

[16]  See Chapter 7 of Strengthening Communities with Neighborhood Data, by G. Thomas Kingsley, Claudia J. Coulton and Kathryn L. S. Pettit, (Washington, DC: Urban Institute Press, 2014)

[17]  Also see Dennis P. Culhane et al., “Connecting the Dots: The Promise of integrated data systems for Policy Analysis and Systems Reform,” Intelligence for Social Policy 1 (3) (2010): 1–9.

[18]  Data Quality Campaign’s materials on communicating about privacy issues to different types of stakeholders are available at http://www.dataqualitycampaign.org/action-issues/privacy-security-confidentiality/.