In the past six years, we’ve seen significant growth in state and local governments’ approaches to open data as they realize they can achieve many goals by openly publishing the data they collect. They’re recognizing the importance of publishing data in formats easily accessed by “civic hackers” and app developers. They’re also discovering cost savings by publishing data online rather than waiting for individual citizens’ requests. Finally, they’re finding that their departments can cooperate better when they share data using a common platform.
The aim of open government data—to open the storehouses of government data to the world in order for the data to have maximum use and effectiveness—is a bold one. Open data initiatives reposition governments as suppliers of data and also anticipate the participation of additional parties in using those data. Those parties, whether other members of government, members of the public, or individuals with a specific professional or technical interest in the data, are the “demand” side of the equation, and are vital partners in using the data to solve key problems or make progress toward goals.
This essay describes how governments may choose to open data, the benefits of doing so, and how nongovernmental actors can interact with the process. It also argues that governments should recognize that the power and value of their data holdings can be realized only by engaging data users and taking stock of their interests—and that it is critical that actors outside of government recognize and make use of the opportunity to shape open data policy and practice.
What Is “Open Data”?
The meaning of open data is rooted in the principles outlined in the 2005 Open Definition, which states that data are “open” when they are available to everyone, free for use and reuse, and when data sets have a minimalist form of licensing that, at its most stringent, requires author attribution and the obligation to “make subsequent derivative works similarly open.” The Open Knowledge Foundation, which uses advocacy, technology, and training to open data, offers three criteria for “open”: 1) legal openness and freedom from restrictive licensing terms; 2) social openness (making information easily available for collaboration); and 3) technological openness (making information available in machine-readable and nonproprietary data formats). The concept has evolved to encompass the idea that information should be available online, from a primary source, timely and complete, and published in machine-readable formats that promote maximal use and reuse.
Open “data” can be more than numeric data. Information such as the text of legislative bills or laws (e.g., data sets used by the Sunlight Foundation’s Open States project or the OpenGov Foundation’s America Decoded project) can fall under a more expansive definition of open data. Text-based data become open data when data-holders eliminate legal restrictions on its use, publish data online for public access, and structure data in a way that improves search and text analysis, such as by implementing a consistent data format.
Arriving at a common understanding of open data matters for individual open data initiatives: A narrow definition of open data limits what the government is expected to produce and what people are expected to use the data for, whereas a more comprehensive definition may entail higher costs for government and higher expectations that the data-using community will produce real public benefits from the data. The public has a compelling interest in open government data because the data were created and gathered under public authority.
Open Data Examples
Governments have taken two main approaches to opening government data. They either publish data on a website or they develop a legal structure to undergird a more complex and ongoing process of open data release. The first method is by far the more common. The next section describes the second method: establishing an open data policy. This method is more difficult and less common, but it has the advantage of laying the groundwork for improving continuity of access to data, creating a better understanding of government data holdings, and ensuring access to specific types of data negotiated through the policy.
Simply publishing open data on a website—for example, by creating a new open data portal—is the more familiar approach for governmental data managers. If we understand all information to be a form of data, government websites have always been synonymous with data release. Even early descriptions of the goals for government websites anticipated that they would allow citizens to perform a wide variety of tasks and that information systems within and between governments would be effectively integrated online. The introduction of open data as a separate concept to specifically enable greater public use and reuse of government data is built on the foundation that governments serve their citizens not only in the traditional sense but also through an online presence.
The concept of making data available and reusable also has roots in existing government practice. Most notably, the U.S. Census makes “published census statistics… available to anyone who needs them.” The U.S. Government Printing Office has published the daily activities of Congress online in the Congressional Record since 1994, and the U.S. Congress created the THOMAS website to open legislative data to public oversight the following year. At the state level, Florida and Hawaii were early adopters of online campaign finance disclosures, maintaining publicly accessible campaign finance websites by 1997. In general, making records available by computer tape, disk, or dial-up servers provided an intermediate step between paper copies and the advent of government websites to provide the public with free access to government data.
Although aspects of government actions during the 1990s and early 2000s resembled ideals of open data, governments began using the term open data to describe such efforts only later in the process. Nongovernment groups such as the Open House Project and the Open Government Working Group advocated for aligning the established principles of open government with open, machine-readable, license-free, and online access to government data. In January 2009, the U.S. government provided the most visible legitimation of these principles. President Obama’s “Memorandum on Transparency and Open Government” identified online publication of government data as a method to improve government transparency and also implied that technology could be used to improve civic participation and collaboration. Later that year, the White House and executive agencies used the term open data to refer to methods of releasing the data in their open government plans.
In the course of these developments, local government began sharing many data sets, and private companies began to provide platforms for open formats, later called “open data portals.” In 2006, Washington, DC, was among the first local governments to create a website featuring a wide range of government data with the specific purpose of “streaming data that the agencies gather through normal operations” to the public. Soon after, Seattle-based Socrata opened its doors with the intention of providing “open government data, readily accessible over the internet, in a form that maximizes comprehension, interactivity, participation, and sharing.” The earliest commercial provider of open data platforms, Socrata was selected to power a number of new federal, state, and local open data sites by 2010. (Since this time, a number of other of commercial and free open data platforms, e.g. CKAN, have also come into routine use.)
Many U.S. state and local governments have continued to build on this trend. Governments currently post a wide range of data sets from both internal and external sources. Unfortunately, because of the sheer number of state and local government sites and the varying methods and naming conventions, it has proved challenging to aggregate a comprehensive list of sites. Nonetheless, efforts such as the federal government’s Data.gov, informal collaborative efforts such as Datacatalogs.org, or simple iterative web searches demonstrate that US federal agencies, states, and localities share government data using hundreds of sites.
Developing Open Data Policy
Opening data can be a relatively straightforward process. Open data advocate Waldo Jaquith has pointed out that governments can create an open data site by running a search for CSV, XLS, and XML files that already exist on their network of public sites and posting them on the same page. However, providing lasting access to the same data requires the additional step of legally codifying public access to data.
As was the case with the development of open data portals, the development of open data policies began without the open data label. For example, the 2006 memorandum from a Washington, DC, city administrator contains many of the elements of the later open data policy. It mandates a rationale for streaming data online, describes a timeline for specific data release, mandates specific agency responsibilities, and describes the need to maintain quality and review. After the 2009 White House memorandum, additional local governments began to develop policies that explicitly used the language of “open data.” The city council of Portland, Oregon, for example, passed a resolution to “mobilize and expand the regional technology community…by promoting open and transparent government, open data, and partnership opportunities.” The mayor of San Francisco issued an execute directive to “enhance open government, transparency, and accountability by improving access to City data that adheres to privacy and security policies.”
The Portland and San Francisco examples also demonstrate the two primary paths to developing state and local open data policy: the legislative and executive approaches to mandating government data release. Since 2009, several states and cities have chosen to follow one of these paths. The Sunlight Foundation has documented more than 30 formal state and local open data policies; approximately two-thirds of these policies were established through legislative means (by law, resolution, or ordinance) and approximately one-third were created through executive means (by memo or directive). In at least one case (San Francisco), an open data policy originally established by the executive was later superseded by legislative policy.
The pace of development of open data policies on a local level has only increased. The Sunlight Foundation identified eight policies new policies in 2010−2011 and six new policies in 2012. Fifteen were enacted in 2013 and eight were enacted in the first four months of 2014. Open data policies have been enacted in the most populous American municipalities—Los Angeles, New York City, and Chicago—in several midsized cities, and in places as small as Williamsville, New York, a village of 5,277 residents. As of mid-2014, eight states—Texas, Illinois, Utah, New York, Hawaii, New Hampshire, Connecticut, and Maryland—had passed open data policies, while an additional seven state-level open data bills had been introduced for consideration.
Because it involves political processes, achieving an open data policy is more complicated than publishing government data on a website. However, the value of the open data policy is that it offers the public a far better guarantee of access to specific data sets in specific formats and of specific quality. Without a formal policy, the public may lose access to posted data if the government website is revised or if department staff change. Individuals who posted those data sets may or may not choose to keep them current. They may or may not choose to make the data sets available in formats that are easy to use and reuse. Moreover, open data policies usually describe a specific rationale for making government data available to the public, and this formal statement provides people who interact with government data an opportunity to make a case for access to existing or new data.
The two methods for implementing open data—releasing open data and creating an open data policy—are not mutually exclusive. In many cases, governments begin by publishing open data online and then develop a formal policy about their practice. Individuals who are interested in enjoying more access to government data may gain the necessary backing for a broader-scale policy by building gradually toward it, by both building political support and demonstrating the value of existing government data releases.
It is unclear whether the pace of open data policy enactment will continue—or whether a majority of governments will adopt open data as a formal policy—but if the practice continues to spread, we may achieve the same outcome even without an official policy.
Benefits and Stakeholders of Open Data
Regardless of how governments choose to open data, the success of the data release relies on people connecting with those data. To capture the attention of additional users, governments will need to understand the various motivations for using the data and stakeholder preferences about which data are most important to achieve their goals. Recognizing the range of potential benefits and stakeholders allows governments to craft an open data initiative tailored to the interests of their local actors.
From community activists, to small businesses, to civic technologists, open data can help groups of people achieve their goals, and those goals can be quite different.
The Sunlight Foundation collected examples of how open data were used to accomplish several objectives. In reviewing the collection, the foundation found:
- Governments and nongovernmental actors can use open data to increase transparency by linking government revenue and expenditure details to a publicly accessible tool for exploring government finances. The New York City comptroller office’s Checkbook NYC 2.0 does this with open-source software that other cities can mimic.
- Open campaign finance and government spending data—including contracts, grants, and subsidies—have allowed watchdogs and journalists to improve political accountability by highlighting improper and publicly-discoverable behavior. For example, WAMU, the community radio station in Washington, DC, used the district’s open spending and campaign finance records to document the connection between real estate developers’ campaign contributions and their receipt of public development subsidies.
- Some local governments, such as New York City and Chicago, are using open data to increase interdepartmental cooperation and increase efficiencies within government. Several nongovernmental organizations are also using open government data to identify potential areas for cost savings, including DataKind’s project to identify optimal public tree maintenance schedules and Data Science for Social Good’s documentation of the relationship between crime and extended streetlight outages.
- Open data can point to issues of service quality and enable advocacy for improvement by increasing transparency about existing services. For example, The Los Angeles Times used public municipal emergency response data to map neighborhood experiences of emergency response time. SeeClickFix deployments in several municipalities allowed citizens to register 311 complaints about nonemergency municipal service concerns.
- Finally, open data can enhance citizens’ participation by providing new opportunities to communicate with governments. 596 Acres, a group of advocates fighting blight, used data to map vacant lots and facilitate public-private agreements on temporary land use solutions, for example. This effort created new avenues for government-public interaction. Others, such as Philadelphia’s OpenDataRace—a contest where individuals and organizations were encouraged to nominate new datasets for public release—provide a format for government outreach to increase local citizens’ interest in and use of government data.
Individuals and organizations outside government are using open data with a variety of motivations. For example, organizations that focus on using open data to find technological solutions to governance problems, typified by Code for America brigades and other civic hackers, are seeking improved trust and collaboration between governments and citizens. Other organizations are seeking business opportunities. The new US node of the UK-based Open Data Institute, for example, plans “to identify valuable, unreleased data sets, identify the audience for those data sets, and then identify the business proposition that makes that data set valuable and its use sustainable.”
Other organizations promote more community-focused benefits. Data intermediaries, such as the members of the National Neighborhood Indicators Partnership, are using open data in targeted ways to tackle social issues. Bob Gradeck of the University of Pittsburgh’s Center for Social and Urban Research describes the variety of data “consumers” who benefit from the work his organization does to collect, clean, prepare, and present open data on property and community conditions. Beneficiaries include students, university faculty, community-based organizations, social service organizations, journalists, residents, home buyers, local journalists, and government agencies, as well as civic hackers. The Baltimore Neighborhood Indicators Alliance aims “to show how using city and state data can help communities reduce crime rates, improve public transit, and help students perform better in school” and “to strengthen Baltimore neighborhoods by providing meaningful, accurate, and open data at the community level.”
Rufus Pollock of the Open Knowledge Foundation described open data initiatives as functioning like an ecosystem, with “data cycles” in which data come from a government source and pass through “infomediaries” who work with, clean, and “wrangle” the data for improving their utility and quality before returning them to the original data source. The governments that provide the source data, the players that analyze and repackage the data, and the ultimate users of the derived products are all part of the “open data ecosystem.” This perspective acknowledges the interdependence between government data and nongovernment data producers and users in achieving the goals associated with open data. It highlights the need for robust feedback between data producers and data users and anticipates data flowing not only in one direction from a government to a nongovernment user, but in complex cycles that feed back into government data use and production.
The multidirectional nature of current open data flows is well illustrated by the growing set of government websites hosting open data that have been collected by specialized community actors. The federal open data site Data.gov, for example, now allows community organizations to post information on their website. At the municipal level, Baltimore.gov similarly hosts community data gathered by the Baltimore National Indicators Alliance. Nongovernment actors can benefit by using the government data site as a platform for sharing their information, and governments benefit by maintaining a broader open data collection for their constituents.
Creating a Successful Open Data Initiative
The Sunlight Foundation recommends that governments and open data advocates starting or improving an open data initiative engage with a broad range of stakeholders to identify core values and goals that their community (and community-based data users) collectively supports. The goals and values that result from these discussions should help shape decisions as the system develops. This process also offers an opportunity for potential individual and organizational users to articulate their own goals and build relationships with others who have similar aims.
The Sunlight Foundation’s Open Data Guidelines help communities answer the next question about what data should be made public. Open data initiatives are about gaining access to more quality government data for free use and reuse, but these initiatives are not intrinsically connected to access to any particular data set. To know what data are available, the Sunlight Foundation recommends that governments provide a public list of all data sets they maintain, including descriptions of those they believe cannot yet be released because of privacy concerns. Governments and open data advocates should then work together to explore which data sets should receive priority attention for release. The data sets selected depends on both existing public records laws, explicit stakeholder goals, and the public interest. The Sunlight Foundation advocates that governments consider public input in a number of ways when deciding what to release first, including thorough review of past Freedom of Information requests, other inquiries from internal or external actors, or data used in public hearings and public law-making. The guidelines also advise how to make data public, which include specifying formatting, documentation, and technology platforms. Although this is generally the government’s purview, decisions should be made in light of the values and goals outlined for the initiative.
Finally, communities will need to grapple with how to implement the open data policy. Sunlight’s guidelines recommend providing regular opportunities for public feedback, both formally and informally. Informally, governments can collect comments through the open data portal or Twitter, or through interactions at community events. Nongovernment actors may also participate in open data policy implementation in a formal oversight role. Governments may choose to create open data working groups that include seats for nongovernment members to oversee the implementation of an open data initiative. Maryland’s development of an Open Data Working Group through state statute provides one example of this. New York City officials regularly meet with the local Transparency Working Group, a less formal relationship but one that nonetheless plays an important consultative role and illuminates additional information or datasets needed by stakeholders.
It makes a difference who is involved in advocating for open data because different actors are motivated by different goals, and the mix of goals pursued will produce different outcomes for the initiative, at least in the short term. What motivates governments to create an open data initiative may differ from the goals that citizens’ groups, community service organizations, journalists, academics, or civic hackers may have in using open data. These different motivations will influence which data sets are made available and maintained. Therefore, wide participation in the early stages of an initiative will help to shape it to the community’s preferences. Being aware of the range of options, methods, and roles in open data initiatives eases the process of figuring out how different user groups can get what they need from open data. All stakeholders may begin by identifying their own role within the open data ecosystem.
For government agencies, beginning an open data initiative can be as simple as discovering which data sets are publicly available online, collecting them together on a single page, and conducting outreach to increase awareness and gather feedback about the holdings. They can inform the public that the data can be used without restriction, and they can make the data available in formats that facilitate that use. If governments are ready to consider a more comprehensive process, they can begin by identifying their primary goals for the open data initiative, meet with relevant stakeholders, and explore the Sunlight Foundation’s resources for open data policy development.
Nongovernment organizations—whether community-serving groups, research institutions, associations of journalists, or citizens’ groups—can help lead change by opening up their own data. They can also participate by advocating for their local governments to develop or improve an open data initiative. Where collaboration with government is more challenging, it is possible to create a community open data portal with partner organizations. Asking for input from local civic technologists, such as local Code for America brigades or civic hacker MeetUp groups, may help groups more quickly use open data to achieve organizational goals.
Open data initiatives are still relatively new, but through persistent and positive interaction between government agencies and citizens, this new government function can achieve many important public interest goals. Resources developed by the Sunlight Foundation and other national groups, as well as highlights and lessons from emerging initiatives, can motivate new localities to launch their own open data practice and promote continuous improvement of existing efforts. By sharing our experiences, we can advance the state of practice and achieve the maximum social and economic benefits from opening up government data.
 Open Knowledge Foundation, “The Open Knowledge Definition” (Cambridge, UK: Open Knowledge Foundation, 2005), available at https://web.archive.org/web/20060819043123/http://www.okfn.org/okd/definition.html. The Open Knowledge Foundation credits the open source coding movement, and specifically the 1997 “Debian Free Software Guidelines,” for providing roots for the broader “open definition.” See Open Knowledge Foundation, “The Open Source Definition” (Cambridge, UK: Open Knowledge Foundation, 2005), available at https://web.archive.org/web/20060924131931/http://www.opensource.org/docs/definition.php. See also opendefinition.org.
 Open Knowledge Foundation, “The Three Meanings of Open” (Cambridge, UK: Open Knowledge Foundation, 2005), available at https://web.archive.org/web/20060113133743/http://www.okfn.org/three_meanings_of_open.html.
 These additional qualities were first described in a document called “The Eight Principles of Open Government Data,” created by a collection of open government advocates in 2007, available at http://opengovdata.org/.
 K. Lange and J. Lee, “Developing Fully Functional E-Government: A Four Stage Model,” Government Information Quarterly 18 (2001): 122−136.
 For an example of the widespread nature of this intermediate step to online data sharing, see the variation in public access to campaign finance disclosure in Elizabeth Hedlund and Lisa Rosenberg, Plugging In the Public: A Model for Campaign Finance Disclosure (Washington, DC: Center for Responsive Politics, 1996).
 Executive Office of the President of the United States, “Memorandum for Heads of Department and Agencies” (Washington, DC: Executive Office of the President of the United States, 2009). http://www.whitehouse.gov/sites/default/files/omb/assets/memoranda_fy2009/m09-12.pdf.
 See, e.g., the Open Energy Information plan at www.whitehouse.gov/open/innovations/OpenEnergyInformation or the Presidential Open Government Report at www.whitehouse.gov/sites/default/files/microsites/ogi-progress-report-american-people.pdf.
 Robert Bobb, “Streaming of DCStat Data to www.dc.gov.” Memorandum (Washington, DC: Executive Office of the Mayor, 2006). http://www.scribd.com/fullscreen/26442622?access_key=key-20rfsh26eu0ob66xlbmu.
 Socrata, “Opening Government One Dataset at a Time” (Seattle, WA: Socrata, 2010), available at https://web.archive.org/web/20100208173200/http://www.socrata.com/about.
 Waldo Jaquith, public presentation at Open Data NJ Summit, May 16, 2014, Montclair, NJ.
 Robert Bobb, “Streaming of DCStat Data.”
 City of Portland, Resolution No. 36735 (City of Portland, 2009), p1, available at www.portlandonline.com/shared/cfm/image.cfm?id=275696.
 City and County of San Francisco, Office of the Mayor, Executive Directive 09-06 (San Francisco, CA: Office of the Mayor, 2009), available at http://sfmayor.org/ftp/archive/18.104.22.168/executive-directive-09-06-open-data/index.html.
 Several locations developed multiple policies during these years; San Francisco, e.g., developed four increasingly ambitious policy approaches to open data between 2009 and 2013.
 Bob Sofman, “Here Are Our Values” (San Francisco, CA: Code for America, March 27, 2014), available at www.codeforamerica.org/blog/2014/03/27/here-are-our-values/.
 Rufus Pollock, “The Present: A One-Way Street” (Cambridge, UK: Open Knowledge Foundation, March 31, 2011), available at http://blog.okfn.org/2011/03/31/building-the-open-data-ecosystem/.