Open Datasets
As a market researcher, app producer and software entrepreneur, I use a lot of different data sets for either research or tell stories. Here are a few great repositories I use regularly:
Market Research
- Consumer Complaints Database describes consumer complaints about financial products and services.
- Product Safety Recall
- Franchise Failures by Brand.
- Top 30 earnings website
- Car sales data
- Global entrepreneurship monitor.
- General Social Survey from the National Opinion Research Center offers the most often used survey data on happiness in the U.S. Since 1972.
- Gallop poll and Gallop Analytics provide a variety of data sets focusing on:
- Economic confidence
- Employment
- Entrepreneurial energy
- Confidence in leadership
- Confidence in military and police
- Religion
- Food access
- Corruption
- Freedom of media
- Life evaluations
Countries / Continents
United States
- Data.gov contains over 200k USA data sets about topics ranging from education to Agriculture.
- Open Government Select Datasets contains all names from Social Security card applications for births that occurred in the United States after 1879.
- USA Healthcare contains USA healthcare data covering loads of health-related topics.
- Non Profit 990 Tax Filings is great for salary data and more.
- American Community Survey focuses on Age, race, income, commute time to work, home value, veteran status.
- American time use survey comes from the Bureau of Labor and Statistics.
- Wage statistics by Area and Occupation.
- USA tapestry data 2015 is great for cross referencing where people live and the type of psychographic lifestyle they exhibit.
- USA Demographic data 2015 is the foundation of marketing. Age, sex and location and ethnicity.
United Kingdom
- Data.gov.uk is the UK government’s open data portal including the British National Bibliography—metadata on all UK books and publications since 1950.
- NHS Digital (formerly Health and Social Care Information Centre) contains datasets from the UK National Health Service.
Europe
- European Union Data Portal contains thousands of datasets about a broad range of topics in the European Union.
Other
- Socrata works with governments to provide open data to the public,
- Dataportals datasets from all around the world collected in one place.
- World Factbook information prepared by the CIA about all the countries of the world.
- Unicef Data contains statistics about the situation of children and women around the world.
- World Health organization contains statistics concerning nutrition, disease and health.
- Amazon Web Services is a large repository of datasets including the human genome project, NASA’s database and an index of 5 billion web pages.
Data sources for civic engagement
- City data has collected and analyzed data from numerous sources to create as complete and interesting profiles of all U.S. cities as we could.
- Envirofacts Envirofacts provides a single point of access to U.S. EPA environmental data contained in U.S. EPA databases. Interested parties from State and local governments, EPA or other Federal agencies, or individuals can search for information about environmental activities that may affect air, water, and land anywhere in the United States. Envirofacts makes it easy to find information using an address, ZIP Code, city, county, water body, or other geographic designation. Envirofacts make it easy to find information from all sources or within specific environmental subject areas, such as Waste, Water, Toxics, Air, Radiation, and Land. Experienced users can use more sophisticated capabilities such as maps or customized reporting.
- U.S. Census The American Community Survey 5 Year Data covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population.
Weather
- Dark sky API lets you query for short-term precipitation forecast data at geographical points inside the United States.
- Weather API alerts, almanac, astronomy, conditions, currenthurricane, forecast, forecast10day, geolookup, history, hourly, hourly10day, planner, rawtide, satellite,tide,webcams,yesterday.
- National Oceanic and Atmospheric Administration GSOD contains global data obtained from the USAF Climatology Center. The dataset covers GSOD data between 1929 and 2016.
Images
- Open Images Data contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.
Maps
Here's my curated list of Map API's which also include geo-coding and GIS.
Music
- Million Song Dataset on EC2
- Million Song Dataset from Labrosa.
- Check out my other post on Music Industry API's
Machine Learning
- UC Irvine Machine Learning Repository
- Kaggle
- Machine learning data set repository
- LIBSVM offers different regression, binary, and multilabel classification datasets stored in the LIBSVM format.
Even More
I found this post on Stack exchange and it's very good.