The DataBoost Nexus #7
Big Data Resources
Previously, we discussed a proper definition for big data, and we considered how data sets can be used in myriad ways to accomplish and complement a wide variety of goals.
The next question to be answered is where to find large volumes of data that are applicable to particular enterprises or industries.
Internal Data
The first sources to consider, and some of the most applicable and readily available, are sources inside your enterprise. A complete listing of client data, including addresses, email information, and any available demographic metrics, could be considered a form of big data – especially for larger enterprises with client lists that number in the thousands.
As well, purchase and transaction histories can be considered big data, especially for order histories that go back years or decades.
External Data
External sources of big data can be broken down into public and private repositories. While private repositories are often confidential or require significant expenditures to acquire, there are a number of publicly available data sets that are both massive and highly useful to a broad range of industries.
Public Data Sets
This article from LinkedIn provides an excellent starting point for finding public data repositories, including data.gov – perhaps the largest public source of data on the planet:
A more recent article from BigData-MadeSimple.com provides an expanded list of public sources that includes many of the sets listed above as well as a number of internationally available big data repositories:
http://bigdata-madesimple.com/70-websites-to-get-large-data-repositories-for-free/
Finally, this most recent list from Forbes offers data hunters a list of the top 30+ sources of big data that can be acquired at no cost:
Big Data Strategies
Of course, acquiring a data repository is only the first step. To utilize the information contained in the data set, you’ll need an enterprise goal that can benefit from the use of large data volumes, and a technique for extracting, analyzing, and outputting the information contained in one or more of these repositories to fulfill that goal.
Next week, we discuss strategies for implementing big data.