About This Technology

Big data refers to a collection of data sets whose volume, velocity, and variety overpower conventional relational databases. Big-data sets are large, change quickly, and contain many data types; for example, a big-data set may contain a mix of text, audio, video, and location traces. Big data has no specific size, because some organizations have the capacity to deal with greater amounts of data than others do before needing to invest in new data approaches. For example, large retailers and major financial institutions have been working with very large data sets for many years. Also, because of improvements in data storage, processing power, and communications, the "normal" size of data grows from one year to the next. In addition to exceeding the capacity of conventional systems, big data may enable novel solutions to previously inaccessible or difficult-to-solve problems. For example, during recent years, data-driven approaches have enabled rapid progress in machine translation—progress that far outpaces researchers' previous attempts to build purely rule-based machine-translation systems.

The digitization of many areas of life and the proliferation of digital devices are accelerating data growth. Estimates vary, but a reasonable consensus is that the amount of data in the world is now growing by 40% to 50% every year. Current sources of big data include usage records from bank cards, loyalty cards, and gift cards; cell-phone call records; medical records; social-network updates; GPS traces; surveillance images; web logs; channel surfing; and energy-use telemetry. Other very large sources of data include search indexes, genomic sequences, geophysical profiles, and astronomical observations. The topic of big data generates interest because stakeholders see opportunity to monetize flows of information, gain insights that are hidden in large data sets, build predictive models, combat information overload, and present highly relevant information to consumers, businesses, and governments. Challenges relating to big data include protecting individuals' privacy, ensuring information quality, sharing data between organizations, finding people with the skills necessary to analyze big data, and crafting technology architectures that can scale up.

Although big data is unlikely to live up to hype in the short term, the growth of data and the accompanying increases in the power of statistical algorithms are very real phenomena that will transform many areas of business and society in the next ten years and beyond. For example, big data could help drug companies create personalized medicines, teach computers to recognize details in images, and transform advertising from an intrusion into welcome advice. In the long term, big-data technologies also have the potential to automate decision making (analytics software already automates some online pricing, financial trading, and other processes). Whether big data will yield all these benefits is uncertain, but growth in data volume, velocity, and variety in the next ten years is almost inevitable. The key question is whether companies, governments, and individuals will be able to harness the opportunities that these data create.