Analysis via Social and Search-Engine Data November 2013
Subscribe to Insights in Brief to be notified about new Featured Content as it becomes available!
An increasing range of applications use search-engine and social data. The applications of search-engine and social data have moved far beyond marketing and advertising and now include financial trading, flu analysis, national security, public-opinion research, and cinema-box-office forecasting. More applications will emerge in the future. Despite some successes, challenges include hacking, fraudulent data, unpredictable behavior, and underrepresentation of some sectors of society. In addition, the increasing societal value of social and search-engine data raises questions about the ownership and archiving of data and data-access rights. Organizations should continue to exploit search-engine and social data to their full potential but also be aware of their limitations.
Data ownership and the concentration of data among a small number of companies are growing issues for users of search-engine and social data.
Researchers at Warwick Business School (Coventry, England) and Boston University's (Boston, Massachusetts) Department of Physics found that analyzing certain search terms (for example, stocks, happy, and war) entered into Google's (Mountain View, California) search engine over the course of a week can predict whether the Dow Jones Industrial Average will rise or fall in the following week. In 2010, researchers at Indiana University Bloomington (Bloomington, Indiana) found that the use of emotional terms on Twitter's (San Francisco, California) social network can predict stock market trends.
Search-engine and social data are making their way into trading decisions, though results are mixed. In April 2013, a false Twitter posting about an attack on the White House briefly sent the Dow Jones Industrial Average down 145 points. A hedge fund that investors from Derwent Capital Markets (London, England) set up in 2011 to trade on the basis of Twitter data lasted only a month. Despite these problems, algorithmic-trading companies are increasing their use of social-data sources, according to Cornell University (Ithaca, New York) professors Maureen O'Hara and David Easley. Matthew O'Donnell of financial-services-technology firm IPC (Jersey City, New Jersey) says that every trading company will make use of social media and analytics within five years.
Health-care applications that use social and search-engine data are developing. The American Journal of Preventive Medicine recently published research indicating that Google searches for information about a wide range of mental illnesses and other mental problems are seasonal. Google Flu Trends has used search-engine data to track flu activity since 2008; however, during the 2012–13 flu season, the US system became unreliable, perhaps because the widespread media coverage of the severe flu outbreak drove healthy people to search for flu information.
Edward Snowdon's recent leaking of information about the US intelligence program PRISM has highlighted the role of communications data in intelligence and security applications. Such technology continues to progress. A previous Scan™ article mentions Raytheon's (Waltham, Massachusetts) RIOT (Rapid Information Overlay Technology) software, which is a large-scale analytics system capable of tracking trillions of entities in social-networking systems and performing detailed analysis. Social data has a developing role in predicting future security problems. Scientists from Bristol University (Bristol, England) found that Twitter data indicated high levels of anger and fear before riots broke out in the United Kingdom in 2010.
Organizations need to be careful of erroneous results in developing applications of social and search engine data for security, health, and finance, for instance. In addition to the fairly obvious problem of hacking, more subtle problems can arise because of the characteristics and behaviors of search-engine and social-media users. For example, media coverage of an issue changes search-engine and social-media behavior (as happened with Google Flu Trends). Also, problems arise from social-media demographics. For example, a study of the Twitter data created during Hurricane Sandy (which hit the East Coast of the United States in fall 2012) showed the greatest volume of hurricane-related tweets coming from Manhattan in New York and very few coming from more severely affected areas outside the city, where less smartphone ownership exists and where damage affected connectivity. Microsoft Research (Microsoft; Redmond, Washington) principal researcher Kate Crawford points out that the raw data would likely have incorrectly highlighted Manhattan as the worst-hit area.
Demographic problems will likely also arise in analysis that assumes that all age groups are fairly represented in social-media data. A recent study by the Pew Research Center (Washington, DC) found that although 86% of internet users ages 18 to 29 used social media, only 52% of users ages 50 to 64 did so; that figure dropped to just 32% for users older than age 65. The same study also noted that more female internet users than male internet users use social media (71% versus 62%) and that rural internet users are less likely to use social media than are urban users.
Another challenge to the integrity of social data is the rise of click farms—groups that create fake "likes" for comments posted on Facebook's (Menlo Park, California) social network, views of videos on YouTube's (a subsidiary of Google) website, and followers of posters on Twitter's social network. An online casino that sublicensed the Monopoly brand from Hasbro (Pawtucket, Rhode Island) was among the companies that bought Facebook "likes."
Data ownership and the concentration of data among a small number of companies are also growing issues for users of search-engine and social data. As the value of search-engine and social data grows, these companies could gain data monopolies that give them an unfair advantage over competitors. Such companies may even require governments to prop them up in the event of a crisis. Jason Healey, director of the Atlantic Council's (Washington, DC) Cyber Statecraft Initiative, recently argued that the strategic importance of some IT companies could lead to a situation in which the companies become "too big too fail," as was the assessment of certain financial institutions by decision makers in 2008.
Governments are starting to see a need to help manage and maintain internet data. An ambitious British Library (London, England) project aims to archive copies of all 4.8 million websites on the United Kingdom's web domain. At Strategic Business Insights' recent Explorer workshop on big data in Washington, DC, participants from the US government discussed the need to preserve both commercially owned and public data (though legal and practical challenges exist in doing so).
Social and search-engine data are valuable resources, enabling an increasingly wide variety of applications. Although opportunities to create new products and services are plentiful, clear challenges also exist. Aside from hacking and fraud, the characteristics and behaviors of search-engine and social-media users can create spurious analysis. And the potential for data monopolies could damage free markets. Organizations should continue to exploit search-engine and social data but actively manage the risks that such exploitation creates.