Abstract
Researchers are now accessing millions of Online Social Network (OSN) interactions. These are available at no or low cost through Application Programming Interfaces (APIs) or data custodians including DataSift and GNIP. Records held in Extensible Markup Language (XML) or JavaScript Object Notation (JSON) are well structured but often inconveniently formatted for use in popular Relational Database Management Systems (RDBMS) or Geographic Information Systems (GIS) software. In contrast, emerging NoSQL (Not-only Structured Query Language) technologies are specially designed to ‘ingest’ unstructured data. Extract/Transform/Load (ETL) procedures for the storage and subsequent analysis of two OSN datasets in SQL/NoSQL databases are examined. The fixed data model of the relational approach may prove problematic when loading unpredictable document-based structures arising from extended periods of data collection. Although relational databases are far from obsolete the spatial analysis community seems likely to benefit from experimentation with new software explicitly designed for handling spatio-temporal Big Data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
JISC: The Value and Benefit of Text Mining to UK Further and Higher Education. Digital Infrastructure (2012)
Campbell, S.W., Kwak, N.: Political Involvement in “Mobilized” Society: The Interactive Relationships Among Mobile Communication, Network Characteristics, and Political Participation. J. Commun. 61, 1005–1024 (2011)
Lee, C.-H.: Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Syst. Appl. 39, 9623–9641 (2012)
Bahir, E., Peled, A.: Identifying and Tracking Major Events Using Geo-Social Networks. Soc. Sci. Comput. Rev. 31, 458–470 (2013)
Licoppe, C.: Merging mobile communication studies and urban research: Mobile locative media,“onscreen encounters” and the reshaping of the interaction order in public places. Mob. Media Commun. 1, 122–128 (2013)
Humphreys, L.: Mobile social media: Future challenges and opportunities. Mob. Media Commun. 1, 20–25 (2013)
Wilken, R.: Locative media: From specialized preoccupation to mainstream fascination. Converg. Int. J. Res. into New Media Technol. 18, 243–247 (2012)
W3C: Extensible Markup Language (XML), http://www.w3.org/XML/
JSON, http://www.json.org/
ECMA International: ECMA-404 The JSON Data Interchange Format, Geneva (2013)
Pew Research Center’s Project for Excellence in Journalism: McCain vs. Obama on the Web: A Study of the Presidential Candidate Web Sites, http://www.journalism.org/node/12772
Greengard, S.: The first internet president. Commun. ACM 52, 16–18 (2009)
Levenshus, A.: Online Relationship Management in a Presidential Campaign: A Case Study of the Obama Campaign’s Management of Its Internet-Integrated Grassroots Effort. J. Public Relations Res. 22, 313–335 (2010)
Towner, T.L.: All Political Participation Is Socially Networked? New Media and the 2012 Election. Soc. Sci. Comput. Rev., 1–15 (2013)
Polat, R.K.: The Internet and Political Participation: Exploring the Explanatory Links. Eur. J. Commun. 20, 435–459 (2005)
Mutz, D.C., Young, L.: Communication and Public Opinion: Plus Ca Change? Public Opin. Q. 75, 1018–1044 (2011)
Hong, S.: Online news on Twitter: Newspapers’ social media adoption and their online readership. Inf. Econ. Policy 24, 69–74 (2012)
Kim, Y.: The contribution of social network sites to exposure to political difference: The relationships among SNSs, online political messaging, and exposure to cross-cutting perspectives. Comput. Human Behav. 27, 971–977 (2011)
Nooralahzadeh, F., Arunachalam, V., Chiru, C.: Presidential Elections on Twitter – An Analysis of How the US and French Election were Reflected in Tweets. In: 2013 19th Int. Conf. Control Syst. Comput. Sci., pp. 240–246 (2012)
Campbell, H.: Barack Obama and Twenty-First Century Politics: A Revolutionary Moment in the USA. Pluto Press, London (2010)
Takaragawa, S., Carty, V.: The 2008 US Presidential Election and New Digital Technologies: Political Campaigns as Social Movements and the Significance of Collective Identity. Tamara J. Crit. Organ. Inq. 10, 73–89 (2012)
Facebook: Key Facts - Facebook Newsroom, http://newsroom.fb.com/content/default.aspx?NewsAreaId=22
Tsukayama, H.: Twitter turns 7: Users send over 400 million tweets per day (2013), http://articles.washingtonpost.com/2013-03-21/business/37889387_1_tweets-jack-dorsey-twitter
Chamley, C., Scaglione, A., Li, L.: Models for the Diffusion of Beliefs in Social Networks: An Overview. IEEE Signal Process. Mag. 30, 16–29 (2013)
McGregor, R.: Obama campaign sharpens tech edge (2011), http://www.ft.com/cms/s/0/b2e7043c-2284-11e1-923d-00144feabdc0.html
Lees-Marshment, J., Lilleker, D.G.: Knowledge sharing and lesson learning: consultants’ perspectives on the international sharing of political marketing strategy. Contemp. Polit. 18, 343–354 (2012)
Boyd, D., Crawford, K.: Critical Questions for Big Data. Information, Commun. Soc. 15, 662–679 (2012)
Bond, R.M., Fariss, C.J., Jones, J.J., Kramer, A.D.I., Marlow, C., Settle, J.E., Fowler, J.H.: A 61-million-person experiment in social influence and political mobilization. Nature 489, 295–298 (2012)
Crampton, J.W., Graham, M., Poorthuis, A., Shelton, T., Wilson, M.W., Zook, M.: Beyond the geotag: situating “big data” and leveraging the potential of the geoweb. Cartogr. Geogr. Inf. Sci. 40, 130–139 (2013)
Leetaru, K., Wang, S., Cao, G., Padmanabhan, A., Shook, E.: Mapping the global Twitter heartbeat: The geography of Twitter. First Monday 18 (2013)
Kosala, R., Adi, E.: Harvesting Real Time Traffic Information from Twitter. Procedia Eng. 50, 1–11 (2012)
Wilson, M.W.: Location-based services, conspicuous mobility, and the location-aware future. Geoforum 43, 1266–1275 (2012)
Spinsanti, L., Ostermann, F.: Automated geographic context analysis for volunteered information. Appl. Geogr. 43, 36–44 (2013)
Goodchild, M.F., Glennon, J.A.: Crowdsourcing geographic information for disaster response: a research frontier. Int. J. Digit. Earth 3, 231–241 (2010)
Warf, B., Sui, D.: From GIS to neogeography: ontological implications and theories of truth. Ann. GIS. 16, 197–209 (2010)
Batty, M., Hudson-Smith, A., Milton, R., Crooks, A.: Map mashups, Web 2.0 and the GIS revolution. Ann. GIS 16, 1–13 (2010)
Andrienko, N., Andrienko, G., Gatalsky, P.: Exploratory spatio-temporal visualization: an analytical review. J. Vis. Lang. Comput. 14, 503–541 (2003)
Stieglitz, S., Kaufhold, C.: Automatic Full Text Analysis in Public Social Media – Adoption of a Software Prototype to Investigate Political Communication. Procedia Comput. Sci. 5, 776–781 (2011)
Morstatter, F., Pfeffer, J., Liu, H., Carley, K.: Is the sample good enough? comparing data from twitter’s streaming api with twitter’s firehose. In: Proc. ICWSM (2013)
Twitter: How do I get firehose access? | Twitter Developers, https://dev.twitter.com/discussions/2752
DataSift: Language Guide | DataSift Developers, http://dev.datasift.com/csdl
Twitter: Overview: Version 1.1 of the Twitter API | Twitter Developers, https://dev.twitter.com/docs/api/1.1/overview
Facebook: JSON with Unity, https://developers.facebook.com/docs/unity/reference/current/Json/
Firefox: JSONView:: Add-ons for Firefox, https://addons.mozilla.org/en-US/firefox/addon/jsonview/
Codd, E.F.: A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 377–387 (1970)
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Hung Byers, A.: Big data: The next frontier for innovation, competition, and productivity (2011)
Foley, J.: OracleVoice: Extreme Big Data: Beyond Zettabytes And Yottabytes - Forbes, http://www.forbes.com/sites/oracle/2013/10/09/extreme-big-data-beyond-zettabytes-and-yottabytes/
Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.): WBDB 2012. LNCS, vol. 8163. Springer, Heidelberg (2014)
Chang, F.A.Y., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 4:2–4:26 (2008)
Apache: HBase - Apache HBaseTM Home, http://hbase.apache.org/
Apache: Welcome to ApacheTM Hadoop®!, http://hadoop.apache.org/
Borthakur, D., Rash, S., Schmidt, R., Aiyer, A., Gray, J., Sarma, J., Sen, M.K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A.: Apache hadoop goes realtime at Facebook. In: Proc. 2011 Int. Conf. Manag. Data - SIGMOD 2011, vol. 1071 (2011)
Shekhar, S., Evans, M.R., Gunturi, V., Yang, K., Cugler, D.C.: Benchmarking Spatial Big Data. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 81–93. Springer, Heidelberg (2014)
Bernstein, P., Brodie, M., Ceri, S., DeWitt, D., Franklin, M., Garcia-Molina, H., Gray, J., Held, J., Hellerstein, J., Jagadish, H.V.: others: The Asilomar report on database research. ACM Sigmod Rec. 27, 74–80 (1998)
D’Souza, D.F., Wills, A.C.: Objects, components, and frameworks with UML: the catalysis approach. Addison-Wesley, Reading (1998)
Axelos: About PRINCE2® | PRINCE2®, http://www.prince-officialsite.com/AboutPRINCE2/AboutPRINCE2.aspx
Microsoft: Microsoft Download Center, http://www.microsoft.com/en-us/download/details.aspx?id=36843
Murray, S.: Import UTF-8 Unicode Special Characters with SQL Server Integration Services, http://www.mssqltips.com/sqlservertip/3119/import-utf8-unicode-special-characters-with-sql-server-integration-services/
Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35, 61–70 (1992)
Edlich, S.: NOSQL Databases, http://nosql-database.org/
Cutting, D.: The Apache Hadoop Ecosystem, http://assets.en.oreilly.com/1/event/75/TheApacheHadoopEcosystemPresentation.pdf
MongoDB: MongoDB, http://www.mongodb.org/
MarkLogic: Enterprise NoSQL Database | MarkLogic, http://www.marklogic.com/
Walmsley, P.: XQuery. O’Reilly (2009)
MarkLogic: MarkLogic 7 — MarkLogic Developer Community, http://developer.marklogic.com/products
MarkLogic: Using MarkLogic Content Pump (Loading Content Into MarkLogic Server) — MarkLogic 7 Product Documentation, http://docs.marklogic.com/guide/ingestion/content-pump
Till, B.C., Longo, J., Dobell, A.R., Driessen, P.F.: Self-organizing maps for latent semantic analysis of free-form text in support of public policy analysis. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 4, 71–86 (2014)
Lee, K.K.-Y., Tang, W.-C., Choi, K.-S.: Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage. Comput. Methods Programs Biomed. 110, 99–109 (2013)
Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput 9, e1002854 (2013)
Lin, J., Ryaboy, D.: Scaling big data mining infrastructure: the twitter experience. ACM SIGKDD Explor. Newsl. 14, 6–19 (2013)
Wang, S.: CyberGIS: blueprint for integrated and scalable geospatial software ecosystems. Int. J. Geogr. Inf. Sci. 27, 2119–2121 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tear, A. (2014). SQL or NoSQL? Contrasting Approaches to the Storage, Manipulation and Analysis of Spatio-temporal Online Social Network Data. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8579. Springer, Cham. https://doi.org/10.1007/978-3-319-09144-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-09144-0_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09143-3
Online ISBN: 978-3-319-09144-0
eBook Packages: Computer ScienceComputer Science (R0)