UC leads cloud computing use for big data analytics
The University of Canterbury is the first university in Australasia to develop cutting edge training through unique
access to cloud infrastructure to solve big-data analysis problems and give staff and students free access to this cloud
infrastructure.
UC Senior Lecturer from the School of Mathematics and Statistics Dr Raazesh Sainudiin secured grants from Databricks
Academic Partners Program and Amazon Web Services Educate which enable free and ongoing access for all UC faculty, staff
and students to use their enormous cloud-computing infrastructure for academic teaching and research.
This provides UC with huge potential to emerge as a leader in big data analytics in this region of the globe, says Dr
Sainudiin, who is part of UC’s Big Data Working Group. He is giving a presentation about UC’s capabilities in industrial
research and big data analytics to members of the local tech industry on 3 May.
“In today's digital world, data about every conceivable aspect of life is being collected and amassed at an
unprecedented scale. To give you some idea of how much data we are talking about, IBM estimated that a whopping 2.5
exabytes (2,500,000,000,000,000,000 bytes) of data was generated every single day, and that was back in 2012. This
massive data could potentially hold answers for many critical questions and problems facing our world today. But to be
able to get at these important answers, the first step is to be able to explore and analyse this gargantuan volume of
data in a meaningful way,” he says.
“Cloud computing allows you to instantly scale up access to over 10,000 off-site computers, as required by the scale of
the real-world big data problem at hand, and complete the data analyses in the least amount of time needed - usually a
matter of hours.
“What if all past and present recorded and real-time data of earthquakes on the planet could be analysed simultaneously?
Or consider the live analysis of every tweet on Earth. There are on average 60 tweets per second. The scale of such
volumes of data is such that they can't be stored, let alone analysed, by one computer or even a 100 computers in any
sort of reasonable timeframe.”
UC has already established a research cluster (at www.math.canterbury.ac.nz/databricks-projects) with thousands of
computer nodes running Apache Spark, a lightning-fast cluster computing engine for large-scale data processing. This
locally set-up resource taps into the infrastructure provided by these grants and is being used by UC students in a new
course STAT478: Special Topics in Scalable Data Science, including several students who are full-time employees in the
local tech industry.
Students are trained to run their own big-data projects as part of their course requirements. This cutting-edge training
using cloud infrastructure to solve big-data problems will generate globally competitive graduates for the data
industry, with key skills in top paying technologies listed in the 2016 Developer Survey, Dr Sainudiin says.
With a curriculum created in consultation with the tech industry, the innovative course has been praised by Wynyard
Group’s Chief Technical Officer Roger Jarquin.
“We hope that such industry-academia collaborations will continue to be a dynamic training ground for future employees
in our growing data industry,” says Jarquin, also an Adjunct Fellow of UC's School of Mathematics and Statistics.
Professor James Smithies, Director of King's Digital Lab, Department of Digital Humanities, King's College London, and
former Senior Lecturer in History at UC, says the course in Scalable Data Science is an excellent resource for the
digital humanities, and sits very nicely beside activities occurring at King’s Digital Lab (KDL).
“The combination of AWS and Databricks is broadly in line with what we think digital humanities students and researchers
will need, and benefits from excellent levels of usability and scalability. This kind of approach is of crucial
importance to the future of digital humanities, as researchers move into big data analysis and we seek to provide our
students with the tools and experiences they need to succeed in their careers both inside and outside university,” Prof
Smithies says.
UC Associate Professor Rick Beatson, recipient of the 2015 UC Innovation Medal, and Dr Sainudiin are making a technical
presentation about UC’s capabilities in industrial research and big data analytics to Canterbury Tech (formerly
Canterbury Software Cluster), a non-profit organisation of local tech insiders and entrepreneurs this month
(http://canterburytech.nz/events/May-2016-event/). This technical presentation, held at UC's Centre for
Entrepreneurship, is an industrial outreach activity by UC's Big Data Working Group.
Dr Sainudiin is keen to introduce this freely available on-demand scalable computing resource to interested postgraduate
students, faculty and staff across the University.
“While this Canterbury Tech outreach event is targeted at local industry, it is important to bring more awareness of the
grant's infrastructure, potential benefits and utility to the UC community.”
Dr Sainudiin completed his PhD at Cornell University and a postdoctoral research fellowship at Oxford before joining UC
in 2007.
ends