IBM/Google Academic Cloud Computing Initiative (ACCI)


The IBM/Google Academic Cloud Computing Initiative (ACCI) is a joint university initiative to help computer science students gain the skills they need to build cloud infrastructures and applications.

The IBM/Google initiative aims to provide computer science students with a complete suite of open source based development tools so they can gain the advanced programming skills necessary to innovate and address the challenges of the Cloud Computing model - which uses many computers networked together through open standards - and thereby drive the Internet's next phase of growth.

The companies will provide hardware, software and services to augment university curricula and expand research horizons while lowering the financial and logistical barriers for the academic community to explore Internet-scale computing. The following resources are available from IBM and Google to Universities to leverage for their respective projects:
  • A cluster of processors running an open source implementation of Google's published computing infrastructure (MapReduce and GFS from Apache's Hadoop project)

  • A Creative Commons licensed university curriculum developed by Google and the University of Washington focusing on massively parallel computing techniques

  • Open source software designed by IBM to help students develop programs for clusters running Hadoop. The software works with Eclipse, an open source development platform.

  • Management, monitoring and dynamic resource provisioning by IBM using IBM Tivoli systems management software

Using this virtual IT lab, students will learn how to develop systems and write massively parallel applications that take full advantage of the distributed computing paradigm rather than the conventional one-server, one-application model. Google and IBM's first pilot phase of the ACCI, granted several prominent US universities access to this large infrastructure. The University of Washington was the first to join the initiative, and a short list of other universities were added to pilot the program.

National Science Foundation (NSF)


In 2008 the ACCI partnered with the National Science Foundation to provide grant funding to academic researchers interested in exploring large-data applications that could take advantage of this infrastructure. This resulted in the creation of the Cluster Exploratory (CLuE) program led by Dr Jim French, which currently funds 14 University projects.

  •   Research Projects  
  •   Resources  

Research Projects

A Comparative Study of Approaches to Cluster-Based Large Scale Data Analysis
This is a collaborative study being conducted by MIT, University of Wisconsin, and Yale University. These three universities are using a National Science Foundation CLuE grants for a comparative study of approaches to cluster-based, large-scale data analysis. Both MapReduce and parallel database systems provide scalable data processing over hundreds to thousands of nodes, yet it's important for researchers to know the differences in performance and scalability of these two approaches to know which is more suitable when designing new data-intensive computing applications. This project is engaged in systems research, much of which requires the ability to change the operating environment. Since this is not possible on the IBM/Google cluster, the project is also hosted on the Cloud Computi ....
A Hadoop Toolkit for Distributed Text Retrieval
Text search is a technology that is vital for modern information-based societies. Today's systems face the daunting challenge of handling quantities of text previously unimaginable. Cluster computing is the only practical solution for addressing the issue of scale. This project leverages the MapReduce framework (via the open-source Hadoop implementation) to tackle issues of robustness and scalability in processing large amounts of data for information retrieval applications.
A Unified Reinforcement Learning Approach for Autonomic Cloud Management
Cloud Computing, unlocked by virtualization, is emerging as an increasingly important service-oriented computing paradigm. The goal of this project is to develop a unified learning approach, namely URL, to automate the configuration processes of virtualized machines and applications running on the virtual machines and adapt the systems configuration to the dynamics of cloud.
Commodity Computing in Genomic Research
This NSF CLuE project focuses on developing parallel algorithms for analyzing the next generation of sequencing data. Scientists can now generate the rough equivalent of an entire human genome in just a few days with one single sequencing instrument. The analysis of this data is complicated by their size - a single run of a sequencing instrument yields terabytes of information, often requiring a significant scale-up of the existing computational infrastructure needed for analysis.
Data-Intensive Text Processing
The NSF CLuE initiative is funding a machine translation project that promises to bridge the language divide in today's multi-cultural and multi-faceted society. Systems capable of converting text from one language into another have the potential to transform how diverse individuals and organizations communicate.
Feedback-Controlled Management of Virtualized Resources for Predictable eScience
This project pursues a novel unified framework to ensure predictable eScience based on two dominant emerging uses of virtualized resources. The foundation of the approach is to wrap an eScience application in a performance container framework and dynamically regulate the application's performance through the application of formal feedback control theory.
Hierarchically-Redundant, Decoupled Storage Project (HaRD)
The Wisconsin Hierarchically-Redundant, Decoupled storage project (HaRD) investigates the next generation of storage software for hybrid Flash/disk storage clusters. The main objective of the project is to improve the performance of storage in a variety of diverse scenarios, including new application environments such as photo storage as found in Facebook and Flickr, high-end scientific processing as found in government labs, and large-scale data processing such as that found in Google and Microsoft.
Hybrid Opportunistic Computing for Green Clouds
Abstract: On-demand, service-oriented cloud computing infrastructures continue to increase in popularity with organizations. Three observations motivate us to investigate running high-throughput, data-intensive tasks as background workloads on these cloud infrastructures. First, the rapid growth in hardware parallelism leaves more residue resources to be exploited. Second, the "incremental power usage" of piggybacking a secondary background workload onto the foreground workload to utilize those residue resources is relatively low. Third, the advances in GPGPU (General-Purpose GPU) processing enable a novel coupling of concurrent workloads. This project will explore a new computing model of offering cloud services on active nodes that are serving on-demand utility computing users. We pla ....
Image Super-Resolution Using Trillions of Examples
Imagine continuously zooming into an image from your personal photo collection. Unlike the modern image processing software, however, this zoom operation would reveal details missing from the original image. Foe example, zooming into someone's shirt would eventually show a high-resolution image of the threads that compose it. A research team at the Department of Computer Science at the University of Virginia plans to develop techniques for intelligently enlarging a digital image that uses a database of millions of on-line images to find examples of what its components look like at a higher spatial resolution.
Learning Word Relationship Using TupleFlow
This project focuses on how researchers at the Center for Intelligent Information Retrieval (CIIR) are using the CluE infrastructure to learn more about word relationships. These relationships are not labeled explicitly in text and are quite varied; by exploiting these relationships, this project will help lead to a more effective ranking of web-retrieval results.
One Thousand Points of Light
A large class of distributed data-rich applications, including distributed data mining, distributed workflows, and Web 2.0 Mashups, are increasingly relying on cloud services to meet their data storage and computing demands. This project proposes a cloud proxy network that allows optimized and reliable data-centric operations to be performed at strategic network locations.
Scaling the Sky with MapReduce/Hadoop
Astrophysics is addressing many fundamental questions about the nature of the universe through a series of ambitous wide-field optical and infrared imaging surveys. New methodologies for analyzing and understanding petascale data sets are required to answer these questions. This research project is focused on developing new algorithms for indexing, accessing and analyzing astronomical images. This work is expected to have a broad range of applications to other data intensive fields.
Trustworthy Virtual Cloud Computing
Abstract: Virtual cloud computing is emerging as a promising solution to IT management to both ease the provisioning and administration of complex hardware and software systems and reduce the operational costs. With the industry’s continuous investment (e.g., Amazon Elastic Cloud Computing, IBM Blue Cloud), virtual cloud computing is likely to be a major component of the future IT solution, which will have significant impact on almost all sectors of society. The trustworthiness of virtual cloud computing is thus critical to the well-being of all organizations or individuals that will rely on virtual cloud computing for their IT solutions. This project envisions trustworthy virtual cloud computing and investigates fundamental research issues leading to this vision. Central to this visi ....
Where the Ocean Meets the Cloud
This project is building a new infrastructure for computational oceanography that uses the CluE platform to allow ad hoc, longitudinal query and visualization of massive ocean simulation results at interactive speeds. This infrastructure leverages and extends two existing systems: GridFields, a library for general and efficient manipulation of simulation results and VisTrails, a comprehensive platform for scientific workflow, collaboration, virtualization, and provenance.

Resources

Video: IBM/Google Academic Cloud Computing Initiative (ACCI)
IBM/Google Academic Cloud Computing Initiative (ACCI)