Anastasia Ailamaki: Information Virtualization
The excellence of the research performed at EPFL has once again been recognized at an international level. Anastasia Ailamaki, head of Data-Intensive Applications and Systems Laboratory DIAS has been awarded a CONSOLIDATOR GRANT 2013 from the European Research Council (ERC).
Transforming Raw Data into Information through Virtualization
The unprecedented evolution of computing power, combined with the decreasing costs of computation and storage infrastructure, has revolutionized all scientific fields and enterprises; at the heart of this revolution is the ability to collect unprecedented amounts of data. Real progress, however, depends on how efficiently we can ‘extract value from chaos’, i.e., process the collected data and transform it into useful information. New insights in the sciences and ground-breaking advances in industry now depend on our ability to analyse massive and complex datasets, in what is called the “fourth paradigm” of scientific discovery through datadriven computing.
Database products are key to every business infrastructure and scientific discovery. Although data management technologies have had an impressive growth in the past forty years, retrieving actionable information and deriving new knowledge from data remains a complex and time consuming process for all but the simplest analyses. The typical business intelligence architecture consists of a series of databases, each specialized for a type of data processing, which are chained together forming complex and rigid infrastructures, where changes must be evaluated and implemented carefully since they may have repercussions affecting the entire business. The complexity of the infrastructure systematically obstructs users as they try to analyze their data because, despite tremendous progress, data management tools are based on legacy designs whose requirements are no longer adequate. A novel approach is needed urgently, or users risk losing the ability to leverage their hard-earned data.
In this proposal, we raise long-standing data management assumptions that do not scale with today’s massive data explosion, and design a comprehensive end-to-end solution that simplifies data analysis by virtualizing the data, i.e., abstracting it out of its form and manipulating it regardless of the way it is stored or structured. Our insight is that, when we acquire data, we do not know which format or schema they should be in because we do not know what kind of queries will be asked. Therefore, we develop processing algorithms that operate directly over raw data, in its original form and location, and virtualize it seamlessly for processing by different systems. Our objective is to effectively remove long-standing operational and scalability bottlenecks, thereby allowing users to quickly and efficiently leverage their data.
The long-term goal of this proposal is to remove barriers to data handling and maximize efficiency for science, businesses, and their users, by enabling new forms of data computation that are unrestrained by how data is collected or stored. This project will construct the building blocks for data-driven computing and the fourth paradigm of scientific discovery, thereby advancing our ability to automatically explore massive and complex datasets at sight, and dictate the new developments in sciences and the industry.
Max ERC funding: 1.97 million Euros
Duration: 60 months
Host institution: EPFL
Project acronym: ViDa