The DMS Lab conducts research in areas emerging with new challenges in data management. The lab is a part of the APL Section, which reflects our interest in bringing together software engineering and data management. Projects range from design of spatial databases, including map visualizations and generalizations, to main memory transactional databases, games and simulations, multidimensional indexing, information retrieval and data integration. The group is keen on validating their work experimentally -- we love writing code, which is not to say that our love for the blackboard is in any way diminished. :-)
When conducting our work we usually resort to one or more of the following:
- Abstractions & Languages
- Combinatorial Optimization
- Data layout, Indexing and Data structures
- System Implementation & Design
- Statistics & Prediction
- Parallelism & Distribution
Online Transaction Processing Clouds
With changing architectural and application trends, we are re-visiting the design of online transaction processing databases. In this project, we are investigating the design tradeoffs OLTP databases in light of the demands of evolving online workloads. We are studying novel approaches for main memory transaction processing clouds that allow ease of administration, high resource utiliziation and a flexible programming model. We are currently working to build a prototype system that demonstrates these properties.
Open Geodata Serving
In a collaboration with the Danish Geodata Agency, we have explored new approaches to cook and serve geodata to the public on the Web. A main challenge in cartography is producing maps of high quality over complex shapes requires the craft of human expertise. However, given the explosion in geospatial data, the pressure for high-productivity tools for cartography is increasing at a fast pace. Our work has explored how to create a new class of declarative cartography tools. Our language CVL, the Cartographic Visualization Language, can be processed entirely within a spatial DBMS, opening up exciting opportunities for automatic optimization and scalability. In a separate line of work, we have also analyzed production logs for map-serving web services. These production logs reveal strong spatial and temporal concentration patterns which can be exploited for more efficient caching.
Behavioral Simulations and Computer Games
In collaboration with the Cornell Database Group, we have worked on a new scripting platform for games and agent-based simulations. Our recent work in this project has been around iterated spatial join techniques optimized for main memory, as well as communication, especially latency, optimizations for cloud environments. We have also explored techniques for automatic parallelization of large-scale behavioral simulations, as well as efficient checkpoint-recovery techniques for Massively Multiplayer Online Games (MMOs).
Multidimensional Indexing and Large Main Memories
We have also studied index structures for either read-intensive or write-intensive workloads. For the first class of workloads, we have studied experimentally, together with collaborators from Saarland University and ETH Zurich, the performance of one specific index structure, the Dwarf index. For the second class of workloads, we have studied how to answer queries over collections of moving objects, e.g., for vehicle tracking or spatial agent-based simulations. The problem is challenging because these applications have very high update rates that result from continuous movement. Our technique, MOVIES, is based on frequently rebuilding index snapshots in main memory. Using data partitioning over multiple nodes in a small cluster, we have scaled MOVIES up to 100 million moving objects over the road network of Germany, while keeping snapshot latencies below a few seconds.
Dataspaces and Personal Information Management
In early work at the ETH Zurich Systems Group, we have co-designed the iMeMex Dataspace Management System, a hybrid information integration architecture that allows users to transition from search to data integration in a pay-as-you-go fashion. Unlike traditional relational DBMS, iMeMex does not take full control of the data, but offers services over one's complex personal dataspace. We have explored several interesting themes in the design of iMeMex, such as the definition of a unified data model for personal information, a novel technique based on mapping hints (called trails) to increase the level of integration of personal information over time, and the search over graphs of user data created by view definitions.
We list below our international peer-reviewed publications in journals and conferences (excludes workshops).
Matheus Ataíde, Cid de Souza, Pedro de Rezende, Marcos Vaz Salles.
The Longest Link Node Deployment Problem in Cloud Computing: a Heuristic Approach.
CLAIO 2014, Santiago, Chile.
Tao Zou, Ronan Le Bras,Marcos Vaz Salles, Alan Demers, Johannes
ClouDiA: A Deployment Advisor for Public Clouds.
The VLDB Journal 24(5): 633-653 (2015).
Special Issue on the Best Papers of VLDB 2013.
Extended version of paper that appeared at PVLDB 6(2) (2012) / VLDB 2013, Riva del Garda, Italy.
Transactional Partitioning: A New Abstraction for Main-Memory Databases.
VLDB 2014 PhD workshop, Hangzhou, China
Awarded best paper runner-up
Noy Rotbart, Marcos Vaz Salles, Iasonas Zotos.
An Evaluation of Dynamic Labeling Schemes for Tree Networks.
SEA 2014, Copenhagen, Denmark.
Pimin Konstantin Kefaloukos, Marcos Vaz Salles, Martin Zachariasen.
Declarative Cartography: In-Database Map Generalization of Geospatial Datasets.
ICDE 2014, Chicago, USA.
Benjamin Sowell,Marcos Vaz Salles, Tuan Cao, Alan Demers, Johannes
An Experimental Analysis of Iterated Spatial Joins in Main Memory.
PVLDB 6(14): 1882-1893 (2013) / VLDB 2014, Hangzhou, China.
Pimin Konstantin Kefaloukos, Marcos Vaz Salles, Martin Zachariasen.
TileHeat: A Framework for Tile Selection.
ACM SIGSPATIAL GIS 2012, Redondo Beach, USA.
Tao Zou, Guozhang Wang, Marcos Vaz Salles, David Bindel, Alan
Demers, Johannes Gehrke, Walker
Making Time-stepped Applications Tick in the Cloud.
SOCC 2011, Cascais,Portugal.
Tuan Cao, Marcos Vaz Salles, Benjamin Sowell, Yao Yue, Alan Demers, Johannes Gehrke,
Fast Checkpoint Recovery Algorithms for Frequently Consistent Applications.
SIGMOD 2011, Athens,Greece.
At the conference, we also presented the following demo on our recovery library.
Tuan Cao, Benjamin Sowell, Marcos Vaz Salles, Alan Demers,
BRRL: A Recovery Library for Main-Memory Applications in the Cloud (Demo Paper).
SIGMOD 2011, Athens, Greece.
Jens Dittrich, Lukas Blunschi, Marcos Vaz Salles.
MOVIES: Indexing Moving Objects by Shooting Index Images.
GeoInformatica 15(4): 727-767 (2011).
This paper is an extended version of the SSTD 2009 conference paper.
Guozhang Wang,Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers,
Johannes Gehrke,Walker White.
Behavioral Simulations in MapReduce.
PVLDB 3(1): 952-963 (2010) / VLDB 2010, Singapore.
Marcos Antonio Vaz Salles, Jens Dittrich, Lukas Blunschi.
Intensional Associations in Dataspaces (Short Paper) [Full Version].
ICDE 2010, Long Beach, USA.
Marcos Vaz Salles,Tuan Cao, Benjamin Sowell, Alan Demers, Johannes Gehrke,Christoph Koch, Walker
An Evaluation of Checkpoint Recovery for Massively Multiplayer Online Games.
PVLDB 2(1): 1258-1269 (2009) / VLDB 2009, Lyon, France.
Jens Dittrich, Lukas Blunschi, Marcos Antonio Vaz Salles.
Dwarfs in the Rearview Mirror: How Big are they really?.
PVLDB 1(2): 1586-1597 (2008) / VLDB 2008, Auckland, New Zealand.
Marcos Antonio Vaz Salles, Jens-Peter Dittrich, Shant
Karakashian, Olivier René Girard, Lukas
iTrails: Pay-as-you-go Information Integration in Dataspaces [Video].
VLDB 2007, Vienna, Austria.
Lukas Blunschi, Jens-Peter Dittrich, Olivier René
Girard, Shant Kirakos
Karakashian, Marcos Antonio Vaz
A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo Paper).
CIDR 2007, Asilomar, USA.
Jens Dittrich, Cristian Duda, Björn Jarisch, Donald Kossmann, Marcos Vaz Salles.
Bringing Precision to Desktop Search: A Predicate-based Desktop Search Architecture (Short Paper).
ICDE 2007, Istanbul, Turkey.
Jens Dittrich, Lukas Blunschi, Markus Färber, Olivier René Girard, Shant Kirakos Karakashian,
Marcos Vaz Salles.
From Personal Desktops to Personal Dataspaces: A Report on Building the iMeMex Personal Dataspace Management System (Short Paper).
BTW 2007, Aachen, Germany.
Jens-Peter Dittrich, Marcos Antonio Vaz Salles.
iDM: A Unified and Versatile Data Model for Personal Dataspace Management.
VLDB 2006, Seoul, South Korea.
Jens-Peter Dittrich, Marcos Antonio Vaz Salles, Donald
Kossmann, Lukas Blunschi.
iMeMex: Escapes from the Personal Information Jungle (Demo Paper) [Poster].
VLDB 2005, Trondheim, Norway.
Sérgio Lifschitz, Marcos Vaz Salles.
Autonomic Index Management (Short Paper).
ICAC 2005, Seattle, Washington.
Rogério Costa, Sérgio Lifschitz, Marcos Vaz Salles.
Index Self-tuning with Agent-based Databases.
CLEI Electronic Journal 6(1), 2003.
Special Issue of Best Papers presented at CLEI’2002.
- Databases and Web Programming and Databases and Data Mining (Spring 2014 - Spring 2015 - Block 3)
- Computer Networks (Datanet) (Spring 2012 - Spring 2013 - Block 4)
- Advanced Computer Systems (every Fall - Block 2; started Fall 2011, formerly Principles of Computer Systems Design)
Research seminars and reading groups
- Systems seminar (Fall 2013 - Spring 2015)