Gunter Saake, David Broneske, Jacob Krüger
|Credits:||6 Credit Points|
|Module:||Course for the Module "Schlüssel-und Methodenkompetenzen" of Master programs; DKE (Applications); DE (Fachliche Spezialisierung);
|Language:||The full course will be held in English. Papers and presentations have to be in English as well.|
Schedule and Deadlines updated on 21.03.2018
Remember to start working on the papers early - deadlines are strict! First submitted draft of paper must be complete (including all chapters), so you can receive useful feedback.
In this master level course, we expect everyone to be familiar with basic rules of scientific referencing/citations and will not accept plagiarism of any kind in your work. Especially, copying sentences from third party work fully or in parts, or "rephrasing" sentences in order to hide plagiarism, will be treated as plagiarism.
Student Conference, TBA
The course is intended for graduate students who intend to pursue an academic career, primarily at Master students and PhD students (in the first year of their PhD). We especially recommend this course for Master students that consider to continue in a PhD position or want to practice academic writing in English for their thesis.
All participants should be interested in academic research and in practicing academic writing.
Although PhD students will not receive a grade or schein, we encourage them to participate in this course to practice academic writing and to prepare a paper for their own research project (to be sent to another conference or workshop or to be published as a technical report).
The participating students will simulate a scientific conference to acquire skills required for ...
- ... writing academic papers
- ... presenting scientific results
- ... participating in a conference
- ... reviewing academic papers of others
- ... organizing a conference
- ... using web-based paper submission and review systems
In summary, you will have to write a paper (with two chances to improve it after the initial version), write reviews (two review rounds with 3 reviews each), and present your work (a short initial presentation, a practice presentation, and a final presentation).
The course consists of a lecture that introduces topics such as academic writing, research ethics, organizing a conference, and presentation. The main focus will be on writing an own academic paper, reviewing papers by others, and presenting your results in front of the group.
- Every participant writes an academic paper, typically giving an overview of the current state and future challenges of a selected research area from software engineering or database systems. We are open for a wide range of topics, that can align with other research projects (e.g., part of PhD research or preparation for a Diploma or Master's thesis).
- Every participant presents his topic (and relevant literature) in a short presentation, for first feedback.
- After about six weeks, every participant submits a first version of his/her paper, which is subsequently reviewed by at least 3 other participants.
- Another two or three days later, an improved version of the paper based on the reviewers comments is submitted and subsequently reviewed again.
- The paper can be improved again. A final version is submitted at the end of the semester.
- All papers are presented in a conference. This conference will take place on a whole day near the end of the semester (July or August, date will be fixed in the first lectures). Before the conference, every participant will practice their presentation and get feedback from the others for possible improvements.
Although there are relatively few lectures, we expect that participants focus on reading, writing, reviewing and presentation and we recommend that you reserved at least one day per week (6 CP = 180h = about 12 hours per week).
The course will be graded based on the pre-versions of the paper, the final version, the reviews and the presentation.
This schedule is a first sketch to provide an overview on the lectures and deliverables! The actual dates will be coordinated and fixed with the partcipants of the course during the first two lectures, usually resulting in a less strict schedule that allows for more time to write the papers (e.g., final presentations can also be held after the official examination phase). The only fixed date is the submission deadline for SPLC challenge solutions (may be extended) for the corresponding topics.
Lectures and Presentations
The slides are older versions and the actual content of the lectures can change!
- 2018-04-04 Lecture: Introduction and topic selection
- 2018-04-11 Lecture: Academic writing I (structure, getting started)
- 2018-04-18 Short Presentations of topic and relevant literature (5 min, strict)
- 2018-04-25 Lecture: Academic writing II (style basics, references)
- TBD Lecture: Publication process (conferences, journals, how to select a venue, how to write a review, ...)
- TBD Lecture: Academic writing III (clarity, cohesion, patterns, typical problems) (safeguarding good scientific practice, scientific misconduct, plagiarism, ...)
- TBD Lecture: Research ethics
- TBD Lecture: Scientific presentations + practice presentations
- TBD Final presentations
Deadlines and Deliverables
All deadlines are strict, we cannot accept any submission or review with delay.
- 2018-04-10: Send your title/topic (to Jacob Krüger, use the link, and you advisor)
- 2018-04-18 Short Presentations of topic and relevant literature (5 min, strict)
- 2018-05-25 SPLC solutions submission deadline (for those topics)
- TBD: Submission of abstract due, via email
- TBD: Submission of first draft due, online submission via easyChair (if you have no account please register first on the linked page)
- TBD First review due
- TBD: Submission second draft due , online submission via easyChair
- TBD: Second review due
- TBD: Send your slides for the practice presentation
- TBD: Submission of final version due, online submission via easyChair
Format Instructions: 2017 ACM Master Template (use sigconf option for LaTeX)
The paper must be between 4 and 8 pages long.
PhD students that plan to submit their paper to a real workshop or conference may also use other templates. In this case, the paper should still have a similar length when it would be formatted as the ACM template.
Topics of interest
You can write about your own research topic or results from a bachelor/master thesis. However, if you do not bring your own topic, you can choose one of the following topics in software engineering or databases (mainly for Master students):
Database Operation Tuning (David Broneske)
A current trend in database systems is to tune algorithms at a very fine granularity. Current code optimizations are controversially discussed, but a clear applicability of them is missing. Consequently, discuss the applicability of a subset of available code optimizations on selected database algorithms.
- Bogdan Raducanu, Peter Boncz, Marcin Zukowski: Micro Adaptivity in Vectorwise
- Jingren Zhou, Kenneth A. Ross: Implementing Database Operations Using SIMD Instructions
- John L. Hennessy, David A. Patterson: Computer Architecture -- A Quantitative Approach
Database Operations on Modern Processing Devices (David Broneske)
Tuning database operations to the underlying hardware is a hot topic with the increasing usage of co-processors. There are numerous publications involving different algorithms and processing devices. Create a survey regarding database operations on different processing device.
- Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin, Dinesh Manocha: Fast Computation of Database Operations Using Graphics Processors
- Rene Müller, Jens Teubner, Gustavo Alonso: Data Processing on FPGAs
- Thomas Willhalm, Yazan Boshmaf, Hasso Plattner, Nicolae Popovici, Alexander Zeier, Jan Schaffner: SIMD-Scan: Ultra Fast in-Memory Table Scan using on- Chip Vector Processing Units
Hardware Sensitive Primitives of DBMS Operations (Bala Gurumurthy)
DBMS operations are tuned for efficiency for specific underlying hardware. Though this provides additional efficiency, it makes porting of the operation to another hardware difficult. Most of these operations have common functionality and they are coupled in different orders to exhibit various behaviors. For example, sorting is used in sort-merge join, ordering and aggregation of values. Similarly, many other such granular functions are available for processing an operation. In this seminar, we will survey the different optimized DBMS operations (both RDBMS and Graphs) and extract their primitives. These extracted primitives are then further analyzed to find the different optimizations carried out. Two outcome are given based on these analyses, 1) The different primitives for the operations (RDBMS and Graph) and 2) Various options used for optimizing these primitives.
- He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N. K., Luo, Q., & Sander, P. V. (2009). Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS), 34(4), 21.
- Pirk, H., Moll, O., Zaharia, M., & Madden, S. (2016). Voodoo-a vector algebra for portable database performance on modern hardware. Proceedings of the VLDB Endowment, 9(14), 1707-1718.
Network Analysis for Scholarly Data (Gabriel Campero Durand)
The increasing amount of scientific publications and improvements in tools for scalable data analysis enable researchers to understand how science works: How do collaboration networks evolve? How can scientific impact be quantified? How does knowledge spread? In this work, we consider literature in this domain, grouping the research into two areas: Firstly, the real-world phenomena/properties studied (e.g., first mover advantages), and secondly, popular algorithms proposed.
- Yan, Erjia, and Ying Ding. "Scholarly Networks Analysis." In Encyclopedia of Social Network Analysis and Mining, pp. 1643-1651. Springer New York, 2014
- Zeng, An, Zhesi Shen, Jianlin Zhou, Jinshan Wu, Ying Fan, Yougui Wang, and H. Eugene Stanley. "The science of science: From the perspective of complex systems." Physics Reports (2017)
- Yan, Erjia, and Ying Ding. "Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other." Journal of the Association for Information Science and Technology 63, no. 7 (2012): 1313-1326
The Internal Workings of Property Graph DBMSs (Gabriel Campero Durand)
Graph DBMSs are a novel solution to consistently manage network data in multi-user environments, with the property graph model being one of the most popular models implemented, for instance in, Neo4j, OrientDB and JanusGraph. In this seminar topic we attempt to provide a comprehensive overview on the internal workings of these DBMSs, namely examples of query languages, physical storage, layouts, and indexes.
- Sahu, Siddhartha, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. "The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing." Proceedings of the VLDB Endowment 11, no. 4 (2017)
- Paradies, Marcus, and Hannes Voigt. "Big Graph Data Analytics on Single Machines–An Overview." Datenbank-Spektrum 17, no. 2 (2017): 101-112
- Fletcher, George HL, Hannes Voigt, and Nikolay Yakovets. "Declarative Graph Querying in Practice and Theory." In EDBT, pp. 598-601. 2017
Towards Defining Fraud Detection as an HTAP Use Case (Gabriel Campero Durand)
Hybrid transactional analytical (HTAP) DBMSs provide support for combinations of OLAP and OLTP workloads. Though this is reasonably a useful functionality, there is no established use case for this technology. Researchers informally suggest that fraud detection could be a representative case, but to our knowledge, there has been no study evaluating this scenario. In this exploratory research topic we aim to establish what could constitute a fraud detection use case for HTAP technologies, selecting an example application from a specific domain.
- Özcan, Fatma, Yuanyuan Tian, and Pinar Tözün. "Hybrid Transactional/Analytical Processing: A Survey." In Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1771-1775. ACM, 2017
- Abdallah, Aisha, Mohd Aizaini Maarof, and Anazida Zainal. "Fraud detection system: A survey." Journal of Network and Computer Applications 68 (2016): 90-113
Typing in (Property) Graph Databases (Marcus Pinnecke)
- Our graph database system is powered by a type and tagging system
- Typing goes beyond "labeling" as in current commercial systems (e.g., Neo4j)
- We’ve got
- GeckoDB incl. in-depth description on the typing- and tagging subsystem
- Some papers as seed for your literature research
- Your Tasks
- Literature search for typed graph database system from industry and (more important) research (at least 5 systems are required)
- Classify or compare these systems regarding their type system or compare these systems w.r.t. their type system
Native temporal support in (Property) Graph Databases (Marcus Pinnecke)
- Our graph database system supports temporal data which is trending
- Temporal support is native implemented rather than “built on top”
- We’ve got
- GeckoDB incl. in-depth concept on temporal support
- Some papers as seed for your literature research
- Your Tasks
- Literature search for graph database systems from industry and (more important) research that natively supports temporal data (at least 5 systems are required)
- Classify or compare these systems regarding their temporal subsystem
Cross-Language Search and Grouping of Legal Texts (Sabine Wehnert)
Corporations are affected by multiple jurisdictions and need to comply to country-specific laws and regulations valid for multiple countries (e.g., EU legislation). In this work, methods for cross-lingual search and grouping (topic, country) in legal texts shall be surveyed. (You may consider other languages than German and English.)
- Patentscope (https://patentscope.wipo.int/search/en/clir/clir.jsf)
- Research on cross-lingual search (e.g., CLEF http://clef2016.clef-initiative.eu/)
- Cássia Trojahn, Bo Fu, Ondřej Zamazal, Dominique Ritze, State-of-the-Art in Multilingual and Cross-Lingual Ontology Matching, Towards the Multilingual Semantic Web, Springer, pp. 119-135, 2014
- Wass, Clemens. Openlaws.eu - Building Your Personal Legal Network. J. Open Access L., p. 1, 2017
- Gupta, Amarnath, et al. Toward Building a Legal Knowledge-Base of Chinese Judicial Documents for Large-Scale Analytics. Legal Knowledge and Information Systems, p. 135-144, 2017
SPLC Challenge Solutions (Jacob Krüger)
- The following topics are different from the previous ones: (Practice) relevant topics proposed at an actual conference
- Designed for contributions by students (with detailed descriptions)
- Not a survey, but doing data analysis (e.g., using tools and scripts)
- They are (most-likely) a bit more challenging!
- Additional requirement: Submitting 4 (+1 references) pages paper at SPLC until 25.05.2018 (more help by advisor[s] for this point)
- Opportunity to get a scientific paper accepted at a conference
- (Advisor unfortunately in Sweden, so regular Skype calls)
There are full paper descriptions of the cases on the SPLC website, comprising information on the data that is analyzed, the expected results, and evaluations used by the reviewers.
Interoperability of Software Product Line Variants
Mainly asking three questions:
- How can we manage differently customized variants of an SPL in a single system?
- How do these variants interact?
- How to resolve name clashes at runtime?
Does not require a concrete implementation, a well-defined description is enough.
Feature Location Benchmark with ArgoUML SPL
Locating the features of a system is essential for most activities but is still a human-centric task. Here, automation shall be used and benchmarked to locate feature according to well-defined criteria. To this end, several information sources to can be used (e.g., source code, version history, …).
Localizing Configurations in Highly-Configurable Systems
“A valid solution will, for any line number or file name, be able to produce at least one concrete configuration that, when used to configure and build the target system, includes the given program location.” Extend existing works, e.g., FeatureIDE, or build your own parser. Correctness and performance are the criteria used to evaluate the outcome of your solution.