Thesis/Jobs

 

 

Bei Fragen zu den hier angebotenen Themen wenden Sie sich bitte an die jeweiligen Betreuer. Weitere Themen können in der Regel direkt bei den Mitarbeitern der Arbeitsgruppe erfragt werden. 

Themen für Abschlussarbeiten

Die folgenden Themen bieten wir derzeit für Bachelor-, Master- und Diplomarbeiten an.

  • Elf: An efficient index structure for multi-column selection predicates
Supervisor:   David Broneske
Abstract: With analytical queries getting more and more complex, the number of evaluated selection predicates per query rises as well, which also involves several different columns. Our idea is to address this new requirement with Elf, a multi-dimensional index structure and storage structure. Elf is able to exploit the relation between data of several columns to accelerate multi-column selection predicate evaluation. Elf features cache sensitivity, an optimized storage layout, fixed search paths, and slight data compression. However, there are still many points that have to be researched. These include, but are not limited to: efficient insert/update/delete algorithms, a merging mechanism of two Elfs, Furthermore, we are interested in how far SIMD can be used to accelerate the multi-column predicate evaluation in Elf.
Goals and results: 
  • Implementation of efficient build, update, or search algorithms for Elf
  • Critical evaluation against state-of-the-art competitors
  • Search-Based Sampling for Software Product-Line Testing
Supervisor:   Mustafa Al-Hajjaji
Abstract: Software product line engineering is an approach for systematic reuse common set of features across a very large number of similar products. SPL engineering addresses well-known needs of software engineering such as, reducing the cost of development and maintenance, increasing the quality, and decreasing the time to market. Evaluating the reliability of an SPL is important, because some of its features are constantly reused. Hence, testing becomes necessary to avoid fault propagation to the derived products. Testing an SPL is a difficult task due to the explosion of possible products that need to be tested as a result of the high number of possible combination features. The main concern is that it is not possible to test all the possible products because the resources for testing are usually limited. To tackle this problem, several approaches have been proposed to reduce the number of products to test e.g., combinatorial interaction testing (Pairwise interaction testing and T-wise testing). The challenge is that computing all the t-wise interaction from a feature model with the presence of all the constraints is still a problem, especially for a large feature model such as the Linux kernel (over 11,000 features). Hence, we have to select a subset of the possible products. The random selection is very poor technique at finding solutions (in our case products) when those solutions occupy a very small part of the overall search space (in our case, feature model). The products may be found faster and more reliably if the search given some guidance. Meta-heurstic searches can provide this guidance in the form of the fitness function. The fitness information can be utilized by optimization algorithms
Goals and results: 
  • Implementation of search-based algorithms to achieve sampling
  • Evaluation against state of the art algorithms using different feature models sizes
  • Join-Order Optimization
Supervisor:   Andreas Meister
Abstract: Within database management systems, users provide queries via SQL. The efficiency of these declarative queries, are highly dependent on the order of join operators. Within join-order optimization, databases try to determine efficient join orders. Based on almost 40 years of research, plenty of different, mainly sequential approaches are available, such as genetic algorithms, top-down enumeration, or dynamic programming. Within this context, several possible thesis topics are available, such as comparison or parallelizing of existing approaches.
Goals and results: 
  • Implementation or parallelization of existing approaches
  • Evaluation of implementation
  • Parallel Sorting
Supervisor:   Andreas Meister
Abstract: Sorting algorithms are basic building blocks for plenty of different complex optimization problems. In the past, sequential algorithms provided enough performance for a practical usage. Nowadays, these sequential algorithms cannot provide any performance improvements based on the parallel hardware architecture of current systems. Hence, parallel sorting algorithms were proposed to adapt sorting algorithms to the current hardware. Based on a literature review suitable parallel sorting algorithms should be identified. Furthermore, selected algorithms should be implemented and evaluated against existing algorithms (e.g., Boost Compute).
Goals and results: 
  • Overview of existing parallel sorting algorithms
  • Implementation and evaluation of selected sorting algorithms
  • Parallel Multi Reduction
Supervisor:   Andreas Meister
Abstract: Aggregations (e.g., sum, max, min, count) are common operations within database systems. Although the parallel aggregation of one group is well studied, how to efficiently aggregate groups seems to be still an open question. For example, one group could be evaluated in parallel or all groups at the same time. In this thesis, an overview of existing parallel aggregation/reduction algorithms should be created. Furthermore, selected parallel algorithms should be implemented and evaluated against existing algorithms (e.g., Boost Compute).
Goals and results: 
  • Overview of existing parallel aggregation/reduction algorithms
  • Implementation and evaluation of selected aggregation/reduction algorithms
  • Statistical analysis of run-time experiments (Bachelor)
Supervisor:   Andreas Meister
Abstract: Experiments can be effected by several internal and external factors. Additional analysis, e.g., significance or error analysis, to generalize measurements, improve the experiments and result quality. In this thesis, the state of the art of analysis for run-time experiments should be reviewed. Suitable analyses should be realized with existing tool-suites (e.g., R).
Goals and results: 
  • Overview of state of the art of analysis of run-time experiments
  • Prototypical implementations of analysis with existing frameworks
  • Using Feature Diagrams to Configure Systems (Bachelor)
Supervisor:   Jacob Krüger, Sebastian Krieter
Abstract: Software product lines (SPLs) are an approach to systematically reuse and customize systems based on user requirements. To achieve this, an SPL is divided into single features that can be selected or deselected for each product. Thus, to generate a specific product from the SPL, the user has to provide configuration for all features. However, configuring an SPL is often difficult as most features depend on one another in a complex manner, which is determined by the feature model of an SPL. Current configuration tools (e.g., FeatureIDE) rely on a list-based representation of all features, which be confusing to the user as it omits dependencies defined in the feature model. An alternative approach is to use a configuration editor based on a feature diagram, which is a tree-based representation of a feature model. In this thesis, you describe and implement the concept of a configuration editor that uses a feature diagram. You evaluate the benefits and downsides of this concept in a small user study. The implementation should be an extension of the tool FeatureIDE.
Goals and results: 
  • Implementing a new configuration view in FeatureIDE using the feature diagram
  • Assessing its usability, benefits, and problems in a small user study
  • An automation tool to specify and generate benchmark datasets
Supervisor:   Marcus Pinnecke
Abstract: Data management spans a wide range of tasks, needs and constraints to database management systems and -tools. Since system development is about making the right design choice for trade-off situations, todays database system market offers a magnitude of tools each having its niche. To find the best-matching solution for cross-niche tasks (such as mixed OLTP and OLAP workload processing), benchmarking is an essential method for industry and research. As there is no database system that "fits all", there is also no benchmark specification that "fits all". Tailoring a benchmark is cumbersome for a third-party if this was not intended in the benchmark specification, and contradicts the idea of standardization and transparency. Clearly, this limits the evolution of benchmarks following modern trends in modern data management, and might open the door for non-standard custom-build benchmarks that are hard to share with the community. A promising solution to this challenge is a flexible benchmark automation tool for database systems that provides a machine-readable specification language, and automates the generation of data and meta information to industry-standard output formats. The purpose of this thesis is to lay the foundations of such a benchmark automation tool, CONTRABASS. The vision of CONTRABASS is as follows: using CONTRABASS, an arbitrary benchmark specification document can be formulated such that CONTRABASS knows how to generate the data and workflow. An important feature of CONTRABASS is mixing and tweaking specifications at the specification language level for tailoring benchmarks without the need to re-write the data generation tool or to bind to the system under evaluation. Further, evaluations based on CONTRABASS benchmarks will be transparent, repeatable, and easily shareable since they only depend on the statements formulated in the specification language.
Details, goals and results:  In previous work we surveyed important state-of-the-art database system benchmarks w.r.t. transactional and analytic processing (e.g., TCP suite), a combination of both processing types (e.g., CH-Benchmark), and several custom-made "micro"-benchmarks. This thesis is intended as a "proof-of-concept" towards an automation tool to specify and generate benchmark datasets.
This includes:
  • Generalization of specification parameters of a set of important database benchmarks (e.g., TCP-C, CH-Benchmark, Custom-Work) based on preliminary work
  • Conceptual work regarding an extensible framework (by using several design pattern) that covers the most important generic specifications parameters
  • Conceptual work regarding an data generation module for this framework that is capable to generate benchmark workloads depending on the specification parameters
  • Prototypic implementation and evaluation of this framework as "proof-of-concept"
Although a parser is needed to actually accept a formal language that covers the specification on the long-term, this parser is not a required part of this thesis. In fact, we expect an architectural prototype that could be extended with several functionalities (such as the parser) in future work. The actual system to develop has to provide a set of access points at the system-internal level that are used to specify a certain benchmark. In addition, the data generator accepts these specification and generates data according this specification. The data is outputted to a free-to-define implementation (e.g., a CSV printer). A strong interest in building flexible systems using industry standards (e.g., several design pattern), an strong interest in simplification of otherwise cumbersome processes, and a strong interest in contributing this solution to both the research- and the open-source community is tendency in the students profile that will perfectly match to this thesis. We are willing to shift the weight of several tasks depending on whether this thesis is a bachelor's or master's thesis, which means the focus could be more on the implementation part, or the focus could be more on the conceptual part depending on the type of thesis, and the students skills and interests. However, interested students have to have a strong self-motivation and must be able to work autonomously in several parts of the project. It is up to the student to choose a platform, a (mix of) programming language(s), possible libraries to implement the prototype if she/he can argue for that.
  • Efficient Mutation Testing of Variable Source Code
Supervisor:   Jacob Krüger, Mustafa Al-Hajjaji
Abstract: Mutation testing is an approach to evaluate test cases for a software system. Mutants are modified versions of the system generated automatically and should contains faults. These faults shall be similar to developer mistakes, for instance due to typos in source code. In the context of variability, e.g. software product lines, it is difficult to mutate and test all necessary mutants. Instead of testing a single system, several variants with different configuration and, thus, features can be derived. Basically, several configurations may have to be tested for each mutant, which is hardly possible in complex and highly variable systems. To face this problem, cost reduction techniques can be used. However, currently those do not consider variable systems. Hence, it is an open issue how to reduce the number of mutants and also configurations that must be tested. Currently, utilizing dependency analysis and T-wise testing are potential solutions to achieve this goal.
Goals and results: 
  • Implementing a cost reduction technique based on variability analysis and T-wise testing
  • Evaluation on variable systems (for instance implemented with the C preprocessor)
  • Robust Parallel Prefix sum
Supervisor:   Andreas Meister
Abstract: Prefix sum are basic building blocks for plenty of different complex optimization problems. In the past, sequential algorithms provided enough performance for a practical usage. Nowadays, these sequential algorithms cannot provide any performance improvements based on the parallel hardware architecture of current systems. Hence, parallel algorithms to calculate the prefix sum were proposed to adapt existing algorithms to the current hardware. However, within existing algorithms always different constraints are given, e.g. that the number of entries must be a power of two. Based on a literature review suitable parallel execution strategies and constraints for the prefix sum calculation should be identified. Furthermore, selected variants should be adapted to work without the given constraints. Since these adaptions will influence the performance, a comparison between the different variants should be conducted.
Goals and results: 
  • Overview of existing prefix sum calculations strategies
  • Implementation of robust prefix sum calculation strategy
  • Evaluation of implemented variants
  • Managing control logic of GPU kernels
Supervisor:   Andreas Meister
Abstract: When complex algorithms are executed control logic is involved to ensure the correctness. Based on architecture, this poses a challenge for GPUs. Hence, the question is, whether control logic should be included within the kernels of GPUs, or whether (most of) the control logic should be managed by the host system (CPU). In this theses, for a suitable algorithm the influence of control logic on the performance of GPU kernels should be evaluated.
Goals and results: 
  • Identification of suitable algorithm for evaluation
  • Implementation of different algorithm variants considering the control flow
  • Feature detection from textual requirements
Supervisor:   Yang Li
Abstract: Feature-oriented software development (FOSD) is a paradigm for the construction, customization, and synthesis of large-scale software systems. The key idea of FOSD is to emphasize the similarities of a family of software systems for a given application domain (e.g., database systems, banking software, text processing systems) with the goal of reusing software artifacts among the family members. Features distinguish different members of the family. Feature model as an important model to capture the domain requirement has been accepted by the mainstream domain engineering at the present stage. However, feature model construction from the requirements or textual descriptions of products can be often tedious and ineffective. To tackle this problem, Natural Language Processing (NLP) Techniques can be used to detect features in domain analysis. In spite of the diversity of existing techniques, the challenge is that achieve high accuracy: automatically find a high number of relevant elements (high recall) while maintaining a low number of false-positive results (high precision). Since there is no optimal technique and each approach is applicable in a particular context, it is an open issue how to increase the accuracy of some specific approaches on Feature detection from textual requirements.
Goals and results: 
  • Implementing an improved algorithm to detect features based on NLP Techniques
  • Evaluation against state of the art algorithms in this area
  • An Advanced Configurator View for Extended Feature Models
Supervisor:   Juliana Alves Pereira
Abstract: Product Line (PL) configuration practices have been employed by industries as a mass customization process. In this context, application engineers widely use extended feature models as the most accepted formalism for selecting features that are in accordance with stakeholders' requirements. Extended feature models describe product functional and non-functional requirements and their interdependencies. In industrial scenarios, application engineers often deal with large extended feature models with complex relationships. Consequently, the management of the configuration space becomes challenging. This thesis has the goal of proposing an interactive visualization mechanism to support application engineers to manage the challenges of configuring extended feature models.
Goals and results: 
  • Implement a new configurator view in FeatureIDE to deal with the manual configuration of extended feature models
  • Conduct an empirical user study of the proposed visualization mechanism by investigating its benefits
  • Variability-Encoding for Abstract State Machines
Supervisor:   Fabian Benduhn
Abstract: In Feature-Oriented Software Development (FOSD), individual products of a product line can be automatically generated for a given feature selection, either by composing feature modules or by extracting the relevant parts of an annotated code base. Due to the possibly massive amount of possible products, it is not feasible to analyse each product separately. Variability-Encoding is a technique in which compile-time variability is transformed to runtime variability, i.e., a meta-product is created that simulates the variable behaviour of the complete product line. This meta-product can be analysed efficiently to draw conclusions about all products. In our research, we have developed techniques for feature-oriented development of formal specfications based on Abstract State Machines. The goal of this thesis is to develop a concept for variability encoding of Abstract State Machines, implement a prototype, and evaluate it by performing a case study.
Goals and results: 
  • Develop and implement concept for variabilicy encoding for Abstract State Machines
  • Implement a prototype, and evaluate it by performing a case study
  • Evaluation of Skip vectors within traditional dynamic programming for Join-Order Optimization (Bachelor)
Supervisor:   Andreas Meister
Abstract: Han et al. proposed Skip Vectors to improve the efficiency of traditional dynamic programming. Within this thesis, the skip vectors should be implemented in a sequential variant of the dynamic programming. Within the evaluation the effects of Skip Vectors on sequential and parallel variants should be evaluated.
Goals and results: 
  • Integration of Skip Vectors into Dynamic programming
  • Optimization of Implementation
  • Evaluation of effects of Skip Vectors
  • Lightweight, Variability-Aware Change Impact Analysis
Supervisor:   Sandro Schulze
Abstract: Change Impact Analysis (CIA) has been proposed as a powerful mean to identify the impact of source code changes, i.e., which part of a software system may be influenced by changes. To this end, data- and control-flow dependencies are employed. For variable software systems, such a technique has to take variability (in terms of features) into account, to answer questions such as "Which feature(s) are impacted by a change to feature X"? So far, no solution exists for common variability mechanisms such as the C preprocessor. In this MSc. thesis, the task is to implement a lightweight CIA based on the tool srcml, which provides an abstract program representation by means of XML annotations. Based on this representation, the necessary information should be extracted and used for computing a set of impacted statements, given a particular change.The technique should be evaluated using mid- and large-scale open source systems.
Goals and results: 
  • Concept for CIA, including envisioned workflow and tools to be used
  • Implementation of variability-aware CIA for the C preprocessor
  • A critical evaluation of the implemented technique
  • Migrating Cloned Software Products: A Better Mousetrap for Finding Similar Code
Supervisor: Wolfram Fenske
Context:

Instead of developing families of related software products as a software product line, commercial developers often create additional product variants by copying an existing variant and adapting it as needed — the clone-and-own approach. While initially cheap and easy, a growing number of such clones quickly leads to maintenance and evolution problems. Switching to software product line engineering becomes more and more appealing. This switch requires migrating the existing variants into a software product line, which has its own set of challenges, for instance

  • Which variants implement which features?
  • Which code, exactly, implements each feature in each variant?
  • How can the variants' code be refactored into the common code base of the target software product line?

In our ongoing research project EXPLANT (EXtracting Product Lines from vAriaNTs) (funded by the Deutsche Forschungsgemeinschaft (DFG)), we try to answer these and other questions. There are already two related theses, a Bachelor's thesis, as well as a Master's thesis, which resulted in a paper accepted at a renowned international conference. You can be part of our project by choosing this topic as your Bachelor's or Master's thesis.

Your Task:

Our current tool support uses clone detection to identify code that is similar across different product variants. However, this may not have been the ideal choice since the clone detector is purely text-based and does not understand where elements of the programming language (e.g., method definitions, field declarations, loops) begin or end. The Technische Universität Braunschweig has developed a sophisticated, extensible framework for finding similarities in cloned product variants by taking such structural information into account. However, their framework can only do the analysis, whereas we also provide refactorings to actually restructure the code. The idea of this thesis is to integrate their work and ours and see whether the result is more effective and/or efficient than before.

Goals and results:
  • Design a concept to integrate Braunschweig's variant similarity analysis into our approach in a modular, extensible fashion.
  • Implement your concept by modifying our existing tool support. This tool support which is based on the Eclipse plugin FeatureIDE, which has been developed by our working group.
  • Measure the effectiveness of your enhancements.
  • Refactorings for Migrating Cloned Products Into a Product Line
Supervisor: Wolfram Fenske
Context:

Instead of developing families of related software products as a software product line, commercial developers often create additional product variants by copying an existing variant and adapting it as needed — the clone-and-own approach. While initially cheap and easy, a growing number of such clones quickly leads to maintenance and evolution problems. Switching to software product line engineering becomes more and more appealing. This switch requires migrating the existing variants into a software product line, which has its own set of challenges, for instance

  • Which variants implement which features?
  • Which code, exactly, implements each feature in each variant?
  • How can the variants' code be refactored into the common code base of the target software product line?

In our ongoing research project EXPLANT (EXtracting Product Lines from vAriaNTs) (funded by the Deutsche Forschungsgemeinschaft (DFG)), we try to answer these and other questions. There are already two related theses, a Bachelor's thesis, as well as a Master's thesis, which resulted in a paper accepted at a renowned international conference. You can be part of our project by choosing this topic as your Bachelor's or Master's thesis.

Your Task:

Migrating variant code in into the target software product line requires a special kind of refactoring, which is called a variant-preserving refactoring. Variant-preserving refactorings differ from standard object-oriented refactorings in slight but very important ways. We already have two such refactorings, but they don't suffice. Much more code could be migrated if we had additional refactorings, such as Extract Method, Extract Constant.

Goals and results:
  • Design variant-preserving Extract Method and Extract Constant refactorings.
  • Find opportunities to migrate more code and design appropriate refactorings.
  • Implement your refactorings by enhancing our existing tool support. This tool support which is based on the Eclipse plugin FeatureIDE, which has been developed by our working group.
  • Prove that your refactorings are actually semantics-preserving.
  • Measure the effectiveness of your enhancements.
  • Search State Dependency Graph
Supervisor:   Andreas Meister
Abstract: The dynamic programming approach for join-order optimization is one state of the art approach for optimizing queries in relational databases. In order to ensure the efficiency of optimization, different enumeration of calculations were proposed. In this thesis, the Search State Dependency Graph proposed by Waas et al. should be evaluated against other enumerations.
Goals and results: 
  • Implementation of the Search State Dependency Graph for dynamic programming
  • Evaluation of the Search State Dependency Graph
  • Hints to the CPU's Branch Prediction Unit and Hardware Prefetcher for Vectorized Processing (Bachelor/Master)
Supervisor:   Marcus Pinnecke
Abstract: GCC/Clang provide two built-in functionalities to give hints to the data cache and the branch prediction unit. We use these hints to express the intention to minimize data cache misses and to maximize the right choice of a certain branch for conditional branches (@if...else@ constructs) inside your push-based vectorized execution engine. The task is to verify whether and which application of these data cache and the branch prediction unit actually bring benefits, lower performance, and where further application might be reasonable. For this task, a cleaned code branch having everything ready to run and to measure the performance is available and can be used out-of-the-box. For the purpose of measuring, we provide data of two columns of the TPC-H line item table. For quick development, we recommend the use of the TPC-H dataset scale factor 1 (SF1). For measuring it is required to use scale factor 10 (SF10) to actually have data that really exceeds the data cache capabilities of the CPU. Both datasets are available.
Goals and results: 
  • Evaluation of the impact of these hints
  • Statements on the code "as-is" + recommendations for improvements (supported by evaluation) for "modified code"
Notes
  • Basic C++ skills are required (for self-studying purposes of existing code).
  • Efficient memory layout of complex data types for parallel optimization (Bachelor)
Supervisor:   Andreas Meister
Abstract: Complex data types contain multiple simple data types, such as Integers, Floats, etc. For most of the tasks not only a single item is considered, but a collection of multiple items. Hence, from a logical view, a two-dimensional memory space is greated. Unfortunately, memory is only one-dimenionsal. Therefore, the logical view of the memory must be mapped to the physical view. For the mapping two approaches exist. First, the complete struct (all items of one struct) can be stored in neighboring memory (array of structs). Second, one item of the struct (one item of all structs) can be stored in neighboring memory (struct of arrays). In this thesis, both approaches should be evaluated for an optimization task in query optimization.
Goals and results: 
  • Adaption of existing optimization approach
  • Evaluation of the two approaches for the mapping to memory
  • GPU implementations for Vertical Partitioning Optimization Algorithms (Master) (already taken)
Supervisor:   Gabriel Campero Durand
Abstract: Jindal et al. have compared different vertical partitioning algorithms in terms of their performance and efficiency at finding the optimal vertical partitioning of a database table to support a set of queries (i.e., the optimal grouping of columns). Specifically they considered algorithms such as AutoPart, HillClimb, HYRISE, Navathe, O2P, Trojan and the brute force approach. Within this thesis, we will consider the implementation of some of these algorithms using GPUs. Baseline and optimized variants of 1 or 2 algorithms should be evaluated.
Goals and results: 
  • An overview of existing vertical partitioning algorithms.
  • GPU-based implementation (using OpenCL or CUDA) of 2 vertical partitioning algorithms.
  • Suggestion for optimizations over the baseline implementations
  • Evaluation of the run time for the implementations.
  • Comprehensive discussion on supporting vertical partitioning with GPUs.
  • Graph PROSE: Graph processing with a search engine (Master) (already taken)
Supervisor:   Gabriel Campero Durand
Abstract: Graph databases have been developed to help users store and analyze graph-shaped data. Workloads on these systems usually require global access (e.g. asking how many vertices with a given property can be found in the graph) or local access (e.g. asking who are all the connected vertexes of a given vertex). Some databases, such as Neo4j and JanusGraph, also offer full-text queries over attributes of the graph. To support these queries, they index parts of the graph using an external search engine (such as Elasticsearch, Lucene or Solr). These indexes act as a materialized view over the graph, where the data is represented as documents, rather than vertexes and edges. Currently the role of search engines is limited to answering specific types of full-text queries. In our research we have found that search engines can speed-up graph workloads beyond the queries that they currently support. In this Thesis we aim to study the implementation of 2 network analysis algorithms, both using a native graph database and using the connected search engine. We will provide a prototype to carry out this evaluation, based on ElasticSearch (as a search engine) and Neo4j or JanusGraph (as the graph database).
Goals and results: 
  • Implementation of 2 network analysis algorithms using a graph query language (Cypher and/or Gremlin) and a search engine (ElasticSearch)
  • Discussion on the differences in entities and processing between the systems.
  • Evaluation of the run time for the implementations using different graph datasets.
  • Discussion of possible future directions in supporting search engine processing through a graph query language, or supporting graph processing through a search engine query API.
  • Analyzing the Birth, Life, and Death of Bug Reports
Supervisor:   Dr.-Ing. Sandro Schulze
Context: Nowadays, open-source systems (OSS) play a pivotal rule in software develop- ment, as they are used even in commercial software. To cope with the increasing demand of software quality, not only version control systems (e.g., GIT) or continuous integration (CI) are commonly used. Also, bug tracking systems are maintained to get rid of as many failures as possible, reported by a vast amount of different stakeholders (developer, user, tester). Over time, this bug databases may get confusing or contains too many bug reports, thus, leaving many of them open.
Task: In this thesis, the student has to investigate reasons for bug reports to be open (or closed, respectively). In particular, the student conducts an empirical analysis of a large amount of bug reports from Mozilla, and develop a technique that allows to reason about the birth, life, and death of bug reports (i.e., why they remain open or get closed). To this end, it might be necessary to dig into machine learning or NLP techniques, but in general, a particular degree of freedom how to solve the task is given.
Goals and results: 
  • Concept for analyzing bug databases, including techniques for reasoning (and pre- diction) of bug reports
  • Implementation of this concept (for mining the bug reports, 3rd party libraries may be used)
  • A critical evaluation of the implemented technique with existing bug databases
Requirements:
  • good programming skills
  • quick grasp of subject matter, strong work ethic, work on you on initiative (with guidance by the supervisor)
  • background in machine learning or data mining is a plus, but not required (can be obtained during MSc thesis)
  • You should be eager, creative, and open-minded to search for smart solutions (that may be not so easy to find)

 

Studentische Mitarbeit und offene Stellen

Aktuelle sind keine offene Stellen zu vergeben.

Wissenschaftliche Teamprojekte

Für wissenschaftlichen Teamprojekten bieten wir eine eigene Veranstaltung an: 

Zu Beginn dieser Veranstaltungen werden verschiedene Themen vorgestellt, die innerhalb des Semesters bearbeitet werden können.

Softwareprojekte

Für Softwareprojekte bieten wir ebenfalls eine eigene Veranstaltung an: 

Zu Beginn dieser Veranstaltungen werden verschiedene Themen vorgestellt, die innerhalb der Veranstaltung bearbeitet werden können.

Darüberhinaus stehen folgende Themen für ein Softwareprojekt zur Verfügung.

  • Automatische Generierung von Visualiserungen (Bachelor)
Ansprechpartner:  Andreas Meister
Beschreibung: Zur Analyse von Ergebnissen ist es wichtig Messergebnisse zu visualisieren. Im Rahmen dieses Projektes soll innerhalb eines bestehenden Evaluierungs-Framework die Funktionalität erweitert werden, geeignete Visualisierungen aus den Messdaten abzuleiten. Aufgabe ist hierbei geeignete Visualisierungen zu bestimmen, und die automatische Erzeugung der Visualisierung prototypisch umzusetzen.
Ziele und Ergebnisse: 
  • Bestimmung geeigneter Visualisierungen für Messergebnisse
  • Umsetzung einer automatischen Generierung der Visualisierungen
  • Bestimmung des Ressourcenverbrauchs von UNIX-Prozessen (Bachelor)
Ansprechpartner:  Andreas Meister
Beschreibung: Um verschiedene Varianten von Algorithmen vergleichen zu können wird sich sehr häufig auf die Laufzeit beschränkt. In realen Systemen werden Prozesse jedoch nicht nur alleine, sondern parallel mit anderen Prozessen ausgeführt. Entsprechend ist es nicht nur wichtig, dass die Prozesse schnell sind, sondern effizient mit den vorhandenen Ressourcen (z.B. Hauptspeicher) umgehen. Im Rahmen dieses Projekts soll ein Konzept erstellt werden, um den Ressourcenverbrauch zu messen.
Ziele und Ergebnisse: 
  • Konzept zur Messung des Ressourcenverbrauchs
  • Evaluierung des umgesetzten Konzepts
  • Qualität von Implementierungen (Bachelor)
Ansprechpartner:  Andreas Meister
Beschreibung: Die Implementierung von komplexen Algorithmen ist fehleranfällig. Um die Korrektheit und Qualität der Implementierung sicherzustellen können verschiedene Konzept umgesetzt werden, z.B. Unit-Tests, Continuos Integration, usw. Im Rahmen dieses Projektes sollen geeignete Methoden zur Sicherung der Softwarequalität identifiziert werden, und ausgewählte Methoden in ein bestehendes Evaluierungs-Framework integriert werden.
Ziele und Ergebnisse: 
  • Identifizierung von Methoden zur Verbesserung der Implementierungsqualität
  • Integration von ausgewählten Methoden in ein bestehendes Framework
  • Database Processing Engine(Bachelor)
Ansprechpartner:  Andreas Meister
Beschreibung: Im Rahmen dieses Softwareprojekts soll ein bestehendes Framework zur Join-Order Optimierung um eine Processing Engine erweitert werden. Hierzu ist es notwendig einen bestehenden Anfragegraph in eine konkrete Ausführung zu überführen und ausführen. Hierzu müssen mindestens die Basis Operatoren implementiert werden. Um die Korrektheit zu garantieren sollte die Daten eines geeigneten Benchmarks (z.B. ImDB) sowie Unittests genutzt werden.
Ziele und Ergebnisse: 
  • Processing-Engine für bestehendes Optimierungsframework
  • Verbesserungen für SQLValidator (Bachelor)
Ansprechpartner:  David Broneske
Beschreibung: Im Rahmen dieses Softwareprojekts soll das bestehende Tool SQLValidator um weitere Funktionalität erweitert werden. Die zu implementierende Funktionalität ist dabei zusammen mit dem Betreuer abzustimmen und kann beliebig erweitert oder eingeschränkt werden. Mögliche Aufgaben sind:
  • User Statistiken über bearbeitete Aufgaben
  • User Account Management
  • Erfassung mehrerer Jahrgänge
  • Duplizierung von Aufgaben
  • Check der Korrektheit der Aufgaben bei deren Erstellung
  • Einreichung ER-Aufgaben
Ziele und Ergebnisse: 
  • Implementierung weiterer Funktionen im SQL Validator
  • Datenquälität im Datacenter: das nächste Level ist greifbar (Bachelor)
Ansprechpartner:  David Broneske, Marcus Pöhls
Beschreibung: Implementierung einer Applikation zur Verbesserung der Datenqualität von CPU-Daten eines Rechenzentrums. Zuerst wird die Datenqualität analysiert und anschließend mit Hilfe der APIs von Intel und AMD verbessert. Die Daten der Hardware-Infrastruktur stelle ich für das Projekt bereit. Konkretes Beispiel: für eine Maschine bei der keine Angabe der CPU-Cores vorhanden ist, kann über das Prozessor-Modell (bspw. Intel Xeon Processor E5-2650 v3) die Core-Lücke geschlossen werden. Das SW-Projekt beinhaltet auch die Duplikaterkennung. D.h. für Maschinen die im Datensatz mehrfach und sogar mit unterschiedlichen CPU-Daten vorhanden sind, wird der "beste" (plausibelste) Datenstand genutzt.
Ziele und Ergebnisse: 
  • Recherche und Implementierung von Algorithmen zur Datenqualitätsanalyse
  • API Integration mit Intel/AMD
  • Duplikate finden und bereinigen

 

 Vorlagen

 

 

Letzte Änderung: 27.10.2017 - Ansprechpartner:

Sie können eine Nachricht versenden an: Webmaster
Sicherheitsabfrage:
Captcha
 
Lösung: