MAQAO (Modular Assembly Quality Analyzer and Optimizer) is a tool for analyzing and optimizing binary codes. In its principles and general organization, MAQAO is fairly modular and capable of supporting a large number of target architectures. At this point, MAQAO supports Intel64 and Xeon Phi architectures but other ones are planned.
MAQAO provides the following features:
- Binary disassembler: it reads binaries in ELF format and uses dwarf information if present to make the connection with source files. MAQAO provides the list of functions and instruction sequences found by a linear sweep analysis of the binary code.
- Control flow reconstruction: it computes the call graph, control flow graph, dominance tree, loop nest information.
- Instrumentation API: it provides an API to insert user code at any point of the binary. User functions can monitor any register, and instrumentation is thread-safe and thread-aware.
Built on top of these features, a plugin mechanism brings other functionalities:
- Static Performance Model of the architecture: using a performance model for the predecoder, the decoder, the ROB and the different functional units, MAQAO assesses the number of cycles taken by a loop. It provides upper bounds on the performance that can be reached, depending on the caches where the data can be found. This performance model is only for innermost loops.
- Performance tuning hints: based on the static performance model, using debugging information, MAQAO provides some hints on how to improve the performance of the code, explaining what its limiting factors are and if there are optimization opportunities.
- Memory-based value profiling: using the instrumentation module to monitor all types of memory accesses, MAQAO provides loop-based summaries of the different accesses to the memory hierarchy and interactions between threads. This analysis provides hints on thread affinity for OpenMP programs and on the number of cores to use in order to achieve best performance.
- DECAN: the DEcremental ANalysis is a method to quickly identify the instructions responsible for performance degradation (delinquent instructions).
- Navigation in the control flow graph, using a clustered view enabling the navigation in large graphs. This user interface only requires a recent web browser.
Strengths of MAQAO
- Uses a combination of static/dynamic analyses, giving a different perspective to performance analysis from hardware counter approaches
- Provides hints to the user, in terms of source code transformation, compiler flags, pragmas, ...
- Thread-aware analyses: the memory analysis is able to detect any performance issue due to thread interactions, such as false sharing, cache trashing, memory bank conflicts, or affinity relations.
- Generic platform for performance analysis on binaries: MAQAO is modular and adding a new module is easy thanks to its plugin mechanism
Audience for MAQAO
MAQAO can be used in many different high performance computing projects. You might be interested in MAQAO if you are:
- A compiler researcher and you want to develop/test some ideas of analyses/transformations at the binary level
- An architecture researcher and you want to analyze code for new machines or build performance models
- A runtime researcher and you want to monitor interactions between threads according to some optimizations occurring in the runtime
- An end-user, expert in your application field and you want to get better performance for your code. You need to be a bit familiar with some performance tuning techniques.