TCS Alignment Toolbox Sequence Wrappers · This module contains some wrappers to make usage of the TCSAlignmentToolbox easier. Most important for beginners is the StringEditDistance, which provides convenience functions for simple string comparisons. For sequences of vectors we provide the VectorialSequences wrapper. The RandomSequenceGenerator enables you to generate random multi-modal sequences for testing purposes.
Group: de.cit-ec.tcs.alignment - All Dependencies
TCS Alignment Toolbox Visualization · This module contains means to visualize Alignments. The most important interface is the Visualizer class. A trivial implementation (essentially just using toString()) is the StringVisualizer. A more sophisticated example that is recommended for outside use is the HTMLVisualizer. All other classes are helper classes for said HTMLVisualizer.
TCS Alignment Toolbox Comparators Library · This module contains a library of standard implementations of Comparators and Normalizers as defined in the comparators module.
TCS Alignment Toolbox Algorithms Library · This module containts standard implementations of AlignmentAlgorithms. In contrast to the adp module these implementations are hand-tailored for some specific algorithms and thus achieve somewhat faster runtime (a constant factor of maybe 30-50 percent).
TCS Alignment Toolbox Algebraic Dynamic Programming · This module contains a more general approach to construct AlignmentAlgorithms by relying on the theoretical concept of Algebraic Dynamic Programming (ADP) as developed by Giegerich et al. ADP defines four ingredients for an alignment algorithm: 1.) A signature that defines the permitted alignment operations. Operations are just function templates with an associated arity, meaning the number of arguments it takes from the left sequence and from the right sequence. In the TCSAlignmentToolbox we have a fixed signature with the following operations: REPLACEMENT(1, 1), DELETION(1, 0), INSERTION(0, 1), SKIPDELETION(1, 0) and SKIPINSERTION(0, 1) 2.) A regular tree grammar that produces alignments, that is: sequences of operations, in a restricted fashion. 3.) An algebra that can translate such trees to a cost. In the TCSAlignmentToolbox this is a Comparator. 4.) A choice function, in case of the TCSAlignmentToolbox: the strict minimum or the soft minimum. An alignment algorithm in the TCSAlignmentToolbox sense of the word then is the combination of choice function and grammar. While we provide hardcoded versions of these combinations in the main package, the adp package allows you to create your own grammars. You can combine them with a choice function by instantiating one of the Algorithm classes provided in this package with a grammar of your choice. For example: AlignmentAlgorithm algo = new SoftADPScoreAlgorithm(my_grammar, comparator); creates an alignment algorithm that implicitly produces all possible alignments your grammar can construct with the given input, translates them to a cost using the algebra/comparator you provided and applies the soft minimum to return the score. This all gets efficient by dynamic programming. Note that there is runtime overhead when using this method in comparison with the hardcoded algorithms. But for complicated grammars this is a much easier way to go. For more information on the theory, please refer to my master's thesis: "Adaptive Affine Sequence Alignment using Algebraic Dynamic Programming"
TCS Alignment Toolbox Metric Learning · This module is a custom implementation of the Large Margin Nearest Neighbor classification scheme of Weinberger, Saul, et al. (2009). It contains an implementation of the k-nearest neighbor and LMNN classifier as well as (most importantly) gradient calculation schemes on the LMNN cost function given a sequential data set and a user-choice of alignment algorithm. This enables users to learn parameters of the alignment distance in question using a gradient descent on the LMNN cost function. More information on this approach can be found in the Masters Thesis "Adaptive Affine Sequence Alignment Using Algebraic Dynamic Programming"
TCS Alignment Toolbox Sequences · This module contains the sequence datastructure of the TCS Alignment Toolbox. It defines the possible value sets in the ValueType enum as well as the different KeywordSpecification classes, namely: 1.) StringKeywordSpecification for string type values. 2.) SymbolicKeywordSpecification for values from a discrete alphabet (also refer to the Alphabet class) 3.) VectorialKeywordSpecification for vectors of some length (or for scalars) A NodeSpecification is a vector of such KeywordSpecifications and defines the order of value sets. A node, then, is defined as a vector of values from these value sets (also refer to the Value interface as well as the StringValue, SymbolicValue and VectorialValue classes). Finally a sequence is defined as a list of such nodes.
TCS Alignment Toolbox Parallel Computing · This module provides a very basic support for the parallel computing of tasks (Engine class) and entries of a matrix (MatrixEngine). This is basically just a wrapper around the java standard functionality for parallel computing (mainly the Standard Thread Pool and the Future interface). Additional functionality is provided by the ProgressReporter interface, which can be used as a hook to provide information on the current state of the parallel computing task to other modules or the user.
TCS Alignment Toolbox Comparators · This module defines the interfaces for Comparators in the TCS Alignment Toolbox. A Comparator has the purpose of defining the dissimilarity between elements in the input sequences of an Alignment. More specific information on Comparators can be found in the 'Comparator' interface. You can find a lot of helpful standard implementations of Comparators in the comparators-lib module. In the TCS Alignment Toolbox we require the output values of Comparators to lie in the range [0,1]. Many natural dissimilarities on value sets do not meet this criterion, such that additional normalization has to be applied. To that end this package also contains a Normalizer interface for functions that map real values from the range [0, infinity) to the range [0,1]. This package also provides a few convenience implementations of the Comparator interface to make the implementation of custom Comparators simpler, namely: SkipExtendedComparator, ParameterLessSkipExtendedComparator, ComparisonBasedSkipExtendedComparator, and ParameterLessComparisonBasedSkipExtendedComparator. Finally the TCS Alignment Toolbox also provides the means to learn parameters of Comparators. To enable that Comparators must implement the DerivableComparator interface to properly define the parameters that can be learned and the gradient of the dissimilarity with respect to these parameters. Gradients are stored using the Gradient interface as well as some convenience implementations of said interface, namely EmptyGradient, SingletonGradient, ArrayGradient and ListGradient.
TCS Alignment Toolbox Algorithms · This module defines the interface for AlignmentAlgorithms as well as some helper classes. An AlignmentAlgorithm computes an Alignment of two given input sequences, given a Comparator that works in these sequences. More details on the AlignmentAlgorithm can be found in the respective interface. More information on Comparators can be found in the comparators module. The resulting 'Alignment' may be just a real-valued dissimilarity between the input sequence or may incorporate additional information, such as a full Alignment, a PathList, a PathMap or a CooptimalModel. If those results support the calculation of a Gradient, they implement the DerivableAlignmentDistance interface. In more detail, the Alignment class represents the result of a backtracing scheme, listing all Operations that have been applied in one co-optimal Alignment. A classic AlignmentAlgorithm does not result in a differentiable dissimilarity, because the minimum function is not differentiable. Therefore, this package also contains utility functions for a soft approximation of the minimum function, namely Softmin. For faster (parallel) computation of many different alignments or gradients we also provide the ParallelProcessingEngine, the SquareParallelProcessingEngine and the ParallelGradientEngine.
TCS Alignment Toolbox Sequence CSV Export and Import · This module permits exporting and importing of Sequence objects to CSV files. The underlying NodeSpecification is stored via a JSON file. The module aims at human- readable and compatible storage. For storage efficiency, further compression is advised.
TCS Alignment Toolbox Sequence Visualization · This module provides convenience implementations for visualization of the sequence datastructure.
TCS Alignment Toolbox Sequence Comparators · This module provides specializations of the Comparator interface for the Sequence data structure.
TCS Alignment Toolbox Primitive Sequences · This module contains convenience functions to interface with primitive datatypes more comfortably.
TCS Alignment Toolbox Trees · This module provides two packages, 'trees' and 'forests', which provide algorithms to compute edit distances on trees and forests (that is, unordered or ordered lists of trees) respectively. The edit distance is computed according to the tree edit distance algorithm of Zhang and Shasha (1989). The basic tree data structure is defined by the Tree interface in the trees module. Please refer to the javadoc for more detailed information.
TCS Alignment Toolbox Sets · This module provides algorithms to compare sets, that is, order-invariant lists. These algorithms are implementations of the AlignmentAlgorithm interface defined in the algorithms module. In particular, this module contains the StrictSetAlignmentScoreAlgorithm for computing the cost of the optimal unordered alignment of two sets, the StrictSetAlignmentFullAlgorithm which provides the Alignment itself as well, and the GreedySetAlignmentScoreAlgorithm as well as the GreedySetAlignmentFullAlgorithm for computing a potentially sub-optimal but faster alignment of two sets. The optimal alignments rely on the HungarianAlgorithm for solving the assignment problem in bipartite graphs. Here, we rely on the implementation provided by Kevin L. Stern which is provided within this distribution.