weka-dev · The Waikato Environment for Knowledge Analysis (WEKA), a machine learning workbench. This version represents the developer version, the "bleeding edge" of development, you could say. New functionality gets added to this version.

# Group: nz.ac.waikato.cms.weka - All Dependencies

weka-stable · The Waikato Environment for Knowledge Analysis (WEKA), a machine learning workbench. This is the stable version. Apart from bugfixes, this version does not receive any other breaking updates.

LibSVM · A wrapper class for the libsvm tools (the libsvm classes, typically the jar file, need to be in the classpath to use this classifier). LibSVM runs faster than SMO since it uses LibSVM to build the SVM classifier. LibSVM allows users to experiment with One-class SVM, Regressing SVM, and nu-SVM supported by LibSVM tool. LibSVM reports many useful statistics about LibSVM classifier (e.g., confusion matrix,precision, recall, ROC score, etc.)

distributedWekaBase · This package provides generic configuration class and distributed map/reduce style tasks for Weka

rotationForest · An ensemble learning method inspired by bagging and random sub-spaces. Trains an ensemble of decision trees on random subspaces of the data, where each subspace has been transformed using principal components analysis.

multiInstanceFilters · A collection of filters for manipulating multi-instance data. Includes PropositionalToMultiInstance, MultiInstanceToPropositional, MILESFilter and RELAGGS. For more information see: M.-A. Krogel, S. Wrobel: Facets of Aggregation Approaches to Propositionalization. In: Work-in-Progress Track at the Thirteenth International Conference on Inductive Logic Programming (ILP), 2003. Y. Chen, J. Bi, J.Z. Wang (2006). MILES: Multiple-instance learning via embedded instance selection. IEEE PAMI. 28(12):1931-1947. James Foulds, Eibe Frank: Revisiting multiple-instance learning via embedded instance selection. In: 21st Australasian Joint Conference on Artificial Intelligence, 300-310, 2008.

multiInstanceLearning · A collection of multi-instance learning classifiers. Includes the Citation KNN method, several variants of the diverse density method, support vector machines for multi-instance learning, simple wrappers for applying standard propositional learners to multi-instance data, decision tree and rule learners, and some other methods.

classifierBasedAttributeSelection · This package provides two classes - one for evaluating the merit of individual attributes using a classifier (ClassifierAttributeEval), and second for evaluating the merit of subsets of attributes using a classifier (ClassifierSubsetEval). Both invoke a user-specified classifier to perform the evaluation, either under cross-validation or on the training data.

partialLeastSquares · This package contains a filter for computing partial least squares and transforming the input data into the PLS space. It also contains a classifier for performing PLS regression.

discriminantAnalysis · Currently only contains Fisher's Linear Discriminant Analysis.

prefuseGraph · A visualization component for displaying tree structures from those schemes that can output graphs (e.g. bayes nets). This component is available from the popup menu in the Explorer's classify. The component uses the prefuse visualization library.

XMeans · Cluster data using the X-means algorithm. X-Means is K-Means extended by an Improve-Structure part In this part of the algorithm the centers are attempted to be split in its region. The decision between the children of each center and itself is done comparing the BIC-values of the two structures. For more information see: Dan Pelleg, Andrew W. Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727-734, 2000.

optics_dbScan · The OPTICS and DBScan clustering algorithms. Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Second International Conference on Knowledge Discovery and Data Mining, 226-231, 1996; Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg Sander: OPTICS: Ordering Points To Identify the Clustering Structure. In: ACM SIGMOD International Conference on Management of Data, 49-60, 1999.

predictiveApriori · Class implementing the predictive apriori algorithm for mining association rules. It searches with an increasing support threshold for the best 'n' rules concerning a support-based corrected confidence value. For more information see: Tobias Scheffer: Finding Association Rules That Trade Support Optimally against Confidence. In: 5th European Conference on Principles of Data Mining and Knowledge Discovery, 424-435, 2001.

SMOTE · Resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The original dataset must fit entirely in memory. The amount of SMOTE and number of nearest neighbors may be specified. For more information, see Nitesh V. Chawla et. al. (2002). Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16:321-357.

normalize · An instance filter that normalize instances considering only numeric attributes and ignoring class index

prefuseTree · A visualization component for displaying tree structures from those schemes that can output trees (e.g. decision tree learners, Cobweb clusterer etc.). This component is available from the popup menu in the Explorer's classify and cluster panels. The component uses the prefuse visualization library.

decorate · DECORATE is a meta-learner for building diverse ensembles of classifiers by using specially constructed artificial training examples. Comprehensive experiments have demonstrated that this technique is consistently more accurate than the base classifier, Bagging and Random Forests. Decorate also obtains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. For more details see: P. Melville, R. J. Mooney: Constructing Diverse Classifier Ensembles Using Artificial Training Examples. In: Eighteenth International Joint Conference on Artificial Intelligence, 505-510, 2003; P. Melville, R. J. Mooney (2004). Creating Diversity in Ensembles Using Artificial Data. Information Fusion: Special Issue on Diversity in Multiclassifier Systems.

fastCorrBasedFS · Feature selection method based on correlation measureand relevance and redundancy analysis. Use in conjunction with an attribute set evaluator (SymmetricalUncertAttributeEval). For more information see: Lei Yu, Huan Liu: Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In: Proceedings of the Twentieth International Conference on Machine Learning, 856-863, 2003.

latentSemanticAnalysis · Performs latent semantic analysis and transformation of the data. Use in conjunction with a Ranker search. A low-rank approximation of the full data is found by specifying the number of singular values to use. The dataset may be transformed to give the relation of either the attributes or the instances (default) to the concept space created by the transformation.

ordinalClassClassifier · Meta classifier that allows standard classification algorithms to be applied to ordinal class problems. For more information see: Eibe Frank, Mark Hall: A Simple Approach to Ordinal Classification. In: 12th European Conference on Machine Learning, 145-156, 2001. Robert E. Schapire, Peter Stone, David A. McAllester, Michael L. Littman, Janos A. Csirik: Modeling Auction Price Uncertainty Using Boosting-based Conditional Density Estimation. In: Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), 546-553, 2002.

multiLayerPerceptrons · This package currently contains classes for training multilayer perceptrons with one hidden layer, where the number of hidden units is user specified. MLPClassifier can be used for classification problems and MLPRegressor is the corresponding class for numeric prediction tasks. The former has as many output units as there are classes, the latter only one output unit. Both minimise a penalised squared error with a quadratic penalty on the (non-bias) weights, i.e., they implement "weight decay", where this penalised error is averaged over all training instances. The size of the penalty can be determined by the user by modifying the "ridge" parameter to control overfitting. The sum of squared weights is multiplied by this parameter before added to the squared error. Both classes use BFGS optimisation by default to find parameters that correspond to a local minimum of the error function. but optionally conjugated gradient descent is available, which can be faster for problems with many parameters. Logistic functions are used as the activation functions for all units apart from the output unit in MLPRegressor, which employs the identity function. Input attributes are standardised to zero mean and unit variance. MLPRegressor also rescales the target attribute (i.e., "class") using standardisation. All network parameters are initialised with small normally distributed random values.

LibLINEAR · A wrapper class for the liblinear tools (the liblinear classes, typically the jar file, need to be in the classpath to use this classifier). Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin (2008). LIBLINEAR - A Library for Large Linear Classification.

elasticNet · An implementation of the elastic net method for linear regression