percentageErrorMetrics · Provides root mean square percentage error and mean absolute percentage error for evaluating regression schemes.
Group: nz.ac.waikato.cms.weka - All Dependencies
niftiLoader · Package for loading a directory containing MRI data in NIfTI format. The directory to be loaded must contain as many subdirectories as there are classes of MRI data. Each subdirectory name will be used as the class label for the corresponding .nii files in that subdirectory. (This is the same strategy as the one used by WEKA's TextDirectoryLoader.) Currently, the package only reads volume information for the first time slot from each .nii file. The readDoubleVol(short ttt) method from the Nifti1Dataset class (http://niftilib.sourceforge.net/java_api_html/Nifti1Dataset.html) is used to read the data for each volume into a sparse WEKA instance (with ttt=0). For an LxMxN volume (the dimensions must be the same for each .nii file in the directory!), the order of values in the generated instance is [(z_1, y_1, x_1), ..., (z_1, y_1, x_L), (z_1, y_2, x_1), ..., (z_1, y_M, x_L), (z_2, y_1, x_1), ..., (z_N, y_M, x_L)]. If the volume is an image, then only x and y coordinates are used.
gridSearch · Performs a grid search of parameter pairs for the a classifier (Y-axis, default is LinearRegression with the "Ridge" parameter) and the PLSFilter (X-axis, "# of Components") and chooses the best pair found for the actual predicting. The initial grid is worked on with 2-fold CV to determine the values of the parameter pairs for the selected type of evaluation (e.g., accuracy). The best point in the grid is then taken and a 10-fold CV is performed with the adjacent parameter pairs. If a better pair is found, then this will act as new center and another 10-fold CV will be performed (kind of hill-climbing). This process is repeated until no better pair is found or the best pair is on the border of the grid. In case the best pair is on the border, one can let GridSearch automatically extend the grid and continue the search. Check out the properties 'gridIsExtendable' (option '-extend-grid') and 'maxGridExtensions' (option '-max-grid-extensions <num>'). GridSearch can handle doubles, integers (values are just cast to int) and booleans (0 is false, otherwise true). float, char and long are supported as well. The best filter/classifier setup can be accessed after the buildClassifier call via the getBestFilter/getBestClassifier methods. Note on the implementation: after the data has been passed through the filter, a default NumericCleaner filter is applied to the data in order to avoid numbers that are getting too small and might produce NaNs in other schemes.
cascadeKMeans · k-means clustering with automatic selection of k. Restarts k-means and selects the best k using the Calinski and Harabasz criterion, without cross-validation.
isolationForest · Class for building and using a classifier built on the Isolation Forest anomaly detection algorithm. For more information see Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou. 2008. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413-422.
iterativeAbsoluteErrorRegression · Provides a regression scheme that uses Schlossmacher's iteratively reweighted least squares method to fit a model that minimizes absolute error. The scheme can be used with any base learner in WEKA that performs least-squares regression
largeScaleKernelLearning · This package provides filters to enable kernel-based learning from large datasets. It currently only contains the Nystroem method.
bayesianLogisticRegression · Implements Bayesian Logistic Regression for both Gaussian and Laplace Priors. For more information, see Alexander Genkin, David D. Lewis, David Madigan (2004). Large-scale bayesian logistic regression for text categorization.
distributedWekaHadoopCore · This package provides loaders and savers for HDFS, plus Hadoop jobs and tasks that wrap the tasks provided in distributedWekaBase.
kfGroovy · Knowledge Flow plugin that provides a Knowledge Flow step that wraps around a Groovy script. The plugin generates a fully compilable template Groovy script that implements various Knowledge Flow interfaces. The user can fill in the methods that are necessary to accomplish the desired logic. The script is compiled at runtime and the Groovy component passes incoming events to the script and collects and passes on generated events.
ensemblesOfNestedDichotomies · A meta classifier for handling multi-class datasets with 2-class classifiers by building an ensemble of nested dichotomies. For more info, check Lin Dong, Eibe Frank, Stefan Kramer: Ensembles of Balanced Nested Dichotomies for Multi-class Problems. In: PKDD, 84-95, 2005. Eibe Frank, Stefan Kramer: Ensembles of nested dichotomies for multi-class problems. In: Twenty-first International Conference on Machine Learning, 2004.
timeseriesForecasting · Provides a time series forecasting environment for Weka. Includes a wrapper for Weka regression schemes that automates the process of creating lagged variables and date-derived periodic variables and provides the ability to do closed-loop forecasting. New evaluation routines are provided by a special evaluation module and graphing of predictions/forecasts are provided via the JFreeChart library. Includes both command-line and GUI user interfaces. Sample time series data can be found in ${WEKA_HOME}/packages/timeseriesForecasting/sample-data.
prefuseGraphViewer · Knowledge Flow visualization component for displaying tree and graph structures from those schemes that can output them. This component is an alternative to the Knowledge Flow's built-in GraphViewer and uses the PrefuseTree and PrefuseGraph packages which, in turn, use the prefuse visualization library.
streamingUnivariateStats · This package provides A Knowledge Flow step to compute summary statistics incrementally
kfKettle · Knowledge Flow step that provides an entry point for data coming from the Kettle ETL tool.
stackingC · Implements StackingC (more efficient version of stacking). For more information, see A.K. Seewald: How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness. In: Nineteenth International Conference on Machine Learning, 554-561, 2002. Note: requires meta classifier to be a numeric prediction scheme
logarithmicErrorMetrics · Provides root mean square logarithmic error and mean absolute logarithmic error for evaluating regression schemes.
classificationViaClustering · A simple meta-classifier that uses a clusterer for classification. For cluster algorithms that use a fixed number of clusterers, like SimpleKMeans, the user has to make sure that the number of clusters to generate are the same as the number of class labels in the dataset in order to obtain a useful model. Note: at prediction time, a missing value is returned if no cluster is found for the instance. The code is based on the 'clusters to classes' functionality of the weka.clusterers.ClusterEvaluation class by Mark Hall.
functionalTrees · Functional trees (decision trees with oblique splits and functions at the leaves)
probabilityCalibrationTrees · Provides probability calibration trees (PCTs) for local calibration of class probability estimates. To achieve calibration of a base learner, the PCT class must be used as the meta learner in the CascadeGeneralization class, which is also included in this package. The classifier to be calibrated must be used as the base learner in the CascadeGeneralization class. The CascadeGeneralization class can also be used independently to perform CascadeGeneralization for ensemble learning. The code for PCTs is largely the same as the LMT code for growing logistic model trees. For more details, see the ACML paper on probability calibration trees.
extraTrees · Package for generating a single Extra-Tree. Use with the RandomCommittee meta classifier to generate an Extra-Trees forest for classification or regression. This classifier requires all predictors to be numeric. Missing values are not allowed. Instance weights are taken into account. For more information, see Pierre Geurts, Damien Ernst, Louis Wehenkel (2006). Extremely randomized trees. Machine Learning. 63(1):3-42.
WekaExcel · WekaExcel adds support to directory read from and write to spreadsheets in Microsoft Excel 97-2007 format. It uses Apache POI (http://poi.apache.org/), specifically POI-HSSF and POI-XSSF (http://poi.apache.org/spreadsheet/), in order to read/write Excel spreadsheets.
dualPerturbAndCombine · Class for building and using classification and regression trees based on the closed-form dual perturb and combine algorithm described in Pierre Geurts, Lous Wehenkel: Closed-form dual perturb and combine for tree-based models. In: Proceedings of the 22nd International Conference on Machine Learning, 233-240, 2005.
supervisedAttributeScaling · Package containing a class that rescales the attributes in a classification problem based on their discriminative power. This is useful as a pre-processing step for learning algorithms such as the k-nearest-neighbour method, to replace simple normalization. Each attribute is rescaled by multiplying it with a learned weight. All attributes excluding the class are assumed to be numeric and missing values are not permitted. To achieve the rescaling, this package also contains an implementation of non-negative logistic regression, which produces a logistic regression model with non-negative weights .