Group: - All Dependencies


naiveBayesTree · Class for generating a decision tree with naive Bayes classifiers at the leaves. For more information, see Ron Kohavi: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Second International Conference on Knoledge Discovery and Data Mining, 202-207, 1996.

Apr 26, 2012

multilayerPerceptronCS · An extension of the standard MultilayerPerceptron classifier in Weka that adds context-sensitive Multiple Task Learning (csMTL)

Apr 26, 2012
multiBoostAB 1.0.2

multiBoostAB · Class for boosting a classifier using the MultiBoosting method. MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with wagging. It is able to harness both AdaBoost's high bias and variance reduction with wagging's superior variance reduction. Using C4.5 as the base learning algorithm, Multi-boosting is demonstrated to produce decision committees with lower error than either AdaBoost or wagging significantly more often than the reverse over a large representative cross-section of UCI data sets. It offers the further advantage over AdaBoost of suiting parallel execution. For more information, see Geoffrey I. Webb (2000). MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning. Vol.40(No.2).

Apr 26, 2012
metaCost 1.0.3

metaCost · This metaclassifier makes its base classifier cost-sensitive using the method specified in Pedro Domingos: MetaCost: A general method for making classifiers cost-sensitive. In: Fifth International Conference on Knowledge Discovery and Data Mining, 155-164, 1999. This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

Feb 06, 2013

linearForwardSelection · Extension of BestFirst. Takes a restricted number of k attributes into account. Fixed-set selects a fixed number k of attributes, whereas k is increased in each step when fixed-width is selected. The search uses either the initial ordering to select the top k attributes, or performs a ranking (with the same evalutator the search uses later on). The search direction can be forward, or floating forward selection (with opitional backward search steps). For more information see: Martin Guetlein (2006). Large Scale Attribute Selection Using Wrappers. Freiburg, Germany.

Apr 26, 2012

levenshteinEditDistance · Computes the Levenshtein edit distance between two strings.

Apr 26, 2012

leastMedSquared · Implements a least median squared linear regression utilizing the existing weka LinearRegression class to form predictions. Least squared regression functions are generated from random subsamples of the data. The least squared regression with the lowest meadian squared error is chosen as the final model. The basis of the algorithm is Peter J. Rousseeuw, Annick M. Leroy (1987). Robust regression and outlier detection.

Apr 26, 2012

kfPMMLClassifierScoring · A Knowledge Flow plugin that provides a Knowledge Flow step for scoring test sets or instance streams using a PMML classifier.

Apr 26, 2012

kernelLogisticRegression · This package contains a classifier that can be used to train a two-class kernel logistic regression model with the kernel functions that are available in WEKA. It optimises the negative log-likelihood with a quadratic penalty. Both, BFGS and conjugate gradient descent, are available as optimisation methods, but the former is normally faster. It is possible to use multiple threads, but the speed-up is generally very marginal when used with BFGS optimisation. With conjugate gradient descent optimisation, greater speed-ups can be achieved when using multiple threads. With the default kernel, the dot product kernel, this method produces results that are close to identical to those obtained using standard logistic regression in WEKA, provided a sufficiently large value for the parameter determining the size of the quadratic penalty is used in both cases.

Jun 26, 2013
J48graft 1.0.3

J48graft · Class for generating a grafted (pruned or unpruned) C4.5 decision tree. For more information, see Geoff Webb: Decision Tree Grafting From the All-Tests-But-One Partition.

Apr 25, 2014
hyperPipes 1.0.2

hyperPipes · Class implementing a HyperPipe classifier. For each category a HyperPipe is constructed that contains all points of that category (essentially records the attribute bounds observed for each category). Test instances are classified according to the category that "most contains the instance". Does not handle numeric class, or missing values in test cases. Extremely simple algorithm, but has the advantage of being extremely fast, and works quite well when you have "smegloads" of attributes.

Apr 26, 2012
grading 1.0.2

grading · Implements Grading. The base classifiers are "graded". For more information, see A.K. Seewald, J. Fuernkranz: An Evaluation of Grading Classifiers. In: Advances in Intelligent Data Analysis: 4th International Conference, Berlin/Heidelberg/New York/Tokyo, 115-124, 2001.

Apr 26, 2012

fuzzyUnorderedRuleInduction · FURIA: Fuzzy Unordered Rule Induction Algorithm. For details please see: Jens Christian Huehn, Eyke Huellermeier (2009). FURIA: An Algorithm for Unordered Fuzzy Rule Induction. Data Mining and Knowledge Discovery.

Jul 29, 2014

fuzzyLaticeReasoning · The Fuzzy Lattice Reasoning Classifier uses the notion of Fuzzy Lattices for creating a Reasoning Environment. The current version can be used for classification using numeric predictors. For more information see: I. N. Athanasiadis, V. G. Kaburlasos, P. A. Mitkas, V. Petridis: Applying Machine Learning Techniques on Air Quality Data for Real-Time Decision Support. In: 1st Intl. NAISO Symposium on Information Technologies in Environmental Engineering (ITEE-2003), Gdansk, Poland, 2003; V. G. Kaburlasos, I. N. Athanasiadis, P. A. Mitkas, V. Petridis (2003). Fuzzy Lattice Reasoning (FLR) Classifier and its Application on Improved Estimation of Ambient Ozone Concentration.

Apr 26, 2012

filteredAttributeSelection · This package provides two meta attribute selection evaluators that can apply an arbitrary filter to the input data before executing the actual attribute selection scheme. One filters data and then passes it to an attribute evaluator (FilteredAttributeEval), and the other filters data and then passes it to a subset evaluator (FilteredSubsetEval).

Apr 26, 2012
EMImputation 1.0.2

EMImputation · Replaces missing numeric values using Expectation Maximization with a multivariate normal model. Described in " Schafer, J.L. Analysis of Incomplete Multivariate Data, New York: Chapman and Hall, 1997."

Apr 26, 2012
DTNB 1.0.3

DTNB · Class for building and using a decision table/naive bayes hybrid classifier. At each point in the search, the algorithm evaluates the merit of dividing the attributes into two disjoint subsets: one for the decision table, the other for naive Bayes. A forward selection search is used, where at each step, selected attributes are modeled by naive Bayes and the remainder by the decision table, and all attributes are modelled by the decision table initially. At each step, the algorithm also considers dropping an attribute entirely from the model. For more information, see: Mark Hall, Eibe Frank: Combining Naive Bayes and Decision Tables. In: Proceedings of the 21st Florida Artificial Intelligence Society Conference (FLAIRS), 318-319, 2008.

Apr 30, 2014
DMNBtext 1.0.2

DMNBtext · Class for building and using a Discriminative Multinomial Naive Bayes classifier. For more information see: Jiang Su,Harry Zhang,Charles X. Ling,Stan Matwin: Discriminative Parameter Learning for Bayesian Networks. In: ICML 2008', 2008.

Apr 26, 2012

distributedWekaHadoop · This package provides loaders and savers for HDFS, plus Hadoop jobs and tasks that wrap the tasks provided in distributedWekaBase. Includes libraries for Hadoop 1.1.2.

Apr 16, 2015
denormalize 1.0.3

denormalize · An instance filter that collapses instances with a common grouping ID value into a single instance. Useful for converting transactional data into a format that Weka's association rule learners can handle. IMPORTANT: assumes that the incoming batch of instances has been sorted on the grouping attribute. The values of nominal attributes are converted to indicator attributes. These can be either binary (with f and t values) or unary with missing values used to indicate absence. The later is Weka's old market basket format, which is useful for Apriori. Numeric attributes can be aggregated within groups by computing the average, sum, minimum or maximum.

Apr 29, 2014
dagging 1.0.3

dagging · This meta classifier creates a number of disjoint, stratified folds out of the data and feeds each chunk of data to a copy of the supplied base classifier. Predictions are made via majority vote, since all the generated base classifiers are put into the Vote meta classifier. Useful for base classifiers that are quadratic or worse in time behavior, regarding number of instances in the training data. For more information, see: Ting, K. M., Witten, I. H.: Stacking Bagged and Dagged Models. In: Fourteenth international Conference on Machine Learning, San Francisco, CA, 367-375, 1997.

Apr 29, 2014

costSensitiveAttributeSelection · This package provides two meta attribute selection evaluators - one for performing cost-sensitive attribute evaluation (CostSensitiveAttributeEval) and a second for performing cost-sensitive subset evaluation (CostSensitiveSubsetEval). Both methods take a cost matrix and a base evaluator. If the base evaluator can handle instance weights, then the training data is weighted according to the cost matrix, otherwise the training data is sampled according to the cost matrix.

Apr 29, 2014

consistencySubsetEval · Evaluates the worth of a subset of attributes by the level of consistency in the class values when the training instances are projected onto the subset of attributes. The consistency of any subset can never be lower than that of the full set of attributes, hence the usual practice is to use this subset evaluator in conjunction with a Random or Exhaustive search which looks for the smallest subset with consistency equal to that of the full set of attributes. See: H. Liu, R. Setiono: A probabilistic approach to feature selection - A filter solution. In: 13th International Conference on Machine Learning, 319-327, 1996.

Oct 16, 2014

conjunctiveRule · This class implements a single conjunctive rule learner that can predict for numeric and nominal class labels. A rule consists of antecedents "AND"ed together and the consequent (class value) for the classification/regression. In this case, the consequent is the distribution of the available classes (or mean for a numeric value) in the dataset. If the test instance is not covered by this rule, then it's predicted using the default class distributions/value of the data not covered by the rule in the training data.This learner selects an antecedent by computing the Information Gain of each antecendent and prunes the generated rule using Reduced Error Prunning (REP) or simple pre-pruning based on the number of antecedents. For classification, the Information of one antecedent is the weighted average of the entropies of both the data covered and not covered by the rule. For regression, the Information is the weighted average of the mean-squared errors of both the data covered and not covered by the rule. In pruning, weighted average of the accuracy rates on the pruning data is used for classification while the weighted average of the mean-squared errors on the pruning data is used for regression.

Apr 29, 2014


Top Dependency Usages

Feb 13, 2021
95.1k usages
8.4k stars
Jun 02, 2023
69.4k usages
14.2k stars
Mar 17, 2023
51k usages
2.1k stars
Jul 31, 2023
27.1k usages
49k stars
Aug 09, 2023
25k usages
2.7k stars