Thursday, December 5, 2019

Weka A Machine Learning Workbench

Questions: Task 1 In WEKA load the data setDIABETES.arff. Perform rule classification using the following methods JRip Ridor For each method produce a summary of the rules produced and comment on the accuracy of the method. Task2 In WEKA load the data set supermarket.arff. Perform association rule learning using the following methods Apriori FPGrowth For each method produce a summary of the rules produced and comment on the accuracy of the method. Task3 In WEKA load the data set breat-cancer.arff. PerformBayesian classification using the following methods AODE BayesNet For each method produce a summaryof the classificationproduced and comment on the accuracy of the method. Answers: Task 1 Diabetes.arff contains points of interest of Pima Indians Diabetes Database gathered by National Institute of Diabetes and Digestive and Kidney Diseases All patients here are females of no less than 21 years of age of Pima Indian legacy. The quantities of Instances are 768 with 9 qualities Number of times pregnant Plasma glucose focus a 2 hours in an oral glucose resilience test Diastolic circulatory strain (mm Hg) Triceps skin fold thickness (mm) 2-Hour serum insulin (mu U/ml) Body mass record (weight in kg/(stature in m)^2) Diabetes family work Age (years) Class variable (0 or 1) Class qualities are 1 then it is tried positive for diabetes and 0 for negative. This is a two-class issue with class esteem 1 being deciphered as "tried positive for diabetes". There are 500examples of class 1 and 268 of class 2. Weka classifier tenets uses a propositional principle learner, which is Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as a streamlined rendition of IREP. It is situated in affiliation rules with lessened blunder pruning (REP), an exceptionally regular and compelling strategy found in choice tree calculations. In REP for principles calculations, the preparation information is part into a developing set and a pruning set. Initially, an introductory tenet set is framed that over its the developing set, utilizing some heuristic system. This overlarge standard set is then more than once streamlined by applying one of a set of pruning administrators run of the mill pruning administrators would be to erase any single condition or any single principle. At each one phase of improvement, the pruning administrator picked is the particular case that yields the best lessening of slip on the pruning set. Rearrangements closes when applying any pruning administrator would expand lapse on the pruning set. Rehashed Incremental Pruning to Produce Error Reduction (RIPPER) is one of the fundamental and most mainstream calculations. Classes are analyzed in expanding size and a starting set of tenets for the class is created utilizing incremental decreased lapse pruning. In this study, we assessed RIPPER through JRip, a usage of RIPPER in WEKA with the parameters: folds = 10; minNo = 2; enhancements = 2; seed = 1; usePruning = genuine. JRip is moderate classifier with 96.5% exactness. Ridor: It produces a default administer first and after that the exemptions for the default standard with the slightest (weighted) slip rate. At that point it creates the "best" exemptions for every exemption and emphasizes until unadulterated. In this way it performs a tree-like extension of exceptions.The special cases are a situated of tenets that anticipate classes other than the default. IREP is utilized to produce the special cases. Swell down Rules produce models which are simpler to keep up and overhaul than different choices Task 2 à ¢Ã¢â€š ¬Ã¢â‚¬ ¹Apriori Association rule: Apriori is a count for customary thing set mining and association principle adjusting over worth based databases. It pushes forward by recognizing the perpetual individual things in the database and extending them to greater and greater thing sets the length of those thing sets show up sufficiently oftentimes in the database. The unremitting thing sets directed by Apriori can be used to center association rules which highlight general examples in the database. Supermarket.arff this information set portrays the shopping propensities for market clients. A large portion of the qualities stand for one specific thing gathering. The quality is't' if the client had purchased a thing out of a thing range and missing generally. There is one example every client. The information set contains no class quality, as this is not needed for learning affiliation rules. Load the information set "supermarket.arff" and change into the Associate Panel. Select "Apriori" as associator. In the wake of pressing begin Apriori begins to fabricate its model and composes its yield into the yield field. The main piece of the yield ('Run data') depicts the choice that have been set and the information set utilized. Affiliation principles are fundamentally planned to backing exploratory information examination. Use Apriori to create standards and utilization them to say something in regards to the shopping propensities for grocery store clients. The information contains 4,627 occurrences and 217 qualities. The information is denormalized. Each one trait is double and either has a quality ("t" for genuine) or no worth ("?" for missing). There is an ostensible class quality called "aggregate" that shows whether the exchange was short of what $100 (low) or more noteworthy than $100 (high). Output To Apriori The output for Apriori association rule for super market is The rules discovered where: biscuits=t frozen foods=t fruit=t total=high 788 == bread and cake=t 723 conf:(0.92) baking needs=t biscuits=t fruit=t total=high 760 == bread and cake=t 696 conf:(0.92) baking needs=t frozen foods=t fruit=t total=high 770 == bread and cake=t 705 conf:(0.92) biscuits=t fruit=t vegetables=t total=high 815 == bread and cake=t 746 conf:(0.92) party snack foods=t fruit=t total=high 854 == bread and cake=t 779 conf:(0.91) biscuits=t frozen foods=t vegetables=t total=high 797 == bread and cake=t 725 conf:(0.91) baking needs=t biscuits=t vegetables=t total=high 772 == bread and cake=t 701 conf:(0.91) biscuits=t fruit=t total=high 954 == bread and cake=t 866 conf:(0.91) frozen foods=t fruit=t vegetables=t total=high 834 == bread and cake=t 757 conf:(0.91) frozen foods=t fruit=t total=high 969 == bread and cake=t 877conf:(0.91) Standards are displayed in precursor = resulting arrangement. The number connected with the predecessor is irrefutably the scope in the dataset (for this situation a number out of a conceivable aggregate of 4,627). The number by the subsequent is indisputably the quantity of occurrences that match the forerunner and the resulting. The number in sections on the end is the backing for the tenet (number of precursor partitioned by the quantity of matching consequents). You can see that a cutoff of 91% was utilized as a part of selecting principles, specified in the "Associator yield" window and demonstrated in that no tenet has scope under 0.91. Few key perceptions: We can see that all introduced standards have a resulting of "bread and cake". All introduced standards demonstrate a high aggregate exchange sum. "rolls" a "solidified nourishments" show up in a number of the introduce. FP Growth In essential words, this figuring goes about as takes after: first it packs the information database making a FP-tree event to address progressive things. After the first step, it disengages the layered database into a set of prohibitive databases, everybody associated with one constant sample. Finally, every such database is mined autonomously. Using this system, the FP-Growth reduces the journey costs scanning. Apriori, visits each trade when creating an alternate candidate sets; FP-Growth does not can use data s things; Apriori makes contender sets FP-Growth uses more confounded data structures mining Method Task 3 Breast cancer data set Breast disease information set has 699 cases with 10 characteristics. The class conveyance is encircled as Benign and dangerous. There are 1 ward variable and 9 free variables. The qualities for the free variables ranges from 1 - 10 and for class variable 2 for Benign and 4 for dangerous tumor. The base conceivable outcomes for an individual to get breast tumor are 1 and the most extreme potential outcomes are spoken to by the quality 10. Aode Arrived at the midpoint of one-reliance estimators (AODE) are a probabilistic order learning procedure. It was created to address the quality autonomy issue of the mainstream guileless Bayes classifier. It oftentimes creates considerably more exact classifiers than guileless Bayes at the expense of an unassuming increment in the measure of calculation. Bayes Net Results for: Naive Bayes === Run information === Scheme: weka.classifiers.bayes.NaiveBayes Relation: breast Instances: 683 Attributes: 10 Test mode: 10-fold cross-validation Time taken to build model: 0.08 seconds === Summary === Correctly Classified Instances 659 96.4861 % Incorrectly Classified Instances 24 3.5139 % Kappa statistic 0.9238 KB Relative Info Score 62650.9331 % KB Information Score 585.4063 bits 0.8571 bits/instance Class complexity | order 0 637.9242 bits 0.934 bits/instance Class complexity | scheme 1877.4218 bits 2.7488 bits/instance Complexity improvement (Sf) -1239.4976 bits -1.8148 bits/instance Mean absolute error 0.0362 Root mean squared error 0.1869 Relative absolute error 7.9508 % Root relative squared error 39.192 % Total Number of Instances 683 Kappa measurement is utilized to survey the exactness of any specific measuring cases, it is normal to recognize the unwavering quality of the information gathered and their legitimacy. The normal Kappa score from the Bayes Net calculation is around 0.6-0.7. References: Ian H. Witten; Eibe Frank; Mark A. Hall (2011). "Data Mining: Practical machine learning tools and techniques, 3rd Edition" . Morgan Kaufmann, San Francisco. Retrieved 2011-01-19.Holmes; A. Donkin and I.H. Witten (1994)."Weka: A machine learning workbench" . Proc Second Australia and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia. Retrieved 2007-06-25. R. Garner; S.J. Cunningham, G. Holmes, C.G. Nevill-Manning, and I.H. Witten (1995)."Applying a machine learning workbench: Experience with agricultural databases" . Proc Machine Learning in Practice Workshop, Machine Learning Conference, Tahoe City, CA, USA. pp. 1421. Retrieved 2007-06-25. Reutemann; B. Pfahringer and E. Frank (2004)."Proper: A Toolbox for Learning from Relational Data with Propositional and Multi-Instance Learners" . 17th Australian Joint Conference on Artificial Intelligence (AI2004). Springer-Verlag. Retrieved 2007-06-25. "weka - How do I use the package manager?". Retrieved 20 Septem ber 2014. Ian H. Witten; Eibe Frank; Len Trigg; Mark Hall; Geoffrey Holmes; Sally Jo Cunningham (1999)."Weka: Practical Machine Learning Tools and Techniques with Java Implementations" . Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems. pp. 192196. Retrieved 2007-06-26.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.