One-Way ANOVA Test in R - Easy Guides - Wiki - STHDA

文章推薦指數: 80 %
投票人數:10人

The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is an extension of independent two-samples t-test for comparing means in a ... Signin Login Password Autoconnect Signin Register SigninwithFacebook Forgottenpassword Welcome! WanttoLearnMoreonRProgrammingandDataScience? FollowusbyEmail byFeedBurner Clicktoseeourcollectionofresourcestohelpyouonyourpath... Course&Specialization RecommendedforYou(onCoursera): Course:MachineLearning:MastertheFundamentals Specialization:DataScience Specialization:PythonforEverybody Course:BuildSkillsforaTopJobinanyIndustry Specialization:MasterMachineLearningFundamentals Specialization:StatisticswithR Specialization:SoftwareDevelopmentinR Specialization:GenomicDataScience SeeMoreResources RPackages factoextra Overview OfficialDoc RelatedTutorials ClusterAnalysis PrincipalComponentMethods survminer Overview Cheatsheet OfficialDoc RelatedTutorials SurvivalAnalysis ReleasePosts v0.3.0 v0.2.4 ggpubr Overview OfficialDoc ggcorrplot Overview fastqcr Overview OfficialDoc OurBooks RGraphicsEssentialsforGreatDataVisualization:200PracticalExamplesYouWanttoKnowforDataScience NEW!! PracticalGuidetoClusterAnalysisinR PracticalGuidetoPrincipalComponentMethodsinR 3DPlotsinR Blogroll Datanovia:OnlineDataScienceCourses R-Bloggers Home Explorer Home EasyGuides Rsoftware RBasicStatistics ComparingMeansinR One-WayANOVATestinR Whatisone-wayANOVAtest? AssumptionsofANOVAtest Howone-wayANOVAtestworks? Visualizeyourdataandcomputeone-wayANOVAinR ImportyourdataintoR Checkyourdata Visualizeyourdata Computeone-wayANOVAtest Interprettheresultofone-wayANOVAtests Multiplepairwise-comparisonbetweenthemeansofgroups Tukeymultiplepairwise-comparisons Multiplecomparisonsusingmultcomppackage Pairewiset-test CheckANOVAassumptions:testvalidity? Checkthehomogeneityofvarianceassumption Relaxingthehomogeneityofvarianceassumption Checkthenormalityassumption Non-parametricalternativetoone-wayANOVAtest Summary Seealso Readmore Infos Whatisone-wayANOVAtest? Theone-wayanalysisofvariance(ANOVA),alsoknownasone-factorANOVA,isanextensionofindependenttwo-samplest-testforcomparingmeansinasituationwheretherearemorethantwogroups.Inone-wayANOVA,thedataisorganizedintoseveralgroupsbaseononesinglegroupingvariable(alsocalledfactorvariable).Thistutorialdescribesthebasicprincipleoftheone-wayANOVAtestandprovidespracticalanovatestexamplesinRsoftware. ANOVAtesthypotheses: Nullhypothesis:themeansofthedifferentgroupsarethesame Alternativehypothesis:Atleastonesamplemeanisnotequaltotheothers. Notethat,ifyouhaveonlytwogroups,youcanuset-test.InthiscasetheF-testandthet-testareequivalent. RelatedBook: PracticalStatisticsinRforComparingGroups:NumericalVariables AssumptionsofANOVAtest HerewedescribetherequirementforANOVAtest.ANOVAtestcanbeappliedonlywhen: Theobservationsareobtainedindependentlyandrandomlyfromthepopulationdefinedbythefactorlevels Thedataofeachfactorlevelarenormallydistributed. Thesenormalpopulationshaveacommonvariance.(Levene’stestcanbeusedtocheckthis.) Howone-wayANOVAtestworks? Assumethatwehave3groups(A,B,C)tocompare: Computethecommonvariance,whichiscalledvariancewithinsamples(\(S^2_{within}\))orresidualvariance. Computethevariancebetweensamplemeansasfollow: Computethemeanofeachgroup Computethevariancebetweensamplemeans(\(S^2_{between}\)) ProduceF-statisticastheratioof\(S^2_{between}/S^2_{within}\). Notethat,alowerratio(ratio<1)indicatesthattherearenosignificantdifferencebetweenthemeansofthesamplesbeingcompared.However,ahigherratioimpliesthatthevariationamonggroupmeansaresignificant. Visualizeyourdataandcomputeone-wayANOVAinR ImportyourdataintoR Prepareyourdataasspecifiedhere:BestpracticesforpreparingyourdatasetforR Saveyourdatainanexternal.txttabor.csvfiles ImportyourdataintoRasfollow: #If.txttabfile,usethis my_data Here,we’llusethebuilt-inRdatasetnamedPlantGrowth.Itcontainstheweightofplantsobtainedunderacontrolandtwodifferenttreatmentconditions. my_data Checkyourdata Tohaveanideaofwhatthedatalooklike,weusethethefunctionsample_n()[indplyrpackage].Thesample_n()functionrandomlypicksafewoftheobservationsinthedataframetoprintout: #Showarandomsample set.seed(1234) dplyr::sample_n(my_data,10) weightgroup 194.32trt1 184.89trt1 295.80trt2 245.50trt2 176.03trt1 14.17ctrl 64.61ctrl 163.83trt1 124.17trt1 155.87trt1 InRterminology,thecolumn“group”iscalledfactorandthedifferentcategories(“ctr”,“trt1”,“trt2”)arenamedfactorlevels.Thelevelsareorderedalphabetically. #Showthelevels levels(my_data$group) [1]"ctrl""trt1""trt2" Ifthelevelsarenotautomaticallyinthecorrectorder,re-orderthemasfollow: my_data$group It’spossibletocomputesummarystatistics(meanandsd)bygroupsusingthedplyrpackage. Computesummarystatisticsbygroups-count,mean,sd: library(dplyr) group_by(my_data,group)%>% summarise( count=n(), mean=mean(weight,na.rm=TRUE), sd=sd(weight,na.rm=TRUE) ) Source:localdataframe[3x4] groupcountmeansd (fctr)(int)(dbl)(dbl) 1ctrl105.0320.5830914 2trt1104.6610.7936757 3trt2105.5260.4425733 Visualizeyourdata TouseRbasegraphsreadthis:Rbasegraphs.Here,we’llusetheggpubrRpackageforaneasyggplot2-baseddatavisualization. InstallthelatestversionofggpubrfromGitHubasfollow(recommended): #Install if(!require(devtools))install.packages("devtools") devtools::install_github("kassambara/ggpubr") Or,installfromCRANasfollow: install.packages("ggpubr") Visualizeyourdatawithggpubr: #Boxplots #++++++++++++++++++++ #Plotweightbygroupandcolorbygroup library("ggpubr") ggboxplot(my_data,x="group",y="weight", color="group",palette=c("#00AFBB","#E7B800","#FC4E07"), order=c("ctrl","trt1","trt2"), ylab="Weight",xlab="Treatment") One-wayANOVATestinR #Meanplots #++++++++++++++++++++ #Plotweightbygroup #Adderrorbars:mean_se #(othervaluesinclude:mean_sd,mean_ci,median_iqr,....) library("ggpubr") ggline(my_data,x="group",y="weight", add=c("mean_se","jitter"), order=c("ctrl","trt1","trt2"), ylab="Weight",xlab="Treatment") One-wayANOVATestinR IfyoustillwanttouseRbasegraphs,typethefollowingscripts: #Boxplot boxplot(weight~group,data=my_data, xlab="Treatment",ylab="Weight", frame=FALSE,col=c("#00AFBB","#E7B800","#FC4E07")) #plotmeans library("gplots") plotmeans(weight~group,data=my_data,frame=FALSE, xlab="Treatment",ylab="Weight", main="MeanPlotwith95%CI") Computeone-wayANOVAtest Wewanttoknowifthereisanysignificantdifferencebetweentheaverageweightsofplantsinthe3experimentalconditions. TheRfunctionaov()canbeusedtoanswertothisquestion.Thefunctionsummary.aov()isusedtosummarizetheanalysisofvariancemodel. #Computetheanalysisofvariance res.aov DfSumSqMeanSqFvaluePr(>F) group23.7661.88324.8460.0159* Residuals2710.4920.3886 --- Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 TheoutputincludesthecolumnsFvalueandPr(>F)correspondingtothep-valueofthetest. Interprettheresultofone-wayANOVAtests Asthep-valueislessthanthesignificancelevel0.05,wecanconcludethattherearesignificantdifferencesbetweenthegroupshighlightedwith“*"inthemodelsummary. Multiplepairwise-comparisonbetweenthemeansofgroups Inone-wayANOVAtest,asignificantp-valueindicatesthatsomeofthegroupmeansaredifferent,butwedon’tknowwhichpairsofgroupsaredifferent. It’spossibletoperformmultiplepairwise-comparison,todetermineifthemeandifferencebetweenspecificpairsofgrouparestatisticallysignificant. Tukeymultiplepairwise-comparisons AstheANOVAtestissignificant,wecancomputeTukeyHSD(TukeyHonestSignificantDifferences,Rfunction:TukeyHSD())forperformingmultiplepairwise-comparisonbetweenthemeansofgroups. ThefunctionTukeyHD()takesthefittedANOVAasanargument. TukeyHSD(res.aov) Tukeymultiplecomparisonsofmeans 95%family-wiseconfidencelevel Fit:aov(formula=weight~group,data=my_data) $group difflwruprpadj trt1-ctrl-0.371-1.06221610.32021610.3908711 trt2-ctrl0.494-0.19721611.18521610.1979960 trt2-trt10.8650.17378391.55621610.0120064 diff:differencebetweenmeansofthetwogroups lwr,upr:thelowerandtheupperendpointoftheconfidenceintervalat95%(default) padj:p-valueafteradjustmentforthemultiplecomparisons. Itcanbeseenfromtheoutput,thatonlythedifferencebetweentrt2andtrt1issignificantwithanadjustedp-valueof0.012. Multiplecomparisonsusingmultcomppackage It’spossibletousethefunctionglht()[inmultcomppackage]toperformmultiplecomparisonproceduresforanANOVA.glhtstandsforgenerallinearhypothesistests.Thesimplifiedformatisasfollow: glht(model,lincft) model:afittedmodel,forexampleanobjectreturnedbyaov(). lincft():aspecificationofthelinearhypothesestobetested.MultiplecomparisonsinANOVAmodelsarespecifiedbyobjectsreturnedfromthefunctionmcp(). Useglht()toperformmultiplepairwise-comparisonsforaone-wayANOVA: library(multcomp) summary(glht(res.aov,linfct=mcp(group="Tukey"))) SimultaneousTestsforGeneralLinearHypotheses MultipleComparisonsofMeans:TukeyContrasts Fit:aov(formula=weight~group,data=my_data) LinearHypotheses: EstimateStd.ErrortvaluePr(>|t|) trt1-ctrl==0-0.37100.2788-1.3310.391 trt2-ctrl==00.49400.27881.7720.198 trt2-trt1==00.86500.27883.1030.012* --- Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 (Adjustedpvaluesreported--single-stepmethod) Pairewiset-test Thefunctionpairewise.t.test()canbealsousedtocalculatepairwisecomparisonsbetweengrouplevelswithcorrectionsformultipletesting. pairwise.t.test(my_data$weight,my_data$group, p.adjust.method="BH") PairwisecomparisonsusingttestswithpooledSD data:my_data$weightandmy_data$group ctrltrt1 trt10.194- trt20.1320.013 Pvalueadjustmentmethod:BH Theresultisatableofp-valuesforthepairwisecomparisons.Here,thep-valueshavebeenadjustedbytheBenjamini-Hochbergmethod. CheckANOVAassumptions:testvalidity? TheANOVAtestassumesthat,thedataarenormallydistributedandthevarianceacrossgroupsarehomogeneous.Wecancheckthatwithsomediagnosticplots. Checkthehomogeneityofvarianceassumption Theresidualsversusfitsplotcanbeusedtocheckthehomogeneityofvariances. Intheplotbelow,thereisnoevidentrelationshipsbetweenresidualsandfittedvalues(themeanofeachgroups),whichisgood.So,wecanassumethehomogeneityofvariances. #1.Homogeneityofvariances plot(res.aov,1) One-wayANOVATestinR Points17,15,4aredetectedasoutliers,whichcanseverelyaffectnormalityandhomogeneityofvariance.Itcanbeusefultoremoveoutlierstomeetthetestassumptions. It’salsopossibletouseBartlett’stestorLevene’stesttocheckthehomogeneityofvariances. WerecommendLevene’stest,whichislesssensitivetodeparturesfromnormaldistribution.ThefunctionleveneTest()[incarpackage]willbeused: library(car) leveneTest(weight~group,data=my_data) Levene'sTestforHomogeneityofVariance(center=median) DfFvaluePr(>F) group21.11920.3412 27 Fromtheoutputabovewecanseethatthep-valueisnotlessthanthesignificancelevelof0.05.Thismeansthatthereisnoevidencetosuggestthatthevarianceacrossgroupsisstatisticallysignificantlydifferent.Therefore,wecanassumethehomogeneityofvariancesinthedifferenttreatmentgroups. Relaxingthehomogeneityofvarianceassumption Theclassicalone-wayANOVAtestrequiresanassumptionofequalvariancesforallgroups.Inourexample,thehomogeneityofvarianceassumptionturnedouttobefine:theLevenetestisnotsignificant. HowdowesaveourANOVAtest,inasituationwherethehomogeneityofvarianceassumptionisviolated? Analternativeprocedure(i.e.:Welchone-waytest),thatdoesnotrequirethatassumptionhavebeenimplementedinthefunctiononeway.test(). ANOVAtestwithnoassumptionofequalvariances oneway.test(weight~group,data=my_data) Pairwiset-testswithnoassumptionofequalvariances pairwise.t.test(my_data$weight,my_data$group, p.adjust.method="BH",pool.sd=FALSE) Checkthenormalityassumption Normalityplotofresiduals.Intheplotbelow,thequantilesoftheresidualsareplottedagainstthequantilesofthenormaldistribution.A45-degreereferencelineisalsoplotted. Thenormalprobabilityplotofresidualsisusedtochecktheassumptionthattheresidualsarenormallydistributed.Itshouldapproximatelyfollowastraightline. #2.Normality plot(res.aov,2) One-wayANOVATestinR Asallthepointsfallapproximatelyalongthisreferenceline,wecanassumenormality. Theconclusionabove,issupportedbytheShapiro-WilktestontheANOVAresiduals(W=0.96,p=0.6)whichfindsnoindicationthatnormalityisviolated. #Extracttheresiduals aov_residuals Shapiro-Wilknormalitytest data:aov_residuals W=0.96607,p-value=0.4379 Non-parametricalternativetoone-wayANOVAtest Notethat,anon-parametricalternativetoone-wayANOVAisKruskal-Wallisranksumtest,whichcanbeusedwhenANNOVAassumptionsarenotmet. kruskal.test(weight~group,data=my_data) Kruskal-Wallisranksumtest data:weightbygroup Kruskal-Wallischi-squared=7.9882,df=2,p-value=0.01842 Summary Importyourdatafroma.txttabfile:my_data.Here,weusedmy_data. Visualizeyourdata:ggpubr::ggboxplot(my_data,x=“group”,y=“weight”,color=“group”) Computeone-wayANOVAtest:summary(aov(weight~group,data=my_data)) Tukeymultiplepairwise-comparisons:TukeyHSD(res.aov) Seealso Analysisofvariance(ANOVA,parametric): One-WayANOVATestinR Two-WayANOVATestinR MANOVATestinR:MultivariateAnalysisofVariance Kruskal-WallisTestinR(nonparametricalternativetoone-wayANOVA) Readmore (Quick-R:ANOVA/MANOVA)[http://www.statmethods.net/stats/anova.html] (Quick-R:(M)ANOVAAssumptions)[http://www.statmethods.net/stats/anovaAssumptions.html] (RandAnalysisofVariance)[http://personality-project.org/r/r.guide/r.anova.html Infos ThisanalysishasbeenperformedusingRsoftware(ver.3.2.4). Enjoyedthisarticle?I’dbeverygratefulifyou’dhelpitspreadbyemailingittoafriend,orsharingitonTwitter,FacebookorLinkedIn. Showmesomelovewiththelikebuttonsbelow...Thankyouandpleasedon'tforgettoshareandcommentbelow!! Avezvousaimécetarticle?Jevousseraistrèsreconnaissantsivousaidiezàsadiffusionenl'envoyantparcourrielàunamiouenlepartageantsurTwitter,FacebookouLinkedIn. Montrez-moiunpeud'amouravecleslikeci-dessous...Mercietn'oubliezpas,s'ilvousplaît,departageretdecommenterci-dessous! RecommendedforYou! MachineLearningEssentials:PracticalGuideinR PracticalGuidetoClusterAnalysisinR PracticalGuidetoPrincipalComponentMethodsinR RGraphicsEssentialsforGreatDataVisualization NetworkAnalysisandVisualizationinR MorebooksonRanddatascience RecommendedforyouThissectioncontainsbestdatascienceandself-developmentresourcestohelpyouonyourpath.Coursera-OnlineCoursesandSpecializationDatascienceCourse:MachineLearning:MastertheFundamentalsbyStandfordSpecialization:DataSciencebyJohnsHopkinsUniversitySpecialization:PythonforEverybodybyUniversityofMichiganCourses:BuildSkillsforaTopJobinanyIndustrybyCourseraSpecialization:MasterMachineLearningFundamentalsbyUniversityofWashingtonSpecialization:StatisticswithRbyDukeUniversitySpecialization:SoftwareDevelopmentinRbyJohnsHopkinsUniversitySpecialization:GenomicDataSciencebyJohnsHopkinsUniversityPopularCoursesLaunchedin2020GoogleITAutomationwithPythonbyGoogleAIforMedicinebydeeplearning.aiEpidemiologyinPublicHealthPracticebyJohnsHopkinsUniversityAWSFundamentalsbyAmazonWebServicesTrendingCoursesTheScienceofWell-BeingbyYaleUniversityGoogleITSupportProfessionalbyGooglePythonforEverybodybyUniversityofMichiganIBMDataScienceProfessionalCertificatebyIBMBusinessFoundationsbyUniversityofPennsylvaniaIntroductiontoPsychologybyYaleUniversityExcelSkillsforBusinessbyMacquarieUniversityPsychologicalFirstAidbyJohnsHopkinsUniversityGraphicDesignbyCalArtsBooks-DataScienceOurBooksPracticalGuidetoClusterAnalysisinRbyA.Kassambara(Datanovia)PracticalGuideToPrincipalComponentMethodsinRbyA.Kassambara(Datanovia)MachineLearningEssentials:PracticalGuideinRbyA.Kassambara(Datanovia)RGraphicsEssentialsforGreatDataVisualizationbyA.Kassambara(Datanovia)GGPlot2EssentialsforGreatDataVisualizationinRbyA.Kassambara(Datanovia)NetworkAnalysisandVisualizationinRbyA.Kassambara(Datanovia)PracticalStatisticsinRforComparingGroups:NumericalVariablesbyA.Kassambara(Datanovia)Inter-RaterReliabilityEssentials:PracticalGuideinRbyA.Kassambara(Datanovia)OthersRforDataScience:Import,Tidy,Transform,Visualize,andModelDatabyHadleyWickham&GarrettGrolemundHands-OnMachineLearningwithScikit-Learn,Keras,andTensorFlow:Concepts,Tools,andTechniquestoBuildIntelligentSystemsbyAurelienGéronPracticalStatisticsforDataScientists:50EssentialConceptsbyPeterBruce&AndrewBruceHands-OnProgrammingwithR:WriteYourOwnFunctionsAndSimulationsbyGarrettGrolemund&HadleyWickhamAnIntroductiontoStatisticalLearning:withApplicationsinRbyGarethJamesetal.DeepLearningwithRbyFrançoisChollet&J.J.AllaireDeepLearningwithPythonbyFrançoisChollet WanttoLearnMoreonRProgrammingandDataScience? FollowusbyEmail byFeedBurner OnSocialNetworks:  Getinvolved:   ClicktofollowusonFacebookand Google+:     Commentthisarticlebyclickingon"Discussion"button(top-rightpositionofthispage)



請為這篇文章評分?