ࡱ> U@ cbjbj 4PC<<<<<<<>&k&k&k&k<bk$>hl&m&m&m&m&m&m&m  $Rj19<&m&m&m&m&m1<<&m&mjooo&m<&m<&m5o&m o(o"ps<<Ms&ml t&koss<0!s,oRMsPT<<<<<Ms&m&mo&m&m&m&m&m11>>JNdo>>NBasic tests for continuous data: Mann-Whitney U and Wilcoxon signed rank sum tests Jenny V Freeman, Michael J Campbell The most recent tutorial examined how the process of setting and testing a hypothesis could be implemented in practice ADDIN REFMGR.CITE Freeman2006288Basic tests for continuous Normally distributed dataJournal288Basic tests for continuous Normally distributed dataFreeman,J.V.Julious,S.A.2006scopeStatisticsNot in FileScope153Scope1(Freeman & Julious 2006). It focussed on some elementary methods for analysing continuous data: the paired and unpaired t-tests. However, these tests make particular assumptions about the distribution of the data. Most importantly that the standard deviations are similar (for the independent groups t-test) and that the data to be analysed are approximately Normally distributed (both tests). This tutorial will discuss some alternative methods that can be used when these assumptions are violated. They are part of a group of statistical tests known as non-parametric or distribution-free tests; distribution-free tests do not involve making any assumptions about how the data are distributed (for example that the data are Normally distributed). An important point to note is that it is the test that is parametric or non-parametric, not the data. Mann-Whitney U test When the assumptions underlying the independent samples t-test are not met, then the non-parametric equivalent, the Mann-Whitney U test, may be used. Whilst the independent samples t-test is specifically a test of the null hypothesis that the groups have the same mean value, the Mann-Whitney U test is not a test for a difference in medians, as is commonly thought. It is a more general test of the null hypothesis that the distribution of the outcome variable in the two groups is the same; it is possible for the outcome data in the two groups to have similar measures of central tendency or location, such as mean and medians, but different distributions. Consider for example two groups of size 50; group A has 48 observations with value 0 and 2 with value 1 whilst group B has 26 observations with value 0 and 24 with a value of 2. Both groups have a median value of 0 but the p-value from the Mann-Whitney U test is < 0.001, indicating that the distribution of data in two groups is different. The Mann-Whitney U test requires all the observations (for both groups combined) to be ranked as if they were from a single sample. From this the test statistic U is calculated; it is the number of all possible pairs of observations comprising one observation from each sample for which the rank of value in the first group precedes the rank of the value in the second group. This test statistic is then used to obtain a P value. The principle is best illustrated with a simple example. Consider the following two samples of size six X=(0,6,5,1,1,6) and Y=(9,4,7,8,3,5). These are then ranked in order as if they were from the same sample (the values for sample X are given in bold): Values 0 1 1 3 4 5 5 6 6 7 8 9 Ranks 1 2.5 2.5 4 5 6.5 6.5 8.5 8.5 10 11 12 Having ranked the values altogether, these ranks are then added up separately for each sample to get two separate totals (U statistics), Ux=29.5 and Uy=48.5. A useful check is that the sum of the ranks should add to n(n+1)/2. In this case n(n+1)=12(12+1)/5=78. The smaller of the two U statistics is used obtain a P-value; thus the value of U used for this example is 29.5. As with the t statistic above this value is compared to tabulated critical values under the null hypothesis (table 1) to obtain a P-value. Rank totals greater than the tabulated critical values are not significant. In this case n1 and n2 are both 6 and the tabulated critical value is 26. As the value of 29.5 is greater than this, the results do not reach statistical significance at the 5% level, and there that there is insufficient evidence to reject the null that the two groups differ in terms of the distribution of their data. Table 1: Mann-Whitney test on unpaired samples: 5% levels of P (taken from Swinscow and Campbell ADDIN REFMGR.CITE Swinscow2002209Statistics at square oneBook, Whole209Statistics at square oneSwinscow,T.D.VCampbell,M.J.2002displaying dataStatisticsNot in File10London</Pub_Place><Publisher>BMJ Books</Publisher><ZZ_WorkformID>2</ZZ_WorkformID></MDL></Cite></Refman>(Swinscow & Campbell 2002)) n1! n2 !2345678910111213141541056111767121826771320273683814212938499381522314051631039152332425365781149162434445568819612410172635465871859911513410182737486073881031191371441119283850637691106123141160154112029405265799411012714516418516412213142546782971141311501691751221324356708410011713515418513223345587287103121139195132334466074901071242051424354862779311021614253750647995226152638516682236152739536824616284055256162842267172927717287 The previous tutorial illustrated the use of the independent samples t-test with some data taken from a community leg ulcer trial ADDIN REFMGR.CITE Morrell1998231Cost effectiveness of community leg ulcer clinic: randomised controlled trialJournal231Cost effectiveness of community leg ulcer clinic: randomised controlled trialMorrell,C.J.Walters,S.J.Dixon,S.Collins,K.Brereton,L.M.L.Peters,J.Brooker,C.G.D.1998scopeStatisticsNot in File14871491British Medical Journal316British Medical Journal1(Morrell et al. 1998). For the leg ulcer data, there were 120 patients in the clinic group and their mean number of ulcer free weeks for was 20.1. There were 113 patients in the control group and they had a mean number of ulcer free weeks of 14.2. It was demonstrated that there was a statistically significant difference in the number of ulcer free weeks between the two groups (P=0.014). However, if the number of ulcer free weeks in each group is plotted it can be seen that the data are highly skewed and are not Normally distributed (Figure 1a and 1b). Figure 1a: Ulcer-free time for clinic group Figure 1b: Ulcer-free time for home group  If, instead of the independent samples t-test a Mann-Whitney U test were carried out on these data the P-value obtained would be 0.017, a value that is remarkably similar to that obtained from the t-test. In fact, the t-test and the Mann-Whitney U test will tend to give similar P-values when the samples are large and approximately equal in size). As this is less than the nominal level usually set for statistical significance of 0.05 we can reject the null hypothesis (that the distribution of the data in the two groups are the same). We conclude that the result is statistically significant and there is evidence that the distribution of ulcer free weeks is different between the two groups. However, we are unable to state what the difference might be, only that there is a difference, if we only consider the P-value. Two groups of paired observations When there is more than one group of observations it is vital to distinguish the case where the data are paired from that where the groups are independent. Paired data may arise when the same individuals are studied more than once, usually in different circumstances, or when individuals are paired as in a case-control study. For example, as part of the leg ulcer trial, data were collected on health related quality of life (HRQoL) at baseline, 3 months and 12 months follow-up. The previous tutorial described a method for analysing paired continuous data, the paired t-test. If the assumptions underlying the use of the paired t-test are not met a non-parametric alternative, the Wilcoxon signed rank sum test, can be used, This test is based upon the ranks of the paired differences and the null hypothesis is that there is no tendency for the outcome in one group (or under one condition) to be higher or lower than in the other group (or condition). It assumes that (a) the paired differences were independent of each other and (b) the differences come from a symmetrical distribution (this can be checked by eye). As with the Mann-Whitney U test outlined above the Wilcoxon signed rank sum test is most easily illustrated using an example. Swinscow and Campbell  ADDIN REFMGR.CITE Swinscow2002209Statistics at square oneBook, Whole209Statistics at square oneSwinscow,T.D.VCampbell,M.J.2002displaying dataStatisticsNot in File10LondonBMJ Books2(Swinscow & Campbell 2002) give details of a study of foetal movements before and after chorionic villus sampling. The data are shown in table 2: Table 2 Wilcoxon test on percentage of time foetus spent movingbefore and after chononic villus sampling for ten pregnant women  ADDIN REFMGR.CITE Boogert1987287The immediate effects of chorionic villus sampling on fetal movementsJournal287The immediate effects of chorionic villus sampling on fetal movementsBoogert,A.Manhigh,A.Visser,G.H.A.1987scopeNot in File137139American Journal of Obstetrics and Gynaecology157American Journal of Obstetrics and Gynaecology1(Boogert, Manhigh, & Visser 1987) Patient noBefore Sampling (2)After Sampling (3)Difference (before-after) (4)Rank (5)Signed rank (6)1251879922427-35.5-5.53282535.55.541520-58-85201735.55.562324-11.5-1.572124-35.5-5.582022-23-39201911.51.510271981010 The differences between before and after sampling are calculated (colum 4) and these are then ranked by size irrespective of sign (column 5; zero values omitted). When two or more differences are identical each is allotted the point half way between the ranks they would fill if distinct, irrespective of the plus or minus sign. For instance, the differences of 1 (patient 6) and +1 (patient 9) fill ranks 1 and 2. As (1 + 2)/2 = 1.5, they are allotted rank 1.5. In column (6) the ranks are repeated for column (5), but to each is attached the sign of the difference from column (4). A useful check is that the sum of the ranks must add to n(n + 1)/2. In this case 10(10 + 1)/2 = 55. The numbers representing the positive ranks and the negative ranks in column (6) are added up separately and only the smaller of the two totals (irrespective of its sign) is used to obtain a P-value from tabulated critical values under the null hypothesis (Table 3). As with the Mann-Whitney U test rank totals greater then the tabulated critical value are non-significant at the 5% level. In this case the smaller of the two ranks is 23.5 and as this is larger than the number given for ten pairs in table 3 the result is not statistically significant. There is insufficient evidence to reject the null that the median difference in foetal movements before and after sampling is zero. We can conclude that we have little evidence that chorionic villus sampling alters the movement of the foetus. Table 3: Wilcoxon test on paired samples: 5% and 1% levels of P (taken from Swinscow and Campbell ADDIN REFMGR.CITE Swinscow2002209Statistics at square oneBook, Whole209Statistics at square oneSwinscow,T.D.VCampbell,M.J.2002displaying dataStatisticsNot in File10LondonBMJ Books2(Swinscow & Campbell 2002)) Number of pairs5% level1% level 7 8 9 10 11 12 13 14 15 16 2 2 6 8 11 14 17 21 25 30 0 0 2 3 5 7 10 13 16 19 Note, perhaps contrary to intuition, that the Wilcoxon test, although a test based on the ranks of the data values, may give a different value if the data are transformed, say by taking logarithms. Thus it may be worth plotting the distribution of the differences for a number of transformations to see if they make the distribution appear more symmetrical. Summary Outlined above are some non-parametric methods for comparing two groups of continuous data when the assumptions underlying the t-test (paired and unpaired) are not met. However, as stated in the previous tutorial statistical significance does not necessarily mean the result obtained is clinically significant or of any practical importance. A P value will only indicate how likely the results obtained are when the null hypothesis is true. Much more information, such as whether the result is likely to be of clinical importance can be gained by calculating a confidence interval, as this a range of plausible values for the estimated quantity. Details of how to do this can be found in Statistics with Confidence ADDIN REFMGR.CITE Altman2000289Statistics with ConfidenceBook, Whole289Statistics with ConfidenceAltman,D.G.Machin,D.Bryant,T.Gardner,M.J.2000StatisticsNot in File2ndLondonBMJ Books2(Altman et al. 2000)  ADDIN REFMGR.REFLIST References Altman, D. G., Machin, D., Bryant, T., & Gardner, M. J. 2000, Statistics with Confidence, 2nd edn, BMJ Books, London. Boogert, A., Manhigh, A., & Visser, G. H. A. 1987, "The immediate effects of chorionic villus sampling on fetal movements", American Journal of Obstetrics and Gynaecology, vol. 157, pp. 137-139. Freeman, J. V. & Julious, S. A. 2006, "Basic tests for continuous Normally distributed data", Scope, vol. 15, no. 3. Morrell, C. J., Walters, S. J., Dixon, S., Collins, K., Brereton, L. M. L., Peters, J., & Brooker, C. G. D. 1998, "Cost effectiveness of community leg ulcer clinic: randomised controlled trial", British Medical Journal, vol. 316, pp. 1487-1491. !RSTewxy 9 N  3 @ A B C o w   !'(.6o ʿʷʫʫʜhYQh1\h/mh1hjm h#o&h#o&h]he1Whha;lhu*jha;lUh#o& hWhWhWhLhhW5hLhh_/5h_/hWh5 hjm5hWhW5hWh_/55STxyB C  !  #########j ######### # & 8@ \ 4Oogdf & 8$ @ \ x gdfgd#o&gd1gd_/gdWc    !EYZFGL  @[.6Xg!GWZ\ļļļhfhf5 hf5 hfhfhfh/mh#o&h1h$hzuRhu,h_/6]hu,h_/6hjm hu,h_/ h]h] hmh_/ h1]h_/?  #%&0ISx|~ cejk y "&(,.0 q!θh!mH sH hu*hu*mH sH  hOhu*hu*OJQJ^J hu*H*hu* hu*6jhu*hu*6Uhu*hu*6hfhfH*h$h}g0 hfH* h#o&H*hfhf5h#o&hf6&,048<@DHLPV\bhntvz|~##d+dd>dFf$ $Ifa$gdu* $Ifgdu* gdu* dgd#o&~dFf$ $Ifa$gdu*  ddFf Ff$ $Ifa$gdu*"%(+,-./012357ddFfFf2$ $Ifa$gdu*79<?BEHKLMNOPQRUWY\_behknopqdFfN$ $Ifa$gdu*qrstwy{~ddFfjFf$ $Ifa$gdu*ddFf$Ff $ $Ifa$gdu*   #&),/258<@dFf($ $Ifa$gdu*@DHLMPRUX[^adgjnrvz{|ddFf0/Ff+$ $Ifa$gdu*dFfL6Ff2$ $Ifa$gdu*        ddFf9$ $Ifa$gdu*      ! # & ) , / 2 5 8 9 : ; < = > ? B D G J M ddFf@Ffh=$ $Ifa$gdu*M P S V W X Y Z [ \ ] ^ a c f i l o r s t u v w x y z dFfD$ $Ifa$gdu*z { ~ ddFfK$ $Ifa$gdu*FfH ddFfRFf.O$ $Ifa$gdu* dFfJV$ $Ifa$gdu* ''''(####%BG&d$Ifgd gd gdu* gdu*FfY$ $Ifa$gdu*q!r!%%%%%''''''''(((($(%(&('()(())***?+`+a+b+c+d+e++ϸèèϝӕ}rnjc hmh_/h]hu*h}g0hmH sH hmH sH h}g0mH sH hfmH sH h mH sH j;eh@oh@oUh@o5CJaJ h@oh jl\h@oh@oUh@oh@o5CJaJh@oh h!h!mH sH ha;lmH sH hu*mH sH jha;lUmH sH %'((()(b+c+d+++--33AG&z#u #u#z#z#p#k#k#k #b# Tgdgdgd_/gd dgd ykd n$$Ifl0H3% t0%44 la +++,--h-----...5.R.S._.a......F//g0i000223313k333333333ϸϰ|u|n ha;l6] hf6] h6] hu*\jhfU\ hf\ ha;l\ hB:\ h\ h#o&\ h\hu,h#o&\hLh#o&6hu,h]6 hu,h] hu,h#o&h#o& hLh_/ h_/\hu,h_/\ hu,h_/h haUh_/+333344{7|77778N9V9X9 <'</<b<c<|<<<<z===,>>>>>uAvAAAAAA%BmBBCCCCCCCC=DHDjDlDDDǺϺǺǶܲخhu,h#o&5hLh#o&6hLh hu,h#o&h#o&hxbhu*jhu*hu*6Uhu*hu*6 hu*6hfha;lh hu*6]jha;l6U] ha;l6] h6]737777777 8##3 $$Ifa$gdgd 8888888E9393949494 $$Ifa$gdkdun$$Iflֈ #      064 la8888"8%8(8493344kdFo$$Iflֈ #      064 la $$Ifa$gd(8,818284878:8449334kd p$$Iflֈ #      064 la $$Ifa$gd:8<8@8D8E8G8J8444933kdp$$Iflֈ #      064 la $$Ifa$gdJ8M8P8R8U8V8X8444493kdq$$Iflֈ #      064 la $$Ifa$gdX8[8^8`8d8h8i8344449kdRr$$Iflֈ #      064 la $$Ifa$gdi8k8n8q8t8x8}8334444 $$Ifa$gd}8~888888E9393949494 $$Ifa$gdkds$$Iflֈ #      064 la8888888493344kds$$Iflֈ #      064 la $$Ifa$gd8888888449334kdt$$Iflֈ #      064 la $$Ifa$gd8888888444933kd^u$$Iflֈ #      064 la $$Ifa$gd88888844449kd!v$$Iflֈ #      064 la $$Ifa$gd888|;};>>AAAAAA#### ####`ufkdv$$IflF ,t8 t6    44 la $Ifgdu* gdu*gd$a$gd AAAAAAAAAAAAAAAAAAAABBB B BBB`````` $Ifgdu* $Ifgdu*BBBB"B#B$B%BCCCC$I````d##}#}#q#l#l#gd#o& dgd#o&gda;l gdu*fkdw$$IflF ,t8 t6    44 la $Ifgdu* DMEOEEEEF`FaF I I!I"I#I$I&I'I=I>II""qϲ%Ĵv:*1|ī]݁òB䉇q6!5!͇Z8:јrӱ‚R؏³\| w+31/X@u}x ZNI&Z6]w.#@/+/9݃h V5]߳[ѮFZ؃EPe[ ͘.y|l5ZFb||V)oTj[<IPy&G'WV)|qVϣq76`~6sV0Q~(0$qZ0>#U/};''6<ʮ85ͬ`_aht5r۲|7X+~ Aw\ǡ× >PY^,{4iև۱6-[̷hfŁIqˊ;9O\Wh5fU hǫ>!>beOa>`ADdeNRߪ_/gvrf菱k(QkL2 SsvkO#B$Ob8v<YG߄qՏ RZ|D4푾4.hihPL8Kд"Z|*V j3ʙ;1%ԺĴ~5>cbZK'#jZebZ1~2繮YyMx' 3/3xGcxg {LTW?0 #o@WVVa:5G%8uWWM]!βf~!*lqtTo14 ɮ)~naCW)ݾg]ƭ G3Pz8f<x^ mwY!{*Zx/kdN~~ "cv}spgG<>מ>˖ø:#/#'3<?+;31EPnE6O;h7}]A%?A+IDq>%FyKiV`{o'CmkxTUlO>&hi UR\PH/3^ FnyΤLi% V;ujZ%SjaZ ,Sٺ6ֆRVW%>ܾ곦(x[<-Kms)<*˷Q}AI1Q3|Yی3ƕJg1[Cy+vo3yKLxE;_{T{."}V4N%0)o[ָyW2J 4:yWqgAͱgqĮ Oʺ7AҔ[ח\ O~a|#ϲ\]u ׻,{qV[9;?W3y雾lW9.J)gfx ;8!i9sgMp8qU;dn53$NW߼V-78O!=O5JJ:O$$kvbX#^#ىNY#Z4śhPVZ4DzOo5n5.xi4nu@̴ f BR0ӾmigpVZNw0|YZ ZZقǗEI3vVG{̵hVZ^v!$yZZ>{~ߩ}ZtzBD։{>BH:ou/JK*tS֑ ͵nl~#`H&\΂ۣ=<sn?.|]=&`Ob38h$$If!vh55#v#v:Vl t%55/ $$If!vh5 5 5 5 5 5 #v #v :V l065 5 /  / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 / 4$$If!vh5 5 5 5 5 5 #v #v :V l065 5 /  4$$If!vh5t558#vt#v#v8:Vl t065t558/ / / / / $$If!vh5t558#vt#v#v8:Vl t065t558/  / / / / @@@ NormalCJ_HaJmH sH tH X@X _/ Heading 1$dxx@&5;KH \^JtH T@T _/ Heading 2$dxx@&5\]^JtH T@T _/ Heading 3$dxx@&56\^JtH DAD Default Paragraph FontRi@R  Table Normal4 l4a (k(No ListR>@R _/Titledxx@&5CJKH\^JaJ tH j@j  Table Grid7:V0HH  Balloon TextCJOJQJ^JaJPC !"&!"&!"&!"&!"&......../ //9PCZYYY YSTxyBC  !   "%(+,-./0123579<?BEHKLMNOPQRUWY\_behknopqrstwy{~   #&),/258<@DHLMPRUX[^adgjnrvz{| !#&),/2589:;<=>?BDGJMPSVWXYZ[\]^acfilorstuvwxyz{~'()b"c"d"""$$**....... //////////"/%/(/,/1/2/4/7/:/?GV\]^acfz{'()b"c"""$$**....... //////////"/%/(/,/1/2/4/7/:/?P~7q@ M z '(3 88(8:8J8X8i8}888888AB$Ic'+,-./0123456789:;=@ABCDEFGHIJKLMNOQc(oq')*+{..5u88`= @!@&@=@NCPCQQQQQQQQ4<qy2$7$5%=%%%'&'i'q')*R*[*\*b*********}...... 00]5f5g5m55555w88S9[9Y@_@@@@@@@@@ AAAA*A/AAARBYBBB5C8CRC' 00O2Q2\2{2J@@AABLCRC::::::::::  ./8#9::RC%%RCAuthorised User W h^`OJQJo(hHh^`OJQJ^Jo(hHohpp^p`OJQJo(hHh@ @ ^@ `OJQJo(hHh^`OJQJ^Jo(hHoh^`OJQJo(hHh^`OJQJo(hHh^`OJQJ^Jo(hHohPP^P`OJQJo(hH          %$1$!#o&u*_/}g0G,M5NzuRWe1Wxba;l/maxuWF&fOW]MjmLhq 'BK9B:@oJm   "%(+,-./0123579<?BEHKLMNOPQRUWY\_behknopqrstwy{~   #&),/258<@DHLMPRUX[^adgjnrvz{| !#&),/2589:;<=>?BDGJMPSVWXYZ[\]^acfilorstuvwxyz{~'(b"%....... //////////"/%/(/,/1/2/4/7/:/<Enabled>1</Enabled><ScanUnformatted>1</ScanUnformatted><ScanChanges>1</ScanChanges></InstantFormat>I<Databases><Libraries><item>all_references</item></Libraries></Databases>@%%N%%4Bʰ<@H@UnknownGz Times New Roman5Symbol3& z Arial5& zaTahoma?5 z Courier New;Wingdings"qh\:cF/cF  E9"z E9"z"6>4.C.C 2QH)?GBasic tests for continuous dataAuthorised UserAuthorised User Oh+'0 0< X d p | Basic tests for continuous dataasiAuthorised Useruthuth Normal.dotUAuthorised User9thMicrosoft Word 10.0@a(K@ (a@CW@t E9՜.+,0 hp  Authorised Organizationz".C{  Basic tests for continuous data Title  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~Root Entry FtData Tx1TableWordDocument4SummaryInformation(DocumentSummaryInformation8CompObjj  FMicrosoft Word Document MSWordDocWord.Document.89q