Categorise
Menu location: Menu location: Data_Cleaning and Encoding_Categorise.
This function enables you to categorise any set of data into groups that you specify, for example ages into age groups.
Typically, a continuous variable might be divided into categories or groups. Take the IgM variable in the parametric sheet of the test workbook for example; this has 298 observations which you might want to summarise in ranges of values. In order to do this, simply select the Data_Grouping_Categorise menu item then select the IgM column of data. You are presented with different ways to group your data into bins (intervals) of counts:
- Quartiles: 4 bins (< lower quartile, lower quartile to median, median to upper quartile, >= upper quartile)
- Quintiles: 5 bins (< first quintile… >= fourth quintile)
- Deciles: 10 bins (<first decile… >=ninth decile)
- Age groups: one of four common groupings (<15, 15-19… five yearly bands to 85+; <15, 15-24… ten yearly bands to 85+; <1, 1-4… five yearly bands to 85+; <1, 1-4… ten yearly bands to 75+)
- User-defined: from minimum min, in k intervals of equal size = step (<min + 1 * step, >= min + 1 * step to < min + 2 * step… in k intervals to >= min + k * step)
Using the IgM example in quartiles:
category | count |
< 0.5 | 56 |
>= 0.5; < 0.7 | 67 |
>= 0.7; < 1 | 98 |
>= 1 | 77 |
Using the IgM example in 10 intervals of 0.5 from:
category | count |
< 0.5 | 56 |
>= 0.5; < 1 | 165 |
>= 1; < 1.5 | 54 |
>= 1.5; < 2 | 14 |
>= 2; < 2.5 | 6 |
>= 2.5; < 3 | 2 |
>= 3; < 3.5 | 0 |
>= 3.5; < 4 | 0 |
>= 4; < 4.5 | 0 |
>= 4.5 | 1 |
A quick look at the counts above shows a similar picture to that you would see from a histogram, namely that the data are not evenly spread into ranges of values, i.e. they are skewed. The text-based histogram will give you counts, but note that the bin values in a histogram are the mid-point of the bin and not the cut-off value between bins, i.e. they are the same as a user-defined bin cut-off values minus half of the step size.
Technical note
Two different options are presented for calculating quantiles for use as cut points in this categorisation function. The methods are described under the Quantiles page. Method 1 (default) corresponds to the default method used in Stata and Method 2 is equivalent to the alternative definition used in Stata.