Categorise

 

Menu location: Menu location: Data_Cleaning and Encoding_Categorise.

 

This function enables you to categorise any set of data into groups that you specify, for example ages into age groups.

 

Typically, a continuous variable might be divided into categories or groups. Take the IgM variable in the parametric sheet of the test workbook for example; this has 298 observations which you might want to summarise in ranges of values. In order to do this, simply select the Data_Grouping_Categorise menu item then select the IgM column of data. You are presented with different ways to group your data into bins (intervals) of counts:

 

Using the IgM example in quartiles:

category count
< 0.5 56
>= 0.5; < 0.7 67
>= 0.7; < 1 98
>= 1 77

 

Using the IgM example in 10 intervals of 0.5 from:

category count
< 0.5 56
>= 0.5; < 1 165
>= 1; < 1.5 54
>= 1.5; < 2 14
>= 2; < 2.5 6
>= 2.5; < 3 2
>= 3; < 3.5 0
>= 3.5; < 4 0
>= 4; < 4.5 0
>= 4.5 1

 

A quick look at the counts above shows a similar picture to that you would see from a histogram, namely that the data are not evenly spread into ranges of values, i.e. they are skewed. The text-based histogram will give you counts, but note that the bin values in a histogram are the mid-point of the bin and not the cut-off value between bins, i.e. they are the same as a user-defined bin cut-off values minus half of the step size.

 

Technical note

Two different options are presented for calculating quantiles for use as cut points in this categorisation function. The methods are described under the Quantiles page. Method 1 (default) corresponds to the default method used in Stata and Method 2 is equivalent to the alternative definition used in Stata.