2022年3月16日 担当:ソン
When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts
出典:
BMC Med Res Methodol. 2017; 17: 162
著者: Janus Christian Jakobsen, Christian Gluud, Jørn Wetterslev, and Per Winkel
著者: Janus Christian Jakobsen, Christian Gluud, Jørn Wetterslev, and Per Winkel
- <論文の要約>
-
【Missing data mechanisms】
There are three types of missing data. Different kinds require different approaches.
• Missing completely at random(MCAR): the reason for missingness is purely random and not related to
observed and unobserved data.
• Missing at random(MAR): the reason for missingness is related to observed data but not unobserved
data.
• Missing not at random (MNAR): the reason for missingness depends on the values of missing variable.
【The planning stage of randomized clinical trial】
It is crucial to prevent the occurrence of missing data. The authors suggest performing the following prior
to the randomization:
• Stating all statistical analyses and publishing the statistical analysis plan.
• Defining key data items; noting and keeping track of the missingness during the trial.
• Outlining procedures for preventing missing key data items.
【The analysis stage of a randomized clinical trial】
• Broking down the analyses into a set of regression analyses.
• For the primary regression, only including an intervention indicator (e.g., experimental drug versus
placebo), the stratification variables (e.g., center, sex, age), and the baseline of the dependent variable(if it is continuous) as covariates.
【Methods to handle missing data】
(1) Deletion: Two kinds of deletions
• List-wise deletion (complete-case analysis): remove the entire row when there is missing data.
• Pair-wise deletion (available-case analysis): only remove rows with missing values in the directly
analyzing variables.
(2) Single imputation: Replace missing values with one particular data.
• Imputation using mean, mode, median
• Imputation using k-nearest neighbors methods
• Imputation using linear model, using classification and regression trees
(3) Multiple imputation: Missing values are replaced with multiple data value. Five questions of when
multiple imputation should be used
• Is it valid to ignore missing data (a rule of thumb: 5
• Is proportions of missing data too large (a rule of thumb: 40
• Is data only missing on the outcome
• Is the MCAR assumption plausible
• Is the MNAR assumption plausible If in doubt, best-worst and worst-best case sensitivity analyses should be used.
- <ジャーナルクラブでのディスカッション>
-
-
■Analyze examples of MCAR, MAR, and MNAR.
Many imputation methods assume MAR. Using statistical
methods (e.g., t-test) and/or visualizations(e.g., Aggregation plots, Mosaic plots) to check. Five steps of
multiple imputing a simple dataset (3 variables age, income, and gender):
• Step 1: imputed each variable
• Step 2: set imputed values of age back to missing
• Step 3: regress age on income and gender (e.g., linear regression) with all complete cases.
• Step 4: use the model from step 3 to predict missing values of age.
• Step 5: repeat steps 2-4 for income and gender, separately.
• Repeats steps 1-5 several cycles (usually 5 is enough).