Introduction
Uncertainty is a parameter associated with the result of a measurement that characterizes the dispersion of the values that could reasonably be attributed to the measurand (1). By quantifying the variation in the results, both the clinical laboratory performing the measurements and the physician receiving the results can have an objective estimate of the quality of the results (2).
The measurement uncertainty matters in laboratory medicine to define the test suitability, to verify quality of in vitro diagnostics products, to provide evidence of unpredictable bias and to demonstrate the test clinical suitability (3). In addition, clinical laboratories looking forward the accreditation under the ISO 15189 standard, shall determine measurement uncertainty for each measurement procedure, and define and regularly check their performance requirements concerning uncertainty (4). The International Organisation for Standardisation (ISO) standard 15189 does not suggest any particular approach for determining measurement uncertainty (MU) (5). It states, “The laboratory shall determine measurement uncertainty for each measurement procedure…”, thus allowing significant flexibility on how to determine it (4, 6).
The traditional methods for estimating the MU are described in the Guide to the expression of uncertainty in measurement (GUM). The main problems with the GUM approach for medical laboratory personnel is its reliance on complex statistical procedures and the fact that some error sources require derivative functions, which are not always estimable (5).
The “bottom-up” approach aims to estimate the individual contribution of every step of the process to the overall uncertainty (7). Briefly, it is based on all conceivable sources of uncertainty that must be systematically evaluated, and demands a clear description of what is being measured, including the relationship between the quantity and the parameters upon which it depends. Then the identified uncertainties are combined to generate a combined uncertainty of the result using statistical propagation rules (8). This exhaustive approach can be time-consuming to apply to laboratory medicine tests in terms of designing and performing experiments to provide additional data for the estimation. It does, however, enable the analyst to identify critical stages in the method and is useful for method optimization or troubleshooting during development (7).
The “top-down” approach directly estimates the measurement uncertainty typically by evaluating quality control (QC) data or method verification experiment data (9). The “top-down” approach is more practical and cost-effective, can be updated as further data becomes available through results from routine internal quality control (IQC) and proficiency tests (PT). More importantly, no statistically significant differences have been found between the uncertainty values obtained by either approach (7, 9). Although a study with practical and detailed examples of all the two approaches is hard to find (9).
With the “top-down” approach, the MU should include both the imprecision and bias component if the latter is considered significant. Uncertainties arising from random and systematic effects are treated alike. Through the application of the uncertainty propagation principles, the uncertainty contributions are then summed up to yield the so-called combined standard uncertainty (10-13). Two approaches can be used to assess bias: it can be based on certified reference materials or the results from quality control material procedures, such as PT (14). Recently, other kind of inter-laboratory proficiency testing scheme data was proposed to estimate the uncertainty related to the bias: the inter-laboratory internal quality control scheme (IQCS) (8).
The principal guidelines from various bodies (Nordtest, Eurolab and Cofrac) all propose different approaches for calculating measurement uncertainty. Handbook for calculation of measurement uncertainty in environmental laboratories (Nordtest) calculated measurement uncertainty on the basis of the within-laboratory reproducibility and the uncertainty of laboratory bias, which is estimated from certified reference material, inter-laboratory comparisons, or recoveries (15, 16). European Federation of National Associations of Measurement, Testing, and Analytical Laboratories (Eurolab) based the measurement uncertainty calculus on the dispersion of the relative difference of the results given by a laboratory on different PT schemes (11,16))). French accreditation body (Cofrac) suggests a different method based on data from combined data from IQC and calibration uncertainty (17, 18).
Despite the large amount of data available on MU, information about the practicality of the formulas in laboratory routines is scarce. In this context, it is important to assess their feasibility with the aim of being able to select one that will be reliable and adequate for each laboratory method.
As the uncertainties of the preanalytical phase are not established enough for laboratory medicine tests to apply the “bottom-up” approach, we selected some clinical chemistry tests to serve as examples to compare different practical “top-down” approaches for estimating MU, considering the imprecision and bias of the methods as components with similar statistical properties. The purpose of this study is to compare three different top-down approaches for the estimation of the MU and to suggest which of these approaches could be the most suitable choice for routine use in clinical laboratories.
Materials and methods
This is a retrospective diagnostic accuracy study developed at the Institute of Clinical Chemistry and Biochemistry of the University Medical Centre Ljubljana, Ljubljana, Slovenia.
The MU was established for the four following laboratory tests to evaluate and compare different manners of estimation: creatinine, alkaline phosphatase (ALP) (Advia 1800, Siemens, Tarrytown, USA), testosterone (Cobas e 411, Roche Diagnostics, Mannheim, Germany), and cancer antigen (CA) 19-9 (Architect i1000, Abbott Diagnostics, Abbott Park, USA). These tests are typical of enzymes, tumour markers, biochemistry and hormones laboratory tests groups.
The allowable total error and the permissible uncertainty were considered to represent the uncertainty target (5, 19).
The permissible uncertainty estimate was based on a non-linear relationship between biological and analytical variation, as proposed by Haeckel et al. (19). The calculation steps were performed using the reference limits (RL) for the adult males in our laboratory (creatinine 44-97 µmol/L, ALP < 128 U/L, CA 19-9 < 37 kU/L, testosterone 8.8-30.6 nmol/L). In cases where a lower RL was unknown, it was set at 15% of the upper RL (19).
The well-established allowable total error was estimated according to Westgard, considering the desirable goals based on biological variation with a 95% confidence level (20-22).
Imprecision and bias uncertainty was considered components of MU, and they are represented as the square roots of the variances in their respective estimators. They were then combined to produce a bias uncertainty estimate. The uncertainty estimates were performed and presented as relative uncertainties (%), which permits their comparison and application over a range of values.
Imprecision (CVWL)
Internal quality control data were collected from January 2016 to July 2018. The control materials for ALP and creatinine were from Biorad (Hercules, USA), for CA 19-9 from Technopath (Tipperary, Ireland) and for testosterone from Roche Diagnostics (Mannheim, Germany). The intermediate precision was determined as the long term, within-laboratory coefficient of variation for each concentration level for at least 235 IQC results. The arithmetic average of the within-laboratory coefficient of variation found for each level was taken as the imprecision (CVWL).
Bias
The bias calculation was performed from three data sources: certified reference calibrators (CRC), PT, and inter-laboratory internal quality control scheme (IQCS) (8, 23).
Bias from certified reference calibrators
Commercial calibrators different from those used for calibration of the test, were used as CRCs. According to the manufacturers, calibrator’s assigned values and respective expanded (k = 2) uncertainties were: ALP_2c calibrator traceable to IFCC (530 ± 4 U/L) and CrRE_2c calibrator traceable to IDMS Reference Method/NIST SRM 967 (724.9 ± 11.6 µmol/L) from Siemens (USA); CA 19-9 calibrators traceable to International Reference Standard (30 ± 0.51 kU/L, 250 ± 2.04 kU/L, 1200 ± 19.86 kU/L) from Abbott (USA), testosterone calibrators traceable to ID-GC/MS (1.32 ± 0.04 nmol/L, 43.03 ± 1.31 nmol/L) from Roche (Germany).
The measurement of one (creatinine and ALP), two (testosterone), or three (CA 19-9) CRCs were performed in 5 to 17 different analytical series, and the results were used to estimate the bias through equation (Eq.) 1 (15):
When more than one CRC was used, the root mean square of the individual bias values (RMSbias) was calculated according to Eq. 2 (15), where n is the number of CRCs used.
Bias from proficiency tests
The external quality control data were collected from February 2013 to May 2018. Data for ALP, creatinine and CA 19-9 were obtained from Instand (Düsseldorf, Germany) and for testosterone from Labquality (Helsinki, Finland) proficiency test.
The bias from PT was calculated using only satisfactory PT results, and results not complying with established PT criteria (i.e., results with z-scores >2 or <−2) were discarded. For each analyte, the results from 13 PT rounds were used for the calculation, with each one involving at least 10 participating laboratories. The bias from each round was calculated by considering the target value obtained from the peer group as the expected value (via Eq. 1), while the RMSbias was calculated using Eq. 2, where n is the number of PT rounds in this case.
Bias from IQCS (Unity program)
The bias was calculated using six IQCS peer-comparison rounds for the creatinine and ALP analytes, each one including at least 49 participating laboratories. The bias from each peer-comparison round was calculated by considering the average value of the peer group as the expected value (again with Eq. 1), while the RMSbias was calculated according to Eq. 2, where n is now the number of IQCS peer-comparison rounds.
Bias uncertainty (b)
Estimating the bias uncertainty, u(Bias), was achieved through three formulas:
The Nordtest approach using the bias from PT, CRC, and IQCS (15).
The Eurolab approach using the bias from CRC and PT (11). Bias uncertainty was not calculated using IQCS data due to absence of duplicate results as required.
The Cofrac approach using the bias from PT and IQCS (17). Bias uncertainty was not calculated using CRC data because the Cofrac MU formula does not include it on the uncertainty estimate.
Nordtest approach
The uncertainty of proficiency test, u(PT), for each analyte was calculated through Eq. 3.
where CVPT is the CV of each PT round and nlab is the number of participating labs in each round, while n is the number of PT rounds.The u(BiasPT) was calculated as shown in Eq. 4.
The uncertainty of CRC, u(CRC), for each analyte was calculated as shown in Eq. 5:
where CVCRC is the CV among the CRC measurements for each analytical series, nm is the number of CRC measurements, and n is the number of CRCs used.The u(BiasCRC) was calculated as shown in Eq 6:
where u(Calman) is the calibrator expanded uncertainty provided by the manufacturer divided by the coverage factor k, which was considered to be 2 in our study (11, 15).The uncertainty of IQCS, u(IQCS), was calculated as shown in Eq. 7:
where CVPCIQCS is the CV of each monthly peer-comparison bias from the IQCS report, nlab is the number of participating laboratories in the IQCS peer-comparison reports, and n is the number of IQCS peer-comparisons.The u(BiasPCIQCS) was calculated as shown in Eq. 8,
Eurolab approach
According to the Eurolab Technical Report the bias contribution to measurement uncertainty is obtained from the mean deviation, the uncertainty of the target value, and the imprecision of the mean value of the replicated measurements performed in the bias investigation, as shown in Eq. 9 (5):
where CVrepPT is the CV among the replicated PT measurements and nrepPT is the number of replications.This formula can also be applied using u(PCIQCS), the CV among the replications of IQCS measurements (CVrepIQCS), and the number of replications for the IQC (nrepIQCS) instead of u(PT), CVrepPT, and nrepPT, respectively. Eurolab and Nordtest have the same approach for calculating the uncertainty of the CRC bias (11, 15).
Cofrac approach
According to the Cofrac SH GTA 14 document the uncertainty can be evaluated based on the external evaluations’ uncertainty, which is obtained from the deviation of the uniform distribution law (divide the half-range by the square root of 3) and the coefficient of variation for the bias, as shown in Eq. 10 (17).
where CVBias is the CV of the averaged biases from different PT rounds, IQCS comparisons, or CRC materials.The estimate of uncertainty using CRCs considers just the calibrator uncertainty provided by the manufacturer. As the bias component is not considered in this uncertainty estimate, this approach was not applied in our study.
Results
The results for the bias, bias uncertainty, and imprecision for the selected parameters are presented in Table 1.
Table 1
Table 2 presents the MUs obtained through Nordtest, Eurolab, and Cofrac formulas, which incorporate biases achieved through different means. The same table also shows the permissible uncertainty, the analytical total error, and the allowable total error for each test.
Table 2
Discussion
There is no agreement on how to measure the MU in the clinical laboratory; therefore, laboratory professionals have to decide what data and which formula should be used. Our study showed that calculations using PT data generally gave higher values of bias, bias uncertainty, and expanded uncertainty than using IQCS or CRCs data. Based on our results, it was also clear that the choice of a formula greatly influenced the bias uncertainty and, consequently, the expanded uncertainty. The Cofrac formula gave higher MU results than the Eurolab and Nordtest formulas with the use of either PT or IQCS data, as previously reported (18). The reason for such a result might be the use of CV of averaged biases of different rounds of PT or IQCS rather than their uncertainty.
Herein, the MU was not calculated by the IQCS Eurolab approach because it requires a coefficient of variation for the control material results. Indeed, it has not been a common laboratory practice to perform control material analysis in duplicate. Therefore, we considered a disadvantage of the Eurolab approach. Additionally, the MU was also not calculated by the CRC Cofrac approach, since the formula does not include the bias uncertainty when using CRC data. In this context, all results of the present study were based on the imprecision and the bias components of the analytical methods.
Our results showed that bias and its uncertainty contributed similarly to the overall MU, corroborating partly the study of Padoan et al. (24). However, it has been proposed that the imprecision component of uncertainty includes some bias effects and exerts a greater influence on the expanded uncertainty than the bias (19, 23). Other authors even suggest that harmonization of methods could minimize the bias and, essentially, make both total error and MU a result of imprecision (5, 6). For clinical purposes, it has been suggested that the appropriate choice for estimating MU of laboratory results is influenced by the intended use (25). For example, when comparing two consecutive results of the same patient over a short time interval, the imprecision of the analytical method is considered the most relevant MU component; while comparing results with a decision limit or a reference interval, the bias predominates (23, 25). However, it should be noted that is not suitable to have different uncertainties for a single parameter used in two situations, which strengthens the importance of using both components, bias and imprecision, to calculate the MU.
How to calculate the bias component of uncertainty is still a matter of debate. The observed differences between the bias results depended on the source of data (PT, IQCS or CRC) used for calculation. However, calculations using the PT and IQCS data do not take into account the uncertainty accumulated in upper levels of the metrological traceability chain (23). Moreover, using data from PT to calculate the bias incurs the risk of overestimating the bias component, probably because some random variation is present in the PT results since tests are usually performed only a few times per year (23). In our study, the highest bias obtained using PT data, which is in agreement with the results reported by Rigo-Bonnin et al. may partly reflect that CRCs data were measured values, while PT values were assigned from the participants’ results (8). On the other hand, Ceriotti reported higher bias values using IQCS data (23).
However, PT or IQCS data is considered advantageous because of using control materials similar to routine samples (11, 18). Additionally, the IQCS management programs, which perform peer-comparisons of IQC results, can provide automatically calculations in a very efficient manner, which was here considered the most practical method. Taking into account that IQCS perform analyses more frequently, the bias calculated using IQCS data can also be considered more reliable than the bias obtained from different rounds of a PT. Furthermore, compared to CRC data, the use of IQCS perfectly represents the variability of analytical conditions in the laboratory.
Regarding the bias from CRCs, Theodorou et al. considered 2.7% of uncertainty of a certified reference material sufficiently small to be ignored in the expanded uncertainty (10). In our study, the uncertainty of the CRCs was lower than 2.7% and, consequently, produced the lowest MUs. The Eurolab and Nordtest approaches gave identical results, which was not surprising as they use the same formula to calculate uncertainty using CRC data. It is also important to note that the calibrators’ uncertainties have not been published in their package inserts. Therefore, we had to obtain these values from manufacturers, which is not always easy (23).
In addition to different calculations, the interpretation of MU can also be performed using different specifications of analytical quality, i.e., permissible uncertainty, allowable TE, CLIA, RiliBäk, and other PT providers (26). The permissible uncertainty limits related to biological variation have often been preferred because of their scientific basis, which can be applied to all methods and it seems to be more rigid than allowable TE (5, 19, 27). According to Qin et al. a relatively high percent of laboratories may not be able to remain within the permissible limits for immunoassays (28). In fact, this prediction was true in our study, our results indicated that for immunoassays it is not easy to meet both permissible uncertainty and allowable TE.
In summary, the Cofrac approach tended to overestimate the MU, while the Eurolab approach required additional measurements (duplicates) to obtain uncertainty data. Based on our study, the Nordtest approach using bias from the IQCS can be considered the most practical approach for estimating the MU.