## Introduction

The famous scientist Galileo Galilei, who is known as the father of modern physics, once said “Doubt is the father of invention” (*1*). He emphasized that the most important factor triggering the development of science was doubt. It is obvious that every subject that is suspected and therefore investigated is more open to development and progress. The word uncertainty means doubt. In its broadest definition, measurement uncertainty means doubt about the validity of the measurement results (*2*).

Today, in economics, physics, chemistry and many other fields the concepts of doubt and uncertainty find a larger area day by day because every result of a measurement comes with the “uncertainty” arising from its own nature (*3*-*5*).

Measurements made in clinical laboratories play an important role in diagnosis, treatment and follow-up of diseases. However, it is known that when any test is repeated, even if all conditions are optimized, the probability of achieving exactly the same result is very low. In other words, the results of chemical reactions are actually components of a distribution. At this point, it would be a proper approach to talk about the measurement uncertainty of results.

According to the definition of the International Vocabulary of Metrology, measurement uncertainty (MU) is non-negative parameter characterizing the dispersion of the quantity values being attributed to a measurand, based on the information used (*6*). In simple terms, it is a quantitative indicator of the range in which the “real” result may actually be.

In laboratory measurements, there are many variables related to sampling, transport and storage, preparation of reagents, maintenance of devices, measurement method *etc.* In order to obtain reliable results, sources causing variation should be identified and minimized (*7*). The International Organization for Standardization (ISO) 15189 guide, which prepared for the standardization of medical laboratories, requires laboratories to have a defined method for establishing the MU of tests, determining performance criteria, and reviewing MU data regularly (*8*).

Generally, two approaches can be mentioned about the MU calculations. These are bottom-up and top-down approaches (*9*). The top-down approach prioritizes the use of quality control results for calculations. Random and systematic errors need to be resolved for this purpose. In the mathematical expression of these errors; internal quality control data is used for random errors and external quality assessment (EQA) can be used for systematic errors. This approach makes it very easy to adapt applications to clinical laboratories. According to the bottom-up approach, all uncertainty components are rigorously detected. All of the above-mentioned variables are examined one by one and included in the calculation in proportion to their contribution to uncertainty. That process is considered laborious for the clinical laboratories. Moreover, such complex calculations may not be necessary, given that the effects of any uncertainty components are reflected in the results of the internal control sample studied daily just like a patient sample. Since similar results were found in a study in which uncertainty calculations were made according to both approaches, it was recommended to use the top-down approach as a simpler method (*10*).

There is no restriction about which method to use. Unfortunately, there is no internationally recognized guide to compare whether the results are appropriate or not. Therefore it is important to publish the calculated MU of tests for comparison between laboratories.

When the literature was researched, for drug concentration tests no study was found using same/similar method. In this respect, we believe that our study will shed light on future research. The purpose of this study is to present an example of calculating MU that is practical to use in clinical routine and to discuss how the results can be evaluated.

## Materials and methods

In our study, it was planned to calculate the MU of immunoassay tests like fertility hormones, drug concentration tests, cardiac markers, thyroid function tests and tumour markers using the internal quality control results and EQA data of our laboratory between January 01, 2018 and December 31, 2018. Permission was obtained from the Ethics Committee of Aydın Adnan Menderes University Faculty of Medicine Non-Invasive Clinical Research with protocol number 2020/09. Table 1 shows the tests included to the study and their measurement methods with ranges.

##### Table 1

Parameters of drug concentration tests were performed on the Abbott Architect c8000 (Abbott Diagnostics, Abbott Park, USA), and the remaining hormones and biomarkers were studied on the Abbott Architect i2000 SR hormone analyser using the company’s original kits and calibrators. Internal quality control studies were carried out daily and external quality control studies were performed once a month. The EQA goal of our laboratory is to prevent the z-score from rising above 3, and try to keep it below 2 as much as possible. The z-score is a measure of the laboratory’s bias relative to comparator group. A z-score ≤ 2 is acceptable; if it is 2-3, a warning and an investigation are recommended; a z-score is ≥ 3 is unacceptable and remedial action usually is required. The EQA data of every parameter was compared with its peer group except drug concentration tests. Because of the number of equivalent group laboratories was low (< 20), the calculations for drugs were made according to the mode group or overall results.

## Calculation steps

The top-down approach was chosen. For this purpose the steps of the Nordtest guide were followed (*11*):

Specify measurand, range, and target MU.

Quantify standard uncertainty component for the within-laboratory reproducibility (u(Rw)).

Quantify the uncertainty component associated with method and laboratory bias (u(bias)).

Convert components to standard uncertainty (u(x)).

Calculate combined standard uncertainty (u

_{c}).Calculate expanded uncertainty (U=k x u

_{c}. where k = 2 (95% confidence interval)).

The standard uncertainty component for the within-laboratory reproducibility (u(Rw)) is derived from internal control sample results collected long enough to cover all worst-case scenarios and studied just like a patient sample. The number of results should ideally be more than 60 and cover a time period of at least one year to reflect all variations such as different stock solutions, new batches of critical reagents, recalibrations of equipment (*11*).

Internal quality control results were examined separately according to the lots of each level. Coefficient of variation (CV%) is the standard deviation expressed as a percentage of the mean. The CV is useful because it is independent of concentration. Coefficient of variation is calculated from standard deviation (SD) and mean of internal control sample results (CV% = (SD/Mean)x100). Then, the standard uncertainty component for the within-laboratory reproducibility is calculated as u(Rw) = Rw/2 where Rw = (Σ(CV%)^{2}/N))^{1/2} and N is the number of calculated CV%.

For method and laboratory bias (u(bias)), two bias components have to be estimated: 1) the root mean square of the individual bias values
(RMS_{bias}), and 2) the mean of the standard uncertainty of the assigned values (u(Cref)). There are three ways to calculate u(bias) component, namely; the use of Certified Reference Material, participation in EQA and performing recovery tests. Use of EQA was the most practical and cost effective way. However, for reference materials a mean value over time is used and for each EQA a single laboratory result is used. Therefore the estimated RMS_{bias} from EQA will usually be higher. In order to have a reasonably clear picture of the bias from EQA results, a laboratory should participate at least 6 times within a reasonable time interval (*11*). The calculations are as follows: RMS_{bias} = (Σ(bias)^{2}/N))^{1/2}, where N = number of EQA rounds; u(Cref) = Σ(CV_{EQA}/N_{Lab}^{1/2})^{2}/N, where N_{Lab} = number of participating laboratories in each round and CV_{EQA} is the CV% of each EQA round.

The standard uncertainty (u(x)) is calculated as follows: u(bias) = (RMS_{bias}^{2} + u(Cref)^{2})^{1/2}. Combined standard uncertainty (u_{c}) equals u_{c} = (u(Rw)^{2} + u(bias)^{2})^{1/2}, while expanded uncertainty U = 2xu_{c}.

Since there are no universally accepted limits that determine the measurement uncertainty of the tests, it was planned to make a comparison according to the allowable total error and biological variation data, like many other articles (*12*-*14*).

## Results

In order to make these complex calculations more descriptive, the example of calculation for a parameter is demonstrated. Table 2 shows the calculations for u(Rw) component of phenytoin (PHNY). Table 3 presents the one-year EQA results for PHNY. CV_{EQA} and Bias values in that table are provided by the EQA program server.

##### Table 2

##### Table 3

According to the Nordtest approach, calculation steps for all parameters are shown in Table 4. For each test, u(bias) value was higher than u(Rw) value. The volume of u(Cref) affected the MU more minimally than the others. Prolactin had the lowest MU (8.3%) and CA 19-9 had the highest MU (28%).

##### Table 4

Table 5 presents calculated MU results of tests and allowable performance limits of various guidelines (*15*-*20*). According to that table, 12 of the 26 parameters (follicle stimulating hormone (FSH), luteinizing hormone (LH), progesterone (PROG), Prolactin (PROL), testosterone (TEST), creatine kinase myocardial band (CK-MB), myoglobin (MYG), troponin-I (TnI), thyroid stimulating hormone (TSH), alpha-fetoprotein (AFP), cancer antigen 125 (CA 125), free prostate specific antigen (fPSA)) have uncertainty values below the limits recommended by all of the mentioned guidelines. Excluding RCPA recommendations, which seem to offer a narrower range than other guidelines, human chorionic gonadotropin (βHCG), digoxin (DIG), phenobarbital (PHNO), valproic acid (VPA), cancer antigen 15-3 (CA 15-3), cancer antigen 19-9 (CA 19-9), carcinoembryonic antigen (CEA) and total prostate specific antigen (tPSA) also met all of the recommendations. Two parameters (dehydroepiandrosterone sulfate (DHEAS) and PHNY) are labelled as ‘Failed’ according to all lists.

##### Table 5

## Discussion

Determining MU from EQA data is practical and cost effective but can greatly increase the MU. Similarly, the RMS_{bias} values which calculated from the EQA results, were the most important parameter that influences the MU. Table 4 demonstrates that impact apparently. For example about PHNY, if we ignore the July EQA evaluation, which has the highest bias, the MU of PHNY would have decreased from 27% to 24%, and it would have been below the recommendation 25% determined by CLIA. From this point of view, it is seen that only one mistake made during the PT programs can affect the whole results. It is obvious that detailed arrangements are needed in this regard. Secondly it can be recommended that if RMS_{bias} or u(Cref) values are higher than a certain value, these components may not be suitable for such calculations.

The ‘Failed’ test, PHNY, is evaluated intently. There was no study that could be observed for result comparison. The largest share of the uncertainty components was the u(bias) value. However, according to Table 3 even in July, the EQA period with the highest bias for PHNY, the z-score was 2.99. Even when the internal and external quality results of a parameter seem within limits, the MU of that test may be higher than desired. For this reason, MU results should be followed closely, just like internal and external quality control results.

When our other ‘Failed’ test, DHEAS, was examined, it was seen that the uncertainty value found was not far beyond the limits specified by the guidelines. The MU of the parameter was found to be 14% for our laboratory. The recommendations of Westgard, Ricos and RCPA are 13.8%, 11.5%, 12%, respectively. However, to improve the parameter, all PT data were re-examined for u(bias), which has the dominant factor on the increasing of MU, it was noticed that none of the 12 participants to EQA had a z-score above two. At this point, it was thought that MU calculations could guide us in recognizing overlooked problems.

In a study conducted by Ayyıldız, the MU for DHEAS was reported as 15.5% (*21*). When that report was examined, it was seen that the same brand and model device was used as in our study. The presence of similar results for DHEAS at the same device may suggest that the method or device performance should be improved. It would be useful to compare method performances by performing similar studies with other brand devices. Similarly, if high MU data is obtained for DHEAS, the target value may tend to increase.

For cardiac markers; CK-MB, MYG and TnI, it can be said that the limits of all lists are complied with and no corrective action is required. The same is true for the TSH parameter. However, the uncertainties of fT3 (18%) and fT4 (13%) tests exceeded the limits of some guidelines, although they were appropriate according to others. In such cases, the laboratory specialist should interpret results based on his/her experiences. In a study by Çubukçu et al., the MU of fT3 and fT4 was calculated according to a similar approach and reported as 15.92% and 24.04%, respectively (*22*).

It has been observed that three tumour markers (AFP, CA 125 and fPSA) are appropriate for all lists, and the remaining parameters do not exceed the limits of guidelines, except for the RCPA recommendations.

The high permissible quality limits recommended by the guidelines for CA 19-9 are noteworthy. It was recommended by Westgard, Ricos and EFLM 46.3%, 52.6% and 57.4%, respectively. The variation value, up to half of the given result, should lead laboratory experts to be more attentive about results of this parameter. Particular attention should be paid to the comparison of results produced by different laboratories, and clinicians should be warned about that when necessary.

There is no standardized formula for clinical laboratories to estimate MU of tests (*12*). Furthermore there is no standardized limits to compare results. Thus every laboratory should define its own formulation system and evaluation method. And so errors can be noticed by renewing calculations at certain periods.

Measurement uncertainty is among the standards as a criterion in quality assessment. The fact remains that, such calculations will not become prevalent unless they are practically done in daily routine applications. Hence MU models that can be calculated using the available data without the need for a separate budget are important and valuable.

The limitation of this study is the lack of possibility to calculate EQA data according to our peer group for drug concentration tests. Although we predict that this situation will increase the uncertainty value of the parameters calculated from the EQA data, we think that special attention should be paid to the measurement uncertainty calculations of these tests, especially since they are analytes that must be kept in a narrow range in plasma and must be precisely adjusted.

In conclusion, our study showed that there can be a wide range of recommendations for a parameter and the importance of choosing which guideline to follow. A test that is appropriate according to one guideline may give very poor results according to another.

It is recommended that the concept of measurement uncertainty, which plays an important role in the total quality performance of the laboratory, should be followed up by the clinical laboratory experts at certain time intervals and should be increased the awareness of clinicians about the subject.