Introduction
The main objective of clinical laboratory is to provide clear, reliable and useful information for clinical decision-making. Current healthcare systems imply performing laboratory tests in different locations, so standardization among laboratories become one of the cornerstones of the quality patient‘s care. Standardization can be defined as the ability to obtain interchangeable results (within certain analytical quality uncertainty) in order to achieve the same medical decision, regardless of the analytical procedure (method, traceability and instrument), measurement units and reference intervals.
The standardization should be based on six basic pillars, which include in vitro diagnostic companies, reference materials, reference methods, reference laboratories, medical laboratories and external quality assessment (EQA) organizations (1). Recently, Greaves noted that EQA is not just a pillar but the central support for on-going harmonization (2). Discordance in results between laboratories and methods should become a practice no longer accepted.
It is widely accepted that the best strategy to organize an EQA scheme is to use fresh frozen commutable control samples with values assigned by reference laboratories using reference methods, which can be found on www.harmonization.net (3, 4).
Spanish Society of Laboratory Medicine (SEQCML) is a non-profit scientific organization that has been providing EQA schemes in Spain since 1980 by using stabilized control materials. Since 2013 a category 1 program has been organized for basic biochemistry analytes. According to Miller et al. this kind of program distributes commutable control materials with reference-measurement procedure (RMP) assigned values and replicate samples in surveys are tested (3). Accuracy of individual laboratories is assessed by comparison with the RMP, while reproducibility is checked both intra- and inter-laboratory, and standardization is assessed by comparison of measurement procedure calibration traceability with RMP. Two initial surveys were performed in 2013 and 2014, as preliminary experiences and regular annual surveys have been organized since 2015. For a proper assessment of bias, having adequate information of measurement’s traceability is therefore a crucial point (5, 6).
Another important aspect to consider is the analytical performance specification (APS) or acceptability limits selected for the evaluation of the derived results. When APS are based on biological variation (BV), it is highly recommended to use the gradual classification of APS according to its strictness: optimal, desirable and minimal (7). It should be noted that the APS grade could be selected according to the limitations of the current state of the art, being defined as the performance achieved by about 80% of laboratories. According to this criterion, in this study the minimal BV-based APS grade was selected for electrolytes evaluation, while desirable BV APS were chosen for enzymes and substrates.
In this regard, a performance worse than the minimum APS should alert the laboratory that its results could be at risk and clinical decision-making might be detrimentally affected. Likewise, a performance reaching the minimal grade suggest that further improvement may be beneficial for patients (8, 9).
The aim of this work is to evaluate the results obtained from two years category 1 EQA program, 2015 and 2016 surveys, performed in our country and to assess the impact of applying this kind of EQA program over the analytical standardization. Evaluation is based on the inter-laboratory imprecision and the bias of the peer group means compared with the reference method values.
Materials and methods
Commutable control materials were purchased from MCA laboratory (Queen Beatrix Hospital, Winterswijk, The Netherlands) by means of the Stichting Kwaliteitsbewaking Medische Laboratorium Diagnostiek (SKML). According to Cobbaert et al. controls had been prepared from fresh anonymized left-over sera of routine laboratory with exclusion of lipemic, icteric, positive hepatitis B surface antigen (HBsAG), human immunodeficiency virus (HIV) and hepatitis C virus (HCV) samples, and stored frozen at – 84 ºC in aliquots. Pathological concentration ranges were created by adequately mixing pools and by spiking with minerals, recombinant human enzymes and human albumin (10). Commutability had been verified by SKML, as explained by Baadenhuijsen et al. and Jansen et al. (11, 12). Throughout the years commutability has been monitored by including a native, single donation spy-sample (10, 12).
Six vials of fresh frozen human serum pools at different concentrations were distributed once per year in a single express shipment at – 80 ºC and delivered within 24 hours to laboratories all over Spain. Different lots at different concentrations were provided for each of the two surveys. Participant laboratories were requested to maintain samples at – 20 ºC until analysis, which had to be performed within the following 14 days. Each vial had to be analysed in duplicate, one vial per day, for 6 consecutive days whenever possible. Results were registered on the SEQCML-EQA website, in order to be either individually and globally evaluated.
A preliminary 2013 survey was carried out in 19 laboratories and was addressed to ascertain whether the logistics of managing a non-stabilized set of control materials was operative in our country. No incidents were observed with temperature maintenance during the time between deliveries of control materials from the provider to the laboratory analysis.
Another point of interest of this preliminary survey was to explore whether laboratories could adequately inform about their analytical traceability to standards. Important difficulties were perceived that impelled holding a meeting between EQAs organization and providers, claiming for clear and complete information on calibrators’ traceability.
In 2014 first survey was performed, as part of a pilot European study (INPUTs) (Italy, The Netherlands, Portugal, Spain and The United Kingdom), with a total of 20 laboratories participants and whose results has been already published (12, 13). Only about 45% of participants were able to correctly inform about its traceability, so results are not shown in this study. This survey was then considered as a pilot to identify the problems that could impact on the EQA participation and further interpretation of results. For both surveys as well as for those performed in 2015 and 2016, same sample management protocol was applied.
The 2015 and 2016 surveys were exclusively run in Spain and included 17 analytes. The number of registered participants was 93 and 105, respectively. The target values of distributed control materials were assigned by the reference methods and laboratories (Table 1).
Table 1
Analytes | Reference method | Reference laboratory |
---|---|---|
Electrolytes | ||
Calcium | Atomic Absorption Spectrometry | INSTAND eV. Düsseldorf, Germany |
Chloride | ICP-IDMS | |
Magnesium | ||
Potassium | ||
Sodium | ||
Substrates | ||
Bilirubin | Doumas method | DGKL, Hannover, Germany |
Creatinine | IDMS | DGKL, Bonn, Germany |
Glucose | GC-IDMS | INSTAND eV. Düsseldorf, Germany |
Protein | Modified Biuret | |
Urate | HPLC | Erasmus Medical Centre, Rotterdam, Netherlands |
Enzymes | ||
ALP | IFCC | Unknown |
α-Amylase | Haga Hospital, The Netherlands | |
AST | ||
ALT | ||
CK | ||
GGT | ||
LD | ||
The Doumas method according to Rainer et al. (14). ICP-IDMS - Inductively Coupled Plasma-Isotope Dilution Mass Spectrometry. DGKL - German Society for Clinical Chemistry and Laboratory medicine. IDMS - Isotope Dilution Mass Spectrometry. GC-IDMS - Gas Chromatography - Isotope Dilution Mass Spectrometry. HPLC - High Performance Liquid Chromatography. ALP: Alkaline phosphatase. ALT - alanine aminotransferase. AST - aspartate aminotransferase. CK - creatine kinase. GGT – gamma glutamyl transferase. LD - lactate dehydrogenase. IFCC - International Federation of Clinical Chemistry. |
Results were categorized by measurement procedure, traceability and instrument. The description of standard materials used by participants for calibration traceability is shown in Table 2. Participant laboratories using the same combination of these three elements were considered as a peer group. The peer groups and the number of laboratories included for each analyte are shown in Figures 1-17Figure 2Figure 3Figure 4Figure 5Figure 6Figure 7Figure 8Figure 9Figure 10Figure 11Figure 12Figure 13Figure 14Figure 15Figure 16Figure 17.
Table 2
Compared to 2015, a new instrument was incorporated in 2016 survey (Bio-systems BA 400), with only 6 participating laboratories. The overall evaluation of the 2015 survey was published on the SEQCML website and was presented at the 2016 EQALM annual meeting (13, 15). Only groups formed by 5 or more final laboratories were considered in this study.
Inter-laboratory imprecision was calculated by averaging the coefficient of variation (CV) obtained from the six controls distributed on the 2016 and 2015 surveys and compared with the best (Dutch) inter-laboratory CV derived from the 2014 pilot study, which used similar six commutable control materials (16).
Bias was calculated by the percent difference between the peer group mean (same measurement procedure, traceability and instrument) and the reference value. The analytical performance specification to apply for bias evaluation was based on the BV data collected on the online 2014 database, which had been elaborated as detailed by Ricós et al., applying the minimum level of requirement for electrolytes and the desirable level for substrates and enzymes (17-19).
The results of this study were examined with the particular focus on the most common analytical procedures used in Spain and its repercussion on non-comparable results, detected throughout participation on level 1 EQA schemes.
Standardization is defined by the attainment of inter-laboratory imprecision within the predefined APS and peer group bias (% mean deviation to the reference value) below the allowed bias derived from BV.
Results
All results exceeding the mean ± 3 standard deviation of each group were rejected as outliers. The number of rejected participant laboratories was 5 for the 2015 survey and 10 for the 2016 survey. Moreover, 30 results for lactate dehydrogenase (LD) which were 100% higher than the others due to the different substrate (pyruvate instead of lactate) were also excluded from the study. Results for bias are presented in Figures 1-17Figure 2Figure 3Figure 4Figure 5Figure 6Figure 7Figure 8Figure 9Figure 10Figure 11Figure 12Figure 13Figure 14Figure 15Figure 16Figure 17. Results for the inter-laboratory imprecision of each peer group for electrolytes, enzymes and substrates are presented in Tables 3-5Table 4Table 5 and compared with the APS for inter-laboratory imprecision (APSIL) from the pilot 2014 survey (16). An overview of the standardization achieved in our setting, according to the bias and the imprecision calculated for instruments, is presented in Table 6.
Table 3
Table 4
Table 5
Bilirubin | 2015, CV (%) | 2016, CV (%) | APSIL |
---|---|---|---|
DPD, SRM 916-Abbott Architect | 3.8 | 4.7 | 9.6 |
DPD, SRM 916-Beckman Coulter AU | 2.3 | 4.8 | |
DPD, SRM 916-Roche Cobas 6000, 8000 | 1.8 | 15.7* | |
Vanadate, SRM 916- Siemens Advia | 5.1 | 1.1 | |
Sulfanilic, SRM 916- Siemens Dimension, Vista | 5.3 | 2.5 | |
Sulfanilic, SRM 916-Biosystems BA | / | 6.5 | |
Creatinine | |||
Jaf nc, SRM 967-Abbott Architect | 1.4 | 2.0 | 7.0 |
Jaf nc, SRM 967-Beckman-Coulter AU | 7.7 | 5.8 | |
Jaf c, IDMS – Roche Cobas6000, 8000 | 2.4 | 3.6 | |
Jaf c, SRM 967-Roche Cobas 6000, 8000 | 4.0 | / | |
Jaf c, SRM 967-Siemens Advia | 3.0 | 1.2 | |
Jaf c, NIST SRM 914a – Dimension | / | 1.4 | |
Enz, NIST SRM 967ª–Coulter AU | / | 2,9 | |
Enz, NIST 967a –Bio-systems | / | 4,0 | |
Enz, IDMS-Cobas 8000 | / | 3,1 | |
Glucose | |||
HK, SRM 965-Abbott Architect | 5.4 | 4.5 | 5.9 |
HK, SRM 965-Beckman Coulter AU | 2.4 | 3.4 | |
HK, IDMS-Roche Cobas 6000,8000 | 8.1* | 0.8 | |
HK, SRM 965-Siemens Advia | 3.8 | 2.5 | |
HK, SRM 917-Siemens Dimension, Vista | 7.2* | 2.0 | |
GOD, SRM 965- Bio-systems BA 400 (6) | / | 2.0 | |
Total protein | |||
B, SRM 927 –Abbott Architect | 3.2 | 3.2 | 3.2 |
B, SRM 927-Beckman Coulter AU | 4.9 | 2.3 | |
B, SRM 927Roche Cobas 6000,8000 | 4.6 | 6.4* | |
B, SRM 927-Siemens Advia | 8.8* | 2.0 | |
B, SRM 927-Siemens Vista | 4.2 | 1.6 | |
B, SRM 927 - Bio-systems BA 400 | / | 2.0 | |
Urate | |||
Uricase-POD, SRM 913-Abbott Architect | 3.0 | 3.1 | 5.2 |
Uricase-POD, IDMS- Beckman Coulter AU | 3.5 | 3.2 | |
Uricase-POD, IDMS - Roche Cobas 6000,8000 | 3.5 | 1.2 | |
Uricase-POD, SRM 909 - Siemens Advia | 2.2 | 2.0 | |
Uricase-POD, SRM 913 - Siemens Dimension, Vista | 1.1 | 4.1 | |
Uricase-POD, SRM 909c - Bio-systems BA400 | / | 3.5 | |
*exceeding APSIL.The coefficient of variation (CV) is presented as the group’s average for six controls. Only instruments with more than 5 participating laboratories are shown in this table. APSIL - analytical performance specifications for inter-laboratory imprecision. B – Biuret. DPD - 3,5-dicholorophenyl-diazoniumtetrafluoroborate. Enz – enzymatic. Jaf – Jaffe. Jaf c - Jaffe compensated. Jaf nc - Jaffe non compensated. HK – hexokinase. POD – peroxidase. |
Table 6
Discussion
The percentage of laboratories excluded was higher in 2016 than in 2015 due to better knowledge of the traceability-instrument, so groups were more specific in 2016. This cannot be considered a disadvantage. The results in this study are discussed form the light of their impact on the aims proposed. These are: positive, negative and needed to be dialogued with providers.
Main positive impacts, which imply an adequate standardization not needing for further improvements, apply to potassium and creatine kinase (CK). Potassium shows inter-laboratory imprecision and bias (Figure 4) within the allowable limits for almost all peer groups. For the remaining electrolytes good inter-laboratory imprecision can also be seen, well in agreement with the 2014 survey (performed in collaboration with other European countries) where all participant laboratories and manufacturers fulfilled the APS for total analytical error at the minimum performance level (20). Creatine kinase show good inter-laboratory imprecision and bias (Figure 10), except for the new group enrolled in the 2016 survey (BA400). So it may be expected a well standardized measurements soon. Negative impacts may be due to several reasons. The aqueous matrix of SRM 915 and 918 used for calcium and sodium, respectively (Figures 1 and 5Figure 5), produces low results. Lack of commutability of calibration traceability materials was described to be a crucial factor to assure standardization in medical laboratories by Panteghini and Ambruster (21, 22).
Instrument dependent problems can be seen in this study for alkaline phosphatase (ALP) with low results for Roche users (Figure 6), whereas all participants use same method and traceability; this event causes an important lack of standardization in our country because it is the greatest group. Same results had been seen by Braga et al., and Aloisio et al. who observed discrepancies among Abbott Architect users related to an “experimental” calibration factor provided by the manufacturer (23, 24). Non-standardized ALP results could have a great impact in some clinical scenarios such as hypophosphatemia diagnosis, so an improvement in the results’ traceability becomes a crucial objective (25). Method dependent troubles are seen in four cases.
Firstly, amylase, were all groups using malto-heptaoside (G7) substrate, as well as the malto-trioside (G3) of Abbott Architect show harmonized results. The remaining G3 groups have unacceptable negative bias (Figure 7). This lack of standardization affects one third of the participants of this study, thus producing a considerable impact on the healthcare in our country. Alanine aminotransferase (ALT) and aspartate aminotransferase (AST) testing show unacceptable inter-laboratory imprecision and bias (low results) (Figures 8 and 9Figure 9) for laboratories that did not add pyridoxal-5-phosphate (P5P) in its measurement procedure. Infusino et al. and Jansen et al. reported that when reagent is supplemented with P5P the ratio of preformed holoenzyme to apoenzyme differs among specimens (12, 26). Gamma glutamyl transferase (GGT), were all groups using substrate of γ-glutamyl-3carboxy-4nitroanilide > 4mmol/L have good precision and bias; however, the Siemens Dimension Vista group that uses a different concentration of substrate (< 4 mmol/L) produces unacceptable high results (Figure 11). Lastly, creatinine shows good inter-laboratory CV. However, only enzymatic methods have good bias at the entire concentration range studied, whereas most of the Jaffe based measurements produce unacceptable high results at low-normal concentrations (≤ 50 mmol/L) and some of them show inconsistent bias along the two surveys evaluated (Figure 14). Part of the 2015 results had been previously published and is in accordance with the 2016 survey, as well as with Jassam et al. that observed as Abbott compensated and Jaffe methods were most affected by glucose interferences, resulting in either under- or over- estimation of GFR and may also lead to errors in the classification of chronically kidney disease (20, 27, 28). Likewise, data reported by Panteghini showed an 18 μmol/L positive bias derived from the Jaffe-based method on a Beckman AU 2710 instrument (29). These results are especially relevant for paediatric population. Our results evidences that for consecutive years the Jaffe method produces false high results at low-normal concentration values, in all the instruments used in our country. Consequently, creatinine is not standardized in our setting and considering the clinical implications associated, Jaffe method should be abandoned. Dialogue with providers is of upmost necessity in several cases. The main negative issue is the lack of adequate information about the calibration traceability of the measurement procedure; this circumstance was observed to affect the 55% of participating laboratories in 2015. In order to address and minimize this issue, the SEQCML- Analytical Quality Commission promoted regular and specific meetings with providers and holding educational communications and workshops in national laboratory congresses (5, 6). This effort seems to have been worthy, observing a decrease in the percentage of wrong-coding traceability from 55% to 20% in 2016.
Some in vitro diagnostic medical device providers reported their methods for ALT and AST as “IFCC traceable” when no P5P was added; this created a high incidence of wrong codifications by laboratory workers that was solved and recorded by SEQCML after informing of this circumstance to providers and users.
Lactate dehydrogenase measurements gave good inter-laboratory CV in the 2015 survey but not in 2016; the reason for this remains unknown and should be discussed with providers. Bias showed an interesting improvement, resulting in satisfactory results for all users of the lactate to pyruvate based measurement in the 2016 survey (Figure 12).
Our findings for bilirubin, chloride, glucose, magnesium (irregular inter-laboratory CV and bias), as well as total protein and urate (good inter-laboratory imprecision, but irregular bias) led us to the opinion that a dialogue with providers would be necessary for improving standardization in our country.
A limitation of this study would be the reduced number of participants in certain groups, due to the fact that this program is still poorly known by many Spanish laboratories. Consequently, one symposium, various workshops in the national congress and specific meetings were organized in 2017, a book has been written in 2018 and other educational activities are planned for the future to overcome this limitation.
Another drawback might be that there is a single exercise per year; this could be not enough to guarantee the trueness for the rest of the year. Because the economic difficulty to make more distributions of these controls materials along the year, laboratories in Spain could use our regular EQA schemes (stabilized materials, peer group evaluation, one sample per month) to verify if their analytical performance is maintained along the year.
Conclusions
The two years of category 1 EQA program experience in our country have manifested a lack of standardization of the 17 more frequent general biochemistry tests used in our laboratories. The application of this kind of EQA program allows estimating measurement procedure-traceability-instrument bias in a way that can be expanded to what happens with real patient samples. The impact of the information obtained by category 1 EQA program on the lack of standardization is: to recommend abandoning methods such as for ALT, AST without exogenous pyridoxal phosphate, Jaffe method for creatinine, pyruvate-lactate for LD, and do not use non-commutable calibrators, such as aqueous solutions for calcium and sodium.