The scope of External Quality Assessment (EQA) in laboratory medicine has evolved considerably since Belk and Sunderman performed the first EQA scheme in the late 1940’s (1). Today, EQA schemes are an essential component of a laboratory’s quality management system, and in many countries, EQA is a component of laboratory accreditation requirements (2, 3). EQA should verify on a recurring basis that laboratory results conform to expectations for the quality required for patient care.
A typical EQA scheme (EQAS) consists of the following events: A set of samples is received by the laboratory from an external EQA organization for measurements of one or more components present in the samples. The laboratories do not know the concentration of the components in the samples and perform measurements in the same manner as for patient samples. The results are returned to the EQA organizer for evaluation and after some time the laboratory receive a report stating the deviation of their results relative to a “true” value (assigned value). Reports may also include evaluation of whether the individual laboratory’s results met the analytical performance specifications and an evaluation of the performance of the various measurement procedures used by the participants.
Important objectives of EQA are, beside monitoring and documenting the analytical quality, to identify poor performance, detect analytical errors, and make corrective actions. Participation in EQA gives an evaluation of the performance of the individual laboratory and of the different methods and instruments (3, 4). Therefore, proper and timely evaluation of EQA survey reports are essential and even a must for accreditation (see ISO 15189, paragraph 220.127.116.11). In this opinion paper, we focus on the knowledge required to interpret an EQA result and present a structured approach on how to handle an EQA error. The paper is limited to EQA for evaluation of quantitative measurement procedures.
Knowledge required to interpret EQA results
The value of participating in EQAS for the laboratory depends on proper evaluation and interpretation of the EQA result. Key factors for interpreting EQA results are knowledge of the EQA material used, the process used for target value assignment, the number of replicate measurement of the EQA sample, the range chosen for acceptable values around the target (acceptance limits), and the impact of between lot variations in reagents used in measurement procedures (4-6).
The most important property of the EQA sample is commutability (7-9). The significance of this is something that one has become more and more aware of in recent years. A commutable EQA sample behaves as a native patient sample and has the same numeric relationship between measurements procedures as is observed for a panel of patient samples. A non-commutable EQA sample includes matrix related bias that occurs only in the EQA sample but not in authentic clinical patient samples and therefore, does not give meaningful information about method differences. Matrix related bias is due to an unwanted distortion of the test result attributed to physical and chemical differences in the samples, compared to the patient material the measurement procedures are directed towards. In a recently published article concerning method differences for immunoassay’s, non-commutability for EQA materials was observed on 13 out of 50 occasions (5 components, 5 methods and 2 EQA samples) (9). The bias demonstrated by the EQA samples was five times found to be in an opposite direction compared with the native serum samples. Therefore, EQA materials should be tested for commutability and if evaluation of method differences is intended, it is mandatory. Additionally, the sample should be stable during the survey period, homogeneous, available in sufficient volume and have clinical relevant concentrations (10, 11). Higher concentrations of components can be achieved by adding components (spiking) to pooled unaltered samples but this may induce non-commutability (12, 13). In practice, the EQA sample very often is a compromise between ideal behaviour in accordance with native samples and stability of the material and therefore, may not be commutable, which limited the opportunities in EQA result evaluation (4).
Assignment of target values
If the EQA sample is commutable, target value assignment could be made by using a reference measurement procedure or a high-specificity comparative method that is traceable to a reference measurement procedure (14, 15). In this case, all participants are compared to the same assigned value and trueness can be assessed. Target assignment by value transfer based on results from certified reference materials is possible if the commutability of the reference materials has been verified (16-18). An example is Labquality’s EQAS 2050 Serum B and C (2-level) that use transferred values from NFKK Reference Serum X (Ref. NORIP home site (http://nyenga.net/norip/index.htm) – Traceability), as assigned values for 16 components. Serum X has certified values from IMEP 17 Material or Reference Serum CAL (19). For many components, a reference method or certified reference material is not available. In that case, an overall mean or median can be used as the assigned value, after removal of outliers or by the use of robust statistical methods (20). All measurement procedures are expected to give the same results for a commutable sample. That gives the possibility to compare the result with other methods. However, the measurement procedure with most participants will have greatest influence on the overall mean or median, and you do not know what the true value is. An alternative is to use the mean (or median) of the peer-group (see below) means (or medians) in order to give the same weight to each peer-group (21). A common reference assigned value should not be used if the commutability of the EQA sample is unknown because it is not possible to determine if a deviation from the assigned value is due to matrix-related bias, calibration bias or that the laboratory did not confirm to the manufacturer’s recommended operation procedure.
The most common procedure used to assign a target value if the commutability of the EQA sample is unknown is to categorize participant methods into peer-groups that represent similar technology and calculate mean or median of the peer-group, after removal of outlier values, and use this as the assigned value. A peer-group consist of methods expected to have the same matrix-related bias for the EQA sample and it is possible to assess quality, i.e. verifying that a laboratory is using a measurement procedure in conformance to the manufacturer’s specification and to other laboratories using the same method. A limitation is the number of participants in each group. The uncertainty of the calculated assigned value would be larger in a peer group with few participants compared to a group with many participants. The variability of results in the group will also influence the uncertainty of the assigned value. A high variability combined with few participants will give the greatest uncertainty of the assigned value.
To assess if the EQA result is acceptable, acceptance limits (i.e. analytical performance specifications) around the target value must be established (22-24). The acceptance limits can be considered regulatory, statistical or clinical.
Regulatory limits have the intention to identify laboratories with sufficiently poor performance that they should not be able to continue to practice. These limits tend to be wide and are often based on “fixed state-of-the-art”. The German RiliBÄK and the USA Clinical Laboratory Improvement Amendments (CLIA) have defined such regulatory limits (25, 26).
Statistical limits are based on “state-of-the-art” and the assumption that the measurement procedures is acceptable if it is in concordance with other using the same method. The assessment of the individual laboratory is given as z-scores, which is the number of standard deviations (SD) from the assigned value the EQA result. Assessment of z-scores is based on the following criteria: - 2.0 ≤ z ≤ 2.0 is regarded as satisfactory; - 3.0 < z < - 2.0 or 2.0 < z < 3.0 is regarded as questionable (‘warning signal’); z ≤ - 3.0 or z ≥ 3.0 is regarded as unsatisfactory (‘action signal’). These criteria is stated in ISO/IEC standard 17043:2010 (27). The performance of the individual laboratory is compared against the dispersion of results obtained by the participants in the peer-group in each survey. A disadvantage is that these limits are variable and may change with time as methods and instruments evolve. Another disadvantage with statistical based criteria is that the limits may vary between peer-groups measuring the same component. Imprecise-method peer groups will have a large acceptance interval whereas precise-method peer groups will have a small interval for acceptable results, independent of what is required for clinical needs. Several EQA organizations use z-scores in the feedback reports to the participants.
Clinical limits can be based on a difference that might affect clinical decisions in a specific clinical situation (28). These limits are desirable but may be difficult to implement because very few clinical decisions are based solemnly on one particular test. More common are clinically established limits derived from biological variation in general (29, 30). A challenge is the fact that the existing database on biological variation is based on few studies or studies with rather poor quality. However, in the strategic conference to arrive at a consensus on how to define analytical performance goals that took place in Milan 2014, a working group for revising the current biological variation database was established (31-33).
Both regulatory and clinical limits are fixed limits and the uncertainty of the assigned value will be a fraction of the acceptance interval. To account for the uncertainty of the definitive value, Norwegian Quality Improvement of Laboratory Examinations (Noklus) have added a fixed interval around the target value in their acceptance limits (34). When the acceptance interval is expressed as a percent, it might also be necessary to include a fixed unit interval below a concentration at which a percent is not reasonably achievable because the concentration-independent variability of a measurement procedure becomes a larger fraction of the acceptance interval.
EQA results are meant to reflect results of patient samples and in most of the schemes, the participant is asked to perform a single measurement of the EQA sample. The acceptance limits are often given in %, and are established according to a Total Error allowable (TEa) concept (35, 36). Total error is assessed because bias, imprecision, and analytical non-specificity can contribute to variation in a single result. If replicate measurements of the samples are included, it may be appropriate to have different limits to separately assess bias and imprecision.
Between lot variation
Between lot variation in the reagents used in measurement procedures may influence participant assessment in EQA (5, 37). The percentage of participants with a “poor” quality assessment declined from 38% if using a common target value to 10 and 4% when using a method specific target value and a lot specific target value, respectively (5). Between lot variation has been described in several publications for glucose strips (38-41). Ideally, the use of lot-specific target values in EQAS would allow assessment of the individual participant’s performance, but such assessments are not feasible in routine EQAS due to the larger number of lots on the market. EQA organizers should, however, register lot numbers when relevant and in some instances comment on lot variation in feedback reports (37). Additionally, between lot variation found when using control materials may not mirror results when using native blood (5, 37). To evaluate the clinical importance of between lot variation discovered in routine EQAS, the actual lot should therefore be examined using native blood.
A structured approach for handling unacceptable EQA results
An unacceptable EQA result should be investigated by the participant (the person in charge of EQA in the laboratory) to find the cause of the deviation and make corrective actions. According to ISO 15189, an accredited laboratory shall participate in EQAS, monitor and document EQA results, and implement corrective actions when predetermined performance criteria are not fulfilled (3). In spite of the extensive use of EQAS in evaluating the quality of the analytical work done in medical laboratories, it is remarkable that there is little aid in the process of finding the sources of errors when they appear. Therefore, the Norwegian Clinical Chemistry EQA Program (NKK) has developed a tool for handling deviating EQA results.
All the mentioned key factors that must be taken into consideration when interpreting an EQA result also apply for handling an EQA error. The ideal EQA sample has two important properties; it behaves as a native patient sample toward all methods (is commutable) and has an assigned value established with a reference method with small uncertainty. If either of these two criteria are not entirely fulfilled, results with errors NOT related to the quality of the laboratory may arise. Therefore, the EQA provider should take steps in the scheme design to avoid or ameliorate adverse consequences. This could be done for example, by using peer-group assigned values for a non-commutable material. It is important to distinguish between different types of error (external, generating cost without benefit) and those important ones that are caused by the laboratory itself (internal). For the laboratory, errors caused by themselves are most important and of their primary interest. However, errors made by either manufacturers and/or EQA organizers (external) may also affect the quality of laboratory performance and therefore could have a major impact.
A simple relation has to be fulfilled if a deviation is to be further investigated:
|R – AV| > L where R is the laboratory result, AV is the assigned value and L is the maximum acceptable deviation, i.e. acceptance limits. Many EQA organizers have suggested acceptance limits for their EQAS. The laboratories should be aware of these limits, and in countries where participating in EQAS is not mandatory/regulatory, it is the laboratories responsibility to define which limits is relevant for their use. In reports from EQA organizers, the laboratory’s performance history is often shown graphically together with the EQA organizer’s acceptance limits. Of the three variables in the above equation, only one, R, is the immediate responsibility of the laboratory. Errors in AV has an external source while an error in L is fundamentally internal as commented above even if most laboratories tend to adopt the limits proposed by the EQA organizer. To understand the complexity of finding the cause of an EQA error all sources of deviation in an EQA result are included in a flowchart and have to be considered. In those EQAS using the z-scores as an individual performance index R should be within the range - 2 ≤ z-score ≤ 2. This indicate that the laboratory result is within the 95% range of the distribution of all results. Results with a z-score < - 3 or > 3 can be identified as unsatisfactory, while results with a z-score between - 3 and -2 or 2 and 3 are questionable (a warning signal). This means the laboratory should investigate whether there is a reason why the results tend to become an outlier.
The history of developing a flowchart
In 2008 and 2009, the topic for group works at NKK’s annual meetings was “How to handle a deviating EQA result”. The result of this work was further processed by the NKK expert group and resulted in a flow chart with additional comments that could be used by the laboratories, e.g. in their quality system, to document actions against deviations in EQA.
In 2012-2013 NKK carried out a follow-up and an evaluation of the flowchart. Deviating EQA results from Labquality’s EQA scheme 2050 Serum B and C (2-level), survey 4 and 6, 2012) were selected and the laboratories were asked to use the flowchart to assess the EQA error and state the cause of the error. They were also asked if they use the flowchart regularly, and if not, why they do not use it. Finally, they were asked if they have any suggestions for improvement of the flowchart. Fifty-six percent of the invited laboratories replied (39/69). The results showed that most errors (81%) were the laboratory’s responsibility (internal causes), 15% the EQA provider’s responsibility (external causes), whereas 4% were a mix (internal/external causes). The most common errors were transcription errors (72%) both with respect to internal and external causes. For 4% of the deviating EQA results the participants did not reach any conclusion. Fifty-eight percent of the laboratories that responded used the flowchart regularly. Of these, 37% commented that they found the flowchart comprehensive and a bit complicated, but very useful in training/educational situations. They suggest changing the order of the items in the flowchart and start with transcription errors, the most common cause to a deviating EQA result (unpublished data).
The recommendations from the evaluation has been taken into account and a new version of the flowchart has been developed in cooperation with the External quality Control for Assays and Tests (ECAT) Foundation in the Netherlands (Figure 1). The content of the original flowchart is kept and where necessary expanded and re-structured.
Description of the flowchart
The flowchart starts with the most frequently errors followed by the logical steps in the flow of an EQA survey (from pre-survey issues to report and interpretation – see Figure 1). Four different aspects elucidate each item in the flowchart: Observation – what is the potential error, Responsibility – who is responsible for the error, Comment – a short comment on action to undertake, Note – a more detailed description of actions (see Figure 2). The responsible could be the participant, the EQA-provider (EQAP), and/or the manufacturer, each marked with different colour.
Before starting evaluating the potential cause for a deviating result, the report and/or comment letter should be read carefully for a possible explanation for deviating results. If no explanation is given, the flowchart should be used (Figure 1 and Figure 2) to reveal the potential cause(s).
The flowchart starts with the most probable causes of error; “Transcription errors” (item 1-6). The EQA provider may wrongly enter the data or the laboratory may record or report a wrong result. In the evaluation of the first version of the flowchart, transcriptional errors were the most common cause for a deviating result.
The next is “Pre-survey issues” (item 7-13). Obviously, a lot may go wrong before the sample reaches the laboratory like sample selection, inappropriate stability or homogeneity, a mistake in labelling or an error in packaging. These errors are the EQA organizer’s responsibility and should have been commented on in the comment letter (see above). Unfortunately, this is often not the case even though these errors are hard, and often impossible, to detect for the laboratory. Examples of more subtle origin are related to the stability of the samples. A good procedure is always to store the EQA sample at stable conditions at least until the report is received – a reanalysis of the sample may eliminate many sources of error. If you do not have any sample material left, you should ask the EQA organizer for a new sample.
The next section, “Sample receipt/handling” (item 14-17), is solely the laboratory’s responsibility. The laboratory should carefully check that the EQA provider has the correct address details and that all instructions for handling the sample from the EQA organizer, has been followed. The visual appearance of the specimen should be checked by reception for an immediate check of sample quality and physical integrity and also that the sample identifiers match the documentation.
“Test Performance” (item 18-22) is next. The laboratory or the instrument or kit manufacturer is responsible for errors in this section. Local documentation of measurement is important: who/when/how. You may locally have changed the procedure of measurement (e.g. factorized results: internal source) or the producer of the method may have changed the calibrator/reagents/procedure (external) without informing you (example creatinine, ALP). The problem may be related to the equipment, the reagents or the test performance. Is the problem new to your laboratory or is it an old problem? Have the error occurred before? It is important to evaluate results in relation to previous surveys. In other words, evaluation of the results of a single survey may be insufficient to reveal the cause of the problem. If it is new, look at your internal quality control data (IQC). First, look for systematic deviations (bias/trends) that may explain the EQA result. If this is the cause of the error, are your IQC rules not stringent enough or is L too narrow? In any case, is there a need for reanalysis of any of the patient samples in the relevant analytical run? If no hints can be found in IQC, you should proceed in the flowchart.
Errors in “Data handling” (item 23-25) are external and usually, not the responsibility of the laboratory. The problem could be related to the statistical procedure used in handling the data, e.g. parametric methods used when the data are not normally distributed, the consensus value is based on few participants causing a large deviation, or it may stem from uncertainty caused by a mix of factorized and original results. The establishment of the assigned value (AV) is a challenge. All participants, regardless of instrument or method, should be evaluated against the AV established by a reference method when this is available and commutable material is used. A deviation that is representative for one particular instrument or method is caused either by the EQA provider (non-commutable material) or the instrument or method used (e.g. a problem with a certain lot of reagents). An evaluation based on a reference value for a non-commutable EQA sample is a mistake by the EQA provider. Another example applies to a deviation between a particular instrument or method and the peer-group AV, based on results from a large number of instruments or methods. The deviation is similar for all participants with the particular instrument or method, and in that case, the instrument or method is linked to the wrong peer-group. It is important to check that the grouping of the instrument or method is correct, by both the EQA provider and the participant. This is a frequent cause of error unless the method is stated and adjusted at each survey. One should also be aware of that in a peer-group consisting of several instruments or methods the instrument or method with most participants will have a greater influence on the assigned value. Errors in this section may be difficult for the participant to detect and should have been commented on in the feedback report.
The last section is “Report and Interpretation” (item 26-29). Is the deviation clinically important? If not, the acceptable limits should be reconsidered, and may be expanded. Limits expressed in percent are probably not suitable for the lowest concentrations of the component because the measurement uncertainty may be larger than the acceptance limits if the concentration is low. Especially high concentrations are often less interesting and therefore also the deviation. However, from an analytical point of view it might still be worth reducing the error. This does not apply to limits based on state-of-the-art. A deviation in accordance with previous results has probably been handled earlier. The error may be the responsibility of the participants, the EQA provider or the manufacturer. The mean of all results for one particular method, however, may always be used to distinguish between errors general for the method (external) and errors in the laboratory (internal), even if you do not know the commutability of the sample. It may be that the error is already recognized as a general problem or specific for your method. An unusually large variation for a particular method may be caused by poor EQA material (external/EQA provider) or between lot variation in reagents for that specific method and several lots present (external/manufacturer). It could also be due to change in the method by the manufacturer. A suspected internal error requires review of the internal quality control (IQC) and the patient results in the period where the EQA sample was analysed. A similar deviation observed in several samples with different concentrations, may suggest that a systematic error is present. In that case, it may be wise to check previously EQA results to look for a trend. For more details, look closer to Figure 1 and Figure 2.
Sometimes there is no explanation to the EQA error. It may have been a transient error in the system at the time of measurement. The error should be followed up in later EQA surveys.
It should be realised that an error made by the EQA provider or manufacturer may cause a deviating result for a participant in an EQAS. The participant should therefore also consider this possibility when evaluating deviating EQA results. Errors caused by the EQA provider’s should have been commented in the comment letter. These errors are often hard and sometimes impossible for the participant to discover and handle. In order to improve their schemes, the EQA provider should create a checklist based on this flowchart as a tool in their work to make ongoing EQA schemes more useful for the participant.
The flowchart presented in this paper is limited to cover mistakes that occur in the analytical phase of the total testing process. Transcription errors, which counted for about three quarters of the mistakes or errors, could be classified as post-analytical errors, i.e. not part of the analytical process, and therefore may “falsely” affect the evaluation of the analytical performance. Today, writing down the patient results are not part of the daily routine when laboratories are highly automatized. The fact that a laboratory professional does not check written results, might reflect lack of attention to deliver correct results and hence, indicate a lack of quality. Another limitation is the limited use of the flowchart so far.
The flowchart itemizes the steps taken by many EQA providers when working with participants to understand and correct adverse performance and is used in the format of Corrective and Preventative Action (CAPA) documentation or Root Cause Analysis (RCA) tools. The flowchart is a useful addition to these as it summarizes these processes for participants. To our knowledge, this is the first time such a structured approach on how to handle deviating EQA results, have been published. So far, the flowchart has had a very limited use. However, the flowchart will soon become available in the public domain, i.e. the website of the European organisation for External Quality Assurance Providers in Laboratory Medicine – EQALM. This flowchart can be the basis for modified versions for specific EQA areas and be further improved based on the experience of users.