Diagnosis codes underestimate chronic kidney disease incidence compared with eGFR-based evidence: a retrospective observational study of patients with type 2 diabetes in UK primary care

Background Type two diabetes (T2D) is a leading cause of both chronic kidney disease (CKD) and onward progression to end-stage renal disease. Timely diagnosis coding of CKD in patients with T2D could lead to improvements in quality of care and patient outcomes. Aim To assess the consistency between estimated glomerular filtration rate (eGFR)-based evidence of CKD and CKD diagnosis coding in UK primary care. Design & setting A retrospective analysis of electronic health record data in a cohort of people with T2D from 60 primary care centres within England between 2012 and 2022. Method We estimated the incidence rate of CKD per 100 person–years using eGFR-based CKD and diagnosis codes. Logistic regression was applied to establish which attributes were associated with diagnosis coding. Time from eGFR-based CKD to entry of a diagnosis code was summarised using the median and interquartile range. Results The overall incidence of CKD was 2.32 (95% confidence interval [CI] = 2.24 to 2.41) and significantly higher for eGFR-based criteria than diagnosis codes: 1.98 (95% CI = 1.90 to 2.05) versus 1.06 (95% CI = 1.00 to 1.11), respectively; P<0.001. Only 45.4% of CKD incidences identified using eGFR-based criteria had a corresponding diagnosis code. Patients who were younger, had a higher CKD stage (G4), had an observed urine albumin-to-creatinine ratio (A1), or no observed HbA1c in the past year were more likely to have a diagnosis code. Conclusion Diagnosis coding of patients with eGFR-based evidence of CKD in UK primary care is poor within patients with T2D, despite CKD being a well-known complication of diabetes.

the general population, with diabetes considered as a subgroup (16,17).However, people with T2D engage with primary care services more differently than the general population.It is therefore important to quantify how well CKD is coded in a T2D population, within the context of the clinical guidelines relevant to this group.This study aims to establish: the incidence of CKD estimated using eGFR-based evidence and/or diagnosis codes, the proportion of eGFR-based CKD incidences that have a diagnosis code, and the timeliness of CKD diagnosis coding in a cohort of patients with T2D.

Methods
This is a retrospective cohort study using routinely collected UK primary care data from 60 general practitioner (GP) practices across England between February 2012 and December 2022.

Cohort
All patients with T2D were considered for inclusion in the study cohort and considered at risk of incident CKD, regardless of whether their serum creatinine was measured throughout follow-up, providing an intention-to-treat population estimate.CKD incidence estimates were therefore not impacted by non-adherence to clinical guidelines for the monitoring and diagnosis of kidney disease.
A burn-in period was defined from the start of data collection (February 2012) to 5th April 2015 to exclude patients with pre-existing CKD (either eGFR-based or coded) prior to the study index date (6th April 2015); any patients with evidence of CKD within the burn-in period or prior to their diagnosis of T2D were excluded.The study period ran from the 2015/16 to 2020/21 fiscal years.

Definitions
eGFR-based CKD was ascertained using repeated serum creatinine or eGFR measurements, with eGFR calculated from serum creatinine using the 2021 CKD-epi formula (19) without racial adjustment.If both an eGFR and serum creatinine measurement existed on the same day for a patient, the serum creatinine measurement was retained and used to calculate eGFR.Patients were classed as having eGFR-based CKD if they had 2 or more eGFR measurements below at least 90 days apart, upto a maximum of 15 60//1.73 2 months apart.Any eGFR measurements occurring between the dates of those observations were required to have a median of below .60///1.73 2   An upper limit on the time between qualifying eGFR measurements was imposed to distinguish between a sustained drop in kidney function and two acute episodes of kidney dysfunction.This upper limit was set to 15 months between measures to allow identification of eGFR-based CKD using measurements obtained at two consecutive diabetic annual reviews.
Diagnosis codes were identified using Read and SNOMED terminologies.CKD diagnoses of stages G3-5 (eGFR < 60) were classed as CKD within this analysis.Codelists are provided in Supplementary Tables 4, 5 & 6.

Statistical analysis
The number of patients with a diagnosis code and/or meeting the eGFR-based criteria for CKD were estimated for each included fiscal year, to align with the financial incentives and audits within the National Health Service.We further present the number (and percentage) of patients that have at least one observation of eGFR (or serum creatinine) during that year, and how many of these have at least one eGFR below .60//1.73 2 To estimate the annual incidence of CKD, three "at-risk" cohorts of patients were established (one for each definition of CKD) for each fiscal year from 2015/16 to 2020/21.Patients are considered at-risk if they are registered with a participating GP practice with a diagnosis of type 2 diabetes before the end of the fiscal year, and no prior evidence of CKD.Evidence of CKD was defined as having: Incidence rates are presented per 100 person-years along with Poisson-based 95% confidence intervals.Follow-up began at the latest of the index date, or date of type 2 diabetes diagnosis.Follow-up ended at the earliest of death, CKD incidence, GP practice deregistration, or the end of the study period.
We performed a sensitivity analysis on the eGFR formula, using the CKD-epi 2009 (20) and Modification of Diet in Renal Disease (MDRD)(21) formulae.We described the changes in the eGFR estimates and their impact on CKD incidence rates.
For eGFR-based CKD, we estimated the median time interval between the two qualifying eGFR measurements.We quantified the number and proportion of patients with eGFRbased CKD that also received a CKD diagnosis code, and summarised the attributes of patients with and without CKD diagnosis codes descriptively using the median and interquartile range for continuous measures, and count and percentage for categorical measures: age, gender, deprivation, duration of diabetes, eGFR-stage, time between qualifying eGFRs, UACR, HbA1c control, cardiovascular disease and indicators of medication prescribing.UACR, HbA1c, and prescribing indicators were extracted from the year prior to first qualifying eGFR.Logistic regression was used to identify which patient attributes were associated with entry of a CKD diagnosis code.Gender, eGFR, UACR, deprivation decile, HbA1c control, cardiovascular disease and medication indicators were included as categorical predictors, and age, duration of diabetes, and time between qualifying eGFRs were included as continuous predictors.Deprivation decile is assigned using the postcode of a patient's GP practice, from 1 = most deprived to 10 = least deprived.
For patients that met the eGFR-based criteria and had a CKD diagnosis code during followup, we quantified the number and proportion of patients that received a diagnosis code before and after the second qualifying eGFR observation.When a patient met the eGFRbased criteria and had a CKD diagnosis code after their second qualifying measurement, we defined this as clinician-verified CKD.The proportion of patients with eGFR-based CKD that had clinician-verified CKD was estimated and the time to verification was summarised using the median and interquartile range.
All analyses were conducted in R version 4.2.3.

Figure 1: Flowchart of patient inclusions and exclusions
A total of 32,276 patients had T2D and were at risk of CKD (Figure 1).Of these, 3.007^{4} (93.2%) had at least one eGFR or serum creatinine measurement recorded during followup.13,945 patients (43%) were receiving care from a GP practice in a highly deprived area (IMD = 1 or 2) and the majority of patients had been first diagnosed with T2D in the 2 years prior to study entry (median 1.8 years; IQR 0 -7 years), with an average follow-up of 5 years (IQR: (2.31, 6.00)) (Table 1).

Diagnostic coding of patients meeting eGFR-based criteria
2,667 patients (8.3%) met the eGFR-based criteria for CKD during the study period.Of these, 54.6% patients did not have a corresponding diagnostic code during follow-up (Supplementary Table 2).
Patients that had eGFR-based evidence of CKD were more likely to have a diagnosis code if they were younger, had a higher G-stage of CKD at the first qualifying eGFR observation, an observed UACR (stage A1) or no observed HbA1c in the past year (Table 2).

Timeliness of diagnosis coding
The majority of patients (55.2%) with both eGFR-based CKD and a CKD diagnosis code received their diagnosis code after their second qualifying eGFR measurement (Table 3), and the median time from the second qualifying eGFR to entry of a diagnosis code was 9.79 months (IQR: 1.18, 24.34).23.8% of patients with clinician-verified CKD received a diagnosis code within 30 days of meeting the eGFR-based criteria, 31.6%within 90 days, and 40.1% and 56.1% within 6 and 12 months respectively.
469 patients had at least one additional eGFR measurement between their CKD-qualifying eGFR less than 60 and entry of a CKD diagnosis code.At least one measurement of an eGFR less than 60 was observed in 4,351 patients.However, only 3,000 (68.9%) //1.73 2 had a follow-up eGFR (or serum creatinine) within 6 months of their first abnormal result (Supplementary Table 3, Supplementary Figure 1).94 patients had an eGFR reported on the same date as their diagnosis code, and these were on average 2.36 lower //1.73 2 than their CKD-qualifying eGFR (Supplementary Figure 3), p = 0.019.

Summary
We have provided concerning evidence of a lack of diagnosis coding of CKD in patients with T2D in UK primary care, with more than half of patients meeting the eGFR-based CKD criteria never receiving a diagnosis code.Of those patients that have a CKD diagnosis code after meeting the eGFR-based criteria, diagnosis coding occurs with a median delay of more than 9 months.CKD incidence was severely underestimated when using CKD diagnosis codes alone, which prevents reliable quantification of the epidemiology and burden of CKD.

Strengths and Limitations
To the best of our knowledge, this is the first study to quantify the rates of CKD diagnosis coding for new incidences of CKD in a type 2 diabetic population in UK primary care.We provide an analysis of recent data that covers the onset of the COVID-19 pandemic and therefore any associated impact on the management and identification of CKD.We hypothesise that the observed drop in eGFR measurement rates can be attributed to disruptions to the provision of routine care caused by the COVID-19 pandemic (22).A limitation to this work is that our data only covers primary care.Patients may be referred to and managed within specialist secondary care centres upon exhibition of kidney function impairment.However, these referrals should be evidenced within the patient's primary care record.Future work should explore how patients diagnosed with CKD in primary care are managed across the care pathway.
A further limitation is that our lab-based CKD definition used only eGFR.Albuminuria and/or an elevated UACR is often an earlier signal for kidney damage than reduced eGFR and is common in patients with T2D.Therefore, further work should also capture patients that present with persistent albuminuria.However, adherence to UACR measurement guidelines is poor, so incidence estimates based on UACR derived using routinely collected data are likely to be unreliable.
A final limitation is that there is considerable variability in how eGFR is estimated by each studied formula (eGFR CKD-epi 2009 and 2021, and MDRD).This raises doubt concerning the reliability and precision of such formulae, however we have used CKD-epi 2021 which provides the most conservative (i.e.highest) estimates of eGFR.We have therefore mitigated the risk of overdiagnosis of CKD based on eGFR by applying the formula that comparatively detects fewer cases of CKD.

Comparison with existing literature
Similarly to González-Pérez et al (23), we have presented crude estimates of CKD incidence per 100 person-years by ascertainment criteria (eGFR-based CKD, diagnosis coded CKD or either).However, our combined estimates were considerably lower, likely due to several differences in our criteria definitions; our study does not identify cases using UACR or albuminuria measurements, which accounted for 49% of González-Pérez et al's included incidences.Furthermore, their work did not impose an upper limit on the time between qualifying eGFR measurements.
Others have reported higher estimates of the proportion of lab-based CKD cases with a corresponding diagnosis code ranging from 57.5% -77.7% for diabetic patients in UK primary care (16,17).However, these studies covered an earlier timeframe, focused on prevalence rather than incidence, and defined eGFR-based CKD differently to the present study.

Implications for research and practice
A lack of, or delay in, identification and diagnosis coding of CKD could lead to improper management of the condition.CKD progression is strongly associated with poor clinical outcomes and has a significant economic burden.CKD awareness remains profoundly low, in part because CKD is usually silent until its late stages.Physician awareness of CKD is critical in the early identification of the condition as well as the early implementation of evidence-based therapies that can slow progression of kidney dysfunction, prevent metabolic complications, and reduce cardiovascular-related outcomes.Tools including automated CKD patient registry/diagnostic coding within electronic health records to identify and prioritise patients for early intensive management can facilitate the clinical inertia we see in CKD management (24).

Conclusion
This study has provided evidence that CKD is poorly coded in primary care for people with type 2 diabetes, despite annual kidney function testing, which could lead to improper care and delayed intervention.The reasons for poor coding require further investigation, and emphasis should be placed on examining historic test results for CKD to improve diagnosis coding.

Funding
This study was funded by Gendius Ltd.

Competing interests
The authors have no competing interests to report.

Ethical approval
No ethical approval was required for this study as there was no patient contact and all study data was anonymised.

Table 1 :
Characteristics of patient cohort at the time of cohort entry Incidence estimates using only CKD diagnosis codes significantly underestimated the overall incidence, as estimated using the composite criteria, across all annual estimates.Incidence estimates using only the eGFR-based criteria were not significantly different from the composite criteria in fiscal years 2018/19, 2019/20 and 2020/21 (Figure2A, Supplementary Table1).Approximately 1-in-4 diagnosis code cases did not have evidence of eGFR-based CKD (N=411, 28.1%).Whereas, more than 1-in-2 eGFR-based cases did not have a corresponding diagnosis code (N=1,457, 54.6%).These findings were consistent over time (Supplementary Table1).Using either the CKD-epi 2009 or MDRD formulae to estimate eGFR resulted in a higher proportion of eGFR-based cases without a corresponding diagnosis code (59.2% and 61.7% respectively).There was a statistically significant drop (p<0.001) in the proportion of patients receiving at least one eGFR (serum creatinine) measurement in the 2020/21 fiscal year to 77.0% (95% CI: 76.5%, 77.5%), from 85.8% (95% CI: 85.4%, 86.2%) in 2019/20.Prior to 2019/20, the proportion of eligible patients with at least one valid eGFR or serum creatinine measurement remained over 80% (Figure diagnosis code.The CKD incidence estimates using eGFR-based criteria only and diagnosis codes only were 1.98 (95% CI: 1.90, 2.05) and 1.06 (95% CI: 1.00, 1.11) respectively.The combined incidence estimate across all fiscal years was significantly higher in the composite criteria than either the eGFR-based criteria (p<0.001) or diagnosis code criteria (p<0.001).

Table 2 :
Odds ratios (OR) and 95% confidence intervals from a logistic regression model to identify factors associated with diagnostic coding of eGFR-based cases

Table 3 :
Number of patients with eGFR-based CKD and a CKD diagnosis code, the proportion that occurred on or after their second qualifying eGFR measurement and the time between the second qualifying eGFR measurement and entry of a diagnostic code.