Loading data...
CMS Medicare Part D  ·  2023  ·  500,000 Claims

Detecting anomalous
prescriber billing
patterns

An end-to-end data engineering pipeline that loads real Medicare Part D claims data into Snowflake, transforms it with dbt across three layers, and applies Isolation Forest machine learning to surface the top 0.5% of statistically suspicious prescribers across 20,935 providers nationwide.

Snowflake dbt Isolation Forest Great Expectations Apache Airflow Python
Prescribers analyzed
2023 dataset
Anomalies detected
flagged for review
Anomaly rate
within 0.6% threshold
Medicare spend
total analyzed

Every dot is a doctor.

Each of the 20,935 prescribers plotted by total claim volume against total drug cost. Normal prescribers cluster together. Anomalous ones — shown in red — sit far from the group, isolated by the algorithm. Use the filters to explore by state or specialty.

Total claims vs total drug cost
Hover over any point to see prescriber details
Normal prescriber
Flagged anomaly

Where anomalies concentrate.

Anomalies are not evenly distributed across states or cost ranges. High-population states produce more flagged cases. Anomalous prescribers cluster at the extreme high end of drug cost — billing far beyond their peers.

Anomalies by state
Top states by flagged prescriber count
Drug cost distribution
Normal vs anomalous prescribers

Specialty context matters.

An oncologist billing $10,000 per claim may be entirely normal — cancer drugs are expensive. The same cost from a family practice doctor is a red flag. Anomaly rates vary significantly across specialties.

Anomaly rate by specialty
Specialties with 50+ prescribers · sorted by anomaly rate

105 prescribers flagged.

Sorted by anomaly severity — the most statistically isolated prescribers appear first. A lower anomaly score indicates greater isolation from the normal cluster. These cases warrant human investigation before any conclusions are drawn.

Name Specialty State Claims Drug cost Avg / claim Drugs Score