An end-to-end data engineering pipeline that loads real Medicare Part D claims data into Snowflake, transforms it with dbt across three layers, and applies Isolation Forest machine learning to surface the top 0.5% of statistically suspicious prescribers across 20,935 providers nationwide.
Each of the 20,935 prescribers plotted by total claim volume against total drug cost. Normal prescribers cluster together. Anomalous ones — shown in red — sit far from the group, isolated by the algorithm. Use the filters to explore by state or specialty.
Anomalies are not evenly distributed across states or cost ranges. High-population states produce more flagged cases. Anomalous prescribers cluster at the extreme high end of drug cost — billing far beyond their peers.
An oncologist billing $10,000 per claim may be entirely normal — cancer drugs are expensive. The same cost from a family practice doctor is a red flag. Anomaly rates vary significantly across specialties.
Sorted by anomaly severity — the most statistically isolated prescribers appear first. A lower anomaly score indicates greater isolation from the normal cluster. These cases warrant human investigation before any conclusions are drawn.
| Name ↕ | Specialty ↕ | State ↕ | Claims ↕ | Drug cost ↕ | Avg / claim ↕ | Drugs ↕ | Score ↕ |
|---|