top of page


7 Easy Techniques to Detect Anomalies in Pandas for Data Analysis
Data analysis is an exciting journey, but it comes with its challenges. One of the biggest hurdles is identifying anomalies—unexpected results that can distort our conclusions and predictions.
Claude Paugh
May 144 min read
20 views


Apache Iceberg and Pandas Analytics: Part II
As I had indicated in Part I, I had built some basic examples with PyIceberg and Python to learn more, and exercise some of the functionality it offers. I started by using data that I collect from time-to-time, for securities, mostly common stocks, and various twelve-month key metrics and analyst forecasts. This is an extension to my SEC filings collection that I have a running series of articles on. I use this particular data to build out details for securities in my Neo4j g
Claude Paugh
May 913 min read
334 views


Gathering Data Statistics Using PySpark: A Comparative Analysis with Scala
Data processing and statistics gathering are essential tasks in today's data-driven world. Engineers frequently find themselves choosing between tools like PySpark and Scala when embarking on these tasks.
Claude Paugh
Apr 155 min read
11 views


Harnessing the Dask Python Library for Parallel Computing
Dask is a flexible library for parallel computing in Python. It is designed to scale from a single machine to a cluster of machines seamlessly. By using Dask, you can manage and manipulate large datasets that are too big to fit into memory on a single machine.
Claude Paugh
Apr 155 min read
6 views
bottom of page