
ABOUT
Hello,
My name is Claude Paugh, and I have more than 25 years in the technology industry. My career started in technology infrastructure and networking, then moved into software engineering. I moved from Canada to the United States during the dot.com boom, and for the last 18+ years I have concentrated on Data Architecture and Engineering.
​
Some of my career and project highlights are below. You can find me on LinkedIn, and our company page. You can also ask our AI chat bot for more information if you would like more details.
Best,
Claude
Competency Areas
Data Architecture
-
Architecture and implementation of Data Lake on AWS S3 and Redshift Spectrum. Sourcing data from Salesforce, Five9, Bing API, Google Analytics API, Pardot, structured files (JSON, CSV, XML), and PostgreSQL relational databases.
-
Informatica MDM Metadata management and infrastructure deployment managing ETL, Data Analysis, capturing business data elements and lineage
-
Created development methodology improvements that increased data quality and data provisioning during development and testing cycles. Data quality went from poor to superior and delivery times went from 3 days to 2 hours.
-
​Near real-time data integration using Python with Salesforce CRM, dimensional modeling requirements capture and database design for analytics data warehouse on AWS Redshift.
-
Created solution reference architectures and implementation for data integration services and event based ETL for AWS integration (Talend, Redshift, S3, JMS, Apache Service Mix)
-
Data lake design for PB scale ingestion streaming (Kinesis) data for world wide streaming service. Included partitioning strategy (minute) and data change modifications for Parquet.
-
Processing optimizations and architecture enhancements to ensure scalability and time series values during ML model changes
-
Constructed proof of concepts web services prototype for data services using Java and Python
-
Development of policies, practices, and contracts for consumer engagement of data interfaces
-
Developed data modeling conventions and design pattern guidelines for relational and multi-dimensional databases
Problem Solving
-
Targeted difficult business process re-engineering analysis and application performance issues. Managed triage and resolution of performance challenges, which delivered several orders of magnitude of performance gains.
-
Project Management of performance testing for $50 million project covering business operations for pricing of asset management products for 1T+ AUM firm. Managed combined on-shore and off-shore team specializing in performance testing.
-
Led the business intelligence governance team at large financial institution and set direction for corporate roadmaps. Responsible for updates to company SDLC methodologies to include development deliverables for data (Agile & Waterfall). Led governance of the BI tools and the best practice adoption, including commercial and open source products.
Modeling and Analytics
-
Performed conceptual, logical, and physical modeling for multiple projects over career
-
Master data schema design for securities, holdings/positions, and application development to calculate derived risk exposure analytics throughout nested levels of portfolios. The analytics calculations covered all firm investment portfolios for a large (200B AUM) asset manger.
-
Developed proto-type analytics engine using Python and Dask libraries for large multi-national financial institution. Proof of concept architecture to create a distributed Python analytics environment, including integration with Azure
-
Analytics development for dataset customization, preparation, and aggregation using Python
-
Developed data modeling conventions and design pattern guidelines for relational and multi-dimensional databases
Engineering
-
Database design including SQL performance tuning, physical database design, and development for highly critical databases that delivered market critical data in a short time windows
-
Developed customized data pipelines with Apache Kafka for analytics and machine learning (ML) development with Python. Development on Apache Spark clusters for exceptionally large (50TB) data sets for PII encryption. Spark cluster sizes were up to 62 nodes, 1950 CPUs, and 10TB RAM
-
Schema design and implementation on Redshift clusters for multi-PB database optimized for handling trillions of rows growing at 25 percent annually
-
Implemented predictive machine learning credit settlement models in Python using pandas, NumPy, and SKLearn. This was responsible for core business revenue generation.
-
Google Cloud Platform (GCP) End-to-End pipelines constructed using Python, Kubernetes (GKE), GCP Cloud Functions, Storage Transfer Service (STS), Google Cloud Storage (GCS) and Weka storage appliances
-
Many years of Oracle and DB2 physical designs and tuning for highly available and performant applications (thousands of TPS)