David Wolf · Project Use Case
AI SECURITY · PRODUCT SECURITY · INTERNAL PRODUCT
Internal Product
Browser-Native Contact Intelligence & Email Discovery Engine
A Hunter.io-style contact intelligence platform using browser-local lookup, company/domain mining, Tranco-scale datasets, H1B/public data, GitHub/arXiv...
Built a browser-native contact intelligence and email discovery engine designed to compete with Hunter.io, ContactOut, Skrapp, RocketReach, Apollo-style, and Crunchbase-style workflows. The system mined and compressed large...
Client
Internal Product / Sales AI Platform
Engagement Type
Internal product data platform
Period
2023–2026
Role
Principal Architect / Data Platform Architect / Sales Intelligence Engineer
Focus Areas
Contact Intelligence, Email Discovery, Company Lookup, Domain Intelligence
The Research Narrative
Strategic Problem
The challenge was building useful contact intelligence without relying on a single proprietary database. Company names, domains, people names, email formats, country-specific naming conventions, industry...
What David Did
Mined and normalized company/domain datasets from Tranco-scale web data and public sources.
What Became Clearer
Created a data-infrastructure case study separate from the outreach/product UX layer.
Consulting Proof
This is evidence of turning messy security telemetry into explainable dashboards, alert-quality improvements, and executive-ready operating views.
The Context
This project is the data moat beneath the Piper Sales AI, recruiting intelligence, staffing automation, and browser-extension workflow systems. It is distinct from Piper: Piper is the outreach and workflow product, while this engine is the contact/company/email intelligence infrastructure. Sources include Tranco top 1M mining, public company datasets, H1B hiring data, Kaggle and open datasets, GitHub Archive mining since 2012, arXiv author/publication mining, LinkedIn sitemap/profile URL mining, global first/last-name inference, email naming convention prediction, 200,000+ compressed domains/companies, in-browser lookup, and Hunter.io-compatible API patterns.
The Challenge
The challenge was building useful contact intelligence without relying on a single proprietary database. Company names, domains, people names, email formats, country-specific naming conventions, industry labels, hiring data, GitHub identities, publication records, and social/profile URLs all need normalization. The system needed to compress the right data for browser use while preserving useful matching, inference, and enrichment behavior.
What I Did
- •Mined and normalized company/domain datasets from Tranco-scale web data and public sources
- •Built compressed browser-usable indexes containing 200,000+ domains or companies for instant lookup
- •Used public company datasets, H1B hiring data, Kaggle/open datasets, and other firmographic sources to enrich company records
- •Explored GitHub Archive mining since 2012 to identify developer, organization, and technology signals
- •Explored arXiv author and publication mining to identify STEM, cyber, AI, and technical expert signals
- •Mined or processed LinkedIn sitemap/profile/company URL patterns for discovery and enrichment workflows
- •Built global first-name and last-name inference across 100+ countries to improve contact/person matching
- •Modeled email naming conventions by company and domain to predict likely address patterns
The Outcome
Created a data-infrastructure case study separate from the outreach/product UX layer.
Research Outcomes
Signal Quality
Improved the trustworthiness of operational security signals
Operational Clarity
Translated complex security data into clearer operating views
Stakeholder Visibility
Made technical risk and status easier to explain
Operational Impact
Turned raw telemetry into actionable security intelligence
Capabilities Demonstrated
Public-Safe Evidence
Shareable insights without sensitive data
Security Analytics
Signal investigation and event analysis
IAM / Access Control
Identity telemetry and access insights
SIEM Alert Debugging
Noise reduction and signal validation
Dashboard Development
Operational and executive views
Executive Reporting
Security data translated for leadership
Telemetry Normalization
Consistent and trusted data
Operational Reporting
Actionable views for security operations
Key Deliverables
- •Browser-native contact intelligence architecture
- •Company/domain mining pipeline
- •Tranco-scale domain processing workflow
- •Compressed domain/company browser index
- •Public company dataset enrichment workflow
- •H1B hiring-data enrichment workflow
- •GitHub Archive mining workflow
- •arXiv author/publication mining workflow
Tools & Technologies
Consulting Translation
The reusable pattern is not Disney-specific: normalize fragmented security telemetry, debug low-signal alert behavior, build trusted operating views, and give leadership evidence they can act on without exposing sensitive systems.