Quick Answers
Why reduce device model cardinality? Raw device strings have 5,000+ unique values, but most appear rarely. Models trained on sparse data overfit to noise. Keeping only the top 500 devices that cover 90%+ of traffic eliminates sparsity while preserving predictive signal.
Does this lose information? Minimally. Rare devices appear so infrequently (often under 100 times in 200 million impressions) that no model can learn reliable patterns from them. Grouping them into an "other" category prevents overfitting while that category itself becomes large enough to learn meaningful patterns.
How does HypeLab decide which devices to keep? We sort by frequency in training data and keep the minimum set covering 90%+ of traffic. Everything else maps to "other." The threshold is chosen empirically based on coverage analysis, not arbitrary cutoffs.
Machine learning models see features as matrices. When a categorical feature has 5,000 unique values but most appear rarely, the matrix becomes sparse and the model struggles. HypeLab's raw device model data contains over 5,000 unique strings, but our prediction model uses only 500. This 10x reduction in cardinality improves prediction accuracy for crypto advertisers running campaigns across Web3 apps like Phantom, StepN, and Axie Infinity, while keeping inference fast enough for real-time bidding. For advertisers looking to reach crypto audiences effectively, understanding how ad tech optimizations like this work reveals why some platforms deliver better ROI than others.
Why Does High Device Cardinality Hurt Ad Prediction?
Device model strings create a data sparsity problem that directly impacts ad performance. User agents report identifiers like "iPhone14,3" or "SM-G998B" or "Pixel 7 Pro," and HypeLab's training data contains over 5,000 unique device strings collected from millions of impressions across DeFi protocols, NFT marketplaces, and blockchain games.
The distribution is highly skewed. A handful of popular devices (recent iPhones, Samsung Galaxy flagships, Google Pixel phones) account for most traffic. The long tail contains thousands of devices that appear rarely - old phones, regional variants, obscure manufacturers, and corrupted device strings from unusual browser configurations.
If we used all 5,000 device models as a categorical feature, several problems emerge:
- Sparse learning: A device that appears 50 times in 200 million data points provides almost no learnable signal. The model cannot reliably estimate CTR for that device.
- Overfitting risk: Rare devices with a few accidental clicks appear to have high CTR. The model might predict high value for these devices based on noise.
- Memory and speed: High-cardinality categorical features increase model size and slow down inference, which matters for real-time bidding where latency kills revenue.
- Maintenance burden: New device models appear constantly. A model tied to specific device strings needs constant updates.
How Does HypeLab Analyze Device Coverage?
The solution to sparsity is coverage analysis, a technique that identifies the minimum set of device models covering the vast majority of traffic. This approach is common in ad tech, but HypeLab applies it specifically to Web3 traffic patterns where device distributions differ from traditional web audiences.
The analysis process follows four steps:
- Count occurrences of each device model in training data
- Sort by frequency (descending)
- Compute cumulative coverage as each device model is added
- Find the cutoff where cumulative coverage reaches the target (90%)
For HypeLab's data, this analysis reveals that the top 500 device models cover approximately 92% of all impressions. Adding more device models provides diminishing returns, with device 501 adding perhaps 0.01% coverage and device 1000 adding 0.001%.
The numbers: 500 device models cover 92% of traffic. The remaining 4,500+ device models combined cover only 8%. Many of those 4,500 appear fewer than 100 times each in 200 million data points. For advertisers running campaigns on Uniswap, Aave, or OpenSea, this means predictions are based on robust data, not statistical noise.
What Happens to Rare Device Models?
Device models outside the top 500 map to a single "other" category. This is not information loss but noise reduction. The "other" category is large enough (8% of traffic, or about 16 million data points in training) that the model can learn meaningful patterns for it.
What does the model learn about "other" devices? Essentially an average. Users on rare devices behave roughly like typical users, with some adjustment for the characteristics of users who choose obscure devices, whether more technical users, those with older hardware, or users from regions with different device availability.
This approach outperforms the alternatives:
- Keeping all 5,000: Learning noise for rare devices, creating a bloated model with slow inference
- Dropping rare devices: Excluding 8% of traffic from device-based learning entirely
The "other" bucket preserves coverage while eliminating sparsity, a technique that benefits both crypto advertisers seeking accurate predictions and Web3 publishers monetizing diverse global audiences.
Why Did HypeLab Choose 90% Coverage?
The 90% threshold is chosen empirically through experimentation, not set arbitrarily. HypeLab tested different thresholds and evaluated the impact on model accuracy for predicting click-through rates across Web3 ad inventory.
| Coverage | Device Models | Result |
|---|---|---|
| 80% | ~200 | Too aggressive. Loses signal from moderately common devices with distinct behavior patterns. |
| 90% | ~500 | Optimal balance. Captures signal from all common devices while keeping cardinality manageable. |
| 95% | ~1,500 | Diminishing returns. Additional 1,000 device models add noise without improving validation accuracy. |
| 99% | ~4,000 | Overfitting. Training accuracy improves slightly but validation accuracy degrades. |
The 90% threshold represents the optimal point for HypeLab's traffic distribution. Other blockchain advertising networks with different traffic patterns might find different optimal thresholds, but the principle remains the same: maximize coverage while minimizing sparsity. Many ad networks skip this optimization entirely, leading to bloated models that overfit on rare device data and make unreliable predictions for advertisers.
Where Else Does HypeLab Apply Cardinality Reduction?
Device model is not the only high-cardinality feature that benefits from this technique. HypeLab applies the same coverage-based reduction across multiple categorical features:
- Browser strings: Hundreds of browser/version combinations reduce to the top browsers (Chrome, Safari, Firefox, Edge) plus "other." Version numbers are usually irrelevant for ad prediction.
- OS versions: Dozens of OS versions reduce to major versions (iOS 17, iOS 16, Android 14, Android 13) plus "other." Minor version differences rarely affect clicking behavior.
- Geographic regions: Country-level geography rather than city or region, reducing cardinality while maintaining meaningful signal about user location and economic context.
In each case, the process is the same: measure coverage, find the minimum set that covers 90%+, bucket the rest. This consistency across features helps HypeLab deliver reliable predictions whether advertisers are targeting DeFi users on Arbitrum, NFT collectors on Ethereum, or gamers on Polygon.
How Is Cardinality Reduction Implemented in Production?
The cardinality reduction happens in HypeLab's feature preprocessing pipeline, before model training begins:
- Build vocabulary: From historical training data, compute frequency counts for each categorical value
- Determine cutoffs: Apply coverage analysis to find top-N values for each feature
- Create mapping: Build a lookup table mapping known values to indices and unknown values to "other"
- Apply at inference: Use the same mapping at prediction time, ensuring new/unseen values map to "other"
The vocabulary refreshes every two weeks when models are retrained, accounting for new devices entering the market and shifting popularity of existing devices. This cadence balances freshness against stability, ensuring that advertisers on the HypeLab platform benefit from up-to-date predictions without experiencing erratic behavior from constant model changes.
What Happens When New Devices Launch?
When Apple releases a new iPhone or Samsung launches a new Galaxy, the device will not be in HypeLab's vocabulary for the first few weeks (assuming the vocabulary was built before launch). During this period, the new device maps to "other."
This is acceptable for three reasons:
- New devices start rare: Even a popular new iPhone takes time to reach significant traffic share. Initially, "other" category behavior is appropriate.
- Vocabulary refreshes: Within 2 weeks (HypeLab's retraining cycle), the new device will accumulate enough traffic to potentially enter the top 500 if popular.
- Gradual transition: The model smoothly transitions from treating the device as "other" to treating it as its own category as data accumulates.
How Does Cardinality Reduction Improve Model Quality?
Cardinality reduction is not just about efficiency. It directly impacts the quality of predictions that determine which ads win HypeLab's real-time bidding auctions:
- Better generalization: By preventing the model from memorizing rare categories, HypeLab forces it to learn patterns that transfer to new data.
- More stable predictions: Predictions for rare devices come from the robust "other" category rather than noisy single-device estimates.
- Faster iteration: Smaller feature spaces mean faster training, enabling more experimentation and more frequent model updates.
- Interpretability: With 500 device categories instead of 5,000, HypeLab's team can actually analyze device-level patterns in model behavior.
The bottom line: Better predictions mean better ad matching. Advertisers get higher conversion rates, publishers get higher eCPMs, and users see more relevant ads. Feature engineering like cardinality reduction is invisible to end users but drives measurable performance improvements.
How Does Device Engineering Affect Publisher Revenue?
Publishers working with HypeLab, including Web3 apps, blockchain games, and DeFi dashboards, should understand how device information affects their ad revenue:
- Common devices get specific treatment: Traffic from popular devices (recent iPhones, common Android phones) receives device-specific predictions that account for how those users behave.
- Rare devices get reasonable defaults: Traffic from unusual devices is not penalized. It receives predictions based on the "other" category, which reflects average behavior across diverse devices.
- Device is one signal among many: Even for common devices, device model is just one of 25 features. Placement quality, user history, and category matching typically matter more than specific device model.
For publishers with globally diverse audiences, like wallet apps serving users across emerging markets, this approach ensures that traffic from less common devices still monetizes effectively rather than being undervalued due to data sparsity.
What Should Crypto Advertisers Know About Device Targeting?
For crypto advertisers running campaigns for DeFi protocols, NFT projects, or blockchain games, HypeLab's device feature engineering means:
- No over-targeting of device niches: Advertisers cannot accidentally over-bid on users of rare devices based on noisy historical data. The model treats rare devices conservatively.
- Platform-level patterns work: The model learns that iOS users behave differently than Android users, that mobile behaves differently than desktop. These broad patterns are captured even though specific device models are bucketed.
- Focus on what matters: Device model is rarely the most important factor in ad prediction. Category matching, placement quality, and user engagement signals typically dominate. Device model provides marginal additional signal.
This means advertisers can trust that their budget is allocated based on robust signals rather than statistical artifacts, whether they are promoting a new L2 chain, a play-to-earn game, or a crypto exchange.
What Is the General Principle Behind This Approach?
Feature engineering at scale requires accepting that you cannot model everything. The principle behind cardinality reduction is straightforward: capture 90% of the signal with 10% of the complexity. The long tail of rare categories adds noise, not signal. Bucketing it allows the model to focus on learnable patterns.
This principle applies beyond device models. Any high-cardinality categorical feature benefits from the same treatment: measure coverage, find the minimum set for 90%+ coverage, bucket the rest. It is one of the most impactful preprocessing decisions in applied machine learning, and it is why HypeLab's predictions remain accurate and fast even as traffic patterns evolve across the Web3 ecosystem.
Ready to Advertise on a Smarter Ad Platform?
HypeLab is the Web3 ad network that applies rigorous feature engineering to deliver accurate, efficient predictions for crypto advertisers and publishers. Cardinality reduction is just one example of how HypeLab optimizes the entire ad-serving pipeline.
- 500 device categories: Covering 90%+ of traffic with no sparsity problems
- Bi-weekly model updates: Vocabulary refreshes capture new devices as they gain popularity
- Robust predictions: Rare devices get reasonable defaults, not noisy overestimates
- Premium Web3 inventory: Reach users on apps like Phantom, StepN, Axie Infinity, and more
- Dual payment rails: Pay with crypto or credit card, no minimum budget required
Launch your first Web3 ad campaign today and see how intelligent feature engineering translates to better performance for your crypto project. Setup takes minutes, not days, and there is no minimum spend to get started.
Frequently Asked Questions
- Raw device model strings have over 5,000 unique values, but most appear rarely. A model that tries to learn from 5,000 categories will overfit to noise from rare devices. By keeping only the top 500 that cover 90%+ of traffic, we reduce sparsity while maintaining predictive signal for common devices.
- We sort device models by frequency in training data and keep the minimum set that covers 90%+ of traffic. Everything else maps to an "other" category. The threshold is chosen empirically based on coverage analysis, not arbitrary cutoffs.
- Minimally. The rare device models we exclude appear so infrequently (often less than 100 times in 200 million data points) that the model cannot learn reliable patterns from them anyway. Grouping them into "other" prevents overfitting while the "other" category itself becomes large enough to learn meaningful patterns.



