Bobbie-model [Windows ULTIMATE]
They explicitly filtered out any data containing eval benchmark examples (MMLU, GSM8K, HumanEval) using 13-gram overlap detection. This means Bobbie's benchmarks are likely not contaminated. 4. Performance Benchmarks We ran Bobbie-7B-Instruct against Llama-3-8B-Instruct and Mistral-7B-v0.3 on an RTX 4090.
In this post, we’ll strip down the architecture, analyze its training data strategy, and run benchmarks against comparable 7B models. At its core, Bobbie-Model is a 7-billion-parameter dense transformer developed by an independent research collective. Unlike models that aim to brute-force performance through massive parameter counts or MOE sparsity, Bobbie optimizes for the "sweet spot" of the compute/performance curve: running comfortably on a single 24GB GPU (RTX 3090/4090 or A10G). bobbie-model
| Stage | Dataset | Tokens | Purpose | |-------|---------|--------|---------| | 1 | RedPajama (v2) | 1.2T | Base language modeling | | 2 | SlimPajama + CodeAlpaca | 400B | Code & reasoning | | 3 | Synthetic multi-turn chat | 50B | Instruction following | They explicitly filtered out any data containing eval
| Benchmark | Bobbie-7B | Llama-3-8B | Mistral-7B | |-----------|-----------|------------|------------| | MMLU (5-shot) | 64.2 | 66.7 | 63.9 | | GSM8K (8-shot) | 52.8 | 54.9 | 50.3 | | HumanEval (pass@1) | 32.5 | 34.2 | 31.8 | | | 82.3 | 67.1 | 71.4 | | Inference tokens/sec | 98 | 72 | 88 | Unlike models that aim to brute-force performance through