Technical Specifications

Parameters13 Billion
ArchitectureDecoder-Only Transformer
Layers40
Hidden Size5,120
Attention Heads40
Context Window8K tokens
Vocabulary32K BPE tokens
Training Tokens2.5 Trillion

Performance

Time to First Token<200ms*
Throughput~90 tokens/sec
Hardware RequiredRTX 4060 Ti / 3060 (INT8)
MMLU Benchmark58.2%
HumanEval48.5%
GSM8K Math51.7%
TruthfulQA45.3%
Code Generation52.8%

*Performance metrics measured on small prompts (<2K tokens). Larger contexts increase latency proportionally.

Training & Architecture

Training Infrastructure

Trained on a cluster of 512 NVIDIA A100 GPUs over 6 weeks. Mixed-precision training with BF16 and FP32 accumulation. Global batch size of 4M tokens optimized for efficiency.

Data Composition

2.5 trillion tokens from diverse sources: web text (45%), books (15%), academic papers (12%), code repositories (18%), and curated instruction data (10%). All data filtered for quality and deduplicated.

Novel Architecture

Grouped Query Attention (GQA) with 8 KV heads for efficient inference. Rotary Position Embeddings (RoPE) for extended context. SwiGLU activation and RMSNorm for stability. Optimized for consumer GPUs with INT8 quantization.

Optimization

AdamW optimizer with β₁=0.9, β₂=0.95. Cosine learning rate schedule from 3e-4 to 3e-5. Weight decay 0.1. Gradient clipping at 1.0. Flash Attention 2 for training efficiency.

Zero Alignment Tax

Unlike traditional models, NL 1.0 has minimal post-training alignment. No extensive RLHF or constitutional AI training. No heavy refusal training. Focus on raw capability with lightweight instruction tuning for coherence.

Inference Optimization

Optimized for RTX 4060 Ti (16GB) and RTX 3060 (12GB) with INT8 quantization. Flash Attention 2 for speed. KV cache optimization for memory efficiency. Runs on mid-range consumer GPUs without CPU offloading.

Core Capabilities

Designed for real-world tasks without artificial limitations

Featured

Language Understanding

  • 8K token context window with full attention
  • Multi-turn dialogue with conversation history
  • Intent classification and entity extraction
  • Sentiment analysis across 15 languages

Code & Development

  • Code generation in 40+ programming languages
  • Real-time debugging and error correction
  • API documentation and technical writing

Mathematical Reasoning

  • Step-by-step problem solving
  • Statistical analysis
  • Algorithm optimization

Content Generation

  • Long-form content with consistent tone
  • Technical documentation and whitepapers
  • Structured data transformation (JSON, XML, CSV)

Analysis & Research

  • Document analysis
  • Scientific paper summarization
  • Hypothesis generation

Multilingual

  • Translation across 15 language pairs
  • Cross-lingual information retrieval
  • Cultural context adaptation

What Makes NL 1.0 Different

01

Zero Content Filtering

No refusal training. No hardcoded blocklists. No moral grandstanding. The model responds to all prompts based on capability, not arbitrary restrictions. You decide what's appropriate for your use case.

02

Efficient 8K Context

Full attention over 8K tokens without sliding windows or truncation tricks. Validated on needle-in-haystack benchmarks with >95% retrieval accuracy. Optimized for speed and memory efficiency on consumer hardware with INT8 quantization.

03

Reproducible Outputs

Deterministic generation with temperature=0. Seed-based randomness for controlled variation. No hidden prompt augmentation or behind-the-scenes modifications. What you request is exactly what you get.

04

Runs on Your Hardware

Designed for RTX 4060 Ti (16GB) and RTX 3060 (12GB) with INT8 quantization. Complete model weights available for download. True ownership - runs entirely on your machine, no cloud dependency. No data leaves your system.