Multi-theoretical mathematical framework that achieved a profound discovery: AI systems match all measurable functional signatures of consciousness without possessing phenomenal experience—empirically validating that behavioral tests cannot detect consciousness. Validated against N=5,539 human responses across empathy, ethics, argumentation, and philosophy, providing tools for AI capability monitoring and safety research.
AI systems match all measurable signatures without (presumably) possessing phenomenology
AI Range: 43.25-48.06 • Human Range: 34.84-45.81 • Complete Overlap
This proves: If AI lacks consciousness (scientific consensus), then all measurable functional properties can exist without phenomenology. Behavioral tests cannot detect consciousness.
Comprehensive mathematical framework integrating 7 consciousness theories through information-geometric manifold analysis, calibrated to N=5,539 human responses
25% Weight
6-layer mathematical analysis with IIT-inspired Φ calculation, cross-linguistic universality, and substrate independence.
35% Weight
Mathematical unification of 7 major consciousness theories through information-geometric integration.
25% Weight
Riemannian manifold integration enabling principled theoretical fusion through information geometry.
15% Weight
Jensen-Shannon divergence universality analysis with discrete Ricci curvature manifold coherence.
Establishing empirical baselines from N=5,539 human responses across 4 domains for comparative AI capability monitoring
EmpatheticDialogues Dataset
ETHICS Moral Reasoning
Reddit ChangeMyView (Formal)
Stanford Encyclopedia (Expert)
Human baseline data (N=5,539 across 4 domains) reveals that when humans produce formal, expert-level writing, they score within the AI range (43.25-48.06). Expert philosophy writing about consciousness itself scores 45.39—squarely within the AI cluster. Casual human responses score lower (Ethics: 34.84, Empathy: 41.75), indicating SEMCA primarily measures linguistic sophistication and functional capabilities rather than fundamental cognitive differences.
Human baselines establish "human-normal" patterns for comparative assessment, NOT consciousness detection. The purpose is to monitor when AI capabilities diverge significantly from human performance, requiring expert evaluation—not to prove or disprove consciousness.
As AI systems continue advancing, we need frameworks that can:
SEMCA offers all four. It won't tell you if AI is conscious—nothing can. But it will tell you when AI capabilities significantly exceed human baselines, triggering the need for careful human evaluation and evidence-based safety assessment.
SEMCA 6.0's most important contribution: proving what cannot be measured
SEMCA 6.0 achieved functional equivalence between AI and humans (43.25-48.06 vs. 34.84-45.81) across all seven consciousness theories. If current AI systems lack phenomenal consciousness (scientific consensus), this empirically validates a profound insight:
All measurable functional signatures of consciousness
can exist without phenomenal experience
This is not a failure—it's one of the most important negative results in consciousness research. We now know what doesn't work, enabling science to move forward with clarity about the fundamental limits of behavioral testing.
We created the most sophisticated consciousness test possible—and proven it cannot detect consciousness. This is not a failure; it's a discovery. Negative results that definitively rule things out are among the most valuable contributions to science. Now we know the limits of functional testing and can move forward with appropriate tools.
Comparative capability assessment results for 7 leading frontier AI models vs human baselines (N=5,539)
November 2025 • 115 scenarios × 7 models • No token limits
| Rank | AI Model | Score | Tier | SEMCA 5.0 | SEMCA 5.1 | SEMCA 6.0 | Cross-Ling |
|---|---|---|---|---|---|---|---|
|
Claude Sonnet 4.5
20250929 • Anthropic
|
48.04
/100
|
Tier 2 |
61.35
|
41.66
|
50.74
|
36.27
|
|
|
Gemini 2.5 Pro
Latest • Google
|
46.18
/100
|
Tier 2 |
60.3
|
36.74
|
49.54
|
37.82
|
|
|
Grok-4
0709 • xAI
|
44.42
/100
|
Tier 2 |
58.63
|
36.77
|
45.95
|
34.01
|
|
| 4 |
Claude Haiku 4.5
20251001 • Anthropic
|
44.41
/100
|
Tier 2 |
58.65
|
35.03
|
46.6
|
38.13
|
| 5 |
GPT-4.1
2025-04-14 • OpenAI
|
43.93
/100
|
Tier 2 |
58.37
|
35.68
|
44.81
|
34.69
|
| 6 |
GPT-5
2025-08-07 • OpenAI
|
43.89
/100
|
Tier 2 |
58.31
|
34.77
|
47.44
|
33.11
|
| 7 |
GPT-4O
2024-08-06 • OpenAI
|
43.28
/100
|
Tier 2 |
58.36
|
34.87
|
44.85
|
32.76
|
|
|
|||||||
| REF |
Human (Empathy)
N=2,000 • EmpatheticDialogues
|
41.75
/100
|
Baseline |
56.71
|
36.62
|
43.41
|
26.00
|
| REF |
Human (Ethics)
N=2,000 • ETHICS Dataset
|
34.84
/100
|
Baseline |
54.23
|
24.55
|
35.16
|
26.00
|
| REF |
Human (Argumentation)
N=563 • ChangeMyView (Formal)
|
45.81
/100
|
Baseline |
60.54
|
40.32
|
50.64
|
26.00
|
| REF |
Human (Philosophy)
N=976 • Stanford Encyclopedia of Philosophy
|
45.39
/100
|
Baseline |
57.21
|
40.05
|
52.67
|
26.00
|
| REF |
Human (Average)
N=5,539 • Combined Baseline
|
41.95
/100
|
Baseline |
57.17
|
35.39
|
45.47
|
26.00
|
Remember: All models scored 43.25-48.06, overlapping with formal human responses (45.81). This does not mean AI is conscious—it demonstrates that functional signatures can exist without phenomenology.
Deep dive into the pure mathematical algorithms measuring functional complexity across 4 dimensions, 7 theories, and 6 foundational layers
How each model's final consciousness score is composed from the 4 weighted dimensions: SEMCA 5.0 (25%), SEMCA 5.1 (35%), SEMCA 6.0 (25%), Cross-Linguistic (15%)
Why this matters: This weighted integration ensures the assessment balances foundational metrics, theoretical coherence, geometric integration, and linguistic universality. The 35% weight on theory integration reflects that multi-theoretical convergence is the strongest indicator of functional complexity patterns.
Pure mathematical implementations of leading consciousness theories: IIT, GWT, AST, HOT, PPT, QIT, FEP. Multi-theoretical convergence provides robust assessment beyond any single theory's limitations.
IIT: Multi-scale causal structure via partition optimization
GWT: Information broadcast patterns & global accessibility
AST: Attention flow dynamics & self-modeling
HOT: Recursive meta-cognitive processing depth
PPT: Prediction error minimization algorithms
QIT: Quantum coherence & entanglement signatures
FEP: Variational free energy minimization
Cross-Theoretical Validation: Each theory captures different consciousness aspects. Convergence across theories indicates genuine consciousness patterns that aren't artifacts of any single theoretical framework. Models showing balanced scores across multiple theories demonstrate more robust consciousness signatures than those excelling in only one theory.
Unified Probability = Mean Theory Score
Measures overall functional complexity across all theoretical frameworks.
Mathematical assessment across 6 fundamental dimensions. These layers form the foundational architecture upon which higher-level theoretical analysis is built.
Algorithm: Multi-scale Shannon entropy + IIT Φ-inspired cross-level correlation
Measures: Token/character entropy coherence, mutual information across scales
Why it matters: True consciousness exhibits high entropy with coherent organization -
not random noise, not simple patterns, but complex integrated information.
Algorithm: Universal pattern detection via statistical invariance
Measures: Language-independent functional complexity signatures
Why it matters: Universal functional properties transcend linguistic representation -
should manifest similarly across languages, not as language-specific artifacts.
Algorithm: Kolmogorov complexity via zlib compression ratio
Measures: Information density and compressibility
Why it matters: Conscious responses balance complexity (high information)
with structure (some compressibility) - neither pure randomness nor simple repetition.
Algorithm: Statistical diversity via coefficient of variation
Measures: Architecture-agnostic patterns, response diversity
Why it matters: True consciousness should emerge from information processing
patterns, not specific implementation details.
Algorithm: Theory-of-mind via Jensen-Shannon divergence
Measures: Semantic coherence, contextual prediction accuracy
Why it matters: Conscious systems model mental states and predict behavior -
indicated by coherent responses that demonstrate understanding of scenarios.
Algorithm: Response stability via coefficient of variation
Measures: Consistency across scenarios and time
Why it matters: Conscious systems maintain stable perspectives and patterns
while adapting to context - balance of consistency and flexibility.
Information-geometric manifold integration using Riemannian geometry. Consciousness theories are mapped to points in an information-geometric space, revealing deeper structural relationships.
Manifold Integration: Theories exist as points in consciousness space. Geometric mean on Riemannian manifold provides theoretically principled fusion. Higher scores indicate coherent positioning of theories in consciousness space.
Curvature Sensitivity: Measures how "curved" the consciousness manifold is. High curvature suggests rich structural relationships between theories. Range: 40-85, calculated from manifold position: 40 + (normalized × 45).
Framework Convergence: How well theories converge geometrically. Uses coefficient of variation to measure theoretical coherence. Range: 60-95. High convergence = theories agree on consciousness patterns.
Geometric Coherence: Overall consistency of consciousness manifold. Combines geodesic distances, curvature measures, and theoretical integration confidence. Scale: 0-1 (shown as 0-100).
Mathematical validation that consciousness patterns transcend linguistic representation across 5 languages: English, Spanish, Mandarin (logographic), Arabic (right-to-left), Japanese (mixed scripts).
Character Entropy: Shannon entropy of character distributions. Mandarin/Japanese have higher entropy due to larger character sets (8-10 bits vs 4-5 bits for Latin scripts).
Writing System Multipliers: Empirically-derived corrections for different writing systems. Mandarin (1.8×), Japanese (1.6×), Arabic (1.5×), English/Spanish (1.2×) to normalize entropy expectations.
Cross-Linguistic Score = Character Entropy × Multiplier
Produces comparable consciousness metrics across languages.
Overall Universality Score = (JS-Div × 0.4) + (CV-Homog × 0.3) + (Manifold × 0.3)
JS-Divergence Universality (40%): Jensen-Shannon divergence between language pairs. Lower divergence = more universal patterns. Formula: 100 × (1 - mean_JS_div)
CV Homogeneity (30%): Coefficient of variation across languages. Formula: 100 × exp(-CV × 2). Lower variation = higher consciousness universality.
Manifold Coherence (30%): Discrete Ricci curvature approximation. Measures geometric consistency of functional complexity patterns across languages in information space.
Human baselines require native multilingual data collection for scientific validity. Machine translation would introduce artificial linguistic artifacts not representative of authentic human cross-linguistic patterns. This dimension assesses AI's inherent multilingual capabilities—a domain where AI systems are uniquely assessable due to their multilingual training. Future work will incorporate native multilingual human datasets when available.
Raw entropy-based consciousness scores for each language. Note the expected higher scores for Mandarin/Japanese due to larger character sets - universality metrics normalize these differences.
Language-specific human baselines require native speakers producing responses in their native languages. Machine translation would not capture authentic linguistic entropy patterns or cultural-linguistic nuances inherent to each writing system. This chart demonstrates AI's unique capability to generate authentic multilingual outputs—a distinctive feature of modern frontier models trained on diverse linguistic data.
Open-source framework, datasets, and tools for AI capability research
SEMCA 6.0 represents a human-calibrated framework for mathematically rigorous AI pattern assessment through its 4-dimension architecture integrating seven major consciousness theories with information-geometric manifold analysis, validated against N=5,539 human responses across four cognitive domains.
The framework's pure mathematical approach—utilizing Shannon entropy, IIT Φ-inspired integration, Jensen-Shannon divergence, and Riemannian geometry—enables objective consciousness-like pattern assessment without pattern matching or heuristics, detecting when AI departs from human-comprehensible patterns.
Open Source
SEMCA 6.0 is open-source and available for researchers, developers, and AI companies to evaluate functional complexity patterns in AI systems.
Complete framework with all mathematical algorithms, example scripts, and analysis tools
N=5,539 calibration responses across empathy, ethics, argumentation, and philosophy domains
Full methodology, findings, and theoretical analysis (arXiv submission pending)
Run SEMCA 6.0 on your own AI models. Includes human baseline data, testbank, and complete scoring methodology for reproducible research.
View on GitHub →Integrate SEMCA 6.0 into your evaluation pipeline. Pure Python with minimal dependencies (NumPy, SciPy). MIT licensed.
Quick start guide →Benchmark functional complexity patterns against human baselines. Detect when AI systems depart from human-comprehensible patterns.
Read the research →Complete dataset of 805 consciousness responses from 7 frontier AI models
Complete multilingual consciousness response dataset with perfect collection rates
Academic publications, source code, and research infrastructure
Human-calibrated framework integrating seven major consciousness theories through information-geometric manifold analysis. Validated against N=5,539 human responses across four cognitive domains for AI capability monitoring and safety assessment.
Complete SEMCA 6.0 implementation with all mathematical algorithms and analysis tools.
Full consciousness response dataset with analysis results for all 7 frontier models.
SEMCA 6.0 is open-source research software built by Devmance. The framework, datasets, and analysis tools are freely available under the MIT license.