Settings

Daniel Noumon's profile picture
 ____              _      _   _   _                                
|  _ \  __ _ _ __ (_) ___| | | \ | | ___  _   _ _ __ ___   ___  _ __ 
| | | |/ _` | '_ \| |/ _ \ | |  \| |/ _ \| | | | '_ ` _ \ / _ \| '_ \
| |_| | (_| | | | | |  __/ | | |\  | (_) | |_| | | | | | | (_) | | | |
|____/ \__,_|_| |_|_|\___|_| |_| \_|\___/ \__,_|_| |_| |_|\___/|_| |_|
              

Daniel Noumon

Building intelligent systems with data & AI


Featured

What I'm focused on right now

Transforming business processes through AI Engineering as a consultant


Projects at Data Science Lab


Education


Blog

Deep-dives and learnings


Hobby projects

My recent creations


Benchmarks I Find Interesting

Some AI benchmarks I keep an eye on

Agentic Search Capabilities

Terminal-Bench

Benchmark for terminal agents. Tests how well AI systems can operate, navigate, and solve real tasks in terminal environments, a proxy for practical coding autonomy.

Agentic Search Capabilities

BrowseComp-Plus

Fair and disentangled evaluation of deep-research agents. Measures how well AI can browse, synthesise, and reason over web content to answer complex research questions.

Agentic Search Capabilities

DeepSearchQA

Google's benchmark evaluating comprehensiveness for deep research agents. Tests whether AI can find and assemble thorough, multi-source answers to complex questions.

Retrieval & Embeddings

MTEB

Massive Text Embedding Benchmark. The standard leaderboard for comparing embedding models across retrieval, classification, clustering, and more. Directly relevant to my fine-tuning work.

Retrieval & Embeddings

BeIR

Benchmarking Information Retrieval. Cross-domain evaluation of retrieval models on diverse IR tasks, the gold standard for measuring how well embeddings generalise across domains.

Document Parsing

ParseBench

Document parsing benchmark for AI agents. Evaluates how well parsers handle tables, charts, content faithfulness, and visual grounding, critical for any RAG pipeline.

Reasoning

ARC-AGI

The closest thing to an AGI test. Challenges AI systems to adapt on the fly to novel tasks they've never seen before, measuring general fluid intelligence rather than memorised patterns.

Composite

AA Intelligence Index

Composite benchmark aggregating ten challenging evaluations to provide a holistic measure of AI capabilities across mathematics, science, coding, and reasoning.



Referrals

What others say

"Placeholder referral text. Replace with an actual quote."

— Name, Title at Company

"Placeholder referral text. Replace with an actual quote."

— Name, Title at Company


Want to get in touch? Contact me here