71North 71North is the partnership studio for the NIU office of innovation

Benchmarks Resource Grants

Benchmarks Resource Grants

Introducing the Benchmarks Resource Grant Program

What is Kaggle Benchmarks?

Kaggle Benchmarks is a standardized framework and platform designed for evaluating Large Language Models (LLMs). It provides researchers with a robust environment to independently define, run, and host high-quality evaluations, seamlessly turning them into public leaderboards that track state-of-the-art AI progress with full reproducibility and transparency.

We have already collaborated with leading labs like Google DeepMind, OpenAI, IBM Research, and Meta to publish their benchmarks. Now, we are opening up heavy-duty support to the broader research community.

The Benchmarks Grant Program

This new resource grant program is designed specifically for researchers building novel benchmarks who need additional compute power and infrastructure support.

If you are selected for a Benchmarks Resource Grant, you will receive:

  • High compute quota & model access: We provide an increased compute quota and access to leading models for your evaluations, including OpenAI, Google, Anthropic, Grok, Qwen, and DeepSeek.\
  • Managed infrastructure: We handle the engineering burden of keeping the leaderboard up to date. You publish your benchmark, and we will continuously evaluate and update the leaderboard with the latest models as they are released.
  • Extensibility to more models: We offer support for additional third-party (3P) labs to bring their own models to be evaluated on your benchmarks.
  • Implementation support: We offer dedicated assistance from our product and engineering teams to help implement your benchmark and expand our platform capabilities to support your evaluation mechanisms.
  • Expansive community reach: We offer marketing support to help promote the launch of your benchmark and gain exposure to the Kaggle community of over 30 million users.

👉 Apply Here: Benchmarks Resource Grant Application Form

The Power of the Kaggle Benchmarks SDK

Our Python-based open-source SDK (kaggle-benchmarks) provides all the structured primitives needed to define state-of-the-art LLM benchmark tasks locally or within Kaggle Notebooks. Key capabilities include:

  • Unified LLM Interaction: A single API (llm.prompt()) for interacting with multiple model providers, supporting structured output schemas (e.g., Pydantic), multi-agent scenarios, and seamless conversation history management.
  • Granular Task Definition: Easily build evaluation logic with the @kbench.task decorator. Return types directly inform leaderboard scoring (e.g., boolean pass/fail, numerical scoring, or tuples for complex counting formats).
  • Advanced Assertion System: Leverage deterministic pattern matching (regex, equality) alongside built-in “LLM-as-Judge” evaluators (assess_response_with_judge) to evaluate open-ended and natural language responses dynamically.
  • High-Performance Dataset Evaluation: Automatically execute batch evaluations across Pandas DataFrames with built-in parallel processing. Execute tasks over multiple models simultaneously while auto-aggregating accuracy metrics and standard deviations.
  • Powerful Multimodal & Extensibility Tools: Full support for multimodal inputs, native LLM tool-calling (function integration), and secure execution of LLM-generated code via protected sandbox environments (Docker).

Looking to Host a Competition Instead? If your team is looking to engage a global community of practitioners to work on a specific research problem rather than hosting a standalone leaderboard, you can apply for our ongoing Competitions Research Grants Program.

You can apply for a competition grant in one of two formats:

Hackathon

  • Best if you have: A clearly scoped problem you need built/solved and a specific criteria for evaluation.
  • Specifically for benchmark hackathons: A clearly scoped definition of the intelligence capability you want to measure and a raw or partially-curated dataset as the benchmark foundation (Optional).
  • Previous examples include: Measuring progress toward AGI - Cognitive abilities, Google DeepMind - Vibe Code with Gemini 3 Pro in AI Studio.

Prediction Competition

  • Best if you have: A clearly scoped prediction problem, a non-public dataset you are willing to publish, and ground truth labels for evaluating submitted models.
  • Previous examples include: Recod.ai/LUC - Scientific Image Forgery Detection, Stanford RNA 3D Folding, BirdCLEF+ 2026.

👉 Apply here: Competition Grants Application Form

Have questions? Explore Kaggle Benchmarks at www.kaggle.com/benchmarks, and visit https://www.kaggle.com/research-grants for more information on the Kaggle Research Grants programs.

Happy Kaggling!