Google Releases Gemini-SQL2: Gemini 3.1 Professional Textual content-to-SQL Scores 80.04% on BIRD Single-Mannequin Leaderboard

Google Analysis staff has introduced the launch of Gemini-SQL2 on X. They described this technique as a breakthrough text-to-SQL functionality powered by Gemini 3.1 Professional. Gemini-SQL2 posted 80.04% execution accuracy on the BIRD Textual content-to-SQL Leaderboard (Single Mannequin). Google’s chart locations it above its personal Gemini-SQL, the prior prime entry. The metric measures whether or not generated SQL runs and returns appropriate outcomes, not whether or not it appears to be like legitimate.

https://x.com/GoogleResearch/standing/2065475343205740911

Gemini-SQL2

Gemini-SQL2 is a text-to-SQL functionality, not a standalone basis mannequin launch. It interprets pure language questions into what Google calls ‘execution-ready SQL queries.’ The potential is constructed on Gemini 3.1 Professional.

Per the announcement on X, “information subtlety & complicated enterprise contexts make producing correct SQL from pure language notoriously onerous.” The X Put up additionally acknowledged that “improved SQL understanding can elevate pure language expertise throughout Google’s information companies.” That factors towards integration targets like BigQuery Studio, AlloyDB AI, and Cloud SQL Studio, which already ship Gemini-based SQL era. Google has not but confirmed which merchandise will obtain Gemini-SQL2.

Benchmarks

BIRD (BIg Bench for LaRge-scale Database Grounded Textual content-to-SQL Analysis) is an business commonplace for this job. It incorporates 12,751 question-SQL pairs throughout 95 databases spanning 37 skilled domains, totaling 33.4GB. The databases embrace soiled values and require exterior information grounding, in contrast to older benchmarks comparable to Spider.

BIRD measures execution accuracy (EX): the generated SQL should run and return outcomes matching the gold question. Google acknowledged this instantly. “Per the BIRD benchmark, which measures execution-verified accuracy, GeminiSQL-2’s SQL doesn’t simply look proper, it additionally runs efficiently.”

The Single Educated Mannequin Monitor restricts the preprocessing, retrieval, and agentic frameworks that ensembles use to spice up scores. It measures the mannequin’s core text-to-SQL potential. Google Cloud’s prior file on this monitor, reported November 15, 2025, was 76.13. Google benchmarks human efficiency at 92.96, leaving a 12.92-point hole from 80.04.

How the Leaderboard Stacks Up

Google’s chart, on X submit, exhibits Gemini-SQL2 forward of eight named rivals, together with a number of unlabeled factors. Solely 80.04% is acknowledged as textual content. The values under are learn from the chart’s place and are approximate; dates replicate every level’s horizontal placement.

System	Group	BIRD Execution Accuracy (Single Mannequin)	Chart Date
Gemini-SQL2	Google	80.04% (acknowledged)	Jun 2026
Gemini-SQL	Google	~77.2%	Mar 2026
Q-SQL	AWS	~76.5%	Dec 2025
Databricks RLVR 32B	Databricks	~75.7%	Jul 2025
SiriusAI-Text2SQL-32B-v2	Tencent	~75.0%	Dec 2025
Arctic-Text2SQL-R1-32B	Snowflake	~73.9%	Jun 2025
GPT-5.5-xhigh	OpenAI	~72.5%	Apr 2026
SQLWeaver-32B	Alibaba	~71.7%	Could 2026
Claude Opus 4.6	Anthropic	~70.1%	Feb 2026

Two patterns are seen. Google now holds the highest two named positions, Gemini-SQL2 and Gemini-SQL. A number of specialised 32B SQL fashions additionally sit above some basic frontier fashions on this chart.

Use Circumstances with Examples

Self-service analytics: A income supervisor asks for month-to-month recurring income by area, for accounts that churned inside 90 days of improve. This wants joins, window logic, and date arithmetic. Execution-verified era catches SQL that runs however returns improper rows.
Knowledge engineering drafts: Devs can draft BigQuery transformations from English, then overview somewhat than write from scratch. Google’s November 2025 work recognized schema understanding because the onerous half. Larger BIRD scores replicate higher dealing with of ambiguous columns and messy values.
Embedded “ask your information” options: SaaS groups including natural-language question interfaces nonetheless want human overview at 80% accuracy. One in 5 queries may be improper. The rating units expectations, not a removing of overview.

Gemini-SQL2 Launch: Group Reception Dashboard

Verified public engagement on Google Analysis’s announcement posts • first ~3 hours • Jun 12, 2026

BIRD Single-Mannequin Leaderboard • Execution Accuracy

Platform Engagement Breakdown

X / Twitter (most important submit)

Views144.4K

Likes2,800

Reposts267

Bookmarks1,300

Replies64

Engagement fee3.1%

LinkedIn (most important submit)

Reactions349+

Feedback12

Reposts27

Reception sign

Bookmark-plus-like to answer ratio on X. A excessive save fee with few replies sometimes alerts approval over controversy. Remark-level sentiment not but measurable; replies nonetheless loading at seize time.

Implementation Sample

Google has not printed a Gemini-SQL2 mannequin string or API but. The schema-grounded sample under works with present Gemini fashions by way of the google-genai SDK. Swap the mannequin string when Gemini-SQL2 ships.

from google import genai

shopper = genai.Consumer()  # reads GEMINI_API_KEY from atmosphere

schema = """
CREATE TABLE orders (
  order_id INTEGER, buyer TEXT, area TEXT,
  quantity REAL, standing TEXT, created_at DATE
);
"""

query = "Complete paid order quantity by area in 2026, highest first."

immediate = f"""You're a text-to-SQL system.
Schema:{schema}
Query: {query}
Return just one executable SQLite question. No rationalization."""

resp = shopper.fashions.generate_content(
    mannequin="gemini-3.1-pro-preview",  # the bottom mannequin named within the announcement; swap when a Gemini-SQL2 ID ships
    contents=immediate,
)
print(resp.textual content)

Manufacturing techniques ought to add execution verification. Run the returned SQL, catch errors, and retry with the error message appended. That loop mirrors what BIRD’s execution accuracy metric rewards.

Key Takeaways

Google studies Gemini-SQL2 at 80.04% execution accuracy on the BIRD single-model leaderboard.
The potential is powered by Gemini 3.1 Professional and targets “execution-ready SQL,” not simply believable SQL.
On Google’s chart, Gemini-SQL2 and Gemini-SQL maintain the highest two named positions; human efficiency is 92.96.
No API, mannequin card, technical report, or product integration particulars have been printed but.

MARKTECHPOST Visible Explainer

Textual content-to-SQL Playground

The duty Gemini-SQL2 simply scored 80.04% on (BIRD benchmark, single mannequin). Choose a query, examine the generated SQL, then run it on a reside in-browser dataset.

1 • Ask in pure language

2 • Generated SQL

Choose a query above to generate SQL.

CREATE TABLE orders (
  order_id INTEGER, buyer TEXT, area TEXT,
  quantity REAL, standing TEXT, created_at DATE
);  -- 12 pattern rows loaded on this browser

Execution accuracy means the SQL should run AND return the appropriate rows.

Take a look at the Particulars right here. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 150k+ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you may be a part of us on telegram as properly.

Have to companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Join with us