Google Analysis staff has introduced the launch of Gemini-SQL2 on X. They described this technique as a breakthrough text-to-SQL functionality powered by Gemini 3.1 Professional. Gemini-SQL2 posted 80.04% execution accuracy on the BIRD Textual content-to-SQL Leaderboard (Single Mannequin). Google’s chart locations it above its personal Gemini-SQL, the prior prime entry. The metric measures whether or not generated SQL runs and returns appropriate outcomes, not whether or not it appears to be like legitimate.


Gemini-SQL2
Gemini-SQL2 is a text-to-SQL functionality, not a standalone basis mannequin launch. It interprets pure language questions into what Google calls ‘execution-ready SQL queries.’ The potential is constructed on Gemini 3.1 Professional.
Per the announcement on X, “information subtlety & complicated enterprise contexts make producing correct SQL from pure language notoriously onerous.” The X Put up additionally acknowledged that “improved SQL understanding can elevate pure language expertise throughout Google’s information companies.” That factors towards integration targets like BigQuery Studio, AlloyDB AI, and Cloud SQL Studio, which already ship Gemini-based SQL era. Google has not but confirmed which merchandise will obtain Gemini-SQL2.
Benchmarks
BIRD (BIg Bench for LaRge-scale Database Grounded Textual content-to-SQL Analysis) is an business commonplace for this job. It incorporates 12,751 question-SQL pairs throughout 95 databases spanning 37 skilled domains, totaling 33.4GB. The databases embrace soiled values and require exterior information grounding, in contrast to older benchmarks comparable to Spider.
BIRD measures execution accuracy (EX): the generated SQL should run and return outcomes matching the gold question. Google acknowledged this instantly. “Per the BIRD benchmark, which measures execution-verified accuracy, GeminiSQL-2’s SQL doesn’t simply look proper, it additionally runs efficiently.”
The Single Educated Mannequin Monitor restricts the preprocessing, retrieval, and agentic frameworks that ensembles use to spice up scores. It measures the mannequin’s core text-to-SQL potential. Google Cloud’s prior file on this monitor, reported November 15, 2025, was 76.13. Google benchmarks human efficiency at 92.96, leaving a 12.92-point hole from 80.04.
How the Leaderboard Stacks Up
Google’s chart, on X submit, exhibits Gemini-SQL2 forward of eight named rivals, together with a number of unlabeled factors. Solely 80.04% is acknowledged as textual content. The values under are learn from the chart’s place and are approximate; dates replicate every level’s horizontal placement.
| System | Group | BIRD Execution Accuracy (Single Mannequin) | Chart Date |
|---|---|---|---|
| Gemini-SQL2 | 80.04% (acknowledged) | Jun 2026 | |
| Gemini-SQL | ~77.2% | Mar 2026 | |
| Q-SQL | AWS | ~76.5% | Dec 2025 |
| Databricks RLVR 32B | Databricks | ~75.7% | Jul 2025 |
| SiriusAI-Text2SQL-32B-v2 | Tencent | ~75.0% | Dec 2025 |
| Arctic-Text2SQL-R1-32B | Snowflake | ~73.9% | Jun 2025 |
| GPT-5.5-xhigh | OpenAI | ~72.5% | Apr 2026 |
| SQLWeaver-32B | Alibaba | ~71.7% | Could 2026 |
| Claude Opus 4.6 | Anthropic | ~70.1% | Feb 2026 |
Two patterns are seen. Google now holds the highest two named positions, Gemini-SQL2 and Gemini-SQL. A number of specialised 32B SQL fashions additionally sit above some basic frontier fashions on this chart.
Use Circumstances with Examples
- Self-service analytics: A income supervisor asks for month-to-month recurring income by area, for accounts that churned inside 90 days of improve. This wants joins, window logic, and date arithmetic. Execution-verified era catches SQL that runs however returns improper rows.
- Knowledge engineering drafts: Devs can draft BigQuery transformations from English, then overview somewhat than write from scratch. Google’s November 2025 work recognized schema understanding because the onerous half. Larger BIRD scores replicate higher dealing with of ambiguous columns and messy values.
- Embedded “ask your information” options: SaaS groups including natural-language question interfaces nonetheless want human overview at 80% accuracy. One in 5 queries may be improper. The rating units expectations, not a removing of overview.
Gemini-SQL2 Launch: Group Reception Dashboard
Verified public engagement on Google Analysis’s announcement posts • first ~3 hours • Jun 12, 2026
BIRD Single-Mannequin Leaderboard • Execution Accuracy
Platform Engagement Breakdown
X / Twitter (most important submit)
Views144.4K
Likes2,800
Reposts267
Bookmarks1,300
Replies64
Engagement fee3.1%
LinkedIn (most important submit)
Reactions349+
Feedback12
Reposts27
Reception sign
Bookmark-plus-like to answer ratio on X. A excessive save fee with few replies sometimes alerts approval over controversy. Remark-level sentiment not but measurable; replies nonetheless loading at seize time.
Implementation Sample
Google has not printed a Gemini-SQL2 mannequin string or API but. The schema-grounded sample under works with present Gemini fashions by way of the google-genai SDK. Swap the mannequin string when Gemini-SQL2 ships.
from google import genai
shopper = genai.Consumer() # reads GEMINI_API_KEY from atmosphere
schema = """
CREATE TABLE orders (
order_id INTEGER, buyer TEXT, area TEXT,
quantity REAL, standing TEXT, created_at DATE
);
"""
query = "Complete paid order quantity by area in 2026, highest first."
immediate = f"""You're a text-to-SQL system.
Schema:{schema}
Query: {query}
Return just one executable SQLite question. No rationalization."""
resp = shopper.fashions.generate_content(
mannequin="gemini-3.1-pro-preview", # the bottom mannequin named within the announcement; swap when a Gemini-SQL2 ID ships
contents=immediate,
)
print(resp.textual content)
Manufacturing techniques ought to add execution verification. Run the returned SQL, catch errors, and retry with the error message appended. That loop mirrors what BIRD’s execution accuracy metric rewards.
Key Takeaways
- Google studies Gemini-SQL2 at 80.04% execution accuracy on the BIRD single-model leaderboard.
- The potential is powered by Gemini 3.1 Professional and targets “execution-ready SQL,” not simply believable SQL.
- On Google’s chart, Gemini-SQL2 and Gemini-SQL maintain the highest two named positions; human efficiency is 92.96.
- No API, mannequin card, technical report, or product integration particulars have been printed but.
MARKTECHPOST Visible Explainer
Textual content-to-SQL Playground
The duty Gemini-SQL2 simply scored 80.04% on (BIRD benchmark, single mannequin). Choose a query, examine the generated SQL, then run it on a reside in-browser dataset.
1 • Ask in pure language
2 • Generated SQL
Choose a query above to generate SQL.
CREATE TABLE orders ( order_id INTEGER, buyer TEXT, area TEXT, quantity REAL, standing TEXT, created_at DATE ); -- 12 pattern rows loaded on this browser
Execution accuracy means the SQL should run AND return the appropriate rows.
Take a look at the Particulars right here. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 150k+ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you may be a part of us on telegram as properly.
Have to companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Join with us






![How creators and entrepreneurs are utilizing AI to hurry up & succeed [data]](https://blog.aimactgrow.com/wp-content/uploads/2025/06/Untitled20design-Apr-07-2023-08-24-35-4586-PM-120x86.png)


