• About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us
AimactGrow
  • Home
  • Technology
  • AI
  • SEO
  • Coding
  • Gaming
  • Cybersecurity
  • Digital marketing
No Result
View All Result
  • Home
  • Technology
  • AI
  • SEO
  • Coding
  • Gaming
  • Cybersecurity
  • Digital marketing
No Result
View All Result
AimactGrow
No Result
View All Result

A Coding Implementation to Compress and Benchmark Instruction-Tuned LLMs with FP8, GPTQ, and SmoothQuant Quantization utilizing llmcompressor

Admin by Admin
May 17, 2026
Home AI
Share on FacebookShare on Twitter


import subprocess, sys
def pip(*pkgs):
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", *pkgs])
pip("llmcompressor", "compressed-tensors",
   "transformers>=4.45", "speed up", "datasets")
import os, gc, time, json, math
from pathlib import Path
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
assert torch.cuda.is_available(), 
   "Allow a GPU: Runtime > Change runtime sort > T4 GPU"
print("GPU:", torch.cuda.get_device_name(0),
     "| CUDA:", torch.model.cuda,
     "| torch:", torch.__version__)
MODEL_ID = "Qwen/Qwen2.5-0.5B-Instruct"
WORKDIR = Path("/content material/quant_lab"); WORKDIR.mkdir(exist_ok=True)
os.chdir(WORKDIR)
def free_mem():
   gc.gather(); torch.cuda.empty_cache()
def dir_size_gb(path):
   whole = 0
   for root, _, recordsdata in os.stroll(path):
       for f in recordsdata:
           whole += os.path.getsize(os.path.be part of(root, f))
   return whole / 1e9
def time_generation(mannequin, tok, immediate, max_new_tokens=64):
   """Grasping decode; studies latency & tokens/sec after a short warmup."""
   inputs = tok(immediate, return_tensors="pt").to(mannequin.gadget)
   _ = mannequin.generate(**inputs, max_new_tokens=4, do_sample=False)
   torch.cuda.synchronize()
   t0 = time.time()
   out = mannequin.generate(**inputs, max_new_tokens=max_new_tokens,
                        do_sample=False, pad_token_id=tok.eos_token_id)
   torch.cuda.synchronize()
   dt = time.time() - t0
   new_ids = out[0][inputs["input_ids"].form[1]:]
   return tok.decode(new_ids, skip_special_tokens=True), dt, max_new_tokens/dt
@torch.no_grad()
def wikitext_ppl(mannequin, tok, seq_len=512, max_chunks=20, stride=512):
   """Mild WikiText-2 perplexity probe (quick, indicative)."""
   ds = load_dataset("wikitext", "wikitext-2-raw-v1", break up="check")
   textual content = "nn".be part of(t for t in ds["text"][:400] if t.strip())
   enc = tok(textual content, return_tensors="pt").input_ids.to(mannequin.gadget)
   nll_sum, tok_count = 0.0, 0
   for start in vary(0, enc.measurement(1) - seq_len, stride):
       chunk = enc[:, begin:begin+seq_len]
       out = mannequin(chunk, labels=chunk)
       nll_sum += out.loss.float().merchandise() * seq_len
       tok_count += seq_len
       if tok_count // seq_len >= max_chunks: break
   return math.exp(nll_sum / tok_count)
outcomes = {}
PROMPT = ("<|im_start|>usernIn two sentences, clarify why post-training "
         "quantization works for giant language fashions.<|im_end|>n"
         "<|im_start|>assistantn")
def benchmark(label, model_path_or_id):
   free_mem()
   print(f"n──── benchmarking: {label} ────")
   tok = AutoTokenizer.from_pretrained(model_path_or_id)
   m = AutoModelForCausalLM.from_pretrained(
           model_path_or_id, torch_dtype="auto", device_map="cuda").eval()
   pattern, dt, tps = time_generation(m, tok, PROMPT)
   ppl = wikitext_ppl(m, tok)
   measurement = dir_size_gb(model_path_or_id) if os.path.isdir(str(model_path_or_id)) else None
   outcomes[label] = {"size_gb": measurement, "ppl": spherical(ppl, 3),
                     "latency_s": spherical(dt, 3), "tok_per_s": spherical(tps, 1),
                     "pattern": pattern.strip().exchange("n", " ")[:180]}
   print(json.dumps(outcomes[label], indent=2))
   del m; free_mem()
Tags: BenchmarkCodingCompressFP8GPTQImplementationInstructionTunedllmcompressorLLMsQuantizationSmoothQuant
Admin

Admin

Next Post
distinction() | CSS-Tips

rotateZ() | CSS-Tips

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended.

Resident Evil Requiem interactive maps of Care Heart, Raccoon Metropolis, and extra

Resident Evil Requiem interactive maps of Care Heart, Raccoon Metropolis, and extra

March 3, 2026
Firefox is Including a “No Thanks” Button to AI

Firefox is Including a “No Thanks” Button to AI

February 5, 2026

Trending.

Nsfw Chatgpt Options – Examples I’ve Used

Nsfw Chatgpt Options – Examples I’ve Used

October 13, 2025
How creators and entrepreneurs are utilizing AI to hurry up & succeed [data]

How creators and entrepreneurs are utilizing AI to hurry up & succeed [data]

June 17, 2025
ModeloRAT and Mistic Backdoor Exercise Linked to Ransomware Preliminary Entry Dealer

ModeloRAT and Mistic Backdoor Exercise Linked to Ransomware Preliminary Entry Dealer

June 24, 2026
Cisco Catalyst SD-WAN Zero-Day CVE-2026-20245 Exploited to Acquire Root Entry

Cisco Catalyst SD-WAN Zero-Day CVE-2026-20245 Exploited to Acquire Root Entry

June 25, 2026
All Overwatch 2 Dokiwatch Skins, Title Playing cards, And Cosmetics

All Overwatch 2 Dokiwatch Skins, Title Playing cards, And Cosmetics

April 24, 2025

AimactGrow

Welcome to AimactGrow, your ultimate source for all things technology! Our mission is to provide insightful, up-to-date content on the latest advancements in technology, coding, gaming, digital marketing, SEO, cybersecurity, and artificial intelligence (AI).

Categories

  • AI
  • Coding
  • Cybersecurity
  • Digital marketing
  • Gaming
  • SEO
  • Technology

Recent News

Trump Units Submit-Quantum Safety Deadlines as White Home Warns of Superior Cryptographic Threats

Trump Units Submit-Quantum Safety Deadlines as White Home Warns of Superior Cryptographic Threats

July 2, 2026
I Evaluated G2’s 9 Greatest Gross sales Coaching and Onboarding Software program

7 Greatest Collaborative Whiteboard Software program On G2: My Picks

July 2, 2026
  • About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us

© 2025 https://blog.aimactgrow.com/ - All Rights Reserved

No Result
View All Result
  • Home
  • Technology
  • AI
  • SEO
  • Coding
  • Gaming
  • Cybersecurity
  • Digital marketing

© 2025 https://blog.aimactgrow.com/ - All Rights Reserved