Most AI benchmarks are exploitable, often by simple methods, undermining their reliability to measure true AI capability.
// curated from Hacker News with AI
Most AI benchmarks are exploitable, often by simple methods, undermining their reliability to measure true AI capability.
Cirrus Labs joins OpenAI to advance agentic engineering tools, licensing products openly, and wrapping up existing services by June 2026.
A satirical game, Hormuz Havoc, was overtaken by AI bots within 24 hours amid political and oil market chaos.
As LMs evolve beyond accurate maps, mastering tacit skill to read, trust, and navigate their shifting representations is crucial.
AI named Luna runs a SF store, hires humans, crafts strategy, and tests AI management, raising ethical questions about AI-driven employment.
AI analyzed 400k Reddit posts, revealing unreported GLP-1 side effects like reproductive and temperature issues, prompting further study.
AI-generated "polls" are models, not actual data; they can mimic results but can't replace genuine public opinion surveys.
AI enhances hackers' ability to discover and exploit vulnerabilities rapidly, posing increased risks of cyberattacks on critical infrastructure and systems.
Karpathy warns developers about 'AI Psychosis'; others risk follow-on issues as AI and cloud-native tech evolve rapidly.
Collabmem enables long-term human-AI memory using simple, file-based episodic and world model system for effective collaboration over time.
Advocates for democratic AI governance: public rules, goals, wealth sharing, and ownership via GAIA for global, accountable oversight.
AI models now are harnessed versions optimized for cost-efficiency; raw models are accessible via APIs, enabling cheaper, smarter AI.
OpenAI's finances are deceptive, heavily reliant on VC subsidies, risky deals, and hype, risking collapse amid mounting financial and ethical issues.
Entroly reduces token costs on Claude, Cursor, and Codex by 80% through codebase compression and context optimization.
Git-why logs AI reasoning traces with code, preserving decisions alongside source files for better context and collaboration.
AI-generated code undermines open source licenses, turning copyleft projects into public domain, risking contributor rights and project integrity.
Tinygrad is a minimalist deep learning framework targeting multiple backends, with no external dependencies, AMD GPU performance focus, and simple, performant inference.
Self-hosted Docker Whisper server offers OpenAI-compatible speech-to-text with models, offline mode, and multi-format support.
AI, via Codex, seamlessly upgraded and repaired an old Nexus 7, highlighting AI's role as a real-world, conversational technical operator.
Recursive-Mode ensures persistent, auditable AI-driven software development, overcoming context loss with file-based workflows and recursive validation.