Hacker Podcast 2025-06-04

Welcome to the Hacker Podcast blog, where we distill the most intriguing tech discussions and breakthroughs from around the web into your daily dose of digital insights!

Why I Wrote the BEAM Book

Erik Stenman, a veteran who kept Klarna's core system humming for a decade, recently shared the compelling story behind his deep-dive book on the BEAM, the virtual machine powering Erlang and Elixir. His motivation? A stubborn curiosity to truly understand the BEAM's inner workings, driven by a painful lesson learned when a seemingly minor 15-millisecond pause during peak shopping times triggered a major incident. The book aims to arm engineers with the knowledge to proactively tackle such critical issues.

The journey to publication was a marathon of challenges: battling document formats, wrestling with publishing systems, and enduring two canceled deals. Yet, Erik persisted, fueled by his desire for a reliable manual, invaluable community feedback on GitHub, and seeing his work referenced in conference talks. His decade-long endeavor taught him the power of persistence over perfection, the importance of focused work, leveraging community, managing scope, and the magic of a real deadline.

The community deeply resonated with Erik's drive to understand the underlying logic, echoing the sentiment that "teaching is the best way to learn." The struggles of technical book publishing also sparked a lively exchange. Many shared similar experiences with publishers favoring mainstream topics over niche, deep dives. While traditional publishers offer marketing and crucial editing, self-publishing is increasingly viable for specialized content, granting authors more control but demanding they handle production and marketing.

The 15ms pause anecdote particularly captivated readers. Some expressed surprise at such a small delay causing chaos, while others clarified that in systems tuned for microsecond responses and millions of requests, a 15ms pause per call or in a critical shared resource can rapidly lead to massive message queue backlogs, collapsing throughput. Insights from WhatsApp were shared, detailing strategies like dropping old messages and tweaking garbage collection to manage such scenarios.

A recurring theme was the perceived under-appreciation of BEAM (Erlang/Elixir) for high-concurrency projects. Advocates hailed it as underrated, citing its strengths in concurrency, fault tolerance, and built-in features like message passing and supervision trees, pointing to WhatsApp's success. However, others countered that modern hardware and mainstream languages can handle significant concurrency on a single machine, reducing the need for BEAM's unique strengths for many projects. They also noted the BEAM ecosystem's perceived lesser investment in developer ergonomics and the significant mindset shift required for the "OTP way."

This led to a fascinating comparison between BEAM's integrated capabilities and modern infrastructure like Kubernetes. Some argued Kubernetes offers similar self-healing and scaling but is language-agnostic. Others countered that BEAM provides many features within the runtime, simplifying the stack and reducing reliance on external components, making it ideal for smaller teams tackling large problems with less infrastructure. The debate highlighted the trade-offs: BEAM's integrated, opinionated approach versus Kubernetes' flexible, composable, but potentially more complex ecosystem.

DiffX – Next-Generation Extensible Diff Format

Developers are intimately familiar with diff files, but a new proposal, DiffX, aims to revolutionize this ubiquitous format. The core argument is that while Unified Diffs are everywhere, they're fundamentally limited and lack standardization for modern workflows. Key problems include the absence of standardized information for file encodings, revisions, or filenames, and missing features for binary patches or arbitrary metadata. This forces tools to implement complex, SCM-specific parsing logic.

DiffX proposes a solution that's fully backwards-compatible with existing Unified Diff tools while adding structure and extensibility. It embeds structured data using special comment lines (e.g., #diffx:, #..meta:) to define sections for the overall diff, individual changes, and files, allowing for structured metadata (often in JSON). The goal is consistent parsing across SCMs and richer, programmatically modifiable information.

The community immediately questioned DiffX's necessity and design. Many felt the problems described weren't widespread for typical users sticking to modern SCMs like Git. Criticisms focused on the proposed format's complexity, particularly the hierarchical dot notation (., .., ...) and the mix of simple key-value headers with embedded JSON. Concerns were raised about maintainability and parseability with standard text processing tools. The length-delimited JSON was also seen as fragile.

The author, chipx86, actively engaged, explaining that DiffX was born from two decades of pain points building a code review tool that interfaces with over a dozen inconsistent SCMs. For such heterogeneous environments, handling diverse metadata, binary files, multi-commit patches, and encoding issues is critical, even if not a daily concern for individual git diff users. The design choices, including the hierarchical structure, were defended as a result of experimentation, aiming for simple parsing rules for tools while allowing metadata flexibility.

Some suggested simply transmitting the two full files (original and modified) instead of a complex diff format. The counter-argument highlighted the efficiency benefits of diffs, especially for communication with Large Language Models or over limited bandwidth. A side discussion also touched on the practical challenges of file encoding and filename normalization across different operating systems and SCMs, lending weight to DiffX's attempt to standardize encoding information.

Merlin Bird ID

Merlin Bird ID, a free, instant bird identification tool from the Cornell Lab of Ornithology, is enchanting users with its "magic." Its standout features include Sound ID, which listens to ambient bird sounds and provides real-time suggestions even offline, and Photo ID, allowing users to identify birds from pictures. The app also offers a step-by-step Bird ID Wizard, a personal Life List, and an "Explore Birds Near You" feature powered by vast datasets.

The community's sentiment is overwhelmingly positive, with many describing it as a "shining example" of beneficial technology. The Sound ID feature receives particular acclaim for its accuracy, even in noisy or remote environments, and its ability to identify multiple species simultaneously. Users are impressed by how the app has transformed their relationship with nature, encouraging engagement with the real world.

While generally accurate, discussions touched on instances of false positives or difficulty distinguishing similar species or mimics. Some users reported bugs like non-functional buttons or lost results, particularly on certain Android devices or after recent iOS updates, though many others reported smooth experiences. Feature requests included cataloging individual birds or an API. Related projects like eBird and BirdNet were also mentioned.

A significant ethical point raised was the potential harm of using the app's built-in bird sound playback feature, as it can disturb birds, especially territorial ones or those with active nests. Users cautioned against using playback and emphasized listening only. The app's free, non-profit model, funded partly by grants, was widely appreciated, with hopes it avoids commercialization.

Precious Plastic is in trouble

Precious Plastic, the open-source project providing blueprints for small-scale plastic recycling, is facing significant challenges despite its global impact. Since its last major release in 2020, which fostered a network of over 1100 organizations recycling 1.4 million kilograms of plastic, the core team has struggled to sustain itself. Problems include losing their workspace due to contamination, failing to find a sustainable business model that wouldn't compete with their community, a costly lawsuit, underestimating software complexity for their community platform, and the dynamic of some community members building businesses without contributing back financially. The project's original design, focused on giving everything away for free, didn't prioritize the core organization's financial sustainability, exemplified by a recent €100K donation being given entirely to the community fund.

Currently, Precious Plastic has just three full-time staff, €30K in quarterly running costs, and only six months of funds left. The founder presented a stark choice: let the project die or push for a Version 5 requiring significant resources and a fundamental redesign for organizational sustainability.

The community expressed sympathy for the mission but was critical of the organization's management. Many questioned the decision to give away the €100K donation, seeing it as poor financial judgment. Critics also sought a clearer roadmap for "Version 5" and suggested a need for restructuring or more experienced staff, with some characterizing the project as a "lifestyle charity" lacking practical execution.

A broader discussion emerged about the effectiveness of small-scale, distributed recycling versus large-scale industrial solutions. Some argued the focus should be on reducing plastic production at the source and forcing industry accountability, viewing small-scale efforts as potentially a distraction or "greenwashing." Alternative waste management strategies like modern incineration or chemical depolymerization were mentioned. Despite criticisms, many acknowledged Precious Plastic's positive impact in building a global community and enabling local recycling initiatives.

Binary Wordle

The Hacker News community is buzzing about "Binary Wordle," a clever web game that applies the familiar Wordle format to guessing a five-digit binary sequence. You get green for correct digits in the right spot, yellow for correct digits in the wrong spot, and grey for digits not in the sequence.

The discussion quickly honed in on the game's inherent simplicity. The dominant perspective is that it's trivially easy, solvable in a maximum of two guesses. The strategy is straightforward: make any initial guess (e.g., 00000). For any position that isn't green, you know the correct digit must be the other binary value. Simply flip the bits in all non-green positions for your second guess, guaranteeing a win. This led to humorous comments about "impressive" two-guess wins and the classic "There are 10 types of people in this world..." binary joke.

Beyond the core joke, users proposed alternative names like "Digitle," "Baudotle," or "Bytle." There was a lighthearted tangent on Landauer's principle and the thermodynamics of computation, musing if guessing 0s uses less electricity than 1s. Some suggested ways to make the game more challenging, proposing longer binary strings, different bases like hexadecimal ("Hexle"), or number-based puzzles with different feedback mechanisms. The game's similarity to Mastermind was also noted, and one user humorously predicted a lawsuit from the New York Times.

Ask HN: Has anybody built search on top of Anna's Archive?

An "Ask HN" post recently sparked a discussion about building a full-text search engine on top of Anna's Archive, aiming to combine the functionality of Google Books and Sci-Hub for this vast dataset. The original poster questioned the feasibility and cost.

The community quickly weighed in on the technical challenges. The sheer scale of Anna's Archive, estimated at around 1 petabyte of raw data, is a significant hurdle. Converting diverse formats (PDFs, EPUBs) into clean plaintext is a massive, time-consuming task, potentially yielding 10-20 terabytes of text. Reliably extracting text, especially from scanned documents, is difficult, with even advanced tooling potentially only achieving 98% accuracy, leaving "garbage" data that could impact search quality. Indexing this much data efficiently requires careful selection of a full-text search database like Lucene or Tantivy. Innovative but speculative approaches like client-side search using WASM SQLite were also suggested.

Beyond the technical, the discussion heavily focused on the significant legal risks. Simply indexing copyrighted material, even without hosting it, could be deemed illegal, drawing parallels to cases like The Pirate Bay. The potential for lawsuits is high, making it a non-monetizable project with substantial personal risk.

A recurring theme was the connection to Large Language Models. Many assumed, and cited reports suggesting, that major AI companies have already downloaded and used data from Anna's Archive (and similar sources) to train their models. This sparked debate about double standards, where large corporations might face different legal outcomes or simply absorb penalties as a cost of doing business, compared to individuals or smaller projects. Some argued that LLMs, while trained on this data, don't provide the same precise, grounded search capability that a dedicated full-text index would offer.

Cloud Run GPUs, now GA, makes running AI workloads easier for everyone

Google Cloud has announced the general availability of GPU support for Cloud Run, their serverless container platform, aiming to make AI and machine learning workloads more accessible and cost-effective. Key features include pay-per-second billing, scale-to-zero capabilities, rapid startup times (under 5 seconds for GPU instances, 19 seconds Time-to-First-Token for a Gemma 3:4b model), full streaming support, and NVIDIA L4 GPU availability without quota requests. GPU support is also coming to Cloud Run jobs for batch processing.

The Hacker News community offered a mixed reception, primarily focusing on cost and comparisons to alternatives. A significant point of discussion was the cost-effectiveness of Cloud Run GPUs versus traditional VMs. While Google touts pay-per-second and scale-to-zero, some users found the instance-based billing expensive for services receiving even infrequent requests, as it can keep an instance alive and cost more than a VM. A Google VP acknowledged Cloud Run GPUs are currently ideal for "bursty workloads" or new apps with sparse traffic, while VMs might be more cost-efficient for predictable, higher utilization. This highlighted the need for clearer cost modeling and safeguards.

The lack of hard spending caps on major cloud providers was a recurring concern, especially for individuals and small projects, fearing "runaway billing." While Cloud Run allows setting a maximum number of instances, users felt this wasn't as robust as a hard dollar limit. Some speculated that major clouds prioritize availability for large enterprises over hard cost limits for smaller customers. Alternative providers like Modal and Vast.ai were mentioned for offering better cost control, including prepaid models or explicit hard caps.

Comparisons were drawn between Cloud Run and AWS services. While some found Cloud Run's developer experience superior, others reported reliability issues, leading them to consider Kubernetes for more control. The general difficulty in obtaining high-end GPUs (like A100s and H100s) on major clouds was also brought up, with users suggesting enterprises reserve most supply, driving startups to smaller, specialized GPU cloud providers offering better availability and pricing.

Cockatoos have learned to operate drinking fountains in Australia

A new study out of Australia highlights the remarkable intelligence of sulfur-crested cockatoos, specifically their newfound ability to operate human drinking fountains. Researchers observed a flock in western Sydney that has learned a complex sequence: using their feet to twist and hold the fountain's handle to release water. This unprecedented behavior appears to be a developing cultural tradition within this specific population.

The article details their impressive dexterity, leveraging strong feet and sharp beaks to manipulate twist handles. While not always successful, their method involves placing feet on the handle, applying weight to twist it clockwise, and holding it to keep water flowing. Researchers ponder why birds opt for fountains over natural water sources, suggesting a preference for cleaner water or elevated perches for predator detection. This skill hasn't spread as widely as their previously documented dumpster-diving behavior, possibly due to variations in fountain designs.

The Hacker News community was clearly entertained and impressed. Many shared anecdotes painting cockatoos and other Australian/New Zealand birds as highly intelligent, mischievous, and often destructive pranksters, described as "flying bolt cutters" or "eternal toddlers with can-opener mouths." Stories abounded of birds deliberately messing with people, stealing food, opening zippers, and even running a "protection racket."

The discussion expanded to broader animal intelligence, highlighting the cognitive abilities of Kea parrots, rats, and cephalopods like octopuses (sparking a brief debate on the correct plural form). Commenters reflected on avian cognition, noting birds' high neuron packing density and different evolutionary paths for intelligence. While some argued cockatoo intelligence is roughly equivalent to a 3-year-old, others were convinced there's much more complexity to understand. The "why" behind fountain use also got speculation, including the idea that it might simply be a fun, mentally stimulating activity for these clever birds. The cultural transmission of skills among animal populations was seen as further evidence of advanced social learning.

Mapping latitude and longitude to country, state, or city

This week, we're diving into the challenge of reverse geocoding – figuring out a user's location (like state or city) from their latitude and longitude coordinates. Austin Z. Henley's article highlights a common pain point: the significant costs of using APIs like Google Maps for this seemingly simple task. Driven by a startup paying thousands annually just for state lookups, Austin set out to build a client-side solution.

His core idea involves using official geographic border data, specifically the US Census Bureau's shapefiles, which contain precise vector data for state borders. The challenge? This raw data is incredibly detailed and large, impractical for client-side shipping. The key was simplifying these complex polygons using the Douglas-Peucker algorithm, dramatically reducing vertices while maintaining high accuracy. For example, Texas's 62,855 vertices could be reduced to 756 with 99.9% accuracy, shrinking the data from 21 MB to just 260 KB in a minified JavaScript library, coord2state.

The community offered a rich discussion on data encoding and size optimization. Suggestions included using more compact binary formats like Float16Array or Base64 instead of JSON, and reducing coordinate precision. Alternative data structures and algorithms for geospatial lookups were popular, such as using a bitmap where each pixel's color maps to a region, or spatial indexing structures like k-d trees or quad trees to narrow down search space. Commenters also pointed out other simplification algorithms like Visvalingam-Whyatt.

Handling edge cases and accuracy was a significant point. Simplification can create small gaps or misclassify points near complex borders. Commenters highlighted that this is common in GIS and can be addressed using topology-aware simplification methods (like those in TopoJSON or PostGIS) that ensure shared borders remain connected. Hybrid approaches were suggested: use simplified client-side data for most points, and fall back to a more accurate (potentially paid) API lookup only for points near borders, balancing cost and accuracy. Alternative, less expensive reverse geocoding services like Nominatim or OpenCageData were also mentioned.

FFmpeg merges WebRTC support

FFmpeg, the ubiquitous multimedia framework, has just merged support for WebRTC, specifically implementing the WebRTC-HTTP Ingestion Protocol (WHIP). This significant addition allows FFmpeg to act as a source, pushing media streams directly to a WebRTC server or gateway. This is a game-changer for open-source broadcasting and self-hosted streaming.

The community expressed widespread excitement, with a key contributor highlighting that with WHIP support now in FFmpeg, GStreamer, and OBS, a ubiquitous protocol for video broadcasting is finally emerging. This could make setting up streaming infrastructure significantly easier and cheaper, potentially allowing services like Twitch to be hosted without massive cloud budgets. The argument is that with efficient codecs like AV1 or H265 and features like Simulcast, the primary cost becomes bandwidth, which is increasingly affordable, potentially supporting thousands of viewers per server.

Practical use cases explored ranged from recording video meetings to integrating with media servers like Jellyfin or enabling remote gaming setups. However, the discussion also delved into technical specifics and potential concerns. Some pointed out that this merge is specifically for WHIP (ingestion) and doesn't include the full WebRTC stack like SCTP data channels or WHEP (egress/playback), though plans for WHEP were mentioned. The complexity of the full WebRTC protocol led to discussions about alternative future protocols like Media-over-QUIC (MoQ).

A notable thread addressed security. Given WebRTC's history of vulnerabilities, some users expressed concern about adding it to FFmpeg. Contributors clarified that this specific implementation is much smaller and focused than a full browser WebRTC stack. They emphasized that standard security practices for handling untrusted input with FFmpeg, such as running it in isolated environments like Docker containers or VMs, remain crucial regardless of the WebRTC addition.