Correcting Slide Geometry — perspective-corrector as a Single-Responsibility Tool

日本語で読む

Conferences, internal meetings, lectures ── any setting where a presentation is given, a photograph of the slide will almost certainly come back trapezoidally distorted. Holding a phone diagonally from the back of a hall guarantees that the rectangular screen images as a quadrilateral. The distortion is unavoidable as a matter of physics.

Correcting that distortion is an everyday problem ── but its detail is rich in design judgements. One implementation, perspective-corrector2, is one of the four tools surveyed in Standardising Records. This article walks through the seven design decisions behind it and reads each as a concrete instance of the single-responsibility principle articulated in Separating Tools from the Orchestrator.

Decision 1 ── Stay Single-Responsibility

The first judgement was about what not to do. Adjacent capabilities to slide-photo correction are plentiful:

  • OCR (text extraction from images)
  • General document scanning (business cards, receipts, whiteboards)
  • File organisation (tagging, search)
  • Cloud sync
  • Sophisticated automatic colour correction (HDR, exposure auto-fix)

Commercial alternatives ── Office Lens, CamScanner, Adobe Scan ── integrate all of these as a document-scanning suite and, in doing so, end up with none of them done particularly well.

perspective-corrector made the opposite call: scope the tool exclusively to trapezoidal correction. Each new feature request gets tested against Doug McIlroy’s single-responsibility proposition3 ── what is lost if this feature were a separate tool? What is lost if it isn’t included? OCR can be delegated to a separate tool; file organisation to the OS file manager; cloud sync to Dropbox or iCloud. Once those decisions hold, the tool’s responsibility shrinks to distorted image → undistorted image, expressible in a single line.

This judgement keeps complexity minimal and lets the don’t-write discipline from The Discipline of Reuse translate directly into the implementation: trust other tools that already exist; write only the part that must be written.

Decision 2 ── HEIC Native Input

Photographs taken with an iPhone are, by default, saved in HEIC. Most desktop tools accept only JPEG / PNG and force the user to convert beforehand. That conversion step generates three layers of friction:

  1. Manual operation overhead
  2. Quality loss (HEIC → JPEG)
  3. Storage duplication

The composite effect is that “fixing the distortion” is preceded by “fixing the file format” ── attention spent on bookkeeping rather than the actual task.

perspective-corrector adds pillow-heif as a dependency and reads HEIC directly. Photos coming straight off an iPhone can be dragged and dropped into the app and corrected without intermediate conversion. The decision is small ── remove input-format friction ── but its effect on the user’s cognitive load is non-trivial.

Decision 3 ── Lossless PNG and 300 dpi A4 PDF

Output format selection separates into two cases.

Per-image output ── 1920 × 1080 PNG (lossless):

  • 16:9 is the de facto standard slide aspect ratio
  • 1920 × 1080 has enough resolution for web display, slide-deck reconstruction, and archival reading
  • Lossless compression preserves quality through any downstream re-editing (colour grading, annotation, extraction)
  • PNG over JPEG: slides contain text, and JPEG’s DCT-based compression introduces block artefacts that degrade text legibility

Aggregated output ── 300 dpi A4 horizontal PDF:

  • 300 dpi is the print-quality baseline; multiple slides arranged on A4 print legibly
  • A4 horizontal is universal for handouts, including international portability
  • A multi-page PDF combines slides into a single file, reducing file-management cognitive load downstream

By emitting both formats simultaneously, the tool covers the two principal downstream uses ── re-edit as image and print / distribute ── without forcing the user to choose at correction time.

Decision 4 ── Auto-Detection With Manual Fallback

The core of distortion correction is identifying the four corners of the slide accurately. Full automation is the ideal; the reality breaks down in several common cases:

  • Slide boundary blends into wall colour or backdrop curtains
  • Reflections or projector light bleed make the boundary ambiguous
  • The photographer has audience members or other objects intruding into the foreground

perspective-corrector chose a two-stage architecture: try automatic detection first, then fall back to manual fine-tuning when it fails4.

  • Automatic: Canny edge detection + contour approximation estimates the four corners
  • Manual: clicks specify corners; drags refine them
  • Magnifier: a localised zoom around the cursor allows pixel-level positioning

This is less give up on automation and more automation plus manual together fits the realistic operating conditions best. A design decision motivated by the friction profile of actual use rather than algorithmic elegance.

Decision 5 ── Batch Processing as a First-Class Concern

A single conference session typically yields 30–60 slide photographs. A design that requires per-image manual operation does not scale to that load.

perspective-corrector treats batch processing as a core feature, not an afterthought.

  • Open a folder; the contained images list automatically
  • Specify four corners per image (auto + manual refinement)
  • A single “process all” button runs correction across the batch and emits PNG / PDF outputs

The interface is designed around the batch case from the start, rather than retrofitting batch onto a single-image flow. The distinction matters: batch-first design surfaces shared-state UI patterns (selectable list, per-image status, parallel progress) that single-first designs scaffold awkwardly.

Decision 6 ── Cross-Platform Binary Distribution

Distribution-channel choice depends on the technical-skill distribution of the target audience.

  • pip-only install → developer audience, technical barrier present
  • Standalone binary distribution → general user, low barrier

The intended users for perspective-corrector include researchers, instructors, and students who do not necessarily have Python tooling installed. pip alone would exclude them.

PyInstaller produces standalone binaries:

  • macOS: DMG (Apple Silicon / Intel)
  • Windows: EXE (32-bit / 64-bit)

Dependencies including ffmpeg / ffprobe are bundled inside the binary; no additional installation is required. The pip channel is offered in parallel for developers, who handle the tool as a regular Python package. Splitting distribution into two channels lets each user cohort take the path that fits their tooling.

Decision 7 ── Projector Tint Correction

Slide photographs taken in conferences carry a distinctive colour cast from projector light. Filament lamps add yellow; LED projectors add blue; each light source’s spectral characteristics imprint on the captured image.

perspective-corrector provides post-processing controls for this:

  • Automatic white balance (colour-temperature estimation)
  • Manual contrast / brightness / saturation adjustment

Performing this colour correction in-tool means the user does not have to launch a separate image editor (Photoshop, GIMP) afterwards. There is a tension here with the single-responsibility principle from Decision 1, but the resolution comes from scoping the colour correction narrowly to “conference-slide photo” rather than as a general-purpose image-editing capability. The tool is not aspiring to grow into a general-purpose colour grader; it solves projector tint on slides photographs and stops there.


Read against the broader dlab corpus, perspective-corrector reads as a concrete instance of several previously articulated principles.

The stance from Relative, Not Deterministic ── not fixing on “the correct procedure”, re-confirming meaning as the situation requires ── shows up here in image correction. Giving up full automation in favour of the auto-detect + manual fine-tune two-stage exists because the correct position of a slide’s four corners is relative to shooting conditions (blending into wall textures, reflections, an audience member in frame): there is no deterministic correct-answer algorithm. Auto-detection (V) narrows the difference first; the human then re-confirms “is this really the right four corners” (Validation). The V&V asymmetry is built directly into the UI1.

The single-responsibility rigour from Separating Tools from the Orchestrator ── the tool stays at distorted image → undistorted image and refuses to grow into OCR, document-scanning suite, or cloud-sync platform. This is what makes it composable as a part inside an orchestrator like media-scribe-workflow.

The don’t-write design from The Discipline of Reuse ── automatic four-corner detection rides on OpenCV’s Canny edges and contour approximation; HEIC support rides on pillow-heif; binary distribution rides on PyInstaller. The code particular to this tool is the connective tissue that lets these existing components address the specific problem of slide photography.

The inter-stage standardisation from Standardising Records ── outputs are PNG (the image-processing standard) and ISO-19005-compatible PDF (the long-term archival standard). Downstream consumers ── media-scribe-workflow, any document toolchain ── receive these without format conversion overhead.

Each individual judgement is small; their accumulation is what produces a UNIX-philosophy answer to the specific problem of slide-photo correction.

In the Udemy-course context, the value is in following the chain of judgements, not in teaching the underlying technologies in isolation. Teaching “how to use the Canny edge detector” is largely interchangeable across thousands of computer-vision tutorials. Teaching “how the decision between full automation and manual fallback was made, and why” is transferable design judgement that applies far beyond image rectification ── and that’s what the case-study format is meant to surface.

References


  1. For the operational definitions and the theoretical treatment of mediation / différance / the V&V asymmetry, see the footnote in the foundational essay Relative, Not Deterministic, and the author’s Zenodo preprint series (Letter version DOI: 10.5281/zenodo.20096463). 

  2. mashi727/perspective-corrector, GitHub. https://github.com/mashi727/perspective-corrector ── The PySide6 + OpenCV slide-photo trapezoidal-correction app discussed in this article. 

  3. McIlroy, M. D., E. N. Pinson, and B. A. Tague. “UNIX Time-Sharing System: Foreword.” The Bell System Technical Journal, vol. 57, no. 6, July–August 1978, pp. 1899–1904. 

  4. Canny, John. “A Computational Approach to Edge Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, 1986, pp. 679–698. ── The paper introducing Canny edge detection, which the auto-corner-detection in this tool relies on. Adopted as a standard edge-detection approach across image-processing applications since 1986.