Standardising the Record — Making Meeting Captures Reusable and Interoperable

日本語で読む

Meeting records, as actually kept by most knowledge workers, are scattered across formats. Audio in a recorder app, notes in a notebook, slides as photos on a phone, handouts as email attachments. Each artifact lives in its own silo, indexed by no shared scheme, retrievable across meetings only with manual effort. Five years on, almost none of it can be queried as a coherent corpus.

The condition is best diagnosed not as an effort problem but as a structural problem. Increasing time spent per meeting does not connect the records to each other if the formats remain incompatible. Cross-search is impossible; sharing requires re-formatting; long-term reference dies of attrition.

This article articulates one direction out of that condition ── standardising the record ── covering what standardisation buys, what its component pieces look like, and a concrete implementation as a four-tool open-source toolkit (media-scribe-workflow and its companions). The argument applies just as well to academic conferences, organisational meetings, lectures, and interview transcripts.

What Standardisation Buys

Standardisation in engineering has always served two purposes ── reusability and interoperability. USB unifies one shape that lets hosts and devices connect. HTTP defines one vocabulary that lets servers worldwide be addressed. JSON gives one notation that lets data cross language boundaries. The pattern is consistent: define the common interface, and independently built components become composable.

For meeting records, the current fragmentation produces familiar consequences:

  • Cross-search is unavailable ── individual records sit in incompatible file formats, so spanning queries across multiple meetings is structurally impossible
  • Sharing demands re-formatting ── adapting to the recipient’s tools introduces a re-keying step every time
  • Long-term reference collapses ── after several years, the editing environment that produced the records often no longer exists
  • Metadata is missing ── when, where, who, about what is not captured in any consistent schema, foreclosing downstream automation

Records that are standardised, by contrast, can be characterised by four properties:

  1. Common interfaces ── input, processing, and output stages all communicate through documented formats
  2. Consistent metadata ── the record carries its own when / who / which meeting identifiers
  3. Long-term readability ── formats that remain openable, queryable, and regenerable after years
  4. Reusable assets ── parts of the record (utterances, slides, figures) can be embedded into derived documents without manual conversion

Standardisation by Inter-Stage Interfaces

The substantive form of “standardising the record” is found at the boundaries between stages, where data passes from one tool to the next. Each tool conforms to documented input and output formats; under that constraint, components become individually replaceable. This is the actual content of a standardised pipeline.

A meeting-record pipeline decomposes into four stages:

Stage Input Output Standard format
1. Audio → transcript audio file (mp4 / wav) timestamped subtitles SRT (de facto since the 1990s)
2. Slide photo → corrected presentation photo (HEIC / JPEG) rectified image PNG / PDF (1920×1080 / 300 dpi A4)
3. SRT + photos → structured subtitles + images + metadata LaTeX source LuaLaTeX (typesetting standard)
4. LaTeX → distribution LaTeX source PDF + chaptered video PDF/A + YouTube chapter notation

What sits at each boundary is a long-stable industry standard. SRT has been a default subtitle format since the 1990s5. LuaLaTeX is the dominant academic typesetting engine in Western and Japanese publishing, and PDF/A is standardised as ISO 190056 for long-term preservation. YouTube chapter notation is the de facto standard for chaptered video.

The deeper design principle is the individual tools do not invent standards: they respect existing standards and connect to them. This is what a standardised pipeline looks like operationally. In the vocabulary of the foundational essay Relative, Not Deterministic, laying a standard format between stages is the condition that makes Verification ── mechanical difference detection between two adjacent stages ── possible1: with the format uniform, “does the transcription’s output match the correction’s input?” becomes a machine check. Standardisation has a double payoff ── it makes each stage relatively swappable and makes inter-stage consistency machine-checkable.

Output Format Is a Choice, Not a Lock-In

The fourth-stage output above is given as LuaTeX-to-PDF, but this is one option among several. The essence of standardisation is that the inter-stage interfaces are uniform, not that the final output is fixed. With standardised input (SRT + corrected images + chapter metadata), swapping the third-stage template is enough to retarget the output:

Output format Best fit for Strength
LuaTeX → PDF/A Academic publication, print, long-term archival Mathematical typesetting, ISO compliance
Markdown Web publication, GitHub, lightweight sharing, AI ingestion Portability, editability, text search
JSON API integration, machine processing, database ingestion, search index Explicit structure, machine readability, downstream automation

A single standardised input thus supports simultaneous distribution through three channels:

  • Official conference proceedings → LuaTeX PDF
  • Internal wiki / GitHub Pages → Markdown
  • Searchable knowledge base → JSON

media-scribe-workflow’s msw-report currently ships with a LaTeX template only, but the pipeline architecture admits Markdown / JSON templates as additions to the same toolkit. Output choice is properly a function of the audience and the preservation requirements, not a property of the pipeline.

The Toolkit

The four-stage pipeline is implemented as four single-purpose tools that interoperate through the standard formats above.

1. media-scribe-workflow2 is the pipeline core. Whisper-generated SRT goes in; Claude-driven contextual analysis and a LuaTeX-formatted structured report come out. The CLI splits into msw-config / msw-report / msw-compile / msw-pipeline so each stage is independently invocable, while msw-pipeline runs all four end-to-end.

2. perspective-corrector3 handles slide-photo rectification. Slides photographed from the back of a hall are trapezoidally distorted; this tool combines automatic four-corner detection (Canny edges + contour approximation) with manual fine-tuning to produce 1920×1080 PNGs or 300 dpi A4 PDFs. HEIC (iOS native) is handled directly, so phone photos can flow into the pipeline unconverted.

3. video-chapter-editor is the chapter-editing GUI bundled with media-scribe-workflow. A waveform display sits alongside the video preview; clicking sets chapter boundaries; dragging refines them. ffmpeg-based encoding can leverage GPU hardware acceleration (VideoToolbox / NVENC / QSV / AMF), which keeps long meeting recordings within practical processing windows on local hardware.

4. luatex-docker-remote4 isolates the LuaLaTeX execution environment in a remote Docker container. This is the engine that msw-compile calls, and exists so that the LaTeX toolchain does not need to be installed on the writer’s local machine. The companion essay Running LuaTeX in Docker, Remotely (currently in Japanese) covers it in detail.

Porcelain and Plumbing ── Inheriting Git’s Decomposition

The README of media-scribe-workflow adopts a Git design vocabulary explicitly: porcelain and plumbing7.

In Git’s traditional reading, commands divide into two layers:

  • Porcelain: high-level commands polished for the user (git commit, git push)
  • Plumbing: low-level commands the porcelain calls beneath the surface (git update-ref, git hash-object)

Porcelain is the surface ground for human use; plumbing is the structural connection meant for machine composition. The split lets Git provide both an ergonomic user experience and a parts-set composable from scripts, without compromise on either side.

media-scribe-workflow inherits the same decomposition:

  • Porcelain: video-chapter-editor (PySide6-based GUI), rehearsal-download / rehearsal-finalize integration commands
  • Plumbing: msw-config / msw-report / msw-compile, vce-encode / vce-split, yt-srt / video-trim / video-chapters, and the other single-purpose CLIs

Both routes traverse the same standardised formats, so the user can complete tasks via the GUI or automate with a Makefile, and end up with the same artifacts.

End-to-End Workflow

In practice the flow looks like this:

  1. Attend the meeting ── record audio, photograph slides (anywhere in the room), optionally record video
  2. Pull files locally ── audio + photos + (optional) video
  3. Rectify slides with perspective-corrector ── HEIC straight from the phone, four-corner auto-detect with drag refinement, batch PDF export
  4. Transcribe with Whisper ── local GPU or remote Whisper server; SRT comes back
  5. Structure with msw-pipeline ── SRT + rectified slides + metadata → Claude organises speakers / topics / structure → LaTeX is generated → LuaTeX produces the PDF
  6. (Optional) Chapter the video with video-chapter-editor ── output a chaptered video for YouTube or a custom player

The artifacts ── PDF + chaptered video + chapter list (YouTube-compatible) ── are all in standardised formats. To cite a specific moment from a meeting later, full-text search the PDF and jump to the corresponding chapter in the video; the cross-reference works without manual bookkeeping.


Read against the broader dlab corpus, this standardisation argument extends a longer lineage on knowledge production (the JA companion essays are linked below; English versions are forthcoming).

Umesao Tadao’s The Technique of Intellectual Production (1969) located the heart of intellectual production in the technique of separating and reintegrating thinking from procedural work. The Kyoto University card method ── physical index cards ── was a 1969 standardised format for that separation. That is an instance of what the foundational essay Relative, Not Deterministic calls mediation ── a tool or medium prescribing what gets shown and what becomes easy to do ── and is the same situation Andy Clark and David Chalmers (1998) formulated from the cognitive-science side in their extended mind paper1.

Standardising meeting records is the contemporary extension of this same lineage:

  • Umesao’s cards = a standard format that decouples thought from organisation in time (1969 physical media)
  • Standardising meeting records = a standard format that decouples capture / processing / presentation / reference across formats, stages, and people (2026 digital media)

Different eras and different media, but a common claim ── the record is a mediator of thought, and the act of arranging that mediation is itself the technique of intellectual production. The dlab essays 思考と道具の境界 — 書くことは考えることか (Writing as Thinking) and LuaTeX を Docker でリモート実行する (Running LuaTeX in Docker) approach the same proposition from adjacent angles.

Meetings are not, ordinarily, published. But when meeting records are standardised in a reusable, interoperable form, they become long-lived assets ── for the writer’s personal library, for an organisation’s knowledge base, and for letters to one’s future self (room for that future self to articulate retroactively “what it was I actually wanted to hear in that meeting”). Standardisation is the precondition for that survival, and is, in the end, a question of stance rather than a question of technology ── the stance declared in Relative, Not Deterministic: fix the format boundary, keep the tools relative.

References


  1. For the operational definitions and the theoretical treatment of mediation / différance / the V&V asymmetry, see the footnote in the foundational essay Relative, Not Deterministic, and the author’s Zenodo preprint series (Letter version DOI: 10.5281/zenodo.20096463). 

  2. mashi727/media-scribe-workflow, GitHub. https://github.com/mashi727/media-scribe-workflow 

  3. mashi727/perspective-corrector, GitHub. https://github.com/mashi727/perspective-corrector 

  4. mashi727/luatex-docker-remote, GitHub. https://github.com/mashi727/luatex-docker-remote 

  5. SubRip Subtitle (.srt) is the format produced by SubRip (Zuggy, ca. 2000) and widely adopted as the default for time-coded captions. Each entry is a triple of sequence number, timecode (HH:MM:SS,mmm), and subtitle text. Modern speech-recognition tools including Whisper output SRT as a primary format. 

  6. ISO 19005-1:2005 Document management — Electronic document file format for long-term preservation — Part 1: Use of PDF 1.4 (PDF/A-1). International Organization for Standardization, 2005. 

  7. Chacon, Scott, and Ben Straub. Pro Git, 2nd ed., Apress, 2014, Chapter 10 “Git Internals”. https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain