• 1 Post
  • 16 Comments
Joined 1 year ago
cake
Cake day: June 14th, 2023

help-circle










  • Your code looked alright. Working in C is a risky chore. You’re early in your journey and I think it’s good to get a taste of many of the traditional techniques before turning towards newer fancier algorithms.

    “Haskell and OCaml”, hm? The two concepts you need to internalize are katamorphisms and paramorphisms. You may have heard of “recursion schemes”; these two schemes are the ones available on any tree. Fundamentally, most of a compiler is tree-to-tree transformations (nanopasses), and they are expressible as one of two forms:

    • Katamorphism: The leaves of each node are transformed before the branch (bottom-up)
    • Paramorphism: The leaves are transformed after/during the branch transformation (top-down)

    If you look again at my AST builder builder, you’ll see .run() and .walk() methods, implementing the recursion for any katamorphism or paramorphism respectively. In Haskell, these are called Traversable types. This is a pun in Monte, where .run() is also the default method; the syntax makes it easy to perform a katamorphism by passing a tree-traversing object directly to the AST.

    Your types are alright, but you’ll want to pass a generic parameter through them, turning them into a valid Functor in Haskell or a generic module in OCaml. This is a basic defense against the “AST Decoration Problem”, AKA the “AST Typing Problem”. As you add features to your compiler, you’ll realize why these are necessary.





  • [HTML and Markdown] are not grammatically Type 2 (Chomsky-wise, Context-Free); rather, they are Type 3 (Chomsky-wise, Regular).

    This is at least half-wrong, in that HTML is clearly not regular. The proof is simple: HTML structurally embeds a language of balanced parentheses (a Dyck language), and such languages are context-free and not regular. I don’t know Markdown well and there are several flavors; it’s quite possible that some flavors are regular. However, traditional Markdown embeds HTML, so if HTML is not regular than neither is Markdown.

    I once did a syntax-directed translation of Markdown to HTML in AWK!

    Sure. The original Markdown implementation was in Perl and operated similarly. However, note that this doesn’t imply that either language is regular, only that a translation is possible assuming the input is valid Markdown. Punting on recognition means that invalid parse trees have undefined translations, presumably at least sometimes generating invalid HTML.


  • I’ve only skimmed the paper, so let me know if I’ve missed something, ideally with a page number. Also, it’s late and I’m tired, so I’m not hyperlinking anything; sorry.

    I’m not sure what a “full semantic analysis” entails, but always keep Rice’s theorem in mind: there aren’t any interesting semantic analyses available for Turing-complete systems.

    Python is a descendant of Smalltalk. Like several of its cousins, particularly the famous ECMAScript, Python doesn’t have types or classes in the Smalltalk sense, but prototypes which form a class-like hierarchy. From the static-analysis point of view, whether a type is created or instantiated is a matter of Rice’s theorem.

    The ability to invoke type() at runtime is not lazy. Python is eager and strict; even generators are eager and strict, although they can cause stack frames to become “stale”; whether a stale stack frame is cleaned up is also a matter of Rice’s theorem.

    None of this prevents compilation of Python. The RPython toolchain first imports an application, evaluating all calls to type() and pre-building all classes; then, it statically analyzes all of the Python objects in memory and decompiles their bytecode to determine their behaviors. The resulting executable behaves as if it were started from a snapshot of the Python heap.

    Yes, CPython sucks. Use PyPy instead; also, use cffi to wrap C libraries.