The sins of the past – adding Cyrillic glyphs without renaming fonts

The URW Base35 fonts are a great set of fonts, available for free as in free software. They have been part of various distributions and systems since long time. Big thanks to URW for their work. But these fonts don’t have Cyrillic or Greek glyphs. Be it as it is, world would be easy. People would need to use different fonts for these languages. Comes around someone who did the unthinkable – namely adding the Cyrillic and Greek glyphs to the fonts (by now nothing bad), but then NOT renaming the fonts. Here we see one point of the stupidity of GPL and absolute freedom. Because what we now have is that documents produced several TeX engines (in particular XeTeX and LuaTeX) which use fontconfig to search the fonts, suddenly pick up these changed fonts that fake their identity, and what comes out is this, a complete rubbish:

broken-fonts

And now we are suffering huge pain from that. Look at the bug reports of that are coming in:

  • 796120 xdvipdfmx broken
  • 789391 developers reference fonts broken
  • 787759 fonts broken in dblatex

Just to name a few. And there is a simple way to circumvent this: Don’t install gsfonts which guarantees that fontconfig finds the real original URW fonts within the TeX Live tree first.

I have now spent many hours to track down these problems, find the reason, and at the end of the day there is always gsfonts with its broken fonts with added Cyrillic glyphs. I honestly don’t care about the history, there are now many fonts with Cyrillic and Greek glyphs, there is no need to fake fonts, and incorrectly take over font names.

This should be a lesson to all the GPL zealots that require absolute freedom of each and everything. Unfortunately things don’t work like that. Using AND RENAMING is ok, the Knuth license as I would say, but anything else is just a source of much pain.

End for today, I have to go to work now. Real work instead of fighting sins of the past.

Nothing to enjoy here.

Additional information Just to let you know, before starting a flame war, I have already contacted the upstream developers, that is TeX Live, and explained them the situation. I don’t see much chance for fixing, since the problem is with fonts without upstream and support, which are probably only used in Debian (I haven’t seen them anywhere else but some mentioning in RH), and which are not officially supported or distributed. It really needs a nice developer to look into why this breakage appeared. Let us hope. And instead of flaming, anyone here is invited to dig into the code him/herself and search for changes.

Additional information 2015-08-22 Just to back up my complaints and counteract several of the comments: I am quoting from an email of a colleague on the list where we are discussing the problem:

However, the fonts extended by Valek Filippov are quite problematic. The Type1 spec clearly requires that there may not exist two different fonts with the same /FontName. The modified fonts shipped with Ghostscript have the same /FontName as the original fonts donated by URW and not even the /UniqueID was changed. IMO they are broken because they don’t comply with the Type1 specification.

I hope that convinced also the last in doubt.

28 Responses

  1. Sami Liedes says:

    Not that I care too much about TeX or fonts, but this reasoning seems quite backwards to me.

    So *TeX breaks because someone *added* glyphs to a font, without breaking any of the existing ones? It seems to me that what broke here is *TeX, not the fonts. In a sane world, adding support for previously unsupported languages/glyphs is a backwards compatible change and a service to the users of the font, which is hindered if you rename the font because then you won’t get the benefit by merely upgrading. Adding glyphs is not “faking” or “taking over” fonts; it’s upgrading them.

    • Ok, to help your understanding: Assume you have a project written in C, and it compiles and the generated program does what it should. Then some package claiming to provide a compatible header file is installed, and suddenly your program does not compile, or worse segfaults.

      Now you have nothing change on your compiler, and nothing on your project. Would you call the compiler or your project broken?

      Anyway, there are other sins by not renaming: The rest of the world uses the URW++ fonts as they are and should be. Now if you don’t embed the fonts because they are anyway ubiquitous, your document will be hosed on a different computer.

      There are hundreds of reasons why not renaming it is simply and plainly wrong.

      Hope that helped your missing understanding.

      • Karellen says:

        Sorry, I don’t think your analogy is that great, because I’m still as confused as the initial commenter.

        The new glyphs have been *added* to the font, right? They don’t replace any existing glyphs, so all the old glyphs in the font should still work as before, shouldn’t they?

        So the only issue is if the new glyphs are bad. As far as I can tell, they can be bad in one of two ways – either the new glyphs claim to be glyphs for the wrong character (e.g. a cyrillic glyph claims that it’s actually for a latin character), or the glyphs are just of really crappy quality.

        From what I can tell, it looks like the glyphs are claiming to be for the wrong characters. But in that case, wouldn’t they be for the wrong character no matter what name you gave the font? So wouldn’t that font be broken in any document, no matter what name it had?

        • It is not an analogy. It is exactly like that. Remove URWcyr and use original fonts, all is fine. Use URWcyr fonts and it is broken.
          There is one variable, the rest does not change. So where would you search?

          • Karellen says:

            The analogy is fine for making me believe that the new glyphs break things. I get that. I understand that part. I never had a problem with that.

            I’m having trouble understanding *why* adding new glyphs to a font breaks things, or why how the font is named makes any difference.

            To stretch your analogy, I’m imagining the old original project ships urw.h, which contains a bunch of prototypes for latin functions like, e.g.:

            int latin_a(int);
            int latin_b(int);
            int latin_c(int);

            Then someone else has come along and added a bunch of prototypes for new Cyrillic functions. Now, these should not affect any of the latin prototypes *at all*. Because all the latin prototypes are still there. However, if it adds a bunch of new prototypes like:

            float cyrillic_a(float);
            float cyrillic_be(float);
            float cyrillic_ve(float);

            Now, if the standard is that cyrillic functions, as are provided by other headers (font files), should all be of type “int(*)(int)”, then, yes, those headers are wrong, and your program will fail to compile, or crash.

            However, your program will fail to compile, or crash, if you use that header file, no matter what the header file is called. It doesn’t matter if it’s called “urw.h” or “my_new_urw.h”, it can never be right according to how things are supposed to work. But that’s only if you *use* the cyrillic functions/characters – if you stick to the latin functions/characters, everything should be fine, shouldn’t it?

            Similarly, if the header file is correct, but the implementation of cyrillic_a() actually outputs a really horrible rendering of a cyrillic_a(), or worse, a cyrillic_ve() instead, then again, things will look worse, but they’ll look worse no matter what you call your new library. If the glyphs are wrong, they’re just wrong, and need fixing. Changing the name of the font won’t fix the font being wrong. I mean, it’ll help your program in that you won’t be using the wrong font, but it won’t fix the font.

            Does that explain my confusion better?

          • mirabilos says:

            The analogy is more like this:

            The original header has…

            #define NGLYPHS 256
            struct glyph glyphs[NGLYPHS];

            The new header and library has #define NGLYPHS 512.

            Now imagine the glyphs are in arbitrary order, or even some “good” one (hash table, lookup by glyph name), but the lookup in the compiled program goes by NGLYPHS, so the program with the new library doesn’t find half the glyphs, but where it does find them, it has different ones suddenly.

  2. Sami Liedes says:

    Yeah, I guess I can understand and even agree that it leads to problems with only referencing fonts. Still it’s quite surprising that something renders incorrectly when you add glyphs.

    If we talk about your example of a C program, you can write a program which calls a function with unsupported parameters and expects to get EINVAL or something similar, and which suddenly behaves differently when someone implements sensible semantics for those previously unsupported parameter values. That’s the level of crazy this seems to me. I cannot imagine a sane reason for rendering of Latin text to break when Cyrillic glyphs are added to a font.

    • Fonts are not just data, they contain vital information in tables and small programs in PostScript, which is a Turing-complete programming language, for each glyph. The probably incomplete changelog (README.tweaks) of the gsfonts mentions several changes.

      I agree that dvipdfmx could be more resilient to these changes as it obviously is, and I will bring it up to the developers, but still, my whole point is that the fonts should have been renamed. That is all. I am really appreciating the work they did in adding the Cyrillic fonts. That is great. But please, for whoever’s sake, rename the fonts.

  3. I read your blog via Planet Debian so attributing the cause of this problem to the GPL struck me as really odd.

    I don’t see why this could be associated with the GPL or “GPL zealots”.

    If you distribute a modified version you have to clearly indicate that it’s a different work from the original: (from #5 GPLv3):

    > You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions:
    > a) The work must carry prominent notices stating that you modified it, and giving a relevant date.

    Could you elaborate on why the GPL would be the root of the problem?

    • Well, yes, but nobody requires you to *change the name of the fonts*. Even if the fonts are distributed as URWcyr, the actual font names, both as file names as well as Postscript FontName property, are unchanged. And this is what creates the problem.

      GPL allows you this. Knuth devised a trip-trap test to ensure that what is called “TeX” behaves *absolutely* like what he provides. And that is a good idea, as we see.

      That is the reason why I consider GPL in many cases a failure.

  4. Hi, Norbert!

    I’m on that side of the fence where those who don’t understand how adding *new* glyphs would break the font — provided those glyphs were added correctly.

    Taking your C program example: suppose someone added a bunch of function declarations to that compatible header file obeying the rules (C does not have modules/namespaces/etc so let’s assume all the existing functions have a common prefix, say, `foo_`). Should your recompiled program cease to run? If it does, it’s a bug in the program.

    To look at it at a different angle: what happens if someone decides to add glyphs for another five incompatible (as in having no common glyphs) alphabets to the same font? Should we now have six differently named fonts? But wait, what about permutations? We now have a possibility to have “URW + alphabet 1”, “URW + alphabet 2”, … “URW + alphabets 1 and 2” and so on… you get the idea.

    So I think renaming a font which has new glyphs added *correctly* is wrong. If some program breaks due to this, it has to be fixed.

    On a side note: I think in most cases (PDF for external circulation) font embedding (subsetting) has to be used.

    • See my recent update of the post, it is simply against the specifications. And what defines *correctly* in your post? I can only think of the specifications of Type1 fonts. And they failed. But even if you do not want to rely on the specifications, we see that the change created incompatible and wrong output. This is IMNSHO *incorrect*.

      • Sami Liedes says:

        Continuing the C program analogy, that a program behaves incorrectly after a library update can still be a bug in the program if it does crazy things. In C we would call this unspecified or undefined behavior. And breaking on adding glyphs seems just like that to me. It’s hard to imagine any sane program doing that.

        • mirabilos says:

          Nope, Sami, a C library can change their ABI (see above for the number of entries in a global data object, for example). In that case, it is no longer compatible to the original program.

          In DLLs, we have versions (e.g. Linux/ELF soversion) for that. In the font world, there’s font names and identifiers. They didn’t change that. That’s akin to a C library “forgetting” a soname change.

          • Sami Liedes says:

            Keeping the same soname is OK as long as your ABI stays backwards compatible. That doesn’t mean that a badly behaving program would necessarily get the same results. However it specifically does mean that you usually can add symbols to the .so and get away with it, as long as the old ones behave according to the old spec. This case seems, on the facts described here (if accurate), very closely analogous to that: Only adding functionality in a way that should not break any well-behaved programs.

  5. smcv says:

    > There are hundreds of reasons why not renaming it is simply and plainly wrong.

    I do agree that it’s inadvisable to make incompatible changes without also changing the machine-readable “name”; for fonts AIUI that’s the same as the user-visible name, for libraries it’s the SONAME, for Python it’s the module name that you “import”, for Debian packages it’s the binary package name and so on.

    However, you seem to be advocating copyright licenses that make that inadvisable activity illegal, and I do not agree that it is appropriate to use copyright licenses in that way. The entertainment industry has successfully lobbied for copyright infringement to be treated as a serious crime in many countries, and I don’t think using the threat of copyright infringement to control users’ and developers’ behaviour is something the free software community should be advocating.

    Developers should avoid incompatible changes because it’s the right thing to do, not because someone will sue them.

    • I never said that this is about copyright. There are two points: One is the change without the required (according to the specifications, see update of the post) changes to the FontName and the UniqueID. This is the core of the problem here. On a larger scale what I am complaining is that the GPL allows for these kinds of rubbish. And I consider this a problem.

  6. Robert says:

    This isn’t really a case against any of the arguments that “GPL zealots” present. It’s just a case where the GPL isn’t an appropriate license to use.

    Feels more like you’re trying to use this as an excuse to take a swipe against the GPL or its proponents.

    • You might be right, maybe I should have said “Debian legal zealots” which describes some of my feelings better 😉 Still, the problem with changing without renaming applies not only to fonts, but to all kind of GPL covered software, and in most cases I consider it bad.

  7. karlberry says:

    seems to me the problem is with valek for incorrectly creating the fonts, and debian for foolishly distributing them, not the gpl. the gpl gives you all the rope you want, but that doesn’t make it bad. requiring renaming has problems of its own. for instance, when a latex package maintainer disappears. if renaming were required, no one could take it over under the same name -> all existing documents using it have to change to use a new version. this happened recently with biblatex, and it was surely a great boon that we could keep the name “biblatex” and “biblatex.sty”. anyway …

    • Hi Karl, I mentioned that the renaming should be necessary for status-maintained, which means that for biblatex that is all fine. Assume texinfo is cloned and released with different semantics and different behaviour as texinfo-8.0.0?

      • Sami Liedes says:

        I think you are conflating “it would be nice if” with “it should be legally mandatory”. People should be nice to each other. Still, there should not be a law for punishing people for merely not being nice. People should not make bad changes to libraries. Still, that should be enforced by social norms and choosing whose software you install and run, not by threatening people with prison.

  8. Anonymous says:

    p.s. also, using system font access for fonts available in the distribution seems foolish to me.

    p.p.s. also, if xetex and xdvipdfmx are still finding different system fonts for a given name, that’s surely a bug, but it’s not clear to me that’s what’s happening.

  9. Sami Liedes says:

    So, you are against GPL allowing this, but also against it forbidding this? I’m not sure you make a lot of sense. Licenses are a legal instrument. The only thing they do is affect legal rights.

Leave a Reply

Your email address will not be published. Required fields are marked *