EmelleTV: Talking with Louis Roché about OCaml and Ahrefs

Published in

Ahrefs

60 min readJun 29, 2023

Transcript

David: [00:00:00] Hello, my name is David. I run EmelleTV. It’s a talk show about OCaml, ReScript, and Reason. I often bring guests from the community to talk about them and meet them and asking a lot of questions about the language or what they’re working on, and of course having fun with Hindley–Milner type system. That’s part of the show. I work for Ahrefs, it’s actually this company.

Today I’m interviewing a coworker, so it’s going to be a little bit a branded stream. Hope you understand. It’s a lovely company. Apply if you’re looking for a job to work on OCaml or Reason. Aside from that, I maintain styled-ppx and implementation of React on server, but that’s just enough about me, and I’m going to introduce our guest, Louis. Hello, Louis. How are you?

Louis: [00:00:59] Good, and you?

David: [00:01:01] Good. Very good. You obviously work at Ahrefs. At what team do you work?

Louis: [00:01:10] This is recently changing, but I have been in the back-end forever, since like seven years ago and I still have some of the projects that I had when I joined. This was stable. I’m in this new team called middle-end. Ahrefs is not very good with naming. We say that it’s the hardest thing in computer science. We have front-end that is actually full stack, then we have a back-end, which is more like data, and now we have middle-end, which is somewhere in the middle. I’m supposed to lead this new middle-end team.

David: [00:01:48] Nice.

Louis: [00:01:49] We’ll see how it goes.

David: [00:01:49] Nice. Today I think we’re going to answer a few questions about Ahrefs, I think has been a mysterious company, if you look at it from the outside. When I joined, I think you helped me understand a lot of things that I didn’t know about Ahrefs. I might just fire the same questions that I did, just recorded so everybody can understand them. Aside from from Ahrefs, who are you and can you present a little about yourself?

Louis: [00:02:21] This is a tough question. Who am I? I grew up in France. I’m French, still I’m French, but I’ve been living in Singapore for seven years, with Ahrefs for seven years. I’ve been working in OCaml my whole life basically because my first job was in OCaml, and Ahrefs which is my second job, is in OCaml too. I cannot say that it’s better than the rest because I never tried the rest. I’ve been involved with OCaml, like the OCaml meetup in Paris for some time.

I’m on the online community. I’m part of the OCaml Code of Conduct committee, which is an effort that was started last year, I think. So far we don’t have a lot of work, so that’s good. I’m one of five doing this. Then outside of that, I’m a pretty normal person. I’m 31 years old, and that’s about it, I would say.

David: [00:03:35] That’s the whole idea. You have been been writing OCaml for a long, long time. That’s fair to say.

Louis: [00:03:40] I’ve been writing OCaml since I’m 16 or 17 was when I wrote my first line, like 14 years ago, something like this. There was no Merlin at the time.

David: [00:03:52] There was no LSP. The first question is, 15 years, this is a long time, but how do you see the evolution of entire language? Would you split it in chunks? How have you seen the progress of the language?

Louis: [00:04:08] It’s hard to say. When I joined the OCaml world, it was because of people who nowadays are fairly important like Gabriel Scherer, who’s working in Inria, I think, but he’s one of the main maintainer for OCaml. He was in this French forum, pushing very hard people to try OCaml and I got convinced. I started just writing a few lines here and there, and then I just stick to that for a long time. I’ve been mostly a user like this. My usage has extended over time, but I’ve never been called a contributor. My point of view is more as a user.

The biggest difference is the size of the community, I would say. It’s much more dynamic than it used to be. We used to install packages like OCaml libraries, using Debian packages. It was apt install something. There was no opam, there was no Merlin, there was no vscode, right?

David: [00:05:25] Right.

Louis: [00:05:25] LSP didn’t exist, so way less library. I think you can see today how it was in the past because you can see, we have 10 different libraries to do HTTP 1.1. We have 10 different standard libraries, and it’s legacy of what OCaml was in the past. We used to have all those smart people, but they had no way to collaborate. There was no opam, there was no way to share your work. Everyone was smart enough to rewrite —

David: [00:06:04] Build your own library for http.

Louis: [00:06:05] Yes.

David: [00:06:06] Right. That’s always interesting for me, how OCaml got so many different things that are hard to create, like standard libraries. I think recently, Containers reached 10 years, so it’s like what? [chuckles] It doesn’t make much sense. If you look at now, it doesn’t make much sense, but if you look at historically, it does make sense.

Louis: [00:06:35] When you have no choice, you do it. You don’t know that it’s harder, you just see “Oh, I can do it,” and you do it.

David: [00:06:42] Yes. You need to think it’s easy. You need to start a standard library or http library saying, “Oh, it’s easy.” Then, you start a little bit, and eventually, you create something. Last week — Oh, sorry, go on.

Louis: [00:06:57] No, go ahead.

David: [00:06:59] Last week, I tweeted that OCaml suffers a lot from the Python Paradox. The Python Paradox, I think somebody write it, I think it was, I don’t remember the name of the guy, but somebody write it in 2004, that when you use Python for a company, and you get the smartest people that they want to innovate, and you have the pioneers of the language. Then, by default, you try to hire people that are in love with software, so eventually, they create good software or they are willing to suffer from getting out of the comfort zone and create software. I believe the combination is the same spot. Can you see, is that true? Do you agree?

Louis: [00:07:49] I don’t know if that’s true. Partially, it’s a strategy of Ahrefs, so I have to say it’s true.

David: [00:07:55] [laughs]

Louis: [00:07:58] Yes, I think it’s partially true, but it’s not completely correct. For OCaml, at least it’s a bit different from Python, because OCaml has this strong academic influence, so a lot of people are actually researcher. There is a big benefit that they have, not free time, but they manage their time, they manage what they work on, and they decide what is important. They got all this time to actually write this complicated code many times because actually, it’s part of their job to just redo the same thing in better ways. It’s normal you have to explore a subject.

It’s okay to do it multiple times. It’s a combination of those people are working in the right place, they have the right time, and they have the correct background. A lot of people were working on subjects that allowed them to do it. Some, it’s because they were very strong in writing languages. Some, it’s because they have this strong Unix background, I would say.

David: [00:09:05] Right. It’s very unixy. The start of OCaml is very unixy. That’s true.

Louis: [00:09:14] Xavier Leroy wrote LinuxThreads, I think that was used in Linux forever, so there is this background. Probably, it’s a bit different in Python because it grew more in the industry rather than in an academic setup. For sure, if you try to target people who live in a niche, you find people with a different interest from the main programming community, I would say. At the same time, I think some of the best meetup or conference I’ve been to were Java meetup.

David: [00:09:59] All right.

Louis: [00:10:00] They know that their language is boring. The language is the same since 20 years or 30 years. There are some changes but they don’t really care about the language too much. It’s a huge, huge community. Basically, everything already exists. There is no big bragging, everyone can do everything. There are 10 versions of everything, whatever. The benefit is that they are super open-minded.

Oh, something is new. Something is different. Okay. Let’s see that. You go to that meetup and they will not talk only about all the fancy new feature in Java. It’s like, “Oh, I saw this new git tool. Oh, it’s funny. Okay, let’s try to use this.” A new way to do web development. Okay, let’s study the subject. It’s not about Java itself because the language is not interesting enough. It’s about other programming stuff. It’s very fun to attend.

David: [00:11:03] That’s a good one. Actually, many people that now are starting to hear OCaml for the first time, after they leave or they got disappointed with the Rust policy drama. These people cannot get into OCaml because some influencer wanted to bash on Rust. They started exploring all the languages and of course Ocaml was one of those. Aside from Zed or whatever you compare it with, low level programming languages. Do you see those influencers move people to actually try the language and deploy it into users and doing serious stuff, or it’s more like vain marketing?

Louis: [00:11:56] I’m not a big Twitch person. I don’t know [crosstalk] modern influencers.

David: [00:12:02] That’s true.

Louis: [00:12:03] I think it exists in two forms. In the past it existed in two forms. You had influencers, you had Rob Pike and- who’s the other person? The two person who are behind Go. They do not make a good language but they are influencers. They were like, “Oh, yes. We did UTF-8 and a Plan 9 in the past and we work at Google. Oh, it’s going to be amazing.” No, it’s a crappy language, but they are influencers. They move people.

David: [00:12:32] [laughs] Okay. Would you say that Go is crap?

Louis: [00:12:38] Go is a language. I haven’t used Go 2 extensively.

David: [00:12:42] This is recorded. This is not a beer in Singapore. This is recorded. You can obviously bash go, that’s part of the game.

Louis: [00:12:50] Let’s say Go is not the most modern language there is.

David: [00:12:54] Right. Thank you. This is just for the headline. We don’t want the headlines- because Ahrefs is going to be like — No, I’m joking. [crosstalk] Yes, go on. Sorry.

Louis: [00:13:10] On the same topic of influencer, we saw it with Reason. When the Reason comes, it’s not just a random person creating Reason. It’s Jordan and he comes with a React background, and he comes with followers. He is not doing videos online but it’s the same idea. I think yes, it has definitely an influence and OCaml grew a lot when Reason started.

David: [00:13:38] Yes, that’s true.

Louis: [00:13:39] I definitely think it has an influence.

David: [00:13:43] That’s true. From the community, how have you seen the Reason creation and adoption from your point of view? You can bash Reason if you want.

[00:13:43] [crosstalk]

Louis: [00:13:56] At that time I think the OCaml community was one IRC channel. It was a bit different from now. I think what I was not super convinced by when Reason to syntax arrived, I think the original claim by Jordan was he’s making a better syntax. I was not super convinced that the syntax was better. It was developed independently from OCaml.

By experience I already knew at the time that if you start to fork or develop on your side and don’t integrate fairly quickly with upstream it’s actually never going to be integrated with upstream.

David: [00:14:53] Right.

Louis: [00:14:54] I don’t know why exactly, but it has happened a few times. Then there is a question of bucklescript because if you write Reason it’s two sides. There is a syntax which I only partially understood too because I was not a web developer, I’m still not a web developer. I did not know about JSX. I did not know how powerful it was and I think React was not as big at the time too, but I think JSX is a nice idea and there are a lot of things in the syntax that are nice, like parentheses around arguments is a problem but it has some benefits, too.

David: [00:15:42] Yes, I think some trade-offs from OCaml, or at least some edge cases from the syntax from OCaml got resolved in Reason just by adding more- like the parentheses or the braces or the semicolons. But yes, the others are, can remove some problems from the syntax. Not problems, but just the edge cases from the cleanness for OCaml.

Louis: [00:16:10] Yes. Even sometimes it’s not edge case but it’s nice to see very clearly for example, when you apply a function, where are the arguments? Where it starts where it ends? There are benefits, obviously, like the OCaml syntax or Haskell syntax is lighter, we will say, have some benefits. The other one is nice, too.

David: [00:16:36] Yes, definitely and you mentioned BuckleScript?

Louis: [00:16:40] Yes. BuckleScript, they didn’t have —

David: [00:16:42] That was not so well received.

Louis: [00:16:47] Yes, I think because there was Js_of_ocaml idea. There was js_of_ocaml and so again, it was like yes, I do something different and- I think Bob developed it fully inside Bloomberg at the time. Basically, he came out and he had, “Oh, yes, I have a new project and it’s working already.” He didn’t start to develop it in public. The community was much smaller, too so every time you split efforts like this, it’s kind of costly. People will say, “Yes, we will try to collaborate. We’ll try to make the two projects work together,” or whatever and it never works. Never works. I don’t think I understood all the trade-off. I’m happy that I invited Bob to the OCaml meetup in Paris which retrospectively, it was a good thing to do.

David: [00:17:50] Yes, for the record, Louis was running the OCaml Paris Meetup, I think. Yes, go on with the story.

Louis: [00:18:00] Yes, so when I moved to Singapore, I still organized one meetup, even though I was in Singapore and I invited Bob to present BuckleScript. At the time, it was a bit controversial, because many people were a bit unhappy with what he was doing, but I’m happy that I did it. I didn’t understood what I was doing exactly but at the end, I think it was the right thing to do. Even if the project died later on, you have to give such projects a chance.

David: [00:18:35] Yes, I think I wouldn’t say that BuckleScript died. It’s more like BuckleScript has been working for seven years, I think.

Louis: [00:18:42] Yes, no, even if it was a failure, which it was not, but maybe like six months later, it could have died and disappeared. Yes, I think when people have a drastically different approach, usually they have a reason. It’s worth listening. A lot of what Bob defended, I’m not sure I completely agree with it. He wants a very stable compiler, for example. He said, in Bloomberg, they are using GCC 3 or 4, I don’t remember, since forever.

So they don’t need to upgrade the compiler, the GCC compiler, for example. He thought the same idea can apply to OCaml, we don’t need to follow the upstream compiler all the time.

David: [00:19:27] Right. Yes.

Louis: [00:19:29] Most companies actually they don’t want to change compiler version. They want something stable. They want no surprise, which has some value, or the stability has some value.

David: [00:19:40] Yes, that’s true but I think when he mentioned about the compatibility with the compiler, I think it’s mostly OCaml has been very stable since, what 6, 7 years ago, I think. I think there were some small changes or some addition features, but nothing really break, but mostly the syntax. Then he complained about the parsing, like the AST modifications, those were present, those were changing between versions. He wanted to not- because BuckleScript is a fork of the common compiler and embedded into ReScript now.

Yes, he was complaining about the AST transformations because every version changes a lot. There are migrations. You could write some logic to migrate from one to another. It’s painful if you maintain a fork of this, you might suffer a lot from updating from one compiler to another.

Louis: [00:20:41] Yes, and I think for him, even as an end user, the stability has some value. It’s interesting for him to have a stable compiler and even for his target, the people he’s targeting, the stability has some value too.

David: [00:21:00] After you mentioned that people were installing or sharing libraries through Debian packages, which maybe- I’m as old as you, but maybe I’m too young to see how those package managers could work with apt get. What’s the position of the tooling? Right now I think we are in a state where we have two bigger players such as opam and dune, as Package Manager and Build Infrastructure, we’d call it, I don’t know. Now dune is exploring installing packages. How do you see the tooling these recent years?

Louis: [00:21:48] It’s amazing. It’s completely incredible. Then people will have different opinions on is opam perfect or whatever. If you compare to what it was before, it’s incredible. I think even if you compare to other languages, it’s a fairly solid experience now. Opam is working well. You just need to learn the UI, but it’s working fairly well. Dune is relatively fast, easy to use. The LSP is pretty magical. Merlin is a very solid tool. It was one of the first, I think I would say, like a small language with a tool as powerful as Merlin.

It’s not only powerful, it’s avantgarde. It understood already that you had to be able to do error recovery and that you had to change the way you parse files to be able to work with something that is half broken.

David: [00:22:57] Yes, that’s true.

Louis: [00:23:00] The people behind Merlin are super smart. In a way, it’s not a surprise.

David: [00:23:05] Right. You actually contributed to the LSP and dune, to both projects, I saw your contributions.

Louis: [00:23:13] Yes, I have commits on many small- it’s mostly small contributions, but I have commit on everything, I think at some point.

David: [00:23:19] Right.

Louis: [00:23:21] LSP, I participated in putting some ppx deriving stuff and I wrote a bunch of commands. I implemented some Merlin behavior inside LSP. If you hover multiple times on the same value, the type will be more and more verbose. I took this behavior back to LSP. Dune I have mostly bug fixes, probably small documentation, nothing big.

David: [00:23:58] One of the things that you mentioned as well, I think we talked about this before. When OCaml was very young, all features that got added into the language were PhD projects, where it’s a student that is very passionate or maybe just his guidance is OCaml fan, he just explored with a language in the theory on academia. Then he worked on a paper and eventually it gets released as part of the language. That was the times where maybe Jane Street not even started using OCaml seriously. Do you see that now? Do you see that those features or academic features got into language? Do you think it’s a weird mix now or they compose well together? How do you see language after these contributions?

Louis: [00:25:04] I was looking today at the OCaml change log because I was wondering when was the release of OCaml 4, and that was 11 years ago because OCaml 4 is- before to OCaml Multicore is the last time there was a big change which was GADT. In the meantime there was mostly small changes. I don’t think the language changed much. If we look what were the big features we could say like the objects in OCaml, GADT, OCaml Multicore.

They all were developed by people in a research setup and somehow it seems to work. I’m not a maintainer on OCaml. I think it works also because they don’t have a lot of energy to integrate a lot of new features, they are very, very picky on what they actually accept in the compiler. Only the most solid implementations will get in.

David: [00:26:36] Yes, that’s true. I think the quality is something that everything core team members says all the time that all these things would be amazing to do but our quality bar is very high. Yes, you need to work on it much more to let us just even look at it. Yes, that’s true.

Louis: [00:26:54] Then there are things that do not compose super, super well. There are part of the module language and part of the object language that do not compose very well. You can make the compiler more or less blow up or the compilation time will become crazy. Actually, those are parts that I don’t know very well. I very seldom combine first-class modules and objects.

David: [00:27:21] Objects, yes. That’s something I haven’t done yet. I think the only experience with that combination might be ppxlib maybe, because you have the traversers. Yes, you use them. You instantiate the traverser. You don’t do anything with internal states of anything. Good point. One thing that maybe it’s worth saying is that right now you work at Ahrefs for seven years.

At the beginning when Ahrefs pick OCaml or Igor, our CTO came with OCaml in the back, there were not many companies working on- using it, using OCaml. Now we have Tezos, Tarides, Ahrefs of course, LexiFi, Bloomberg, BeSport. Many companies that have- even some of them have their own forks of OCaml that they are experimenting and deploying it or whatever. Seven years ago, do you think it’s a risky decision? The second question is how can you convince your boss about using OCaml?

Louis: [00:28:35] For sure, I think picking OCaml at the time was a risky choice because who do you hire? It’s like there were five OCaml developers. In Paris, you can find people. In Paris, you can find students. You go to the OCaml meetup and socialize and you can more or less build a company, which is what the previous company I was in called Cryptosense was doing. This is how BeSport came to life. BeSport just picked a few people around Vincent Balat and then you meet people. You steal one or two person from the OCaml meetup and you tell them, “Oh, join my company,” and now you have enough people to push a project forward. How do you do this from another country? Even today, I think it’s not an easy choice.

David: [00:29:35] Somehow risky, yes. That’s true.

Louis: [00:29:38] Today, you can hire, but even if you have, I don’t know, 2,000 packages on opam, the tooling is still- the libraries are not, there are not libraries for everything like you have in some other languages.

David: [00:30:00] Right. It’s big enough, but it’s not populated with everything.

Louis: [00:30:07] I don’t know if we have full support of GRPC. I’m not sure that we have complete support of http2 or 3. It’s not that small, but many things like this. I would say, today I would say it’s a risk. How would I convince my boss to move to OCaml? I would —

David: [00:30:34] Would you do it? Maybe you would not do it. Maybe you say, “It’s fine, we can do with whatever,” with Java you said that you enjoy the Java meetup, then you join your company writing Java. Would you be happy writing Java? Would you be fine? Or you would say “Oh, OCaml here makes sense, let’s try to change it.” How would you do it?

Louis: [00:30:57] I think if I was in a small company, it would definitely make sense to use OCaml. It’s interesting because in a small company you could say all the Java tooling has more value than in a big company, but at the same time you have less hands. You need to be more productive per person and you have less time for maintenance, and those are two things for which OCaml is very strong. You can write few lines of code that do many things, so it’s very expressive. At the same time, it’s solid enough that when you write your code, you can launch it in prod and you can leave it there for some time and hopefully nothing breaks.

The language is stable, the compiler is stable, so there will be no big surprises. I think that’s very valuable, and then you compare, what are the alternatives today? Rust is incredibly hard. It’s very, very hard language to use. You can do fancy stuff, you have incredible community but it’s a super hard language to use. You have what, Python, but then you are losing all the type safety. You have Go, which is a bit in between those. You have a fast Python I would say. Then you have Java. Java which has a huge community, and is a fast language.

In a way I think OCaml is closer to Java. It’s one easy language to use, solid, no surprises. The feature set is not incredible but it’s working well enough and you can do more or less what you want with it. You can do work in the back end, work in the front end, it’s approachable. To me it’s a replacement to Java. It’s a light Java.

David: [00:33:04] I’m mostly front-end. Now I’m doing some back-end stuff but I mostly am experienced from front-end. You are experienced from back-end of course and when I’m talking to back-end persons from OCaml, every time I talk with a back-end person who only writes OCaml they mention then the runtime. From the front-end, it’s a problem that I have never, ever thought. I know that the problem exists because I studied computer science and all these things, but it’s something that in the front-end I never think about it.

How could you describe to me that- I know a little bit about the memory presentation and about the stack, the heap, how memory works, even the O(n) notation, O big notation. How can you describe the runtime of OCaml, from someone who doesn’t know much about runtimes, so has nothing else to compare, rather than notes, for example. That’s my experience.

Louis: [00:34:08] Yes, I’m not an expert either, but it’s an interesting point, actually, because if you go on, for example, the real-world OCaml, there is a whole chapter on the runtime. I think it’s important for the OCaml people because of their background. We have those unixy people, so they have experience with C before and because in C you need to know what is a representation in memory of everything you manipulate, they took that from C and bring it to OCaml. Those people, they like to know, when I have an integer, it’s going to be nowadays, 64 bits.

David: [00:34:56] 63 right? That’s the —

Louis: [00:34:59] Yes, one bits for the right GC, and then we have 63 bits for the value.

David: [00:35:05] Right. Yes, people love the runtime. I think it’s like those things when — You guys started talking about the front-end. For me, I love CSS. I can talk all the time about CSS, but if you never have experience with a language or with designing the UI, CSS means nothing. You understand what they are saying because the thing makes sense, but semantically it doesn’t. When you talk about the run time at the beginning, for me it felt like I have never, ever thought of this.

Louis: [00:35:41] I guess it’s two sides. There is the technical side, how it’s actually implemented that when you allocate a value, where do you put it in memory? What is the representation of that value in memory? For example, we said that the int are 63 bits actually, that when you allocate the value, you allocate by words in OCaml. You have one word, for example, if you allocate the values that is on the heap, you have potentially two words. You have one word, which is a pointer to the actual value, and then the values, which is like a number of words afterwards.

You have the GC, so when is it triggered? Actually, the GC can be running every time you can allocate a value which means that you can write code that will not trigger the GC. It means you can write code that is very fast because there will be no interruption, and I think that’s critical for companies like Jane Street. Then yes, the other side is the runtime from a user perspective. I see it two way. I see one way that it’s like no one knows about the runtime because it’s very, very simple in OCaml. You don’t need to deal with the runtime very often.

You just know that you pass values by reference, so you don’t make many copies, and then the GC is fairly fast and will not stop for too long. That’s probably what 99% of the normal OCaml people know about the runtime. Then an interesting fact that comes with that is that the OCaml compiler is bad by modern standards, that it’s not doing any kind of optimization or very little optimization, yet the native code is fairly fast. The native code that is generated for an OCaml program is fairly fast. I think if you look at the benchmark it’s not too far away from C++, which is surprising, and it means that the language is pushing you to write code that by default is fairly efficient.

David: [00:38:06] That’s exactly my experience.

Louis: [00:38:09] The types that are offered and the functions, the APIs that are offered, somehow allow you to write code that is not too, too bad. I think it’s a miracle, but it’s an interesting one.

David: [00:38:23] Yes, that’s exactly my experience. At Ahrefs, the formula of the Coca-Cola of Ahrefs is like the crawler, the thing that navigates the internet and saves data. After that, we have storage and all of these pieces that are complex. What can you explain about the secret sauce of Ahrefs? What are they from the outside? Many people would never, ever write a crawler or a very dummy one, but for one that indexes 9 billion pages, 1 trillion? I don’t know the numbers but insane amount of numbers. What can you explain?

Louis: [00:39:10] I guess the first question is what is a crawler?

David: [00:39:13] Yes, yes, because you read the webpage, you scrap a webpage, that’s fairly simple. You can do it in any language, but then what do you extract about this page, and more importantly, how you navigate to the next one. I think that these are the two main questions.

Louis: [00:39:35] What you extract, depends. Ahrefs, we care about the links. What is Ahrefs building is more or less a map of the internet. The crawler is running all the time. It’s downloading, I don’t know, like 4 million pages per minute or something like this. There is a counter. Every minute we crawl 5 million pages. We have been talking for 40 minutes. You can count how many pages we have downloaded in a period of time. We download those pages and then we extract the links. That is the main information we care about. This is not the only information.

First is how do you parse HTML and how broken is HTML on the internet? This is horrible. The internet is broken. You have to extract all the links in a page and then you have to store all those links. When you store links, because — What is a crawler exactly? Where does it start and where does it end? Is it only the part that is downloading the html? Or is it actually the parsing too, and it’s influencing how you are storing your data, because — Let’s say you download a page and you have a 100 links in it, you do at least two things with those 100 links that you want to reuse them in your scheduler to decide what do I crawl next.

You also want to update counters, because you want to update your map of the internet. You downloaded a page, you know that there are links and you want to update the map. How do you update the map, because you have a 100 new links? What do you do? You update a 100 small counters, a 100 small integers. Then can you do it 5 million times per minute? Then can you do it in many direction, because it’s a graph.

David: [00:41:48] You would loop. If you don’t do it properly, you would loop forever.

Louis: [00:41:55] You have links between pages, but then you want to also count links between domains and you want to count the links inside the domain. Then how do you decide it’s an interesting link or not? Then when you index a link, what do you index? You need to index the link itself, but you want to index the text that is attached to the link, maybe the paragraph that is around that specific link. You could look where it is in the page. Is it visible or not? You have — It’s an open question.

That’s an interesting question, because there is no one that can say, “I’m doing a crawler and this is the right way to do it.” Even big companies like Google, they make tradeoffs. They decide, “We do it in one way.” Then they gather information they can gather. They cannot download every page on the internet all the time. They cannot download and process stuff fast enough. There is more content that is created than content that can be downloaded.

David: [00:43:00] It’s interesting, because if you think about fixing a bug on the crawler, it’s usually when you have a database, you can run migrations or you can get out data. You can store data broken or whatever. You can fix it. If you have the history of internet, that’s another source of data. It is life. I don’t know the right metaphor, but it’s — If you have to fix a bug on the crawler, that means that you stored information wrongly. That can affect the next version of your map, because it’s not only a map, it’s a map and a timeline. You can just look it up. It’s an archive as well. I think internet archives, they don’t have a crawler. I think they don’t have a crawler.

It’s the idea of — You can improve the crawler. Something that you didn’t look before, now you are going to look it up now. I don’t know, when — I think at some point we — At the beginning, either we started indexing, I think videos was — I don’t remember. Some media, I don’t know. That, of course, blowed up immensely, everything- they complicated everything.

Louis: [00:44:14] Yes. That’s an interesting question. Actually, Because, it is right that your database is very big so you cannot just migrate stuff. The big data page says we have 170 trillion rows in the database, so we cannot just push this to somewhere else.

David: [00:44:35] First of all, what technology is that running on?

Louis: [00:44:40] I think that’s a combination of different technologies. That will be a ClickHouse and then some internal database. Custom stuff.

David: [00:44:52] At Ahrefs, correct me if I’m wrong, we like to build our own things mostly. When I have it in other companies, you would use Sentry for reporting, or you would use PagerDuty for live crashing, or you would use whatever tool that you- or a web server, like a framework that runs your server. I think we implemented all of this by ourselves. That sounds both crazy from the outside, but when you join Ahrefs, if you ever join the company, you understand perfectly why has it been done like that. Yes, we have our own database. It’s scary.

Louis: [00:45:41] It’s not completely our own database, it’s more like a wrapper around existing database. It’s partially because we have no choice. The problem is large enough that you don’t have a ready made solution. Google was like this for a very long time. They had MySQL, I think that they used very extensively MySQL and it’s just that they used it in a way that was working for them. They don’t have a giant MySQL database, but probably they just sharded the problem.

They have one small database per server and they have a smart way to send the tasks to the right server to retrieve the data they want. Because you have to build on top of something, we are a small company. The total number of employees, I don’t know, it’s 100 plus now, but the back-end team is still 15 people or something like this. We don’t have too many hands.

David: [00:46:43] Yes. That’s insane.

Louis: [00:46:47] You ask what you do when there is a bug in the crawler and it affects how you conceive the programs because you know that something will run forever. The strategy becomes, I don’t want to fix bug by hands. It’s, you have an auto healing index. You crawl a page for the first time, and let’s say you make a mistake. The number of links you counted is off by one. You know it was like this for three days because you deployed, it was broken. Three days later, you notice it, and you cannot go back in time. It’s already too late.

Instead, what you do is that you fix your crawler. The way you store the data, you make sure that the next time you crawl the page, it overrides the previous version with something that is correct now. You have to have those auto healing processes, and you cannot attend to every small detail by hand, and the full rebuild of the index will be the last resort. Only if you have absolutely —

David: [00:48:01] Did that ever happen?

Louis: [00:48:06] It partially happened. Not everything, but there are things that were rebuilt once in a while. We were storing two things because we download pages, we download the HTML that we store, and then we have two counters. We have the counters we extract from the page. Let’s say you have one link that you see twice in a page. You have this link and the number two attached to it, and then you have diff. You store diff, let’s say, because you downloaded that page that belongs to the domain ahrefs.com. Now you see that that specific URL, for example, has three links that were not present before.

You store somewhere plus three, and later on you will aggregate all those plus three, plus one, minus one together. There are two different things. You have the absolute numbers and then you have those diffs. Once in a while, we have a bug that we didn’t compute the diff correctly. Then we will rebuild the diff from scratch. We will go back to those absolute numbers, process them altogether and then restore it. Then when it happens, it can take a month, but it hasn’t been done in a long time. It’s a long process.

David: [00:49:28] Right. That’s interesting.

Louis: [00:49:32] This is where OCaml is shining, too, because it’s very easy to have multiple versions of the same type, for example. If you store data, with a version number in the database, you have a variant.

David: [00:49:48] Yes, you treat it differently or?

Louis: [00:49:50] It’s fairly automatic and- yes.

David: [00:49:59] Yes, as well we have diffing on HTML. One of the big features that we did, I think that was last year, that we have diffing for the content of the page, or the diffing of links, we have as well diffing of content. That’s very good.

Louis: [00:50:18] We have a lot of small funny features. We are one of the first company after Google, obviously, to render pages at scale. We have hundreds of servers running chrome, and as much as possible, when we download the HTML of a page, which is the raw HTML, then we will put it in Chrome, let it run for a while and then get the rendered version of that. Which is incredibly expensive time-wise, because it’s much harder work than just downloading the HTML. We couldn’t do this if there was no project like Chrome that would be open source and usable for free. We are standing on the shoulders of giants for this.

David: [00:51:08] Imagine trying to create a web engine just to see a page from the server. That’s very good. What’s the favorite part of working at Ahrefs?

Louis: [00:51:26] To me, is the people. I’m not a SEO expert, and I’m not a SEO fan either. I didn’t join Ahrefs because I love to study the internet, that was not my goal, and I’m not a marketing person, so I don’t have a big use for SEO by myself. At first when I joined, the technical challenge was fun, but nowadays to me, the value is more the people. You get to meet smart people who work on complicated projects. I spend a lot of time dealing with interns, for example, which is super gratifying, I feel. I try to spend a good amount of time sharing with other people, working on the tooling or stuff like this. I really like that part, you can see the influence you have on other people when you make their life easier.

David: [00:52:32] That’s good. What’s your favorite part of OCaml?

Louis: [00:52:37] Of OCaml?

David: [00:52:38] Yes. You can say the people.

Louis: [00:52:41] When it compiles, it works. That’s the key point. Unlike Go, we have some types.

[00:52:41] [laughter]

Louis: [00:52:59] If we are a bit serious, maybe not the language itself, but the LSP is super, super good nowadays. It’s definitely a very good experience. We have to thank Tarides for all the work they are doing on the tooling over the past three years now, because it’s crazy.

David: [00:53:21] I think Tarides is carrying most of the boring work. Not boring work, but work that is always hidden. That you don’t get any fame, but you will only get the trash. When the toolings don’t work, you complain, and when tooling works, you just don’t celebrate it. Tarides is behind, for sure. You have been going to ICFP, ICFP is international conference for functional programming, for quite some time. One of the verticals, or one of the parts of ICFP is OCaml.

Last year we outgrow our neighbors, our language neighbors, I think it was Scala, maybe, Haskell. I think we outgrow them. What do you think about the conference?

Louis: [00:54:20] Same thing, it’s amazing to go there and meet the people actually, because ICFP, it has multiple parts. The main track is a bit more academic. Even though some people who work at Ahrefs, they published there, but they were students when they did it. It’s a bit more academic. Then you have all the workshops that are a bit more approachable, at least for me, but you spend one week with smart people who are very excited by what they’re doing. This is the amazing part. Once a year, everyone is super happy to meet each other. This is very much a good experience, but it’s- about conferences, I think the ReasonML ones were very, very nice too.

Again, I’m a back-end person, but it was super cool to attend a conference on a different topic where people have different interests, and it was the early days of the language for people who were super interesting, the people who attended were curious and wanted to see something new. They had different ideas. I think that was super good.

David: [00:55:36] Yes, those conferences were very good. I haven’t attended any, actually, but yes, I heard Javi saying amazing things about them. Almost everybody who attended said good things. Why there’s no OCaml conference?

Louis: [00:56:00] This is a multifold answer, because it’s actually a question that was asked. There is this, how is it called, the OCaml Software Foundation, because if we can explain how OCaml, the management of OCaml is that —

David: [00:56:23] Please do. As core contributor of the code of conduct, please do.

Louis: [00:56:27] There is the core group of contributors for the language, and out of that before there was something called the OCaml Consortium, I think, where companies could pay few thousand USD a year and it will give them a license to use OCaml not as an open source project. You could get the compiler and do changes on it and you didn’t have to publish the changes again. It was also a way to just sponsor the OCaml development. They took it to a different level. They created this OCaml Software Foundation that is pushing some efforts around OCaml.

One question was, do we want to have an OCaml conference or do we want to have OCaml- maybe not conference, but smaller events but that could happen more often. Where will we put those events in the world? You need people with time, you need people with money, and you need to find the right place for the right people to attend. I think no one has all those resources, including the mental space to build fancy ideas on what to put in a conference. I cannot provide a definitive answer because I’m not the one deciding on all those things, but I think it’s a combination of all those that makes the ICFP the place to be.

David: [00:58:05] Right, because this year is on Seattle. Every year it changes the location.

Louis: [00:58:13] Yes.

David: [00:58:16] Last year we released we- I didn’t but yes, OCaml released multicore and effects or handling effects. We chatted a bit, a lot at work about this, and I think you said multicore was not something that needs to happen, but you are not very excited. On the contrary, you said the effects are a big deal. Thinking about the person that doesn’t know a bunch about what effects are, could you do a short summary and then explain why those are exciting?

Louis: [00:59:01] I’m not a specialist with effects either but to me, a parallel will be to talk about Rust. In Rust, you have those ways to, how is it called? Borrow checker. You have a way to know to who one value belongs. It heavily affects how you are writing code because then you need to architecture your code in a way that is safe. Do you know that, for example, that value can only be used by one bit of code at a time?

David: [00:59:45] Right. Otherwise, you would have crazy bugs. Data corruption

Louis: [00:59:56] That would be C. Rust came in and was provided this safety. That’s a bit the same idea in OCaml. It’s like you come with effects and I think it has many usages that I do not completely understand, but some of them allow us to change the way we do concurrent or parallel computations, and it makes it safe. Like the borrow checker makes the Rust code safe. This is definitely affecting the way you write code because now you have one more tool to express your ideas. I think this is definitely changing the way the language will be used.

While the multicore, it’s just in the background, it is happening, but this is not the tool. This is just a mean, so it’s like, how do we do fast computation? Do we need to split stuff on different cores and how do you do it? Either you fork or you do multicore. In a way, it could be completely hidden behind a magic API and I will not know if it’s fork or multicore, and it’ll be fine to me as a user. If you have something like the borrow checker in Rust, this is actually a language feature, and this is something I see day to day and it is affecting how I can think and what I can express.

David: [01:01:36] Right. I see, I see. Yes, because right now one of the features that Jane Street, the famous company that does the Wall Street and whatnot and pushes OCaml for the next level, they have a team working on the OCaml compiler, and one of the big fears that they want to work on, I think, they call it locality or local, global variables. Would that express, I have no idea about those rather than watching Stephen Dolan at the presentation on last ICFP, but would that allow some of the users of OCaml that they do care about the memory layout or the owner of the variables to express those different changes regarding using multicore?

Louis: [01:02:34] I don’t know exactly — My own — I mean, I have a light understanding of that. To me, I think it would be interesting even if there was no multicore, the stuff they’re doing local/global because we already had concurrency with lwt or stuff like this. It has benefits because you control your allocations too. You can decide what is allocated on the stack versus what is allocated on the heap. It can have big performance implication.

This is an exciting feature, but this is maybe where you see my C background that when I was in uni, the first year was just writing some C code and we had to rewrite Bash. We had our own version of Bash. We spent two months writing in Bash or stuff like this. We had to deal with many of the small like — You are launching a bunch of processes together and you have to manage your memory or whatever, or we had to rewrite malloc, so we —

David: [01:03:53] Okay.

Louis: [01:03:56] I know little bits about memory management and how to deal with pointers because I did those projects in the past. These local, global things seems appealing, but at the same time, it’s probably not critical. It’s not going to change the vast majority of the code that is written in general. All my small personal projects or even most of the code that is running at Ahrefs, performance is not key. I care more about the fact that the code is readable and stable rather than performance, I would say.

David: [01:04:42] Yes. Only in somewhere else. Yes, I see your point. Usually, code, for example, just like a web API were way, way fast enough. There’s no point to optimize the endpoints. Most of our endpoints I think we have 500 endpoints. That’s optimizing one by one or optimizing 10% of them, they would not change absolutely anything.

Louis: [01:05:10] We spend so much time doing queries to different databases or http query to gather whatever we need to gather before to answer a request. This is so expensive compared to what we do most of the time.

David: [01:05:28] Right. That’s true. That’s true. Why do you think Ahrefs is such a different company? When I have experience with the — It’s because of the culture maybe? Here we don’t have real management, we don’t have product owner. We don’t have many things that when you come from working on the SaaS companies that are from the culture of US maybe, or some Europe companies. In Ahrefs we don’t have anything like that. Can you say that it’s good or bad and why?

Louis: [01:06:07] It’s good and bad. There are definitely some downsides. Why it’s like this, is also because the company is young and small too. It’s what? 10 years old so it takes time. Every time you want to make a change company-wide, it probably takes two years to actually make change happen.

David: [01:06:33] Okay.

Louis: [01:06:35] This is not the only company with a structure that is not well defined.

David: [01:06:44] Right.

Louis: [01:06:44] What happens is that there is a structure, it’s just people don’t have the title because actually when you have been in the company for a long time, you own some bits of codes and there are people who are expert on the subject. Then there are people that you trust for something and people that you trust for something else. Even though there is no direct management, there are people taking decisions, so who is taking the decision, right?

David: [01:07:16] Right.

Louis: [01:07:17] It’s good to be flexible and it allows more or less anyone at some point to take a decision if they want to and if they dare to. The downside is that sometimes you don’t know if you can take the decision or not, and you don’t know who you should talk to and then there are some hidden politics because some products, some features there, they belong to someone. You don’t want to offend that person so you can’t go and touch this or stuff like this.

David: [01:07:51] Well, I need to interrupt here. I think you have been way too long at Ahrefs to realize what politics — What it means because in Ahrefs there’s literally zero politics or not politics, but battles or discussions for the sake of discussion, it’s nearly zero. I think that’s one of the things that at the beginning of like, are we not talking about this and somebody said no need to. It’s the culture of getting very direct and very technical focus.

I think when you work in a company that you can be weeks without knowing what to do or just months working on so many processes that are close to useless from your point of view or maybe very beneficial from an individual contributor, you feel like you are losing your time. In Ahrefs I don’t think I have been noticing a layer I thought I’m losing time because of the company it’s the other way around. Oh, my peer is asking me to implement something that needs to be done and I haven’t finished yet. That’s more the feeling of the work, right?

Louis: [01:09:09] I guess it’s not politics looking for power because there is no power to gain.

David: [01:09:16] Exactly.

Louis: [01:09:17] What do you want to own? There is nothing to own. You can try, but there is nothing to win at the end.

David: [01:09:26] Getting to the last questions now, but have you been following a little what Javi and Antonio and a little bit of myself having worked in Melange. What’s your opinion about Melange?

Louis: [01:09:43] I know what Melange is.

David: [01:09:46] Definitely.

Louis: [01:09:47] Okay. What do I think about Melange? Again, it’s a question that is hiding other questions. Let’s say technically, for example, this is pretty impressive. What the four of you have been able to do in a few months is amazing. Because just to give some context, it’s like moving — Okay, Melange was not super, super alive, six months ago. The project was moving but slowly. There was no Dune support, there was not much stuff happening. Then six months later, you have the whole Ahrefs front-end, which is like, hundreds of thousands of lines of code that are written by what? 30 people maybe now.

It’s completely moved to Melange. This is amazing. I’m able to compile all this code in one comment. I go in the repo, I do “make dev” and everything works.

David: [01:10:56] These and many more advances, but yes, that’s the part that it’s funny.

Louis: [01:11:04] Yes, it’s amazing. It automatically works and it didn’t break the experience of anyone so it’s compatible with what was bucklescript or rescript beforehand. It’s compatible with native code at the same time. It’s amazing. What do I think about the project? Another side of the question will be, was it the right thing to do to fork rescript? Or, is it the right way to do it? Is it good to have a fork of the compiler inside of Melange to achieve that project? I don’t have a strong opinion on it.

I don’t have enough experience. After all those years of seeing Reason and Bucklescript evolving, I believe that the experience of the end user, so the developer that is using these tools, is more important than the technical implementation. Is it the best way to do it? I don’t know. Does it give a good end user experience? Yes, then that was the right thing to do.

David: [01:12:21] On those tools, you would always prioritize the developer experience, rather than technical merits? How would you choose —

Louis: [01:12:33] As a user or as a developer of those tools?

David: [01:12:36] As a developer of those tools.

Louis: [01:12:39] As a developer of those tools, given the target, given what I see of how you build a community and it’s like the early days of Melange, I would prioritize user experience. I think, for example, all the efforts that have been put into making Dune work, I think the target was the user experience at the end. Because we couldn’t make it work another way. If we didn’t have this, I’m not sure that we would have moved to Melange, for example.

David: [01:13:19] I see.

Louis: [01:13:20] What’s the downside was that for example, this is not the fastest implementation there is. I think there is some many different calls to this Melange compiler that are not the fastest way to do it, but the UI is good so we still use it.

David: [01:13:44] What I would like to feature, knowing that at the start Bucklescript got born, even though they were like js_of_ocaml. Now, I think, eight or nine years past, Rescript got its own path, but then Melange is trying to again, be part of the OCaml to Javascript compilation, or Reason to Javascript compilation. How do you see the future? Because eventually, nobody wants to have two ways to Javascript.

Louis: [01:14:16] I’m not sure that’s true. Why would people not want many ways to do the same thing? It’s like if you look at other languages, actually, many of them have different ways to do the same thing. Why not OCaml? As long as the projects don’t die, it’s not like Melange is attacking jsoo or jsoo is attacking Melange. It’s like, people don’t hate each other. They are not fighting for users; I think the targets are a bit different.

David: [01:14:53] You would want different ways of combining to Javascript? Because the sane competition? That’s true-

Louis: [01:15:03] To me it’s not the sane competition. It’s more that I think it targets different audience. It tries to do different things. One example will be during one of the Reason conferences, we wanted to do a workshop and we wanted to show atdgen which is a tool we’re using a Ahrefs lots to parse and write JSON. It’s protobuf but for JSON.

David: [01:15:34] Yes, it would give us type safety from front end all the way down. Sorry, backend all the way down.

Louis: [01:15:43] Yes.

David: [01:15:43] Sorry, go on.

Louis: [01:15:46] You have as with protobuf or with Graphql too you have a definition, you have a file with type definitions and from the definition you derive OCaml code or Python code or TypeScript code. It supports multiple languages. To do so you need an atdgen binary. In the Reason conference, you have people using Linux, Windows, Mac, different version, whatever so how do you give a binary that everyone can use? In two minutes, I just went into the atdgen repo and I enabled js_of_ocaml compilation inside Dune and now my binary is actually JS file that I can run in node JS.

David: [01:16:34] Right.

Louis: [01:16:35] I don’t think that Melange aims to do that. Because then —

David: [01:16:41] I think that’s the magic. Yes, I agree.

Louis: [01:16:44] In Melange you will have one file per module or something like this, which means I will need to run through webpack or something like this later on.

David: [01:16:51] Yes, you could but you would face a few problems. Marshall for example, that it’s the encoding/decoding on bytes, that doesn’t work in Melange.

Louis: [01:17:01] Well, it doesn’t work in js_of_ocaml I think too.

David: [01:17:05] Yes, but I think you can stub it, right? I think you can —

Louis: [01:17:10] But I would say most of the time actually you don’t care because it’s corner cases, it’s just that the UI they provide is good enough for OCaml people and Melange, it provides the nice, what? FFI, for example, to interact with the JavaScript code. The way it outputs code is closer to the JavaScript way too; I would say so it’s easier to make webpack or other tools like this work together

David: [01:17:39] Yes, I agree that those are different targets. It just my point of view was more like, okay, js_of_ocaml the crazy thing is that you have entire project in OCaml. You add one line say in Dune compile to node JS please and then you have a single file that is compiled to JavaScript. That’s insane so if you have, for example, a compiler written in Menhir it’s a language to write compilers in OCaml, you can compile it to JavaScript in one line or any library, even drivers, even anything that you can imagine. That’s the valuable position or thing that gets people to try js_of_ocaml very fast. But on the contrary, the documentation is very bad.

It’s the classic OCaml project that you need to understand 50% of the project to even start it so that’s like — For people like me, I invest a lot of time trying js_of_ocaml and even try to write bindings to React and succeed but I did not succeed convincing people in Ahrefs, front end of Ahrefs to try js_of_ocaml. For me, that was the — That technology is not good enough for prime time or not good enough to convince my team, then yes, there’s no way to convince any other.

On the opposite, Melange fits together the low barrier to try and good documentation and at some point, it gets complex but the ease of experience I think it’s much better. But yes, you don’t have a one line — You need to meddle it a bit on building the integration with your front end or your pipeline but yes, once it is done, it works. But yes, you would never do that with atd. The experience in atdgen that’s not going to happen in Melange.

Louis: [01:19:41] It’s funny how you say it and it’s true that it’s easier. Many things in Melange are easier to experiment with and at the same time it’s more complicated. For example, in js_of_ocaml you have a clear separation between OCaml types and JavaScript types. String that is an OCaml string is a different type from JavaScript string.

David: [01:20:10] You have like a wrapper, right?

Louis: [01:20:12] It’s very explicit and it’s good for the OCaml person because then you know when this is a part of the language you are comfortable with and then when it starts with JS, it’s okay, be careful because you don’t know what you are doing. This is easy and in js_of_ocaml. Because it’s very easy. You see JS dot and then you know now I have a JavaScript value. In Melange it’s your string is what? And you have to deal with the encoding. What is the actual encoding of a string in Ocaml?

David: [01:20:57] Do you remember that I said that every time that I’ll talk with a backend person, they always mention the runtime. Exactly that moment. You always think about the runtime.

Louis: [01:21:06] Actually, I’m not sure.

David: [01:21:08] It’s not the runtime itself, but the encoding. In Melange for example, of course, all the types not of course, but all the types that you have in language are the same representation as a JavaScript value. For example, a string is in a string, integration is a number, float is a number and so on and so forth. Variant is an object; a record is an object. Melange maps perfectly or as good as possible to JavaScript values. It’s cool that you said that when every time that you work with js_of_ocaml, once you see JS dot, whatever this is the namespace and you know that you’re treating with things that come from the client.

For example, that’s a barrier for people that tried rescript or tried Melange in the first place because they don’t understand why do I need a wrapper? Why do we need a generic for at type that already have? Because it’s the mentality of why do I need to care about the runtime?

Louis: [01:22:17] Yes, basically you pay a cost but at a different time, like in js_of_ocaml, you pay the cost very early because as soon as you write code you need to make the difference between the two words. In Melange you will only pay the cost if you write FFI and you need to care about the representation. It’s if there is a string with something weird in it, you don’t know the encoding of the string, for example, then you need to be careful.

The experience by default is much easier. It’s just that when you are dealing with the boundaries then things can be a bit more implicit and probably you need to know the language better to do things the right way. It’s easier and it’s actually more complicated in some bits.

David: [01:23:08] Yes, I think if you look now, js_of_ocaml and Melange are very drawn line. You can draw a line between the tradeoffs. One side is very clear, one side is very clear. Now, I would say that I’m comfortable saying that both are balanced for the users, even rescript now. For me now, I have a feeling the three of projects are in the right column. You can classify them perfectly now. If you get into, “Oh, I want to try this ML or like OCaml, whatever language as a whole,” you can choose — Based on your team or your decision. You can choose clearly one another.

Louis: [01:23:57] Yes. Actually, you said you couldn’t sell js_of_ocaml to Ahrefs, but we can probably talk a bit about what was the discussion, what happened, because Javi and you, you actually tried to do something so that it could happen. You work on the React bindings and then you try to show that it could work. In a way, I think that js_of_ocaml, it could fit what we do because we don’t depend on a lot of external code. One very interesting thing in Melange is that the FFI is very good. It’s easy, convenient, to interface with other existing JavaScript libraries.

In Ahrefs, we have bindings to what, React, and then maybe one or two library to deal with the timestamps and charts. We don’t have millions of bindings. We have maybe five big libraries we have bindings for, and then a bunch of smaller stuff. We don’t bind to so many things and we don’t need FFI that is amazing. It’s not a priority. js_of_ocaml could have worked.

David: [01:25:14] Could have worked. I agree.

Louis: [01:25:16] The fact that even in this perfect setup for js_of_ocaml it fails is interesting. You find the right company with many OCaml people, many people who understand js_of_ocaml, and you don’t need one of the best features of Melange and still, this is not actually the tool that won at the end.

David: [01:25:42] Yes, that’s true. The experience, I think that’s exactly what you said, it could work, theoretically if you look at the direct from the outside or even if you look at far from the front end, it makes a lot of sense. Once we were working on this, I was working on this middle-end team, before it was not called middle-end. I was working on the middle-end, and most of my assumptions were like, “It’s going to work perfectly.”

Because of what you said, right? Then when we try to — How can we write React, we are married to React. I think we like the model of components. We like the model of data; we like the composition. We’re not going to change React. Let’s bind it to React, so we create the same PPX and the same library to React. I think that was how Javi started and then we end up finishing.

did the emotion binding, so I know the CSS, everything worked and we felt like js_of_ocaml was very mature, but there were a few problems that you could not solve easily at the time. At the time js_of_ocaml didn’t have Unicode support. Now they have some Unicode support or the parsing, I haven’t followed that closely, but you would need another library to run to get the Unicode support that in Melange or Bucklescript at the time was natively. That was an issue. The other issue or biggest issue that you can’t bypass is that js_of_ocaml, you compile it in one file, one gigantic file.

Incremental migrations were very hard or very difficult to iterate over time. You could migrate parts of the app, but then you would need to compile everything in both, have two duplicated apps. It was definitely not — The migration plan was impossible. We could try. I think we tried in one of the small apps, I think we could try wordcount, is one of the verticals we have at Ahrefs, with js_of_ocaml, and once we were trying those, we find the wrapper, it was very hard to sell.

The wrapper is like the Js.t that we call it in Recript, in js_of_ocaml I think it’s JS.object. It’s unsafe. You have JS.unsafe. There are many, yes, many constructions you can track with JavaScript differently from what we do with the bindings. That part was — With these three things that I said Rusty, which is one of let’s say the only Tech Lead at Ahrefs, like the only person that — He’s the CTO in the frontend, how I call it.

He was the person who we would need to convince to migrate to the frontend. He was definitely not on board with the idea. I think that’s the main reason. He would chat with our people and people would say, “Yes, fine, if Javi and David are happy, then we are all happy,” but even though we migrate one small app, the experience was worse. The user experience of iterating over React components was worse or even the data was worse because you had this wrapper.

Louis: [01:29:23] There were too honest in the way they named functions in the API. For example, all those unsafe functions, they exist in every FFI, it’s just not called unsafe, but because it’s called unsafe, people are like, “They’re not going to use this, you are not supposed to use it.” Yes, you’re supposed to use it. Just be careful when you do it.

David: [01:29:47] I think you explain to me that anecdote, is that somebody asked Xavier Leroy the creator, the author of OCaml, they ask, “What do you think about Objec.magic?” Right? Object.magic is the method of OCaml that you can, like unsafe, coerce any variable, right? You can light the typechecker and say, “Trust me, this is whatever, an array and it’s a list or whatever”. His answer was, it is like when you are working in the street, would you inject — How is that called? I don’t remember the thing, but would you become a junkie?

You get a syringe, I don’t know how to say in English, but would you inject some random thing on the street? That’s not part of the language. I think you’re explained me the anecdote, or maybe it’s Javi, How do you see the purity of OCaml? Do you think that the OCaml is very pure or has some pragmatism on safety? Because of course it’s type safe, of course, the compiler when it compiles it works, but you can bypass it from time to time. What’s your opinion?

Louis: [01:31:10] I don’t think it’s pure in any way, shape, or form.

David: [01:31:17] You can write pure code, right?

Louis: [01:31:20] Yes. You can write pure code. But for example, you have exceptions that are very pregnant, that are everywhere and you don’t have any way to know if a function can raise an exception or not.

David: [01:31:37] Right.

Louis: [01:31:38] Okay, it depends what program you write, but basic things, you run your program, like a CLI that is running — I don’t know, downloading something and you press control C like you want to stop your program. There is — It’s a signal, and in OCaml it’ll raise an exception that you need to — You can catch and you can do something with it, right? At any point in time, the user of your CLI can come and interrupt the program, right? Which means at any point in your program, you need to be able to deal with this interruption.

David: [01:32:18] Right.

Louis: [01:32:19] It’s like as soon as you have these where is the purity, what is — You have no good way to protect yourself against all these issues. At the same time, I’m probably biased because I have been using the language for long. It provides you what is good enough. There was some improvements because, I don’t know if you remember, but at some point, the strings were mutable in OCaml.

David: [01:32:51] Yes.

Louis: [01:32:53] By defaults, the strings were actually what is called Bytes nowadays. It has been a big change. People had to fight to turn Bytes into string, because it was breaking code, obviously. There was more mutability. It was not as pure as it is nowadays, I would say that the balance is not too bad. Could it be more? Probably. There are some things that we can’t really express in OCaml, like ownership of a value.

Like you open connection to a database, you have a handler or something like this that you want to use only at one point in time and you don’t want to share. You have no way to express it. Then you can’t really protect yourself against the steal. The code can take that value, put it in a global reference, and it can be suddenly reused elsewhere. This is where, for example, the local/global stuff —

David: [01:34:06] Yes, solve exactly that issue.

Louis: [01:34:09] Yes. This kind of issue. Is it a problem? Yes. Is it a problem that we face at work? Yes. For example, we see like — We have one problem where people can open the connection to a DB using one of those — A common pattern in the ocaml to do like “with_db” for example. Then you pass a continuation, you pass a function, and then this with_db function will create a DB handler and pass it to your function later on.

Inside your function, you can do one more with_db. This is something that you probably want to forbid because you don’t want to open connections after connection after connection when there is already one available.

David: [01:34:57] Right.

Louis: [01:35:00] For now, how do you fix this? This is an actual problem and you have no good solution. But maybe you write different code than I do. You write code that is in the browser or just behind the browser so maybe you have different views. You have to deal with more mutability than I do, for example the whole DOM, before React. Yes, but before React no one assumed that anything was immutable in a browser. Everything could be changed at any point in time.

David: [01:35:43] Yes. That’s why in the browser many APIs were pushing for observables, right? You accept mutability into all your values and then you say any value can change in any time and you need to subscribe to — Listen to the changes or not and that’s the trend of — I think that was one of the biggest inclusions of ES4 that didn’t get published and they tried with ES5 and they didn’t get to the language neither that are like these observables concept. I think they come from React JS and they come from reactive programming from, I don’t know, 30 years ago where sometimes reactive is very useful. Before React, I would say that not many people did — The immutability was not part of their fashion of writing code.

We are very far from those problems. We do immutability in a few places, for example, we have a global theme, a CSS theme, right? You can have a dark or light theme. We interact with the browser directly. We opt out from React to do that because the performance is better. You can load that at the beginning, you can then allow React later. But the way it’s just very self-contained, right? You will never want to write your data reactive.

Maybe you want, but for example, for Ahrefs it doesn’t make any sense because our data is you have tabular data that never changes on your session, right? It’s not live data. It’s like you open a report and the report is the moment time that you request. There’s no live thing. Nothing is very reactive in nature so yes, for us it’s just like a perfect sense.

Louis: [01:37:41] You would be happy with more purity?

David: [01:37:45] Would we be happy with more purity? No, I think —

Louis: [01:37:49] Would you wish to have a language that is closer to Haskell that is like —

David: [01:37:54] No, I don’t think so. I think no, because purity makes — Purity in some places makes your life so much better, right? But often you want the tools to be pure so like libraries that you create or you consume need to be pure but your application needs to do all sort of things, right? Your application or when you are a product engineer, you want to just ship fast and if something gets you in your way and you store it globally and deal with it later or store it globally and be safe and then forget about it.

You need to do things perfectly and draw the line and architect things that slows you down insanely. I think the line — OCaml is very well position where you can opt out, do your life easy and then move back and run fast. But yes, my tools to be pure or libraries that I’m using or even — I know I’m working on styled-ppx so making types safe, like your styles. I think that’s something that I’ve been pushing but yes, you want that tool to be type safe.

You don’t want to do all your things on your app perfectly, to demand it perfectly mostly because on the web, everything is changing all the time. On the backend it’s a little bit different but on the web, iterations are just much more common than in the backend

Louis: [01:39:31] I find it interesting that everyone is pushing for immutable stuff. At least my impression is that in the front end React maybe didn’t create this trend but made it popular. My understanding is that you deal with the DOM as an immutable object. You never manipulate the DOM directly anymore. You do it through React. You have an immutable object, more or less, which goes against many things that happened, historically in a browser, the way the DOM is implemented is completely not like this. It has some interesting benefits.

You can have any extension in your browser that are changing part of your page. I’m using one daily. I’m using Dashlane to store my passwords. It does stuff for — If there is an input field, it’s creating a popup and I can click, input my password in that specific field.

To do this, it has to inject HTML in the page actually. But it breaks. Some apps are crashing because of this. Some apps, some websites, they’re not crashing. They will just see that there is a change coming from my extension and they will just discard it and rerender without my stuff.

It goes against many things that happened for 20 years, more or less. You have to make both of those worlds still somehow work together.

David: [01:41:18] I think React did — I think not even React, Meta did that all the time. They pushed for a solution that is way better in some areas but destroys previous effort insanely. For example, with GraphQL, I think it’s happened the same. They say, “You’re going to have one endpoint. It’s going to be through POST.” You would call this endpoint all the time, which goes against completely about REST what we were doing before.

Of course, with all the tradeoffs, if you go 10 times — 10 years back and you say to a person, “No, we call all the time same endpoint.” You will say, “You guys are stupid.”

Louis: [01:41:58] RESTful was a trend.

David: [01:42:00] Exactly. The RESTful, you will need to add the link to go to the next resource. Of course, all the people would scream at you. Similarly, it happens now with server components. I don’t know if you are following the thing. Again, they are pushing for a new concept that they have mined in their business and works well. The rest of the people are like, “No, that’s just insanity.” I think React, the first concept is you — The insert and update are the same operations.

There’s no create the DOM and then update the DOM. It’s always like, “Do the thing.” It’s just rerender. Just because they push for that approach and they delay, of course, the first load is going to be slower. You don’t have serialization. Then later updates are going to be faster.

Louis: [01:42:58] Actually, even the later update are slower, because you need React to do this diff between the two version and to only update the relevant part of the DOM. React is doing what the browser was actually doing. You are duplicating the work and you are doing it in JavaScript, which is slow. The browser was doing it in a very optimized C++ code

David: [01:43:26] Fair point, but the DOM is fast enough. I think I read a lot of articles about anti-React that the DOM is fast enough. I think that’s — These benchmarks are nonsense. Those benchmarks — Even the benchmarks that — I would call it micro-benchmarks. Even microbench like — I don’t know, work faster, implement some charting library mutable to DOM and using whatever, Vanilla JavaScript or using React. Of course, you can outperform React but at what cost.

The cost of creating two charts, two components of a chart, the API nicely, blah blah blah blah rather than mutating the DOM all the time. That’s super expensive. When you’re working, for example, for Ahrefs, I think we have — I don’t know, 2 million lines of code in reason. I don’t know, 5,000 components. Some insane amount of number of components. If you do that in mutable or maybe not mutable but maybe just going to DOM and listening to DOM and hoping that everybody is a good citizen. It’s just you would slow down development so hard that it doesn’t make any sense.

I think React draw this line where virtual DOM is like if you know how to create the structures, know how to trigger the renders, I think can be as good as the DOM, of course. In general terms, I think it’s good enough. I think the balance is very OCamly. [laughs] You know that Jordan was behind, when — Jordan is an OCamler that knows this balance, that —

Louis: [01:45:23] It’s funny, because you could say that for example Facebook has enough resources to make it work, right? They could have decided it is going to be a more troublesome for our developers, but we are going to offer a faster experience for the customers, the people actually visiting the website. But even them with their infinite resources, they decided to go in a different direction. Is it because it’s a tech company? I feel that some big tech company, they run the way they do, because they are led by technical people, so they can make technical choices instead of business choices sometimes.

David: [01:46:08] React is that, exactly that case. I think I saw the documentary about React, and it’s more the idea just spreaded everywhere. I think they have that competing library; I don’t know how, I think it’s called Jacks or something. I don’t the name, but they have a competing library that was written with PHP and XML, and the whole stack on meta back in the days. The idea of React spreads everywhere. The point that you made before of developer experience leads everything in the early days. I think that it just applied exponentially, because at some point they released publicly, everybody hated JSX. Then after one year it was the most famous library used. Then from now on, the monopoly went for, I don’t know, six, four years, I don’t know. So many years that now everybody has the component model, the hooks, the state. It spread the idea everywhere. It’s interesting, but I think it’s just like [unintelligible 01:47:11] , so.

Louis: [01:47:13] How smart do you need to be to Jordan and be correct about React was the right thing to do, and to do it properly, and then Reason was the right thing to do, and to do it properly?

David: [01:47:25] How smart need to do. I don’t know. Uncountable, I would say. Hard to quantify smartness. It’s hard to quantify smartness, but I think it’s even harder when you look at what Jordan has been doing. I think Jordan is the kind of person when you speak with him, he talks, or he says things that doesn’t make sense at the beginning. It’s his way of thinking. He’s thinking three, five years ahead, and when he explains the idea to you, you are like, “I got the sense that I didn’t understand anything.” After a few months you start saying, “Oh, right, it made sense.” I think everybody on the React team only says how brilliant Jordan is. So, yes.

Louis: [01:48:18] It’s interesting that you interviewed Rudi, the author of Dune, and he said that even him, for example, originally when he saw the Reason syntax, he was like, “meh,” like, “What is this thing?” Like, “Yet again?” He say, “Yes, at the end it was right to make it more approachable.” This was from a user experience perspective. It’s a clear benefit.

David: [01:48:44] Yes. I love that interview, because Rudi said exactly that, like, “Oh, at the beginning Reason felt like a toy, but then we were doing tooling for OCaml, and the language and a lot of work on the actual language. Reason was the thing that bring more people on OCaml, than we never, ever did. Even though it’s not a competing language, even though the person that created has so much power into the frontend people, even that the number of people that got into OCaml community, it was bigger than any other effort that we made. [laughs] Which for me is super funny at the end of the day, because most of us came from the JavaScript and end up doing OCaml and mixing everything. That was the idea.

Louis: [01:49:40] It’s funny how you need a little bit of luck for all those things to work. You need, for example, Rudi to decide early on, “Okay, I don’t really trust this thing, but I still I’m a good citizen, so I will add the support inside Dune”

David: [01:49:56] Yes. That’s very noble.

Louis: [01:49:58] It’s like if you don’t have those people who are able to compute this is maybe good or maybe bad, and are able to balance their opinion, versus the community thing. They have to do it early enough, at the right time. It’s interesting that somehow —

David: [01:50:24] Of course, not what everything Jordan says is correct, I think in the sense of, of course, he created stuff that was definitely not on shape or not on the area of success of React, of course. I think he created React and then React native. Of course, they both are insanely successful. Reason I would call it successful as well, but actually, push the idea of Esy. Esy is the package manager that is still somehow used, and some people love it. Even myself, I have a lot of respect for Esy, and use it from time to time. That you can consume JavaScript libraries, npm packages as well as OPAM packages.

This project has been suffering for long, that it’s definitely not the right solution, or at least, it didn’t create these ideas, this sudden idea to the rest of people to continue pushing for it.

Louis: [01:51:27] This is one of those tools where I think the technical implementation was good, but the UI was not great. The output is just not nice, for example. You run it, and then it displays some-

David: [01:51:44] You mean, the actual UI, the CLI?

Louis: [01:51:46] Yes, because opam, which is not the most fancy tool ever, but still when you opam install, it has some colors. It doesn’t display one line production. It’s some kind of somehow clean output that Esy doesn’t have. You had to learn this weird JSON syntax to put your package, and it outputs ugly text after that. The idea was very good, but the UI was not completely working.

David: [01:52:24] I think Esy has some, people call it state-of-the-art ideas, the end goal of our package manager, what do you want to do, or we want to use. The efforts of maintaining the overrides or being on top of all the libraries or even compiles Esy, there’s a lot of maintenance that needs to get done. At some point, we have a team of, I think, six, seven persons working on it, and that experience was very good, but from when Reason got a little bit lost, this team, those people —

Of course, when the blockchain companies started hiring all of them to work for Web3 and paying them insanely amount of money, then the project got a little bit carried over, got a little bit less maintenance then is likely in a stale mode that you can use, but you could get not as good as opam.

Louis: [01:53:35] This exactly why there is no Rust code inside Ahrefs, because all the developers got stolen by the blockchain companies.

David: [01:53:45] That’s fair. That’s fair. I think, Louis, we are running out of time. For me is daytime, I can do stuff, but for you is definitely night time. I can work, talk with you for hours and hours, but I think the show is reaching to a point to finish. It was a pleasure to have you, of course.

Louis: [01:54:07] It’s a pleasure.

David: [01:54:09] If somebody attends to ICFP, please go to Louis, I think he’s the party manager, and as well, a person very interesting to talk to. Please, bother him. I think he’ll be in Seattle on September.

Louis: [01:54:26] Yes, Seattle, September 4th to 9th, I think, something like that. It will be online too, this year. I think it will be online and for free. All the talks, at least all the ML, or maybe not ML, but the OCaml workshop will be online of free. There is no need to travel all the way to Seattle to see the talks, at least to see the OCaml workshop.

David: [01:54:52] Makes sense. In this era of internet, I think that makes sense. Cool. Thanks everybody for being here. You’re having a little bit late day, but that was perfect. Thanks, Louis, to spend time with us.

Louis: [01:55:07] Thank for having me. That was fun.

David: [01:55:10] See you guys.

[01:55:13] [END OF AUDIO]

EmelleTV: Talking with Louis Roché about OCaml and Ahrefs

Transcript

Written by Louis Roché