What the hack is PNPM?

What the hack is PNPM?

An excellent replacement for NPM/Yarn

ยท

7 min read

What is a package manager?

Before we begin, I'm assuming that the most of us are already familiar with NPM and Yarn. If you don't know, allow me to explain in simple terms. NPM and Yarn are package managers. You could be asking yourself,

hmm..., What does that mean?

Let me break it down. So, as the name would imply, the term package sounds like a bundle or container (meanwhile in my mind ๐Ÿค”, my Amazon products package delivery). And yes, you guess it right. It can be compared to a wrapper that encases your code.

For instance, let's say we create an utility (like a currency converter) and want to make it accessible to the public so that others can use it. So we bundle it up and that's our package.

Now we need to host it somewhere so that people can use it. Thats where Manager comes into picture. The manager takes care of our packages and packages build by others.

So NPM and Yarn are managing our packages (in simple word codes) to be used by others. It helps other to save their precious time to build it by themselves. NPM and Yarn are specifically for JS packages. There are package managers for other languages too. For ex: Composer for PHP. Now as we get familiar with what package managers are, lets moves into our main topic.

Note: It just a short overview of package managers, so that you can have an understanding of what they are. There are other additional variables and regulations at play. This NPM official guide, npm.github.io/how-npm-works-docs/theory-and.., contains comprehensive information about packages.

What is PNPM?

PNPM is a JS package manager. However, it has several incredible advantages over Yarn and NPM. Its fast, efficient and uses minimal storage space.

What problem it's trying to solve, why should we use it?

Now you might be thinking,

hmm..., I get it, PNPM is a package manager, but we already have two package managers, NPM and Yarn, so why another one, why should we use it?

Let me explain the issues we are experiencing with NPM. Prior to NPM@v3, we had two significant issues with v2.

  1. Windows long path issue. Long paths are caused by a package hierarchy tree that is too deep. Windows has a 256-character limit on file paths. (This issue is covered in detail in this blog post npm-distribution-path-length-problems).
  2. Code are copy pasted several times when it's needed in dependency packages.

The "flat node modules" architecture introduced in NPM@v3 partially addresses these issues, however performance is still an issue.

image.png

Yes, Yarn is indeed a good substitute for NPM. It has certain performance and speed improvements, but it still using the same flat model pattern. Problems with flat node_modules pattern are:

  1. modules can access packages they donโ€™t depend on
  2. some of the packages have to be copied inside one projectโ€™s node_modules folder
  3. A dependency tree flattening algorithm is somewhat complicated.

By using the flat node_modules architecture we can fall into some serious issues. Which may affect thousands of users. Check out this article by the creator of PNPM How PNPM can help you to avoid silly bugs.

Now we are familiar the problems. Let's understand how PNPM is trying to solve that.

How PNPM comes into the picture

PNPM uses a different architecture than flat node module, in contrast to NPM and Yarn. This concept, which we might refer to as a partial flat node module, is quite interesting. So, let's say we create a node application using express in both NPM and PNPM.

npm install express
pnpm add express

Now let's examine the node_modules folder in both. With PNPM, we get image.png

With NPM, we get image.png

As we can see, even though we only installed express in our app, when we create it using NPM, all of express's dependencies are installed inside of our node modules (flat model). However, if we go under PNPM, all that is installed inside node modules is express and a subdirectory called .pnpm/. And it's obvious since we only installed express. But you might be thinking

what is this .pnpm/ folder?

In order to manage dependencies, PNPM uses the symbolic link and hard link patterns. We need to become familiar with this pattern before we can comprehend what .pnpm/ means. If we are inside the node_modules of the PNPM project, we can see that the express is just a symbolic link.

hmmm, what is symbolic link?

We can think of it as a connection that can lead us to the source's true location. It's not an actual copy. For more clarity, we can think of it as a shortcut. When node encounter require('express'), it will instantly recognize the real path.

Okay, got it.

Let's have a look inside express. image.png

Wait... what! no node_modules? where are the dependencies of express?

Here, node_modules/.pnpm/express@4.18.1/node_modules/express.

PNPM install all the dependencies by following this folder pattern,

.pnpm/<name>@<version>/node_modules/<name>

Therefore, in accordance with the flat folder layout, all dependencies are placed inside the .pnpm/ folder. But it's not the same as NPM@v3. Unlike flat node modules made by npm versions 3, 4, 5, or 6 or Yarn version 1, this flat structure maintains package isolation while avoiding the long route problems caused by the nested node modules made by NPM@v2.

  • We call this .pnpm folder the virtual store directory. Boring right? let's call it local package storage.

Now lets take a look inside node_modules/.pnpm/express@4.18.1/node_modules/express.

image.png

What!, again no node_modules? ๐Ÿ˜ค

Here comes another trick of PNPM. All the dependencies of express are install one level up in the tree. So the dependencies of express are stored inside node_modules/.pnpm/express@4.18.1/node_modules/ instead of node_modules/.pnpm/express@4.18.1/node_modules/express/node_modules/. By doing so, we are able to avoid the circular symbolic link trap. What is it?.

image.png

See, all the symlinks are available inside node_modules/.pnpm/express@4.18.1/node_modules/.

Hmm..., great. Now it makes sense.

All of these dependencies are symbolic links, that are pointing to their installed locations in the node modules/.pnpm/ directory.

image.png

All dependencies are installed in a Flat model fashion inside .pnpm, remember?

Fast? taking less disk space ๐Ÿ˜€. How?

If we take a closer look when our packages are getting installed, we may have noticed something similar to this:

image.png

What the hell is content-addressable store?

Remember the virtual store directory? which we call it as local package storage of our project for simplicity? Similar to this we have content-addressable store, which we can refer as global package storage.

Ok, get it. But how it help us to improve the speed and taking less disk space?

So in our directory, PNPM automatically define a parent folder as a global storage of packages. For instance, if we are working on C:/projects/practices/, then it will create a .pnpm-store folder inside the root C:/.

Now whenever we create any node application inside C:/projects/practices/app and install a package, it will first try to find it inside our global package storage C:/.pnpm-store/, if not found then it will request for it from the NPM registry, put it inside global storage and then make a hard link of it in our local storage C:/projects/practices/app/node_modules/.pnpm/.

Sounds awesome right?

This technique speeds up the procedure for installing a dependency. Next time if any project need that package it will be hard linked from our global storage. No need to fetch again.

Next thing we can see in the picture is about hard link. Hard link is nothing but a copy with sync. Its means the package found in our global package storage copied over our local package storage. if any changes made on the package itself, whether its inside local package storage or global, both will be affected by it. Therefore, we may argue that both copies are current. There is only ever one copy stored on disc for each version of a module.

For instance, if we use NPM or Yarn, we will have 100 copies of axios on disc if we have 100 packages that needs it. Using PNPM, we can conserve gigabytes of storage space!

Are you convinced enough! to shift from NPM/Yarn to PNPM? You can check the benchmark here before getting started.

Let's wrap up this article by referring to the diagram ๐Ÿ‘‡.

image.png

P.S: This is my first article ever. I'm excited and little nervous. Please share your thoughts in the comment.

That's all for today. Thanks๐Ÿ™‡โ€โ™‚๏ธ for reading the article till the end.

Bye!!!๐Ÿ™‹โ€โ™‚๏ธ

ย