Preben Thorø: Welcome to our little studio here. May I ask you to do a short introduction of yourself?
Bert Jan Schrijver: My name is Bert Jan Schrijver. I'm CTO at OpenValue. We're a Java consultancy company with offices in The Netherlands, Germany, Austria and Switzerland. I have a background as a Java developer and software architect. I've mainly done consulting for 10 to 15 years. I worked as a developer and architect on various projects. I'm also active with The Netherlands Java User Group, the NLJUG, where I'm mainly busy with the organization of a couple of conferences we do.
Why the command-line?
Preben Thorø: 10 to 15 years of professional experience, or even more. So, you're an old fox in the forest. The reason why I invited you here is that I know you can do magic with the command-line. I would like to hear a little bit more about that, but maybe before we dive into this, why command-line at all? I mean, this is the year 2021, do we need command-line tools.
Bert Jan Schrijver: It's an interesting question. When I started using the command-line, there was less choice in visual and point-and-click tools. I started using Linux when I was in university. So, this is around 1998, I think. I had a friend who was really enthusiastic about Linux, everything you could do with this. We're talking about Linux 2.0, maybe 2.2, and he introduced me to this world. Then I started experimenting with it and used it to run a server in my student house. After that, I kind of got interested in command-line utilities and what you could use them for. I was really amazed by the power and the simplicity of these tools. So typically command-line tools are very simple, right? They do one thing, and they do this one thing right. You can string lots of these tools together to build data pipelines. If you look back at what we have now in terms of big data pipelines, I think it is the next step of what you could do with command-line utilities in the late 1990s, so to say.
Preben Thorø: Is this something that is evolving or following the time, or is it similar to talking DevOps, and containers, and Kubernetes, and everything nowadays? Did that bring back the need for the command-line?
Bert Jan Schrijver: I mainly use the command-line for three things. The first thing is to be quick. If I need to start something quick, or look up something quick, typing something in the terminal is faster than, opening a webpage, clicking or opening a utility.
Get the weather forecast (https://wttr.in for a preview):
Get the current (internet) IP address of the machine you’re connected to:
Secondly, if I need to do something with data, a log file I need to search in, a big data file, a couple of thousand lines I need to search or filter, or I have a file I need to transform, I also reach out for command-line utilities.
Compare 2 (big) logfiles, show only the differences, and use keys up/down/pgup/pgdown to page through the differences (useful for finding differences between the output of 2 builds):
Thirdly, if I need to automate things. As a developer, you typically need to do repetitive stuff, like checking 10 machines whether they're up, do redeployment or similar. Using the command-line helps in terms of automating things and creating scripts. Then you have more time to be productive because automation is helping you in spending less time on repetitive tasks.
Do a health check on 10 machines with name server
Which editor to use?
Bert Jan Schrijver: I like Vim because it's available everywhere. Whichever server I log in to, whether it's a Mac, or Linux box, or AIX box, or Unix box, it's always there. It doesn't matter what you use to edit something, but Vi or Vim is typically always there. I've stuck to using it and I will. After using it for five years I found out how to exit it, and then you can become really productive ;-)
The command-line ecosystem
Preben Thorø: Talking about the command-line, we bump into a lot of words like POSIX, shell, bash, SED. Could you please help me to find the landscape here?
Bert Jan Schrijver: Yes. Let's look at Linux-based machines. A Linux machine starts up and the first thing it boots is the kernel. The kernel is the interface between the hardware and the operating system utilities. The kernel makes sure that your keyboard can interface with whatever software you run on there. Once the kernel has booted, the operating system software starts. For example, the network daemon that makes sure that you connect to the internet, or your GUI makes sure that you can use your mouse. Once you log into a system, a shell starts and a shell is a command interpreter.
POSIX is a standard for these kinds of command-line utilities, and bash is one of the most common shells. A shell is a command-line interpreter. It has a couple of built-in commands like if you want to change the directory, you type "cd". If you want information on something you can type "-help". Bash has lots of built-in commands, and bash can also hand over to other commands, for example, a command like sed, which is a string editor, or grep which you can use to search or filter. To string those all together, the kernel makes sure that it processes your keyboard input, then your shell makes sure that you can type commands and then they can either be handled by the built-in commands in your shell, or by commands that are installed as small binaries on your operating system.
Preben Thorø: Which means you could extend this by installing more binaries?
Bert Jan Schrijver: There's a lot of binaries available by default. If you take a stock Ubuntu system, there's lots of stuff going on there by default. But it's also fairly easy to extend, install other binaries, create binaries yourself and create scripts. Because the nice thing with shell scripts is that most shells have a built-in scripting language. With bash, you can create bash scripts. A script in itself is also again another program that you can use on the command-line. If you are typing the same five, six commands over and over again, you could put them in a script, and then the script becomes another command, and this command can read inputs and also produce output. You can also string multiple commands together. I think that's the real power of using the command-line: the ability to string multiple commands together and that the output of one command is the input of the second command.
Utilities and what to use them for — what about 3rd party extensions?
Preben Thorø: Do we have a set of favorite extensions?
Bert Jan Schrijver: You mean utilities to use?
Preben Thorø: Yeah.
Bert Jan Schrijver: Utilities I use a lot are cut, which is for printing files, grep, for searching and filtering of files. You can either make grep output specific lines or filter specific lines. I use cut a lot to cut out specific columns. I use sed a lot, a string editor that you can use to do string replacements. And then there are tools like rev, which can reverse a file, or reverse a string. Those basic commands allow you to do most of the data manipulations you would do on a single file.
Convert string “spaces” to “tabs” using rev and sed (not a real-world example):
Let's say that I want to email a group of people. I have an email somewhere where all the email addresses are in but they're scattered with names and all the stuff. I can take this email, put it in the command-line and then start to filter out. I put every key that's in this file on a single line, and then filter out the email addresses by looking at the @. Then I start filtering out duplicates, maybe sorting them, and use this to take the email addresses for another mail I'm going to send. If you would do this by hand, it would, first of all, be a boring task, and automating is a lot more fun to do, but secondly, you probably would be a lot slower.
Getting a list of unique email addresses from an email (in mail.txt):
If you have any kind of data manipulation you need to do, and it's not in the order of magnitude of gigabytes, then typically using command-line utilities can be as quick as, or probably even quicker than writing a Java program to do so, or even using a big data solution like Spark. That would probably be a bit overkill because with most data-sets we work with, you don't need distributed computing, I'd say.
Preben Thorø: What about third-party extensions?
Bert Jan Schrijver: I like to work mainly with standard command-line utilities. If you're creating a script with specific tools that are running on my machine, for example, that helps my team automatically do package replication, or deploy to a Kubernetes cluster, then I need to write installation instructions for them to run it on their machine, and they might not work cross-platform. Some of the tools that you get on a Linux box work a bit differently than how they work on a Mac. There are some cross-platform considerations you need to take into account, and the more external tools you use, the more installation work, or burden you put on the others who want to use your scripts or your commands.
What about Windows?
Preben Thorø: There are small differences in the dialects between Linux and Mac. What about Windows? What do Windows users do?
Bert Jan Schrijver: There used to be not so many options. Before you either had no alternative for Windows, or you could use Cygwin, which was somewhat like a bash shell you could run on Windows. With modern Windows versions, you have WSL, which is the Windows subsystem for Linux, which actually works quite well. You just get a bash shell, and you can use most of the same scripts and commands that you use on a Linux box. If not, you can always run a Linux VM or a Docker container on your machine to make sure that you have a stable Linux environment on your Windows machine.
Get a list of all the docker port mappings (from docker-proxy) from the process list:
Learn more about command-lines
Preben Thorø: If I want to learn more about this, what would you suggest?
Bert Jan Schrijver: It's an interesting question because there's a lot to learn. It depends on how you learn. If you like to read, there are lots of good tutorials. You can just Google the Linux command-line. I did a talk at GOTO Amsterdam a while ago, that's up on YouTube. It is called Mastering the Linux command-line. You can watch this talk and you get a lot in 45 minutes. On my GitHub, there's a repository called Mastering Linux command-line, which has lots of interesting commands and some tutorials. What's always worked best for me is just to get started. Find a Linux shell somewhere, be it in a VM or Linux box, and just start exploring commands because most of these commands are fairly well documented themselves.
You can type a command and "-help" to get the info. You can find the manual page by typing "man" command. This exploring of learning new commands can also help. And also talking to colleagues about the use case, "Hey, I'm trying to do this, how would you approach it?" Because typically there are three, four, five, or more ways to do something.
Preben Thorø: I have a feeling that those that really mastered the command-line are experienced people like you and our generation of software developers. Is this something you feel we should teach at the universities?
Bert Jan Schrijver: Yes, I think so. You become a lot more productive and also a lot more powerful. Let's say that you're running your Java application somewhere on a Linux box, and something is wrong with this box. Then it helps if you have basic skills to do troubleshooting. For example, look at the process list, do any processes use a lot of CPU or memory? Look at the virtual machine stats. Is there lots of I/O going on, or I/O wait? What is the system busy with? If the process is hanging, using tools like, strace, or btrace, to dive into the Unix process and see, is it waiting for some I/O or some file that's locked or blocked?
Attach to the running process started with the “java” command and see what it’s doing:
It can help, apart from being more productive in scripting and data manipulation, in better understanding what is my application doing?
It can also help you get to places where you otherwise wouldn't be able to get to. You're running a headless end-to-end test on a machine, and it doesn't work for some reason, and you want to find out why. By using the Linux command-line, you might be able to set up a virtual X server on a machine and make sure that the tests run on this virtual server. Then you open a VNC server and you are able to jump there. You can use SSH to jump to the server, and then on your machine, you can visibly see what the headless end-to-end tests are doing on this remote machine, somewhere in a Docker container, somewhere in the cloud, or somewhere else. It can help you find problems that otherwise would be hard to debug because your test is running somewhere in some container, on some cloud environment, and you have no idea what's going on there.
Preben Thorø: Thank you. This has been very inspiring. Thanks for dropping by.