A Winternship Experience at Shopify— April 29, 2018
Do things. Tell people.
Shopify taught me this value.
Let me tell you about my internship journey at Shopify.
It all started back in Horizons, my summer internship in 2017. I was introduced to the idea of DevOps by a software engineer who used to work extensively on infrastructure-related problems. It is intriguing to learn how being part of an infrastructure team allows you to solve problems at scale. It is all about building a platform that is reliable, resilient, and maintainable in the long run.
Production engineering is a relatively new field and it requires a significant amount of technical knowledge in selective Computer Science areas. Very few companies are willing to hire interns for their Production Engineering teams. It is very fortunate that Shopify is one of them. Also, it was nominated as the best place to work in Canada. Considering the fact that the work strongly aligns with my interests, I thought it would be a good idea to give it a try as my first engineering internship.
I was lucky enough to be in Waterloo when Shopify hosted a session for students to learn more about internships and the work engineers at Shopify do.
That talk was a life changing moment. I never regretted attending it. The vibe in that hall was just amazing. I could feel that people are really passionate in the things that they do, and the environment somehow resonates with my personality and experience.
Here are some of the production engineering teams that were introduced during that talk. Some of them have changed over time.
- Datastores - MySQL, Memcached, Redis
- Datadelivery - Elasticsearch, Kafka, Zookeeper
- Optimize developers’ workflows and shipping pipeline
- Create uniform dev tool
- Webscale: Performance gurus, flash sales, load testing
- Pods: Shopify’s sharding abstraction, resiliency
- Cloudplatform: Move Docker containers to managed Kubernetes cluster
I was sold right after the talk by Hormoz.
Never in my life I have seen a technical talk this interesting. It left me with a lot of questions and I just wanted to know more. Most of the talks that I have been to were usually about the algorithms interview as well as the company’s product and its vision. This was a very different one, and I found a lot of value in it.
Initially, I was interested in the Webscale team, but after talking to Hormoz, it seems like that team does not hire interns. I then leaned towards Pods and Cloudplatform.
I started reading up about the company and watched their Engineering Talks published on YouTube because I wanted to know more about the technical projects that engineers at Shopify do and why a specific design decision was made. One of the concepts that really fascinates me is Shopify’s sharding abstraction, pods. It looks really easy from a high-level point of view, but when it comes to implementation, it is challenging.
The more I read or watch, the more excited I was.
Pods was the obvious choice for me.
Those who passed the initial life story interview will be invited to a technical interview which is divided into two parts. For the first part, interviewees were asked to bring a project that they have worked on to present to the devs, including the code.
I brought Ceroku, a Heroku clone. I experimented with git servers and built wrappers around
git-receive-pack in order to have a custom authentication system when users do
git push. This custom git server will then trigger the build process for the application through
pre-receive git hooks.
There are two parts of the build process: the slug builder and the slug runner. A slug is essentially a gzipped file that contains all the application files and the binaries needed to run the application. The slug builder will be triggered first and this creates a slug file which can be run by the slug runner as docker containers. I then used traefik, a modern HTTP reverse proxy and load balancer, to route requests for a specific subdomain to its corresponding docker container.
Ceroku is one of my favorite projects.
For the second part of the interview, interviewees were required to work through a hypothetical problem with the developers. This was one of the hardest interviews that I have ever had. It had concepts related to Operating Systems, Networking, etc. - the usual Production Engineering stack. Considering the fact that I have not taken any of those classes formally in school, it was a very challenging one for me. I didn’t expect anything in return after the interview.
I did it! A few weeks later, I received a call from my recruiter and I was told that I got the offer. Without hesitation, I accepted right away. I knew that was the work that I wanted to work on.
“Nice to chat with you just now and congratulations on accepting your internship offer to Shopify for Winter 2018 !”
The excitement followed that call led to a couple of sleepless nights.
A few weeks before the start of my internship, I was given a Production Engineering resource guide to help me get up to speed. Preparation beforehand was optional, and many learn on the job itself. This is a great way to keep interns in the loop and it feels like Shopify really values its interns and want them to grow. That is what I am looking for.
Life as an Intern
All Shopify interns were housed at 1Eleven. We were given options of one bedroom studios of varying sizes to choose from. This location is within 5 to 10 minutes of walking distance from the office. Not to mention that it has a great view of the City Hall and the Parliament as well!
On our very first day, we were given a work laptop which needs to be returned at the end of the internship. We went through an onboarding process to setup our tools and get to know more about the company’s vision and culture.
One of my favorite tools at Shopify is
dev. Dev is a command-line tool that interacts with various projects, including cloning, bootstrapping, running tests, booting servers, and shell integration to activate certain dependencies when entering a project directory. It is really convenient that developers can just do
dev up in their terminals and the entire development environment for that project is completely setup on their machines.
Interns at Shopify will be paired with a mentor when they join Shopify. On my first 1-1 with my mentor, we talked about potential projects, future plans, and my internship goals. That was a great start as it helps to ensure that interns can get the most out of his or her time at Shopify.
In the first few weeks, I was mainly working on small tasks just to get myself familiarized with the codebase and to obtain more context about Shopify’s infrastructure. My first task was to work on the monitoring, alerting, and logging system for Shopify’s shop mover system (i.e. moving shops from one pod to another).
I got the opportunity to explore Shopify’s sharding abstraction, pods. Considering the fact that I have never taken a concurrency course before, trying to understand exclusive locks was challenging. But it was fun! Through my first task, I also did the typical responsibilities of a site reliability engineer: from reading p95 graphs to exploring live production logs.
I then started asking myself these questions:
- How do you know what to monitor?
- When do you know you should alert?
- What caused that spike in that graph?
- Why is there a sudden increase in the number of exceptions?
- Are we behind SLOs?
- What is the root cause of this problem?
TIP: Reading graphs is about anomaly detection and reasoning about a particular area in the graph.
It came to realization that building successful monitoring and alerting systems is challenging. Logs are the most essential component in a production system. Without those, engineers will have a hard time diagnosing problems.
In a R&D-based team, it is very important to diagnose problems until the root cause. We do not want to apply a patch to a high-level abstraction, which may solve the problem temporarily, but not in the long run. Everything is about reasoning, and we should not do something just for the sake of making it work unless time is a concern.
Besides that, I explored a little bit on MySQL internals and tried to understand how database replication works. I learned how binary logs were actually used in MySQL replication. After having some background on MySQL replication, I then leveraged MySQL binary log streaming to build a real-time database corruption detector in Go. Here, I actually learned Go properly. I wrote a simple key-value store in Go previously, but I did not understand channels and goroutines. Now, I know how buffered and unbuffered channels work in Go. They are just like semaphores.
In the middle of my internship, there was some internal restructuring and I joined the service communication team. Since this team is relatively new, I was able to be part of the RFC process of the service communication layer (RPC) project. There, I learned how tech reviews work when working with developers across multiple teams.
I started working on a service graph builder to visualize service-to-service communication and track how the graph changes over time. I also took on more responsibility of the distributed tracing project because the service graph relies on the tracing data.
I continued working on the existing tracing instrumentation and deployed tracing proxies as Kubernetes DaemonSets so that hundreds of Shopify’s services can send tracing data (spans) to the proxy, which will then go through a trace processing pipeline. The trace processing pipeline receives data from a message queue, partitions them, and builds traces from spans using circular buffers. With the traces, the service graph can be generated.
At Shopify, we embrace a postmortem culture and from learn from our failures. Whenever there is an incident, we carry out root cause analysis (RCA) to prevent the problem from occurring again in the future. I was given an opportunity to be part of the on-call and emergency support of Shopify’s core infrastructure during work hours.
Holding the pager is a huge responsibility especially when it comes to Shopify’s core infrastructure because there will be more than 500,000 merchants relying on Shopify for their businesses to operate. When things break or don’t happen the way they should, the on-call will be paged and he or she has to solve it and ensure that there will be minimal merchant impact. It was indeed an awesome learning opportunity and that was my first on-call experience. I also experienced how production engineers at Shopify do on-call support and incident management from a first person point of view.
Honestly, where else can I get this opportunity?
From what I’ve heard from friends in different companies, everyone tends to wrap things up in the last few weeks of their internships. I decided to try something different. There was an opportunity for my team to give a Production Engineering talk.
I know that my long term goal is to speak at tech conferences. But I have not done any tech talks before. I knew that I needed to do something because I was always in my comfort zone.
I decided to volunteer and give that tech talk.
It was my first tech talk. I talked about the work that I have been working on all this while, particularly the service graph, as well as a brief overview on what my team was working on. I was pretty nervous at first, but now that I have gone through it, I would not mind doing it another time. In fact, I am looking forward to more of these talks in the future.
Special Events at Shopify
Intern Retreat : Did we say we went to a winter excursion during our first month of internship? We slept in cabins. We sat around bonfires. We networked. We went tobogganing, skiing, snowshoeing, and skating. Most importantly, we had fun. This was my first classic Canadian winter experience. I have only experienced two winters in Canada, and this was the best one!
Shopify Summit : Winter is usually the ideal time if you want to be part of the Shopify Summit. It is a yearly event and it is a chance for everyone to think big, share ideas, and connect with others across the company. Think of it like Apple Special Events where they announce new products, but for Shopifolks only. I got to learn about the journey of our merchants, and some of them were very inspiring. It made me rethink commerce in which it is not just about making a profit, but also about empowering people to make their first sale.
Production Engineering Summit : Since Production Engineering is a very technical field, it would not be ideal to dive deep and talk about its technical details during the Shopify Summit, so we had to split it up into a separate one. This was a one-day event. There were multiple talks: team structure, internal infrastructure, daily hacks for our day-to-day operations, lightning Talks on Haskell and Ruby, etc. We ended the day with archery tag and escape room, followed by a team dinner.
Life outside of Shopify
My goals this term were pretty simple: to learn how to skate and to explore cooking.
I have only been on the skating rink twice, and even so, I have problems trying to walk on the rink. I told myself it is time to change that. I want to be able to go on the world’s largest naturally frozen ice rink, the Rideau Canal.
I invested in a pair of skates and enrolled myself into a skating class. After a few weeks of skating practice, I was ready to skate on the Rideau Canal. I skated over 4KM in total along the canal, back and forth. That to me, was an achievement that I am proud of.
Skating on the Rideau Canal was never like skating on any ordinary skating rink. There were no sidewalks. The ice wasn’t smooth and there were bumps everywhere. Some parts of the ice were thin. Falling is easy if one is not careful enough no matter how experienced that person is.
Four months ago, I couldn’t even stand properly on a skating rink even with the sidewalk. Today, I am confident to say that I can be on the rink alone, without any sidewalks, and I think this will be my new hobby.
I was also very interested in cooking and I love Italian dishes. I attended a couple of cooking and pastry classes. Coming from an Asian background, learning how to cook French or Italian dishes was an entirely different world. I learned different cutting techniques and the terminologies used in the kitchen.
Did you know that there are more than 40,000 different type of rice varieties? I didn’t. All I knew was just jasmine rice and japanese rice. I was exposed to different types of rice varies such as sticky, basmati, long grain, black, bomba, and carnaroli. There are so many different types of salt as well – table, kosher, finishing, etc.!
Some of my favorite dishes from the classes were beurre blanc (literally translated from French as “white butter”) on fish, risotto, madeleines, pâte à choux, and Haskap clafoutis.
During my free time, I also looked into git internals to understand how it works under the hood. I wrote a small Ruby implementation of
git init and
git add. I also attempted to compile my own linux kernel, but that failed miserably because I had no idea what I was doing.
I think Production Engineering is one of the most interesting fields in the Computer Science industry. It encompasses the disciplines of site reliability engineering, infrastructure engineering, and developer productivity. However, it is not for everyone. Some people prefer building products and working on user-facing applications directly. If you love Operating Systems, Distributed Systems, Networking, or Concurrency, this might be for you.
Here are some of the challenges that my team works on:
- Evolving our sharding abstraction to allow other applications around the organization to take advantage of the architecture that’s allowed Shopify Core to scale.
- Building resiliency tooling to automatically generate resiliency matrices, and improve Toxiproxy to make creating resilient applications a breeze
- Moving shop data between shards with minimum disruption for the customer to improve data locality, resiliency, and performance for our merchants
- Designing the RPC layer that makes talking between our hundreds of internal applications a joy, setting up a service mesh to provide circuit breakers to everyone, and enable Chaos Engineering
- Failing over shards between datacenters without losing requests
- Building the tools to make refactoring data at scale easier: traversing billions of records across hundreds of Pods without a hitch
Shopify is one of those companies who deeply care about growth in their interns. Weekly checkups by my mentor were very helpful as they help solidify my thoughts every week and address my concerns that I was having. Looking back, I cannot imagine doing all of what I have done as part of my first engineering internship. It was a superb learning experience! I was involved in the teaching industry all this while. Being a tutor or teaching assistant is definitely rewarding, but I think being a production engineer is more exciting, at least for now.
At Shopify, opportunities for growth and making impacts are everywhere.
It is up to everyone to make full use of it.
To me, every single day at Shopify is a chance to make an impact.
Thanks everyone who made my experience at Shopify a memorable one.
Until next time!
If you are ambitious and passionate about building scalable, maintainable and resilient applications, my team is hiring full-time engineers: Service Patterns.
If you are still in school and interested to be part of Shopify, do check out the careers page for internship postings. I believe that it should be up in the next few weeks. Highly recommended!
I would love to try out the violin. It looks fun being able to draw the bow across the strings or pluck those strings.
These are the stuff that I will be exploring for my remaining 2018: git, Linux kernel, performance tools (flame graphs, strace, rbtrace, eBPF), resiliency patterns, and competitive programming.
Off to Mozilla! 🦊