jueves, 2 de mayo de 2013

Working at Spotify


Last week, I attended a conference at Universidad Carlos III, which was supposed to be about how is working at Spotify. Despite I think the lecturer was more interested on headhunting than explaining the work there, I took some interesting notes which I have completed with information from their blogs to write this entry.

The company

Their product allows music streaming with very short latency, thus the user can play audio files as if they were in local disk. This fact makes an important difference with respect to other similar applications, but this product has much more interesting features:

Personal recommendations. It incorporates a radio which plays by genre and based on your music history. Additionally, there is a recommendation panel which primarily contains the latest news about your favourite artists.

Social connectivity. All your activity can be shared on Facebook and other social networks. Actually, Spotify could be considered a music social network.

Playlists management.

Available on mobile devices, basically, smartphones and tablets.

A wide variety of songs, but not all the most popular existing music, as a result of the contracts with the record labels.

So this is Spotify: “Find all your favourite songs and share with all your friends, wherever you are, on whatever you have”. They currently have about 6 million paying users (unlimited or premium members) and more than 20 million total users, still increasing their presence in new countries.

Applying Agile Methodologies
 
The changing infrastructure and software requirements are enough motivation to use agile methodologies that facilitate frequent releases; so that change response is possible and new user requirements can be identified.

You can easily appreciate the incremental nature of agile methodologies in the continuous functionality evolution at Spotify. From the initialization product step, a simple music player with playlists management, until other complex additions: the radio, personal recommendations or accurate search. This is a strong benefit because the knowledge acquired by a developer during an increment is reused for the subsequent ones. It is important to point that concurrent iterations through the classical phases (inception, elaboration, construction and transition) are possible here (Think why?!) Another point is that each phase has a time box for which the scope can be reduced if is not to be completed. However, ¡the limits of the time box must always be respected!

Spotify is a very representative case of how Agile is an appropriate method for software development in fast growing startups. Spotify have even created its own Agile manifesto because the original one (you can consult its twelve principles here: http://agilemanifesto.org/principles.html ) is too general for some purposes. In the Agile à la Spotify manifesto there are five principles which represent their culture and align all developers. These are the following:

Continuous improvement The workers are willing to try new things without feeling fear. The coaches will never blame a creative person. This is summarized in the slogan “Think it, build it, ship it, tweak it”.

Iterative development Developers are encouraged to work in small steps and release often. If you are a user, I am absolutely sure that this convinced you as the Spotify updater notifies new releases almost every day. Sometimes, it is a little embarrassing but, of course, it follows one of the main Agile manifesto statements: “Customer collaboration”.

Simplicity This requires, in contrast, complex collaborations and interactions between developers. Like the most elegant mathematical solutions, which are only reached after difficult abstraction methods, the software simplicity needs several iterations and, possibly, more than two eyes on the code.

Trust – People trust each other in order to obtain the best of them. When you break something, only learn about it and start thinking again. Do not feel blame.

Servant leadership The managers do not impose their criteria, instead of that, they prioritize work and let the others to be free thinkers. A manager cannot be always beside a person, so this one has to develop the necessary skills to resolve alone different problems.

The specific approach they are using is a mixture of both Scrum and Kanban. The former refers to a software-oriented agile framework, while the latter is a general system for just-in-time production: produce only what clients are expecting to pay for. Nor Scrum nor Kanban rigidly rules the way of working; they can be adapted to the real situation of any organization.

As a result of the Scrum framework, Spotify developers work in autonomous, self-organizing and cross-functional teams. Each team is focused on a concrete area of the overall system for which they frequently have to release a functional increment. The backlog comprises the user requirements for the increment under development of the current sprint. No changes are allowed in the backlog during the course of the current sprint. The duration of the sprints is normally between 1 and 4 weeks, after that, a sprint planning meeting is held so as to determine the backlog for the next sprint. 

Development teams

A developer can play many roles inside a team, which are very similar to those defined by Scrum framework. The teams are particularly called squads and they have the conventional features: 3 to 10 multidisciplinary people. The figure bellow is a non-strict UML diagram that explains synthetically the relations between the organizational units.



Each squad is assigned to a particular module of the system: the search engine, the list manager, the social network, recommendations, backend infrastructure or others. A squad is always related to a coach, whose objective is to remove impediments affecting the current increment. He has to protect the squad and focus it on the Scrum process. The coach must never have people responsibilities; on the contrary, the classical project manager usually has these kind of responsibilities.



Besides the mentioned roles, the product owner has always the product and the customer in mind. He must assure that the team is delivering some value to the business while the rest of the team can focus on the user requirements. Remember that user and customer are not necessarily the same. Squads are the smallest unit in this organization but they can join other squads to originate the tribes. There is an established maximum of ten squads per tribe. This rule is based on the Dunbar number, which is the conclusion obtained from a social study stating that social relationships are not possible with more than 100 people. Squads associate in this way so as to reach more ambitious objectives, which require more people with different skills. Tribes sometimes held informal meetings where they expose what they have learnt, what they have delivered, and so on.

There is still other entity which includes people from different tribes: the chapter. If you remember it, the continuous improvement was one of the most important principles in the Spotify Agile Manifesto. It is necessary for this improvement to efficiently share the knowledge acquired by the members of one tribe with other members from different tribes. For instance, the testers of tribe A resolve a big problem which they consider probable to reappear in other areas. It would be very useful to share this knowledge with testers from other tribes, originating the chapter group. They share code, tools, how-to…

 
In above figure, you can see what is known as the matrix organisation. Horizontally, the squad member is related to the chapter lead, while vertically, the lead is the product owner. The first is the technical professor who promotes the technical skills and cares about the quality of what they are implementing. The second is the watchdog of the time box, he only cares that the increments are delivered as they were planned in the sprint planning meeting. These two leads have opposed objectives, but this is a healthy tension that tends to balance both forces.

The working environment

 
Developers work in lounge areas without walls or other obstacles affecting the desired fluent interaction. The components of a squad share the same room and the rest are very near; the result is that all people feel free to get in to other squad spaces and communicate whatever they consider interesting for the others.

There are many spaces with psychedelic furniture which facilitate informal gatherings where technique, imagination and other aspects are combined in conversations. Obviously, there are also formal rooms for business or work planning. 

In conclusion, the environment, on the contrary of many Spanish companies, promotes the interaction, stimulates the creativity, the social relationships and effective breaks. Nobody has to fight for a new chair or keyboard and the leaders are simply servant leaders who do not impose their criteria; they only remove impediments.

Architecture
As I said before, Spotify has more than 20 million users demanding audio files and storage for playlists or social information. It is necessary an architecture which scales as fast as the number of users and their requirements do. This is why scalability is considered the cornerstone. The first step to accomplish that the architecture shows this quality is making modular and independent services, which also implies fault tolerance: if a service fails, the others keep alive.

Spotify developers are allowed to modify all services code, even if they are not from the assigned squad, because this avoid some squads being inactive for a long time as a consequence of the dependencies between them. How can the architecture support this? Frequently, companies have a well-defined roll: the database administrator or something similar, but this is a bottleneck when multiple squads need to deploy their services. Spotify pursues a self-service infrastructure, what means that squads are able to start developing, testing or deploying by themselves. As squads are cross-functional, they must be able to decide things like the best storage model: owned datacentres or public cloud; the required storage capacity, hence, they have to monitor and define virtual loads for the servers; the evaluation of side effects affecting other services; ultimately, squads are the responsible of backend and frontend aspects when developing. (A front end developer is the responsible of software and/or hardware “from the Apache [or other HTTP or application server] to the user interface”).

There is not much documentation on the network about what interactions take part in the scenario, from the user clicks the play button, until the song is delivered to the client. It is a fact that 80% of the playing requests are served from the local cache or other Spotify peers. Every user has a local cache which takes about 10% of the local disk space and they do not advertise that are taking part in a P2P network architecture, which means they are both clients and servers at the same time.  Moreover, for every user request, the Spotify servers always start transferring the first chunks of the song, reducing the latency enormously. While this occurs, the song is searched between other peers; and only if this succeeds, the Spotify server stops its transference.  When this happens, the network traffic to Spotify servers reduces.

Technologies
 
If you want to apply for a Spotify job, it would be very helpful to know about the following technologies:

Operating system: Debian
Installed in all developers machines. (Not sure about the servers, but they are Linux).

Client: Chromium
The Spotify client is a chromium browser tuned with lots of javascript lines.

Distributed control version: Git
All developers have access to other squad code, what they call code transparency.

Clusters: Apache Hadoop
Services have to be executed in multiple nodes requiring intensive data access and distributed file systems. The Hadoop framework allows that. Map reduce jobs are written in Python.

Backend services
Messaging and other communication processes or cluster configuration are implemented in Java, C, C++ or Python.

MVC framework: Django.

Object-relational databases: PostgreSQL.

NoSql Databases: Apache Cassandra and MongoDB. 

For those who have not knowledge about the above technologies and nor working experience, they can try the Spotify puzzles as a way to demonstrate their value for the company. (https://www.spotify.com/es/jobs/tech/ ) 

More information

Certainly, not a lot: