Many successful companies have been founded on the principle of collecting, understanding and monetizing data, but getting to that position is a tedious task. One needs to have a meticulous approach to building the product and execute on it, and with a bit of luck, might end up in a position to sit on mountains of data. Clever people can mine that data for more and more value, creating a moat around the product.
This has been slow and expensive in the past, and still is, to an extent. In addition to having a great product, one needs to invest in the data infrastructure behind it, and in the talent to build and maintain the data operation. It’s like the process of climbing a mountain when you’re just learning how to walk. It takes time and iteration.
Luckily, today there are ways to accelerate this process. Cloud native tools and services can be used to create highly scalable systems that make it easy to implement industry best practices at any company. A modern data platform that consolidates business critical data into a single source of truth, coupled with machine learning applications and operations, presents a competitive edge for companies of any size – from a pre-seed hustle to large enterprise.
A data platform is centered around a data warehouse, which holds company data but also consolidates external sources into one place. This makes it possible to enrich the company data and easy for data scientists to mix and match while creating experiments and algorithms to enhance the business outcomes, enabling data driven products.
But there is still a cost to all this. Expertise of specialists is required at all the steps of becoming data driven, from the technical plumbing, application development to business level understanding of machine learning. Hiring is tough, employment costs are high and retaining data talent requires interesting problems to solve every day – something a startup might struggle with. Expertise can also be contracted but costs may be prohibitive for an early stage company.
The end result looks like this – I’ll save the further details for a different time, but it’s sufficient to say that it’s a cloud native, modern and scalable platform, built on tools that are widely used in the industry. So not rocket science or magic, but a pragmatic approach that is standard enough for new people to onboard and get productive quickly.
«A modern data platform that consolidates business critical data into a single source of truth, coupled with machine learning applications and operations, presents a competitive edge for companies of any size – from a pre-seed hustle to large enterprise.»
When creating data infrastructure, security and compliance are always key topics. Snowflake and AWS have a great feature for our purposes, an account hierarchy. This enables strict separation of data between entities, company data is theirs and nobody else has access to it, despite using shared infrastructure that helps save costs. This way, we are fully in line with GDPR and have a strong security baseline as well. Companies can also publish and share curated data sets to other users of the platform if they so wish – this technology stack makes it very easy. Sharing insights in a closely knit ecosystem makes it possible for companies to benefit from data they themselves might not have, and opens up new monetization opportunities with low investment.
But as with any technical infrastructure, it needs constant support. Data engineers creating and managing the infra, plugging in new sources for data and optimizing the transformations into the warehouse, scientists researching the data and creating experiments hand in hand with the product teams, and business analysts who co-create scenarios with the portfolio companies. This, we believe is the “unfair advantage” Baloise can help companies with, through our ecosystem.
Initially, we at the Mobility Ecosystem of Baloise worked on proof of concept projects with selected companies, while building out the basic shared infrastructure and best practices, to keep our focus on delivering tangible outcomes as early as possible.
We did manage to validate the need for the data platform – our second project on the platform shipped tangible results in less than half the time it took in the first project, thanks to having the infrastructure already set up, and the team that knew how to work with it. In the timeline of the project, we were able to address more scenarios than we even estimated ourselves! One such use case was to create an optimization of EV charging for shared fleets.
Perhaps the most interesting application was optimizing the supply of shared fleets – basically, putting cars to locations where they are likely going to be needed, at a price that is profitable yet results in optimum conversions, is the basic problem everyone is tackling in their own way. Some still do it manually based on experience. We wanted to have a data driven approach to this, and with an added complexity: what if the vehicles are electric?
Refueling a combustion engine car takes maybe 5 minutes and can be done while repositioning the vehicle, but charging an EV from an AC charger takes 3-4 hours. It’s clear that cannot wait till e.g. Monday morning when the rush hour starts, one needs to have the full fleet in operation then. This is a perfect problem to optimize with data.
As an outcome, we created a model that optimizes the fleet positioning as well as charging according to demand forecast, that includes several external regressors such as weather and calendar – holidays etc are a key factor in predicting fleet demand, and demand must be balanced with an optimized supply model. Maybe more about that later though.
We also created a sample application with public data to showcase the platform capabilities. Here we forecast the booking activity for the Citibike scheme in NYC/NJ, which is utilizing their public data set, addressing seasonality between weekdays, taking weather and holidays into account: The data is available for anyone at https://ride.citibikenyc.com/system-data. For this demonstrator, we make heavy use of Snowflake features including Snowpark to host our Jupyter notebook, and the excellent Prophet library from Meta for the actual forecasting.
What started as a quick validation of hypothesis, turned out to be very timely for the portfolio teams. Thanks to the recent hype around ML models, everyone understands the importance of data now and Baloise’s Mobility Ecosystem has been building for this in the long term already – as evidenced by not just the data platform services but also investments into companies such as Vianova, Tronity and Stratos, where data is core to their business.
Not every company is in the business of data, but rather stands to benefit from it. This means high costs for a data operation aren’t easy to justify. Hence, the further “unfair advantage” Baloise can bring is to boost the ecosystem with data skills that are flexibly available for the portfolio teams. Ecosystem members can use a central pool of talent who maintain the infrastructure and are thus already familiar with the setups, enabling them to quickly and easily take on business needs, essentially like a data helpdesk, solving business problems with tangible outcomes, cost effectively.
«Thanks to the recent hype around ML models, everyone understands the importance of data now and Baloise’s Mobility Ecosystem has been building for this in the long term already.»
Over time, as ecosystem companies mature and increase their own data skills, it would be lucrative to form a shared data community. In a data community, companies could utilize talent from each other, e.g. a data engineer from company A and a scientist from company B could collaborate. This would help talent acquisition as well as retention as people would have interesting new problems to solve also outside their own business – a win-win for the companies.
We’re not there yet there, but having the data platform shared, along with the data blueprints and support available, also early adopters using it, it's been a great start so far.