Estimates are complicated or engineers are naive?
Estimating software development tasks accurately has proven to be a challenging endeavour for engineers, leading to a historical pattern of overcommitment and under-delivery.

A deep dive into building an o11y platform for frontend performance, errors, and user sessions - from MVP to production.
It's been a while since I have published something here. But I have been busy to help build an o11y platform for the last 2+ years. And I am proud of what we have accomplished considering where we were standing when I joined. From a questionable UX to actually delivering value it has been a hell of a ride :rocket But today we are here to talk about Frontend o11y, which I was fully focused on for almost a quarter if not more. So let's get the show rolling...
It all started when a customer asked us that they would like frontend o11y and one thing about working at Last9 is shipping MVPs at a unreal speed. I can't even recall the no of mvps I have shipped in a few weeks, all this even pre AI era. Now we have a different problem where we are shipping too much and being on top of everything is sorta impossible. Every monday I open the app and end up seeing new stuff on there. So it is obvious that we are gonna gear up to build the frontend o11y part. It was a surprise that I would end up owning and shaping so much of it.
Frontend space has been getting complex each and every year - blazing fast, responsive and good ux is table stakes today. You may have a good product but if it is not smooth and fast I may end up somewhere else. Latest example of this is Cursor I choose to move away not because I hate the product but I hate it when the app I spend 8+ hours in a day messes up layouts and stutters while switching panes. With such high expectations from users (especially an o11y platform where devs are the users) you cannot compromise on quality of frontends.

So we started with Performance, with a number of performance audits I have done for various orgs I had a rough idea of how I approach performance issues. First we need to understand what pages in your app might even have issues, we started at this. We built an SDK that collects web vitals on each page and we gave a summary of all Views with top 10 paths. As you can see in the above image this is the zoomed out view not for debugging but telling you what needs debugging. Your top 10 paths, traffic patterns and rough overall performance of what real users are experiencing.

Later on we added filters so I can zoom in and out to my liking say I just want to look at all logs page performance. Add a filter for contains /logs and then I want to group it by let's say Browser to see if we are not performaning well on certain browser i.e. Firefox. Followed up that with a geo map giving up regional insights on whether your performance is not good in particular regions. We did something similar for errors as well

Lastly we had user sessions, this has been one of the best deep dive things which helped us replace sentry in house in further attempts to improve this as much as we can. If you are not using the platform yourself don't expect others to use it :)

Sessions give you a lot of information, particularly very useful for cases where a bug report happens I can go and find that session. See all the interactions to reproduce the error, go through queries/api calls/assets fetched to drill down the root cause of it all. There is support for custom events as well where I can send important business metrics to track. Use trace/logs to metrics build custom business dashboard on top of these etc. A lot of things happening here if all this interests you reach out to me for a demo

Let's talk about a View inside a session this is where things get real interesting. If you are also intrumenting backend, db and all services. You can co-relate them here check out trace details below there you can see the whole picture. Frontend -> Proxy API -> Backend -> DB you get the point. All errors/exceptions from all services this has been a game changer but this is still a v0 for us. We need to tie up all ends here with the AI era no one is opening documentations / dashboards etc (sorry tailwind we love you it will all work out) so I have moved on to AI applications and handing this to the lovely Frontend team I helped put together here I know it is in good hands :handshake
Coming to agentic debugging, I really have stopped doing queries on dashboard and started using our MCP / Ai assistant we have to help me get some insights. For example when debugging errors for one of our customers instead of looking up traces I just asked the assistant on my secondary monitoring while runnning deployments on the other after a while I glanced to the left and realised what exact issue they ran into

There are numerous such examples: when debugging our sdk I try to use MCP to fetch logs and traces and figure it all out inside my terminal sessions. Speaking of this how I built the SDK that does all the telemetry for the dashboards you see above was 80% all from my phone. I never actively stepped into an IDE to write code myself, so much of it was just using my brain to do two things
Then I realised I almost never needed to be on my mac except to test out things. The entire flow of [ideation -> technical plan -> coding it out -> reviewing -> refactoring -> testing it all out] was all possible through my phone except testing part. Which is just crazy that means I am out in the wild running codex cloud agents and reviewing code while on the treadmill of the gym. The way devs do things have changed so much in the past year maybe last 6 months that traditional way won't work in a few years from now. And I am here to jump on this wagon to build/contribute for what future of software developments looks like
Estimating software development tasks accurately has proven to be a challenging endeavour for engineers, leading to a historical pattern of overcommitment and under-delivery.
Do you ever find yourself thinking, "My manager doesn't care about my growth, and our 1-on-1 meetings are a complete waste of time"? If so, you're not alone. But 1-on-1 meetings can be the most valuable mechanism to affect your career and ensure that your career is moving forward the way you want it to.