If you’ve ever tried to train a machine learning model or just wondered why your computer fans start screaming when you open too many Chrome tabs, you’ve probably run into the alphabet soup of processors: CPU, GPU, and TPU. They all “process” things, but they do it in ways that are fundamentally different. Choosing the […]

Read More →

A checklist is a type of job aid used to reduce failure by compensating for potential limits of human memory and attention. It helps to ensure consistency and completeness in carrying out a task. Checklists are useful for applying methodology. The Front-End Checklist is an exhaustive list of all elements you need to have/test before […]

Read More →

Site Reliability Engineering (SRE) is what happens when you ask a software engineer to design an operations team. An SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software […]

Read More →

netdata is a system for distributed real-time performance and health monitoring. It provides unparalleled insights, in real-time, of everything happening on the system it runs (including applications such as web and database servers), using modern interactive web dashboards. netdata is fast and efficient, designed to permanently run on all systems (physical & virtual servers, containers, […]

Read More →