Matt Pursley's CV

Matt Pursley, RHCE PSM
Technical Team and Project Leader
Platform, DevOps, Site Reliability Engineer

mpursley@gmail.com

772-226-0559
Resume
Resume PDF
Resume Code
Github: Open-source Projects, Personal Projects

Skills

	Technical Team and Project Leader Platform, DevOps, Site Reliability Engineer

	• Project & Team Leadership: Lead and mentor teams of 3-5 engineers, driving project deliverables, timelines, and budgets. Present project status, risks, and resource utilization reports to Senior ICs, Directors, and C-Level executives. Consistently deliver projects within budget and on schedule.
	• Cloud Infrastructure Leadership: Design, plan, and deploy product solutions on AWS, GCP, and Azure, utilizing Kubernetes for microservices. Act as Team Lead for infrastructure initiatives supporting billions of requests/transactions across millions of global Users.
	• Resource Planning, Cost Optimization, & Team Oversight: Lead migration and consolidation efforts to improve infrastructure observability and efficiency. Develop and implement resource planning strategies to ensure efficient utilization of vendor and cloud resources, minimizing waste and maximizing cost savings across multiple projects. Analyze vendor and cloud billing data to identify opportunities for migration and optimization.
	• Cross-Functional Team Leadership and Collaboration: Facilitate collaboration between Engineering, QA, Product, and Security teams to ensure applications meet SLOs/SLAs and budgetary constraints. Serve as the primary point of contact for infrastructure, observability, and production readiness reviews and related issues.
	• DevOps, Infrastructure as Code & Observability: Implement Infrastructure as Code (IaC) using Terraform and Helm. Deploy and manage monitoring, logging, and alerting systems using Grafana, Prometheus, ELK stack, and other CNCF/open-source technologies.
	• Backend Development & Collaboration: Directly manage and contribute to the development and updates of segmented parts of backend infrastructure and applications. Collaborate with Subject Matter Expert (SME) Engineers, QA, and Project Managers to ensure timely delivery. Improve backend processing efficiency through resource monitoring and optimization.
	• AI/LLM Integration and Automation: GitOps and Agent Management, Review, Auditing, Orchestration and Monitoring (OpenAI, Anthropic Claude Code, Gemini CLI, Ollama, etc). Incident Management Integration: Incident.io, FireHydrant, Grafana ML/Sift. Infrastructure/Kubernetes: Gemini Antigravity/CLI, HolmesGPT, Google kubectl-ai.
	• Data-Driven Decision Making & Reporting: Take ownership of establishing and tracking SLI, SLA, and SLO metrics to ensure products deliver value to internal and external customers. Utilize data analysis to inform resource planning, cost optimization strategies, and provide regular progress reports to stakeholders, including senior management.
	• Security Best Practices & Compliance: Implement and maintain security best practices for microservice architectures including least privilege access, defense-in-depth strategies, authorization, secrets management, and encryption to accommodate compliance frameworks such as SOC 2, PCI DSS, PSD2, and SCA.
	• Cloud Security & DDoS Mitigation: Leverage Cloud-based Web Security systems (e.g. Cloudflare, Akamai) for DDoS protection, service rate limiting, and WAF (Web Application Firewall) configurations. Utilize AWS IAM, security groups, and Well-Architected Framework best practices to secure cloud infrastructure.
	• Incident Management & Response Leadership: Own, develop, and maintain the Incident Management process, while also operating as First Responder, Tech Engineer, and Incident Manager, to ensure rapid issue resolution and minimize service disruption. Delegate tasks and coordinate response efforts during critical incidents.
	• Post-Mortem Analysis & Process Improvement: Lead blameless post-mortem analyses, implement action items, and foster a culture of continuous learning to reduce the recurrence of similar incidents.
	• Project Management & Executive Reporting: Manage project deliverables, timelines, and budgets. Generate and present project deliverable reports and timelines to Senior ICs, Directors and C-Level Teams, including budget and resource utilization reports. Effectively communicate project status, risks, and dependencies.
	• Mentorship & Team Development: Lead and mentor a team of engineers, helping them to be more productive and achieve Company and Personal goals. Focus on developing individual skills and fostering a collaborative team environment.
	• Talent Acquisition & Development: Lead the recruitment, interviewing, hiring, and onboarding of high-performing engineers. Improve onboarding processes and documentation to reduce team training and ramp-up time. Mentor team members to achieve company and personal goals.

Systems Platforms	Scripting & Coding	Monitoring & Alerting
• Amazon EKS, Google GKE	• Shell Script	• Grafana, Datadog, New Relic, Loki, Elasticsearch
• Ubuntu, CentOS, Fedora	• Python	• Prometheus, Alerts, Exporters
• MacOS	• Golang	• CI/CD & GitOps: Terraform, GitHub Actions, ArgoCD, GitLab CI, Kubernetes
• Windows, WSL	• Javascript, Typescript	• Atlassian (Jira, Confluence, etc.)

Work Experience


Jan 2026 - Present	Stealth Startup
	B2B Technology consulting, focusing on automation, scalability, serverless using large Cloud Providers and GitOps based CD/CI pipelines.
	Principal Platform/DevOps/Site Reliability Engineer
	Internal Ventures & R&D
	• Cloud Platform and Game Server Ops: Develop GitOps based automated solutions for dedicated Game Servers.
	• CI/CD Pipeline: Develop and support fully automated CI/CD pipeline for Web, Desktop and Mobile applications. Apps being built and deployed for Open/Public Users in Google Play, Apple MacOS/IOS, Steam and Epic Game/App Stores.
	• AI Integration: Integrate Nvidia Omniverse digital-twin objects using Unreal Engine 5 and LLM APIs for next-gen interactive environments.

	Strategic Direct Placements (Full-Time Embedded)
Feb 2025 - Jan 2026	UltraViolet Cyber, https://uvcyber.com
	Leading tech-enabled managed security services provider that unifies offensive (red team) and defensive (blue team) cybersecurity operations into a single, comprehensive platform.
	Staff Platform/DevOps/Site Reliability Engineer
	Infrastructure Orchestration
	• Lead the design of critical infrastructure updates using Pulumi, Terraform, ArgoCD/Rollouts to drive GitOps-based continuous delivery across various AWS EKS Kubernetes clusters
	• Manage the full Change Request lifecycle including strategic presentation and approval from the Change Advisory Board (CAB).
	Observability Architecture
	• Architect and maintain a “single pane of glass” observability ecosystem using Grafana, Prometheus, and Cribl to unify data streams for enhanced operational visibility.
	• Integrate logs, metrics, and alerts from AWS CloudWatch, PagerDuty, and Cribl to ensure comprehensive system monitoring.
	• Ingest and visualize event data from SentinelOne and Torq, alongside operational metrics to correlate security incidents, events, and data drops.

Mar 2023 - Jan 2025	Zepz Inc, https://zepzpay.com
	British based global digital cross-border payments platform that enables international money transfers.
	(Sr. Site Reliability Engineer)
	Indie Game Dev backed by Polychain, Polygon, Circle, and Sui
	Enterprise Observability Migration
	• Developed, deployed, and refined infrastructure monitoring configurations across Datadog, NewRelic, and AWS CloudWatch to support massive scale and reduce alert fatigue.
	• Spearheaded the strategic consolidation of various fragmented monitoring tools (Datadog, NewRelic, AWS Cloudwatch, etc.) into a unified Grafana Cloud ecosystem, significantly reducing operational overhead.
	Synthetic Monitoring Strategy
	• Directed the migration and enhancement of remote synthetic checks from legacy vendors (Site24x7, Pingdom) into a centralized Grafana dashboard.
	Cross-Functional Enablement
	• Partnered with DBA and DevOps leadership to modernize monitoring architectures for mission-critical databases and cloud infrastructure.
	Incident Resolution Engineering
	• Engineered and refined incident response protocols, reducing MTTR for company-wide outages and ensuring high availability (9s) for business services.

	Key Consulting Engagements (Contract)
Jan 2023 - Mar 2023	Worlds Gaming
	Indie Game Dev backed by Polychain, Polygon, Circle, and Sui
	(Technical Consultant)
	• Cloud & Pixel Streaming: Architected a Proof-of-Concept (POC) for high-fidelity game streaming using Nvidia GPU instances across multi-cloud providers (AWS, CoreWeave, GeForce Now).
	• Kubernetes Orchestration: Integrated auto-scaling infrastructure for dedicated game servers using Kubernetes clusters and UnrealContainers.
	• LiveOps & Data: Optimized user data pipelines, implemented build monitoring, and managed live technical operations for game match demos at the ETHDenver Conference.

2021 - 2023	Improbable Worlds, https://www.improbable.io
	British multinational company focusing on technology to support large scale games, metaverse, and virtual worlds/events
	Sr. LiveOps, DevOps and Site Reliability Engineer
	Infrastructure and Application Deployment and Management
	• Worked closely with the LiveOps, DevOps and Dev Teams to build, test and deploy scalable application and infrastructure Tech stacks based on Terraform, Dockerized Applications, Helm Charts, Kubernetes Clusters, etc.
	• Planned, Developed, Tested and Deployed large-scale system upgrades, with additional HA redundancy and monitoring to reduce the risk of Customer or User impact.
	• Developed and upgraded dashboards and alerts using SLOs, SLAs, KPI metrics and logs.

2019 - 2021	Sage Intacct, https://www.sageintacct.com
	British based software company focusing on financial services and management
	Sr. SRE, Site Reliability Engineer
	Infrastructure and Application Monitoring and Alerting
	• Worked closely with the SRE, Cloud Ops and DBA teams to build, test and deploy scalable application and infrastructure Tech stacks based on custom Python, Bash and Config Management Tools.
	• Completed a deep-dive review of the existing Metrics Collection, Storage and Visualizations infrastructure. (ELK Stack, Ansible/Chef, Nagios/Zabbix, Pagerduty, etc)
	• Completed full evaluation and scoring for several modern industry standard alternatives against existing requirements and desires for a complete revamp/replacement.
	• Collected feedback from various stake-holder Teams and individuals about scoring values for viable alternatives.
	• Architected a full project plan to deploy and migrate to a newly developed metrics collection and storage solution, while carefully scaling back and retiring the legacy system.

2012 - 2019	Sony Interactive Entertainment, Playstation, https://www.playstation.com
	Playstation Now, a global video game streaming platform
	SRE, Site Reliability Engineer
	Infrastructure and Application Monitoring and Alerting
	• Worked directly with Onsite DC and “Remote Hands” Engineers to deploy thousands of new servers and network hardware to dozens of datacenters and POPs in countries around the world
	• Defined and updated KPIs, SLOs, SLIs, SLAs, metrics and alerting
	• Designed and developed solutions to collect, search and visualize logs and events and fire alerts and notifications to appropriate Teams based on application errors, logs, KPI and SLA breaches. Utilizing internal and open-source tools and tech like Elasticsearch, Kibana, Prometheus, Grafana, Ansible, Ceph, Opsgenie, Kubernetes, Gitlab CDCI, Fluentd, Rsyslog, SNMP, etc.
	Automation and Hands-on Operations
	• Configured and maintained Amazon Web Services (AWS) and Google Cloud Platform (GCP) cloud computing environments
	• Performed operational tasks to mitigate major (business or customer impacting) incidents, or unblock Team members, where automation is not yet in place.
	• Developed operational tooling, for “one off” updates and playbook automation
	• Improved automation for systems inventory updates and configuration management
	• Optimized and improved SDLC/CDCI pipeline, processes and infrastructure
	Solutions Architecture
	• Performed requirements gathering and resource planning for new projects
	• Researched and evaluated industry standard solutions
	• Evaluated and compared onsite, private, public cloud service options and offerings, including feasibility, compatibility, security, compliance and TCO evaluations
	• Maintained up-to-date understanding of all mission critical infrastructure, service architecture and updates
	• Documented, communicated and advocated for SRE best practices throughout the company
	Technical Lead and Project Management
	• Managed project timelines, deliverables and resource planning
	• Led architecture and design sessions for cross team projects
	• Provided cross-team architectural consulting, production readiness review and validation
	SRE Team Building
	• Proactively helped to build and scale out an effective global SRE Team
	• Reviewed, interviewed and screened potential SRE candidates
	• Trained and mentored team members
	• Developed, maintained and updated candidate screening and interview procedures and processes
	• Updated and maintained “New SRE” startup and training materials
	Incident Management Process and Reporting
	• Participated in Oncall Rotation
	• Developed and Maintained Incident Management and Review Processes
	• Developed and communicated RCA and issue mitigation plans
	• Refined and improved KPIs, SLOs, SLIs, SLAs, metrics and alerting, based on incidents and discovered observability gaps
	• Performed and reported RCA and Postmortem findings
	• Troubleshot/break fixed and/or escalated discovered issues to relevant teams or engineers

2010 - 2012	Digital Domain, https://digitaldomain.com - Vancouver, BC and Port St Lucie, FL
	Sr. Systems Admin and Engineer
	On-screen Credits: The Legend of Tembo, Jack the Giant Killer, Transformers 3, Tron Legacy, Thor
	https://www.imdb.com/name/nm1250137/
	Systems and Infrastructure:
	• Worked to duplicate, setup and integrate Linux environments for new 200 seat and then new 500 seat VFX Studios. Which included 200+ HP Workstations, 1000+ HP High Density Blade Servers and 100+TB of Isilon or NetApp Enterprise class Storage, and high performance Brocade switching environment.
	• Setup, configured, and maintained OS and Software installation and configuration management systems ( Redhat Kickstart, Onesis, Puppet, CFEngine, etc).
	• Worked with sister companies in the US and Canada to integrate VFX Pipeline and Software synchronization. Including CentOS Linux operating system updates and changes, site specific software package installations and deployments, etc.
	• Worked with Linux Kickstart, Onesis and Puppet to setup fully automated bare metal installs for CentOS Linux Operating systems, custom packages, connections to shared storage, custom CG Pipeline and Toolset, etc.
	• Worked to develop scripts and procedures to bind CentOS Linux and MacOSX workstations and servers to Windows Server 2008 via LDAP with Kerberos encryption.
	• Acted as Lead Support for all Render Queueing and Job Management, including automation and scripting.
	• Handled large scale file system sorting, cleanup, transfers, and digital delivery packaging.
	• Configured Symantec Netbackup to run daily, weekly and monthly backups. As well as final show archiving, removals and restorations.
	• Worked with VMWare ESXi Server to deploy, maintain and balance several key server VMs.
	• Setup and maintained Monitoring and Alerting systems for all Storage, Networking, Servers and Workstations for the Studios.
	• Acted as Level 2 and 3 technical support for all Linux and Unix based issues with all Workstations and Servers.
	• Provided detailed documentation and training for Level 1 and Level 2 Technical Support to handle commonly occurring issues.

2007 - 2009	Keystone Pictures
	Visual Effects, Lead Technical Director, Technical Supervisor
	Onscreen Credits : The “Buddies” Series (Space Buddies, Santa Buddies, Adventure Buddies, etc. )
	https://www.imdb.com/name/nm1250137
	Systems and Infrastructure:
	• Worked with several hardware and software vendors to install and configure a 100 SGI Linux 1U render nodes, 25 MacPro Workstations, and a 40TB SGI Raid Storage Server, connected through a new HP ProCurve Gigabyte network.
	• Developed a clone-able dual-boot MacOSX and Fedora Core Linux system install for the studio’s 25 MacPro Workstations.
	• Developed a clone-able Fedora Core Linux based system install for the Studio’s 100 Render nodes, using Render Management through PipelineFX’s Qube.
	• Managed and Supported the Studio’s Render-farm with 100 Linux 1U RenderNodes and 25 MacPro workstations.
	Render and Color Pipeline:
	• Worked with the CG Supervisor to help develop an AOV based render work-flow for workstations and renderfarm using Mental Ray 3.6.
	• Developed a LUT to translate between the 10Bit Log Panasonic Genesis Camera format to linear, and back within Shake 4.1.
	Character Lighting and Fluid FX:
	• “Finaled” the Lighting and Rendering of 65 animated face replacement shots.
	• “Finaled” all in-house Fluid FX using Maya 2009 and Houdini Master 10. Including dust, smoke, clouds, rocket thrusters, etc.

2000 - 2007	American Museum of Natural History, https://www.amnh.org
	Rose Center for Earth and Space and Science Bulletins Departments
	Technical Director/Unix Systems Administrator
	Systems and Infrastructure:
	• Began working with Engineering and Productions, which is a group of about 15 VisualFX Artists, System Administrators, Video Engineers, and Production Staff responsible for developing, maintaining and upgrading all Computers, Video Systems and Video Content for the Digital Dome and Space Shows. This includes two SGI OnyxII Reality Monster Super Computers, several SGI Octanes, O2, Linux and Windows graphics workstations, and 7xHDTV and 4xHDTV Projector Theaters.
	• Worked with Systems Admins and Video Engineers to Design, Create and Test a 7 Node Linux Graphics Cluster for Interactive 3D and Digital Dailies playback in the Hayden Planetarium in preparation for the upcoming full dome show. This system was based on non-proprietary, commodity-based hardware (Dual AMD64, Nvidia Quadro FX 4400, etc) and software (Linux, PiranhaHD).
	Full Dome Visual FX and Animation:
	• Worked with the Art Director to design, model and animate “Feather Dream”, which comprises 2 of the 36 Minute Planetarium Music Show entitled “Sonic Vision”. Additionally, worked to create several background elements and transitions between other shots within the show. “Feather Dream” was created using Maya6 and Shake3.5.
	• Created two 2.5 minute quarterly news animation sequences for the Science Bulletins Department at AMNH using Partiview, Uniview, Maya7, Shake3.5 and PiranhaHD, which were recorded to HDCam and then encoded to HDTV Mpeg2. For playback to visitors within AMNH, and distributed to a network of Museums and Educational Institutions around the world via the Internet.

1999 - 2000	New York Institute of Technology, https://www.nyit.edu
	Advanced Computer Graphics Department
	SGI/Unix Systems Admin
	Systems and Infrastructure:
	• Maintained and supported graphics software and hardware for Computer Graphics Labs in Manhattan, NY. Including Silicon Graphics (Unix) Workstations, Avid Video Editor, Softimage3D, Alias Wavefront, etc.

Personal Projects and Research


Oct 2022 - Present	OpenWorldGame.io, https://github.com/OpenWorldGame-Io
	Co-Founder, Lead Game Dev/Contributor
	• “Open World” is an open “sandbox” project that leverages Epic Game’s free and open-source Unreal Engine 5.x and Lyra starter project/game, along with a customized code and content delivery backend, to provide a free-to-play open space environment
	• Players/Users can use to chat, communicate, show and share ideas. While also working to incorporate Infrastructure, Metrics and other types of Data Visualization. E.g https://github.com/mpursley/UnrealEngine-Example_BluePrints_Using_Rest_APIs.
	• Including some modern updates and customizations to scale the active/concurrent User/Player count beyond what is available from a single backend dedicated gameserver. More information on this project will be released as Prototype, Alpha and Beta releases are developed and deployed.

Education


2018	• PSM (Professional Scrum Master), Scrum.org
2005	• RHCE (RedHat Certified Engineer), Redhat, Inc.
1996 - 1998	• Digital Arts/3D Animation, The Art Institute of Vancouver