All Systems Operational
Mechanic Operational
90 days ago
99.95 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Round trip time (RTT) ?
Fetching
Run boot time ?
Fetching
Stores receiving delayed Shopify events ?
Fetching
Delayed Shopify events ?
Fetching
Past Incidents
Nov 7, 2024

No incidents reported today.

Nov 6, 2024
Resolved - Mail has been unpaused! All mail is being processed now. We are working closely with the mail provider to ensure we can avoid this in the future. We are also working on a backup mail provider right now. Truly sorry for the inconvenience. Thank you for your patience and support ❤️❤️
Nov 6, 19:44 UTC
Update - We are working with the vendor to unpause the email, which we expect to happen shortly. We are also investigating adding a second email vendor to prevent this issue from recurring. Thank you so much for your support, and we apologize for the inconvenience.
Nov 6, 17:40 UTC
Identified - Postmark (Mechanic's email provider) has informed us that deliveries have been paused. We're on it! We expect this to be fixed very soon.

⭐️ All emails are enqueued on Postmark's end, and will be delivered when Postmark hits the resume button on their end.

👁️ Postmark's responding to an abusive user, and the platform itself is temporarily caught in the cross-fire. We're designing away the *possibility* of this kind of abuse, so that it can't happen again. :) Postmark's responsiveness here is an important part of what lets them keep email deliverability reliable, and I'm glad to be working with them on this. They're an excellent platform partner.

I'm sorry for the trouble. We're on it. 🤲

If you've got questions, we're here! team@usemechanic.com and https://slack.mechanic.dev/

Nov 6, 15:51 UTC
Nov 5, 2024
Resolved - Our email provider has resumed email sending. All emails will be processed in the order they were received. Thank you for your patience.
Nov 5, 16:46 UTC
Identified - We just received a heads up from Postmark (Mechanic's email provider) that deliveries have been paused. We're on it!

⭐️ All emails are enqueued on Postmark's end, and will be delivered when Postmark hits the resume button on their end.

👁️ Attentive observers will note that this is the second time this has happened in a single two-day span. Postmark's responding to an abusive user, and the platform itself is temporarily caught in the cross-fire. We're designing away the *possibility* of this kind of abuse, so that it can't happen again. :) Postmark's responsiveness here is an important part of what lets them keep email deliverability reliable, and I'm glad to be working with them on this. They're an excellent platform partner.

I'm sorry for the trouble. We're on it. 🤲

If you've got questions, we're here! team@usemechanic.com and https://slack.mechanic.dev/

All the best,

=Isaac

Nov 5, 16:36 UTC
Nov 4, 2024
Resolved - Our email provider has resumed email sending. All emails will be processed in the order they were received. Thank you so much for your patience.
Nov 4, 18:12 UTC
Identified - We just received a heads up from Postmark (Mechanic's email provider) that deliveries have been paused. We're on it!

⭐️ All emails are enqueued on Postmark's end, and will be delivered when Postmark hits the resume button on their end.

Nov 4, 14:48 UTC
Nov 3, 2024

No incidents reported.

Nov 2, 2024

No incidents reported.

Nov 1, 2024

No incidents reported.

Oct 31, 2024

No incidents reported.

Oct 30, 2024
Resolved - This incident has been resolved. :) Thanks for tuning in. 🌞 🌱 🐉
Oct 30, 20:15 UTC
Update - This incident has been resolved. :) Thanks for tuning in. 🌞 🌱 🐉
Oct 30, 19:32 UTC
Monitoring - Performance has been back to normal for an hour. I'm moving this incident to the "monitoring" state, and monitor it we shall. <3
Oct 30, 18:36 UTC
Update - We're seeing periodic returns to normalcy, though it's not time to call this issue closed yet.

... because y'all this is an incredibly specific networking issue. 😂 Amazing. Both the Lightward/Mechanic team and the Fly.io team are actively riffing on this problem, testing different solutions.

Fly.io says:

> We're indeed seeing some flapping on the upstream link into AWS for the route to the particular subnet your Redis Cloud instance is on, which is something happening within AWS's own network we don't have much insight or control over. We're trying some things on our end to see if we can isolate and work around the flaky route, but if that doesn't work our (more time-consuming) last resort would be to followup further upstream with either Equinix (who provides the link into AWS) or AWS directly.

We'll get through this, and I'm gonna see if we can't get Mechanic running even faster than before this happened. ;)

=Isaac

Oct 30, 17:48 UTC
Update - Our service provider has confirmed the incident. Because y'all are often a technical crowd, here's what they said:

> We tracked the slower route to us-east-1 (~5-7 ms) down to one particular route into our upstream network that about half of our hosts in iad use - the other half is configured with a different upstream that's significantly faster (~0.5-0.7ms).

:) Mechanic is an efficient system. Those extra milliseconds count.

Fly is getting this resolved with their upstream networking provider. In the meantime, this information opens the door for a new mitigation strategy, and we're on it. 🤩

Oct 30, 00:08 UTC
Update - State of play: our query time for external services is up across multiple vendors, and none of the vendors reflect intrinsic issues on their end. It really looks like an infra thing somewhere in there. We’re actively comparing notes with Fly.io (our primary compute provider). I think (based on recorded facts) and feel (based on vibes) that the issue is in their realm, whether or not it’s “theirs” per se.

We’re digging. It’s gonna emerge or disappear, one of the two. These things always do. :)

=Isaac

Oct 29, 15:16 UTC
Update - Narrowing in on it, tightening the scope and optimizing as we go. Actively working on it. :) I anticipate getting this resolved tomorrow (October 28).
Oct 28, 02:55 UTC
Update - Still working on this. The adjustment we made helped, but not as much as I expected it to. We're getting in touch with our infrastructure provider - I really want to get that RTT number back under 2 seconds.

If you've got questions, I'll be in Slack! https://slack.mechanic.dev/

=Isaac

Oct 26, 20:21 UTC
Identified - Isaac here! We're aware of an issue causing Mechanic runs to be performed a *hair* more slowly than before. Typically, platform RTT (time between an event arriving and a resulting action being dispatched, assuming a ~instantly-performing task run) hovers around 2 seconds. Since October 24 ~8am UTC, platform RTT has been about 10 seconds. (This data is all published in realtime at https://status.mechanic.dev/.)

We've identified the component responsible for the slowdown, and are in the process of adjusting our infrastructure accordingly.

I'm really sorry y'all. We're on it!

Oct 25, 22:14 UTC
Oct 29, 2024
Oct 28, 2024
Oct 27, 2024

No incidents reported.

Oct 26, 2024
Oct 25, 2024
Oct 24, 2024

No incidents reported.