DC Rainmaker

Stages Power Meter In-Depth Review Update

It’s been a touch over four months since I first published the Stages Power Meter In-Depth Review.  It was interesting to me in that a lot of people took very different things away from the review.

As a result of that review, Stages made a number of  updates to their power meter firmware since then, including addressing specific items that were raised as concerns during the review.  Based on that, I continued to ride with it. Every single ride for months.  They provided iterations of new firmware updates, and I updated.  Rinse and repeat.

Except, it wasn’t just riding with a single power meter.  No, it was riding with 3-4 power meters concurrently.  And 4-7 head units concurrently.  One of the Slowtwitch editors recently noted something along the lines of ‘The fun factor of these rides were approximately zero’.  Which is pretty true here as well.  Aside from being a cold and rainy winter, there’s far more complexity in ensuring that every setting and start/stop time is exactly the same when you have so many head units and power meters running concurrently.

I’m reasonably confident that outside of Stages themselves, I probably have the largest and most complete data set of a single rider against as many additional power meters as one can technically attach to their bike.  I do note ‘single rider’ because again – this is just me.  It’s not as though they gave me 10 crank arms to test with and assign to random people.  And quite frankly, I wouldn’t want that.  That’d be a nightmare.  And it’d be useless without the same painstakingly strict test protocols that I go through.  Protocols that no sane person wants to deal with every.single.ride.

If you’re just finding this page without going to the original review, I encourage you to go to the original review to get a grasp on how the Stages Power Meter works, unboxing shots, and all the usual background information.

A look at the testing methodology:

If there’s anything I’ve learned (or can note to others), it’s just how difficult it is to accurately test power meters.  Going out for a ride with two power meters isn’t a test of a power meter.  It doesn’t tell you who is right or wrong.  It just gives you two power plots.  It can tell you and show you potential abnormalities, but not absolutes.  It cannot be used to perform a full comparison review.  You must have a 3rd unit to provide perspective.  Speaking of that 2nd (or 3rd) unit, making the assumption that the Quarq/PowerTap/SRM/Power2Max is always correct is fundamentally flawed.  How do you know?  What calibration procedures have you done?  And have you done them correctly?  Even then, as I’ll show you below, it’s easy to make some of those units go askew in certain conditions.  Knowing those conditions is critical.

The same goes for data collection.  Each head unit records data differently, and finding ones that record data the same way is critical to testing.  One of the tools I got added to my bag for these tests was the WASP unit.  The WASP allows me to simultaneously collect power meter data from an unlimited number of ANT+ power meters (or other ANT+ accessories) concurrently.

Further, not only does it collect that data concurrently with a timecode, it also collects at a higher rate than a typical Garmin (or other head unit).  Normally Garmins will pick one of the 1-8 broadcasts per second, and record that.  Whereas the WASP will collect all samples per second and record the average of those.

You can see a screenshot of what this data looks like below:

Which isn’t to say I just used the WASP.  Nope, almost all of my rides has between 4 and 7 head units recording concurrently.

This means that every single time I was following a set procedure on how to collect the data, which included:

1) Validation that each power meter was paired to the correctly labeled head unit (validation of ANT+ ID against known PM ANT+ ID)
2) Validation that each power meter did a manual calibration prior to the start of the ride
3) Validation that each head unit was recording at the same settings (1s recording, cadence and power zeros included)
4) Validation that all were using an external speed sensor for indoor rides, and that all circumferences were set identically
5) Starting all head units at exactly the same time (creative use of fingers)
6) After the start of the ride, validate that all sensors were correctly transmitting
7) At approximately 10-15 minutes into the ride, stop by the side of the road and manually calibrate all units

The calibration procedure included stepping off the bike, but over the top tube.  Then putting the cranks in the 12/6 position, and then manually calibrating each unit.

Post ride, all of the data would be collected into a single folder and then labeled by power meter and head unit.

While this sounds somewhat simple, doing all seven steps 4-7 times (for each head unit/power meter combination) really adds up.

And that’s all before I even start analyzing the data.  Which usually takes hours per ride.  There is no application out there today that can cleanly generate all the charts and data plots you see in this review.  That’s all done with Excel, painstakingly.  A simple 90 minute ride has over 20,000 power meter data points alone to correlate and analyze.

Ultimately though, I have a lot of good data to work with.  Clean data, more correctly.  There were certainly (many) rides where things went wrong somewhere in steps 1-7, meaning that ride got tossed out.  It could be something as simple as the battery dying, or it could be that a unit got inadvertently stopped without me realizing it or that there was some form of ANT+ interference.  All of it meant that the ride got tossed from using in this review.

All data shown in this review is prior to the firmware update from approximately two weeks ago.  All raw data for this review is available at the end of the review for anyone to download and analyze should they wish.

Some random thoughts before we get started:

Before we dive into the analysis, I want to cover some ground on a few topics briefly.  Mostly as a way to ‘catch-up’ folks on various areas of note relevant to this review.

On my pedaling: It’s been funny how some have attempted to identify issues with my riding style during the original review, somehow impacting the tests.  Some said I was left-leg heavy (thus impacting things).  Some said I was right-leg heavy (more issues).  Some said I stopped and started my bicycle the wrong way.  Or pedaled the wrong way.

I say this in the nicest possible way: None of that matters.  Really, it doesn’t.  It’s trying to find fault where fault doesn’t lie.  Either the product works with a random cyclist (me), or it doesn’t.  Whether it works with a different random cyclist (you), is certainly debatable. I lack the concentration to somehow pedal a certain way for hours on end.  Perhaps a professional tour rider does, but for me, I’m just gonna keep on pedaling the same way I have since I had training wheels on.  Which based on what I can tell – is probably the same way you pedal.  And at the end of the day, it’s all about whether the unit works across the board – yes or no.

On studies of how people peddle: There’s certainly been some interesting studies on how people peddle.  I’ve looked at a LOT of studies on this topic.  But there’s some key issues that folks like to talk around.  First is that most of these studies are 20-30 years old.  That doesn’t mean they aren’t useful.  But that does call into question the accuracy of the data collection methods on left/right power meters.  Keep in mind that it’s last year that we finally got a left/right power meter that works outdoors.  Most of the studies are indoor-based, and it’s well proven that power meters act differently indoors than outdoors.  Even the more recent ones are very small in their data sets – literally in some cases just a few rides.

Again, I’m not saying to ignore those studies.  But I am saying to take them with a boulder-sized grain of salt.

On ‘second/update’ reviews: This is the only time I’ve ever completed a ‘second review’ on a product.  Historically when a company prematurely releases a product, they have to live with the reviews published to the internet based on premature release.  Ask Motorola how that worked out for the Motoactv.  Or Garmin.  I often go back and make minor changes or updates based on new features or changed functionality, but not wholesale new reviews.  Power meter reviews are actually the most complex reviews I have to publish.  They are incredibly tough to get ‘right’, and a lot of data collection and analysis goes into it.  Thus, when I publish a second review for a product, that means at least another 2-3 products in The Queue get pushed out further and delayed.  That’s the only way it works in a time-constrained system.

On data collection: One aspect that some have wondered whether it impacted the end results was the Edge 510/810 in some of the original tests, which had an issue that resulted in some power drops.  Out of curiosity, I looked more closely at this and went and actually ‘nulled’ those drops (they were very predictable timeline-wise in that particular firmware version).  However, that still didn’t resolve the core issues brought up in the review around variability.  Nulling out the Edge issues only moved things about one half of one percent in most cases (on average it occurred once every 2 minutes).  So while it did have an impact, it was sorta like dumping a glass of water into a flooded house.

On differences between power meters: I see a lot of talk about the holy grail of never switching between power meters because it means your data will be offset.  That’s true.  There’s a fundamental difference in power measurement location between using a crank based power meter and one on a trainer or wheel.  No doubt.  But I’d argue that in the scope of power meter technology today – it doesn’t matter.  I’d argue that most folks don’t calibrate, and even those that do, wouldn’t necessarily know when the data is right or wrong or when a power mis-calibration has occurred.  ‘In power meter we trust’.  Can you, out on the open road, tell the difference between 5w higher or lower for 4 seconds?  How about 10w on an hour long climb?  And if you can (which, you might), can you tell me where and when that variation started to occur?  And can you do it over the course of multiple years and ensure that ever single ride was calibrated perfectly?  And, as you’ll see below – that’s the real question, and not just for the Stages, but for any power meter.

The Tests and Results: Indoor Rides

Let’s dig into a handful of rides.  These are rides where all data recording aspects went as planned, thus enabling us to really dig into the data.  As with the previous review, any obvious ANT+ transmission errors (i.e. interference) were nulled as to not impact any specific power meter.  This is not the same as spikes or drops however, and in the event of those, they were and are specifically called out.  ANT+ interference errors are easily seen because they tend to affect all data channels (i.e. heart rate included).

These two rides were done indoors on trainers that have the capacity to both generate resistance as well as measure power.  That’s key because it gives us even more data points to work with in some cases (what the resistance ‘should’ be).

90 Minute Indoor Trainer Ride:

This indoor ride was completed on the CompuTrainer (CT), with three additional power meters: The PowerTap, Stages, and Quarq.  Per the calibration procedure, each was manually calibrated (or roll down in case of CT) prior to the start of the ride.  Then again at the 20 minute marker, all were manually calibrated again.  The workout itself was as follows:

A) 10-Minute warm-up
B) Some high-cadence work for 10 minutes
C) (Then Calibration)
D) Building for 15 minutes
E) Then 3 minutes easy
F) 3 x (10 minute intervals with 2 minutes easy in between)
G) 4 x short 30s sprints
H) 5 minute cooldown

With that in mind, let’s look at the overall stacked graph below. This means that the numbers are simply stacked on top of each other.  It doesn’t mean that the Quarq is measuring higher.  I did this just because for this graph it’s easier to see.

As you can see, the numbers ‘tracked’ quite closely across all units.  But as I discussed in the first review, creating a comparative graph isn’t as useful because it tends to ‘skip’ over details, such as the exact variability between units.

Next let’s look at the difference between the power meters in watts.  This is somewhat complex to display on a single chart for all units at once, so it’s in multiple charts instead.  The titles specify which power meters are being compared.  The vertical axis shows wattage, and the vast majority of the ride my average wattage is between 230 and 280w (to give context on percentage).

Note that all charts are sized with a min/max vertical axis of –80w and +80w.

Next is against the CompuTrainer itself.  Note that the CompuTrainer has a specified warm-up period of generally between 10 and 20 minutes.  Thus why you see the divergence there for those first 20 minutes.  It’s not the Stages causing that divergence.  Once I complete the secondary calibration on the CompuTrainer, it snaps right into place.

For completeness, here’s the Quarq vs PowerTap numbers.  As you can see, any two power meters will differ.

So what do you see above?  Well, in all the charts the vast majority of the time you see the difference being less than 20w.  You’ll see some spiking towards the end, but that’s in the 500w+ intervals that I was doing, and thus the difference is likely due to lag more than anything else.  But remember, we’re not looking at any difference itself as being bad, but rather the variation of the difference.  Each power meter measures power in difference places.  As a result, the PowerTap will generally show less wattage than the Quarq, for example.  So we’re looking to have more of a steady line – wherever that may be (high or low).

In looking closely, you see that in general the variation was lowest when comparing the PowerTap to the Quarq, and the Stages to the CompuTrainer.

But if we step back and look at this graph from the standpoint of a coach, focusing in particular on the three main interval sets – it’s clear that you can easily discern what the athlete is doing, and their output level.

If I look at just the first 10-minute interval for example, here’s the averages:

Quarq: 287w
Stages 278w
PowerTap: 288w
CompuTrainer: 270w

And the second interval:

Quarq: 282w
Stages 274w
PowerTap: 282w
CompuTrainer: 266w

And the third interval:

Quarq: 281w
Stages 269w
PowerTap: 280w
CompuTrainer: 266w

As you can see, any coach could easily use any of those numbers to give perspective feedback to an athlete on how this ride went.  In my case, all three intervals were set to essentially the same values at the start, with a slight fade  of 10w over the course of the interval (where I backed off the wattage to keep within a HR zone).

What about one of those sprints at the end?  Well, here’s what one of those look like (averages including the build/fade):

Quarq: 487w
Stages 458w
PowerTap: 457w
CompuTrainer: 441w

As you can see, there’s a bit more variation, but not much.  But which one is right?  That’s the tough part.  How do you quantify exactly which one is correct?  The Stages and PowerTap were only 1w apart.

Now let’s look at total ride averages.  As noted once before – that’s the absolute easiest bar to meet.  I can put up a $99 PowerCal strap and get pretty close to spot-on averages (within a couple watts).  But nonetheless, here they are:

We see that the Quarq is the highest, which is logical – it’s measuring power closest to my legs.  And the PowerTap and CompuTrainer are lowest, also logical given their place later in the equation (due to drivetrain loss).  We see the stages sits below the Quarq, and in this case slightly below the PowerTap as well.  For reference, the difference between the Quarq and the Stages is 4%, whereas the Stages and the PowerTap is 1.5%.  And the Stages and the CompuTrainer is less than 1%.

80 Minute Indoor Trainer Ride:

Ok, next up, another indoor trainer ride.  The structure was fairly similar as the first one:

A) 10-Minute warm-up
B) (Then Calibration)
C) Some high-cadence work for 10 minutes
D) Building for 15 minutes
E) Then 3 minutes easy
F) 3 x (8 minute intervals with 2 minutes easy in between)
G) 4 x short 30s sprints
H) 5 minute cooldown

With that in mind, let’s look at the overall stacked graph below.  Again remember that the stacked graph simply shows all of them on top of each other, thus there will naturally be gaps.  It’s used to easily see the differences.

So let’s dive into those differences.  Like above, I’ve done ‘difference’ charts pitting the Stages up against each one.  Here’s the Quarq vs Stages – difference in watts.  In order to keep them inline with the earlier charts, the scale was kept at +/-80w.  In the below example it bumped just a touch bit higher in those intervals, at 94w.

So before we move onto the others, you’ll see that in general it’s within 20w the entire time.  Again remember that delays in transmission and recording can cause some of the variability.  The spikes you see at the ending are due to the quick sprints I was doing.  Because of the fact that these were only 20 second sprints at a high intensity (500w+), they can easily produce differences like you see due to that delay.

Here’s it plotted against the KICKR (via ANT+):

And then here’s the Quarq and KICKR plotted.  Remember all these graphs are smoothed at 10s (the underlying data is).

You’re probably looking at the above and seeing a lot of variability with the KICKR.  And that’s true.  Remember that the KICKR measures power based on changes to speed.  It’s doing it differently than based on pure strain gauges.    What you see above is that during the portions of the workout where I’m shifting speed/cadence/power significantly (the high cadence portions & the sprints), we see variability due to data lag.  But in the main sets we see the values very close (less than 10 watts).

Here’s the average/max/NP for the ride:

As we can see, the average and NP numbers were very close.  The max watts on the KICKR was a bit lower, but that makes sense because it wouldn’t likely have felt a 1s spike during a sprint as high as the Quarq or Stages.  And at 753w, the difference between the Quarq and Stages is exactly 2.5%.  Well within the published margin of error for either unit.

The Tests and Results: Outdoor Rides

Now we get to the fun stuff – outdoors!  While I have lots of rides in Paris, I’m actually using two particular rides below for a reason.  First is that I have the WASP data, which makes it easier and cleaner to visualize.  But second is that unlike my Paris rides which are full of stops due to traffic/etc, these are more or less nonstop.  Thus making it easier to both visualize as well as spot any differences.  With stop/starts of traffic, it can become very difficult to separate out drops/spikes from simple stopping and starting rapidly.

Las Vegas Desert Ride:

This was a ride I did while in Las Vegas in mid-April.  First up is the stacked graph.  Now, this can be really busy looking – because it’s far more variable outside than inside.  The route itself is more or less never-ending rollers.  So I’m constantly shifting power according to terrain.  Note, you can click on any of these to expand a bit.

So, let’s smooth things out a bit with a 10-second average:

Again, remember these are stacked, and thus not the actual difference between the units – but rather the relative differences in how they track.

Now let’s look at the differences between each one.  As with before, these are all smoothed at 10s.

Now for the Stages vs PowerTap:

And finally, Quarq vs PowerTap:

Now, the challenge here continues to be the variance in outdoor data when comparing rides side by side.  So I applied a 1-minute (60-second) smoothing to it:

So within this, we can clearly see how they tracked.  In most cases they aligned quite well.  We see that in general the Quarq tends to ‘rise’ above the rest from a max standpoint, either because it’s measuring further up the drivetrain (likely), or because it catches some of the short bursts a bit better.  We see that the Stages pretty much just slides in between the Quarq and the PowerTap and tracks well against both.  The only cases where we see differentiation seem to come from the PowerTap on some of the descents – reporting a bit lower power than the rest.

Finally, here’s the totals across all three units:

As you can see, all within the same ballpark.  But again, getting ride total averages in the same ballpark is pretty easy in the grand scheme of power meters.  What I do appreciate though is that you can start to see a pattern between the Stages, Quarq and PowerTap being developed.  We see that the Quarq tends to be the highest numbers (Avg/NP), with the Stages slightly below it, and then the PowerTap beyond that.  This likely means that my left leg is just a tiny bit weaker than my right leg, as the Stages is only measuring left-leg.  The difference between the PowerTap and Quarq makes sense and is inline with expectations, likely due to drivetrain loss.

Mountain Ride:

This ride was done shortly after the Vegas ride.  But now I’d travelled to Los Angeles and this ride was starting right at the base of the nearby Angeles National Forest (basically a mountain range), and then heading up into it.  The weather down low and on the climb was miserable (pouring rain, cold), but up top it was beautiful.

I really wanted to include this ride because it shows just how massive the impacts of weather and calibration can be on data.  Data that unless you had multiple power meters on your bike, you’d likely not realize there was an error.

First up, let’s look at the stacked graph.  Quite frankly, this is a mess to try and decipher– so let’s just move on.

So let’s go ahead and apply a 1-minute smoothing to it.  This creates a rolling average of the last 60-seconds of data.

Wow, lots of interesting stuff in there.  But before we do that, let me give you the elevation profile of the ride that goes along with this.  This is set to display as ‘time’, because that’s the same as above (seconds).  I specifically moved the elevation points to the right side of the graph, so that it basically aligns visually to what you see above.  Where the numbers are on the right side the mountain just goes back down (I start/end in the same place).

What that in mind, what you see is that there was no place for any auto-zero type technologies to kick in on either the Quarq or the PowerTap.  In the case of the PowerTap, that happens while coasting.  And in the Quarq, when I backpedal.  Since I was literally climbing for nearly an hour straight – the only way to do so would have been to stop and get off my bike.

So I did….

First calibration: You’ll see a manual calibration I did (I marked it on the chart two screenshots above), this was about 15 minutes up the hill, where I literally pulled off to the side and manually calibrated.  In doing so, all three PM’s started to align again.

But wait, that didn’t last terribly long.  Look below.  In yellow highlighter I’ve highlighted the two points where I did a calibration or auto-zero.  As I continued to climb, you see the power meters start to drift apart.  The stages stays relatively constant, but the Quarq drops off significantly – upwards of 50w+.  And the PowerTap even starts to drift downwards as well, about 10-15w.

As soon as I pulled over to a random viewpoint and did an auto-zero coast, they both snapped right back in place.

Why were they drifting?  Well likely because of this:

This is the temperature chart for my ride.  You can see a 15*F+ shift.  Keeping in mind that the Edge 800 temperature gauge (which is what this is from) has the updating speed of a turtle.  It would literally take 5-10 minutes to drift from 72*F to 0*F in a freezer.  So in reality, the temperature shift is likely closer to 20*F+.  Here’s what it looked like outside (it’s pouring):

So how do I know that the PowerTap and Quarq were drifting, and that it wasn’t just the Stages?  Well, some if it comes from knowing yourself.  In my case my heart rate stayed pretty constant across that timespan.  And while heart rate isn’t always a great indicator of power, it does help provide context.  I certainly wouldn’t have lost 50-70w in wattage over the course of just an hour climb.

Next is that the Stages contains temperature compensation, whereas the Quarq doesn’t.  Also, once the auto-zero was done on the Quarq and PowerTap, everything instantly aligned back to where it should have been.  Keep in mind there is no manually triggered auto-zero on the Stages (happens continuously), so there was nothing changed there at that time.

We also see some of this same drifting in reverse (plus a bit of other funkiness) happening to the Quarq on the descents on the way back down.

So, as we look at the ride totals, you’re going to see data different than ‘the norm’.  Because the Quarq and PowerTap were measuring low during the climb, these numbers will be lower for average and normalized power.  Of course, that doesn’t impact max power, which is across the entire ride.  In this case, we do see a fair bit of variation in maximum power – more so than I would have expected, with them each offset about 100w (200w range in total).  The challenge with max power though is that it can be one split-second packet that determines it.

So where does this leave us?  Well, the Stages appears to have a fairly solid temperature compensation system built into it.  The PowerTap didn’t drift significantly in comparison to the Quarq, though we certainly saw that.

Now, when we look at the middle portion of the ride where the temperature was fairly constant, we see that all three units tracked very well against each other:

Excluding the climbing/descending aspects, you could have easily used the middle data from any of those power meters.  It’s only when you include the climbing/descents that you reduce the viable units to use for this particular cold and rainy day.

Cadence items of note:

I wanted to briefly cover cadence, though I thought it was pretty well covered in the original review. As you may remember, cadence within the Stages Power Meter does not depend on a cadence magnet, and thus uses an internal accelerometer.  This means that there is no magnet installation required, nor any other sensor required on your bike.  It just does its thing internally to the pod attached to your crank arm.

Now in the original review people seemed to continually look at the graphs and think that I said there were cadence issues with the unit.  Despite clarifying this numerous times, there was still confusion there.  What was said at the time was that below 60RPM we saw some impacts on torque (and thus power), but we didn’t see any issues with the cadence itself.

I tested the cadence range down to 30RPM, and up to just under 200RPM – against a known good.  In this case that ‘known good’ was a traditional magnet-based cadence sensor.  (Fun testing aside, it’s actually interesting to see the Stages PM drop off at precisely 30RPM.  31RPM is good, 30RPM gone.)

Taking a look at an indoor plot first, this is cadence of the Stages cadence vs Bontrager magnet cadence sensor.  The graph is the 10-second running average plot, variation shown in RPM.  Really do take note of the scale here though.

As you can see the average difference was between 0 and 2RPM.  But again, that’s because there’s going to be some reaction time delay there from an electronics standpoint – so even just a single second delay would show up here (delay caused by transmission or recording).  Said differently: They look basically spot on.

Now, here’s an outdoor ride (the Vegas one):

In this case you see more variability because of stops and starts being a factor and the data time slice needing to be just 1-2 seconds.  So from a post-ride data analysis standpoint, it’s actually relatively difficult to see.  To exemplify this, I went ahead and looked at a few of those areas where there’s divergence.

Now, you may be asking ‘Why don’t you just slide the entire data plot a few seconds?’.  Well, when I did that it skews off the power.  Meaning that while the power aligns fairly well from a timecode standpoint, the cadence does have a slight delay in it.  Not enough that you’d notice it out on the ride, but enough that you notice it when you stop pedaling altogether (which is the case above).

It’s one of those things that’s much easier to see when displayed on a head unit because you can look at both units at once and see that even though one might be delayed .5 to 2 seconds, it’s showing effectively the same thing.  For example, if I stop pedaling for an intersection.  One unit might take 1 second to go from 90RPM to 0RPM, whereas the other might take 2 seconds.  Thus on a graph it would look like there’s a ~90RPM gap, when there’s not.  They’re both measuring it correctly, it’s just that there’s some internal communications and recording differences.

Again, I’m simply not seeing any issues with cadence on a road bike (nor was I seeing issues before) – either real-time indoors, outdoors, or in analysis afterwards.  I don’t have a mountain bike, so I can’t in those circumstances.  I do however have plenty of cobbles in Europe though – and saw no issues there.

Pacing and Wattage Stability:

One of the core areas of concern with the previous firmware was the instability of the pace.  While power meter users will note that wattage on power meters fluctuates second to second, the initial Stages firmware introduced too much variability in my opinion – even while using smoothing options.

The best way to exemplify this is to simply show it.  So I went out and captured some simple steady-state riding down the street.  Nothing complex here, just riding on mostly flat ground.  There’s no fundamental difference between riding on flat ground or a mountain from a strain gauge standpoint, it’s all just ‘effort’.

Here’s the video clip of steady-state riding.  In case it’s not clear, there’s three head units, each labeled with Stages (left), Quarq (right), and PowerTap (top):

As you can see, I included instant power (top), 3-second (3s) power (middle), and 10-second  (10s) power (bottom) on the display.  All three ebb and flow together pretty much together.

Comparing Bluetooth Smart and ANT+ from the same unit:

After publishing this update earlier this morning a few of you asked about the Bluetooth Smart aspects, specifically focusing on comparing the ANT+ data coming from the Stages unit to the Bluetooth Smart (aka BLE) channel.  As background, the Stages Power Meter is the first power meter to offer dual-broadcasting of data across both ANT+ and Bluetooth Smart, ultimately letting the user decide what devices they’re going to connect to the unit.

On the ANT+ side you have all the traditional power meter head units (i.e. Garmin, Timex, CycleOps, etc…).  Whereas on the Bluetooth Smart side you have cell phone based applications, today limited to those on iPhone 4s and higher devices, as well as newer iPad/iPod devices with Bluetooth 4.0 in it (which is required for Bluetooth Smart).   There is not yet compatibility on either Android or Windows Phone.  I dove into the Bluetooth Smart aspects in more detail in the original review.

But I didn’t spend too much time either in the original review or in the update looking at comparative data from the Stages Power Meter when analyzing both data channels at once (ANT+ & BLE).  So since I had a longish trainer ride today, it seemed like the perfect opportunity to give it a shot.

The setup for this was relatively simple: I had an Edge 800 recording the Stages ANT+ power stream, and then I had an iPhone 4s with the Wahoo Fitness App recording the Bluetooth Smart stream.  I use the Wahoo Fitness app because I feel it’s the most complete app out there for data recording and analysis.  It doesn’t have all the ‘community’ features of some other apps, but when it comes to data and getting data in any format on earth with reliability – it rocks that boat.

This then gave me a slew of files.  Oh and for fun, I was also recording the PowerTap and Quarq concurrently – but we’ll ignore those for this test (I have included them in an updated set of raw data files however at the end of the review).

After getting all the data consolidated I started by throwing it into a 1-second chart:

Interesting, you do see some interesting variations there between the two plots – which I’ll get to in just a few moments.  One track is a bit more ‘tapered’, while the other more volatile.   But does it have an impact on segment averages?

Let’s first look at all the segments of my workout from today.  Comparing the average wattage and cadence of each segment along the way – some as short as 2 minutes, some as long as 15 minutes.  Plus the overall averages and max’s.

Again, near-perfect comparisons doesn’t tend to be exciting.  But the above is pretty astounding.  It’d be difficult to achieve that even with two Garmin Edge 500’s side by side recording the same power meter.  I would expect that because it was a trainer ride, there’s slightly less variability than an outdoor ride – so you might get a hair bit more variation there.

So why are there itty-bitty sub-1% variations (more like sub-.5%)?  Well, the Bluetooth Smart channel in this case is updating more frequently.  It’s not that ANT+ can’t do that (as in fact, I do it with the WASP units all the time as shown in this review).  It’s just that the Garmin doesn’t record higher than once per second.  Which means it may miss some stuff.  Hence why you see the higher max value on the Bluetooth Smart side – it likely had a split-second sample where I peaked higher than the Garmin even saw.  This also means that you see a touch bit smoother track on the Bluetooth Smart side as it’s not just picking one semi-random packet out of the air, but rather grabbing a bunch and averaging those for the ‘1-second’ data point viewable to us.

While this one test shouldn’t be considered the end-all-be-all of accuracy, I did want to include it for those who were curious.  It seems to me that the data is pretty darn solid though.

(Again note that this test above is on firmware prior to the latest Bluetooth Smart update, which may address any of the tiny little variances I saw.)

Final Thoughts:

Back in my first review of the Stages Power Meter, I concluded with the following statement.

“At present, based on me (and only me) it would be difficult for me to swap out my existing power meter with the Stages power meter.  There’s just too much variance and fluctuations in power.  Do I think that Stages can get there though?  Yes, I do.  But I think it’s going to take time, and likely more software work.” – January 2013

Based on what I’ve seen, they’ve done that work (and put in that time) – into the software.  The physical unit I have has not changed since the original review.  It’s the same unit I’ve had since the very beginning.  They’ve just updated the software within it.  And they took a lot of feedback from the original review and addressed issues of concern we had.

For me, I have no issues in using any of the power meters I’ve used in this review – including the Stages.  I do in fact from time to time pick different ones, and the data is generally similar enough that there’s no discernable difference.  Further, in some situations (such as nonstop climbing with shifts in temperature), the Stages simply performed better than two other units.  This likely due to its automatic temperature compensation algorithms.

As for Stages being left-only and doubling the power, for me (and again, just me), I’m just not seeing any issues there.  It’s possible that others have larger discrepancies, or that those discrepancies could vary. But in my case it seems pretty consistent across a wide variation of rides and riding conditions.

I think probably the biggest takeaway here is that no particular power meter is perfect.  Anyone who says that there is, is sadly mistaken.

Given all that, here’s the updated Pros and Cons table:

Pros:

– Cheapest direct force power meter on market today
– Easy to install.  Silly easy.
– Tons of crank compatibility options
– Accelerometer based cadence measurement works really well
– Utilizes standard CR2032 user-replaceable battery
– Automatically compensates for temperature changes
– Lightweight – 20g

Cons:

– Left leg dependent, simply doubles left leg power
– Total power could be highly impacted by your left/right distribution (but I didn’t see this)
– No method of end-user calibration validation (for advanced users)
– Doesn’t support Rotor cranks/arms as of today, or carbon crank arms

Thanks for reading!  And as always, feel free to post comments or questions in the comments section below, I’ll be happy to try and answer them as quickly as possible.  At the end of the day keep in mind I’m just like any other regular triathlete out there. I write these reviews because I’m inherently a curious person with a technology background (my day job), and thus I try and be as complete as I can.  This isn’t my full time job.  But, if I’ve missed something or if you spot something that doesn’t quite jive – just let me know and I’ll be happy to get it all sorted out.  And lastly, if you felt this review was useful – I always appreciate feedback in the comments below.  Thanks!

Finally, I’ve written up a ton of helpful guides around using most of the major fitness devices, which you may find useful in getting started with the devices.  These guides are all listed in the ‘How-to’ section.  Enjoy!

Note: Raw data files used in this review are available here.  Notes are contained within each folder.