But even the mere change in amplitude is an issue.
Well, everything is, or can be, an issue. The real question is will
you settle for it all being unknowable, or is there value in knowing at least something about what the amp does?
For instance, in any car, there are hundreds or thousands of tradeoffs and interactions happening when you push the gas pedal. But most will not look at it as unknowable magic, but see a simple fact of "I push the gas pedal further, the car goes faster" (up to some limit).
And the use of a dummy load to measure output power is fine, but is another form of compromise. It does not measure the performance of the amp as actually used into a dynamic speaker load.
We can stipulate this fact; a speaker load will behave differently than a resistor load. The answer to understanding an amplifier's capability is to have a simple starting point, then layer on the additional factors which will alter the answer.
The simple starting point is a resistor load and a sine test frequency, at some maximum level of distortion measured at the resistor. To get the ball rolling, assume distortion is zero when power output is measured, and the load resistor matches the OT tap's specified load impedance.
One can increase the test signal strength until distortion appears, and then back it down just until the point of maximum voltage output and zero distortion. Now you measure the RMS voltage output and apply the math to get the
maximum clean output power. You could also look at the size of your test signal and say, "For the point where this signal is injected, it represents the largest signal which can pass cleanly to the final load." That might be important if something in the entire amp circuit could distort before the output tubes. It might also be useful to apply the test signal directly at the output tube grids, if you have a signal source which is clean and can generate those large voltage swings.
Now if you look at a
plot of speaker impedance (the faint line on the graph linked), you'll notice that the rated 8Ω is the lowest impedance in the usable frequency range; when you move away from ~300Hz where the speaker is truly 8Ω, the impedance rises. This
may have an effect on the amplifier's output power. But to know if it will, you have to know something about the amplifier. Knowing nothing else, we should assume power transfer will b less than perfect because the load isn't matched, so clean output power is lower than our first measurement with a resistor. If nothing else, you will know clean output power cannot be any higher than the value measured with the resistor.
If you used a sine wave for the resistor measurement, you also know something else: Peak power output is double the measured RMS output power (take this on faith for a sine wave, or I can bore you with the math derivation). If the sine wave was so distorted it became a square wave, then the peak power output would be equal to the RMS power output, so you just found something else useful from using a sine and a resistor load: worst-case speaker heating is double your maximum measured clean RMS power output, and you can rate the speaker accordingly if you want to prevent it from blowing.