The reason the cartridge and the speaker should be the first place to focus, is that they are the "least perfect" components.
Both will have variations from neutral frequency response well within the audible range - sometimes of the order of 20db + (particularly speakers!)
Here are some typical frequency response plots for a speaker generally considered to be among the most neutral... (one of the best)

Here is the Frequency response plot for the same speaker "In room"

(speaker plots borrowed from
http://www.regonaudio.com/Digital%20Roo ... ction.html - an excellent article!)
Those rises and drops in the chart are clearly audible "in room" - and will dominate the sound of a system...
Look at the peak on the in-room chart at 3.5kHz followed by the trough at around 5kHz....
Similarly with cartridges...
Here is a V15V fitted with SAS stylus... not even close to a flat frequency response - although the variations on this cartridge will be far less obvious than the ones on the speaker shown above.

So my personal opinion, is you should start by choosing those components that are "least perfect" - because these are the components that will have an immediately noticeable and audible signature that can overwhelm the system.
Then from there, you proceed to the rest of the system, and dial things in.
The other way around you choose something that has a subtle/minor impact on the sound, and then add to it something that has a "gross" impact... it seems silly to me!
The starting point (IMO) should be neutrality - attempting to achieve a system that reproduces the original recording without adding or detracting.
Once you get as close as you can to that goal, then you add the "salt and pepper" and adjust EQ, tweak the room, work on making it suit your personal taste.
The alternative is what I call "audio roulette" (similar to russian roulette) - with the results being just as random.
bye for now
David