Extract the maps, record a playthrough. Merge the two.
Record the characters position at various points (given that they didn't have a computer controlled rail-mounted camera they couldn't get that part perfect. You'll note that the camera doesn't follow mario perfectly).
Then you find a suitable location, calculate the length of the maps and note the positions where your character was at different points in time. Use that data to move the camera along from point to point, following the previously mentioned schedule.
The final part is merging the two through the use of fairly simple shape recognition software (recognize the shape upon which you're suppose to project the digital image. Pretty simple in this case). Tweak until you're satisfied.
Simple in theory, but requires a lot of work.