Within SR Maxwell's equations take the usual form in Galilean coordinates of Minkowski space. Minkowski space is one of the two possibilities following from the special principle of relativity (Galilei's principle of inertia). In modern terms this can be formulated as postulating the existence of inertial reference frames and that for an inertial observer space is described as a Euclidean affine manifold. Under these assumptions the possible symmetry groups are the Galilei or the Poincare group, leading to Galilei-Newton spacetime (a fiber bundle) or Minkowski spacetime (a pseudo-Euclidean affine space with signature (+---) or equivalently (+++-)). This implies the existence of particularly simple coordinates, which for the Minkowski spacetime are Galilean coordinates based on the choice of a Minkowski-orthonormal basis. That's however a pretty abstract mathematical approach.
Characteristically for Einstein's works of his earlier years Einstein was after an operational physical construction of the spacetime description, and that's why he took as postulates the special principle of relativity and from the Maxwell equations the only piece that is relevant for this construction of the spacetime description, i.e., the demand that the speed of light, as measured by an inertial observer, must be independent of the motion of the light source relative to the observer, which follows from the assumption that the special principle of relativity should hold also for electromagnetic phenomena.
From this he derived the Galilean coordinates by his choice of clock synchronization, using light signals. Together with the insight that one should synchronize clocks being at rest relative to each other and using just one reference clock of one inertial observer and assuming the above symmetry principles (particularly the isotropy of the Euclidean space wrt. the inertial observer) you have to use (a) the two-way speed of light being ##c## and then assume (by "convention") that the one-way speeds back and forth between the observers reference clock and each one of the other distant clocks is also ##c##, independent of the direction (isotropy) and distance of the other clock (homogeneity). Then he could show that this clock synchronization procedure is transitive, i.e., that then any two clocks within the one inertial frame and being at rest relative to each other within this frame are synchronized. From this the usual Lorentz transformations between different Galilean spacetime coordinates follow, and using these coordinates the Maxwell equations are form invariant, and that was the aim of the paper given in the famous first sentence, i.e., to eliminate the asymmetries implied by the then standard interpretation of the Maxwell equations as distinguishing a preferred reference frame defined as the rest frame of "the aether", which had quite odd properties to begin with anyway.
The conclusion then was that not the until then sacrosanct Newtonian mechanics had to be preserved but the Maxwell equations had to be form invariant wrt. inertial frames, which lead to the Lorentz (Poincare) invariance of the new space-time model rather than the Galilei invariance of the Newtonian space-time model, and this implied that the mechanical laws had to be adapted to the new space-time model. This part is then the weak point of the famous paper, because Einstein at this point didn't find the most simple interpretation and thus he introduced the notion of relativistic (velocity not only speed dependent!) masses, which obscured the mechanics tremendously. This was "repaired" pretty quickly by Planck, who gave an elegant derivation using the action principle (in its (1+3)-dimensional form) leading to the correct interpretation of relativistic momentum with the (Newtonian) mass being within SR what we now call the invariant mass.
Of course the full understanding of the mathematical structure then came in 1908 with Minkowski's famous talk about the four-dimensional spacetime description.
In analogy to Euclidean analytical geometry of course also in Minkowski space Galilean coordinates are just a preferred choice in the sense that when expressing the dynamical laws, compatible with the symmetry properties of Minkowski space, in these coordinates they take the most simple form. You can choose of course any other coordinates you like, and since in Minkowski space there's no more any remnant of an absolute time it's natural that you can use any diffeomorphisms between Galilean coordinates and arbitrary four parameters as "generalized spacetime coordinates", leading to a description of non-inertial reference frames. However these generalized spacetime coordinates do not necessarily have a direct physical meaning but they just parametrize spacetime-point location (usually also covering only a part of Minkowski space).
From this it is only a small step to make inertial reference frames and thus the Poincare group a local symmetry, which leads directly to the spacetime description of general relativity.