Legacy Database Reverse Engineering and Methodology

AI Thread Summary
Reverse-engineering a legacy database can be effectively achieved using tools like ERwin or Sparx Enterprise Architect to create conceptual and logical models from the existing physical database. This process is crucial for documentation and reconstruction. The Waterfall methodology is suggested as more suitable than Agile/Scrum for this task due to the complexity involved in reconstructing logical schemas from physical databases. Analyzing queries that extract data can also provide insights into the database structure. Additionally, external source code related to the database can aid in understanding non-core schema objects. Companies like Auctor, which specialize in modernizing legacy systems, may utilize tools such as Evolveware for this purpose. The choice of methodology may depend on factors like project size, platform changes, and customization needs.
WWGD
Science Advisor
Homework Helper
Messages
7,713
Reaction score
12,855
Hi All,
Say we have a legacy database that is not well-documented and we want to reverse-engineer it.
I am looking for an effective way of doing it. Only way I can think is using something like ERwin to
produce a Conceptual, Logical model for the database from which we can reconstruct the (Physical)
database. Right? Since this process would be well-documented and reasonably straightforward, would it be reasonable to use a Waterfall to see it through? I don't think Agile/Scrum would work well. Right?
 
Computer science news on Phys.org
It’s not easy to reconstruct a logical database schema from the physical database. You might have more success studying the queries used to extract data.

I once had a patent idea that extracted the star schemas from a database using table meta info, naming conventions of columns and tables and statistics of columns to decide a fact table (many rows many keys) from dimension tables (few rows few keys) heuristic. It was for doing data mining and cube analysis.

But physical schemas can have a lot of added info for distributed databases which just isn't a part of the logical one. Erwin may be able import the schema and then you’d have to reorganize it into something conceptually better.
 
  • Like
Likes QuantumQuest, jim mcnamara and WWGD
@jedishrfu has some great points. IMO DB objects not in core schemas are more than painful, they can derail a lot of efforts at a total schema. I would rely on whatever external source code exists - especially code that inserts into or reports from those oddball tables.
 
Sparx Enterprise Architect is also a good tool for this purpose, its mostly comparable to ERwin. Different people have different preferences for tools that do things like this, my preference is for Sparx so that's mostly what I use.

Waterfall vs Agile/Scrum I would think would depend size and scope, platform changes, special customization, and expected changes to things like user interfaces or application interfaces.

I've also worked with a company called Auctor that specializes in reverse engineering, assessing and modernizing complex "legacy systems". I believe they use a tool called "evolveware".

Kyle
 
Last edited:
  • Like
Likes WWGD
This week, I saw a documentary done by the French called Les sacrifiés de l'IA, which was presented by a Canadian show Enquête. If you understand French I recommend it. Very eye-opening. I found a similar documentary in English called The Human Cost of AI: Data workers in the Global South. There is also an interview with Milagros Miceli (appearing in both documentaries) on Youtube: I also found a powerpoint presentation by the economist Uma Rani (appearing in the French documentary), AI...
Back
Top