Protocol for PKs (Primary Keys) with Null Entries?

AI Thread Summary
The discussion centers around the challenge of establishing a primary key (PK) for a table containing employee dependents, particularly when dealing with null entries and potential duplicates. Suggestions include using a composite key of employee number, dependent name, and relation, but concerns arise about the uniqueness of names. An alternative proposed is to implement an auto-increment numeric PK for the dependents table to avoid complications with naming conventions. The conversation also touches on the need for a clear schema design, suggesting separate tables for employees and dependents, linked by foreign keys. Ultimately, the consensus is to avoid complex naming strategies and instead utilize straightforward database design principles.
WWGD
Science Advisor
Homework Helper
Messages
7,678
Reaction score
12,344
Hi all :I have the table Emp (EmpNum PK, EmpLastNM, EmpEducation, EmpDependent, EmpEducation,...EMpHireDate) which I want to normalize and then I want to do an ERD for it.

I must break this into tables one for which , it seems, I cannot have a non-null PK:

Now EmpDependent is both multi-valued and has some nulls. My question is, what can I do about setting up a PK for empdependent given the null?
I have an idea of using the triple (Emp_Num, Emp_DepName, Emp_DepRelation) as the PK; an employee may have more than one dependent by the same name, but (given dependents are nuclear family) , not likely two dependents with both same name and relation.

STILL, there is an employee without dependents, meaning this triple will contain some nulls. How does one usually deal with this?
 
Technology news on Phys.org
You wouldn't enter rows in this table for employees with no dependents. There is no point in doing so; then your PK is valid.
 
Employee John Smith has a son Peter Smith. He divorces, gets full custody of Peter, and marries again with Carole. Alas, Carole was married before to Brian Smith, and they also had a son Peter Smith, for whom Carole has full custody. Now the nuclear family of John and Carole has two Peter Smith's in it.

Quite unlikely, sure, but I wouldn't structure a database with a PK that could conceivably become non-unique.
Why not add a PK (autonumber) to the EmpDependent table?

Another potential problem with the structure are two employees married to each other. Are their children dependents of both? Maybe you need three tables: employees, dependents, employees_dependents_relation.
 
Last edited:
Samy_A said:
Employee John Smith has a son Peter Smith. He divorces, gets full custody of Peter, and marries again with Carole. Alas, Carole was married before to Brian Smith, and they also had a son Peter Smith, for whom Carole has full custody. Now the nuclear family of John and Carole has two Peter Smith's in it.

Quite unlikely, sure, but I wouldn't structure a database with a PK that could conceivably become non-unique.
Why not add a PK (autonumber) to the EmpDependent table?

Another potential problem with the structure are two employees married to each other. Are their children dependents of both? Maybe you need three tables: employees, dependents, employees_dependents_relation.

Thanks, that is precisely the key I had in mind: {emp_num, emp_dependentname, emp_relation}. But you're right to consider guarding against very common names: john smith, bob jones, etc. Still, the may be a way around it: we can use Pete Smith for one, Peter Smith for the other and so on. This I think can be made to work for most English names , who have informal as well as formal names :Charles, Charlie, Chuck, Pete, Peter, Bob, Robert. I think something similar may be the case for names in other languages. A problem from this may be the one may want to enter the person's actual legal name. And you don't want to introduce the SSN, since you want to keep it private. But I think the issue of wanting the person's legal name may not really matter that much, as this is kind of secondary and could be obtained in a different way.
 
PAllen said:
You wouldn't enter rows in this table for employees with no dependents. There is no point in doing so; then your PK is valid.
I wonder if using the employee's name itself as a dependent would also work for these cases, understood (maybe written some way in the data dictionary) to mean the employee has no dependents. But this may slow the search and take up a lot of memory . So maybe your advice is best and we could rename the table: EmployeesWithDependents to make your point clear.
 
WWGD said:
Thanks, that is precisely the key I had in mind: {emp_num, emp_dependentname, emp_relation}.
But you're right to consider guarding against very common names: john smith, bob jones, etc. Still, the may be a way around it: we can use Pete Smith for one, Peter Smith for the other and so on. This I think can be made to work for most English names , who have informal as well as formal names :Charles, Charlie, Chuck, Pete, Peter, Bob, Robert. I think something similar may be the case for names in other languages. A problem from this may be the one may want to enter the person's actual legal name. And you don't want to introduce the SSN, since you want to keep it private. But I think the issue of wanting the person's legal name may not really matter that much, as this is kind of secondary and could be obtained in a different way.
I think we are talking about different things here.
The combination of fields {emp_num, emp_dependentname, emp_relation} as primary key doesn't look like a good idea. Once you have to start with tricks (enter Charles as Charlie if need be, ...) it will get messy sooner rather than later.
If we ignore the problem of dependents of two employees I mentioned at the end of my previous post, why not simply add a numeric primary key to your EmpDependent table? That way you don't have to do any tricks with common names: every dependent will have his own key.

WWGD said:
I wonder if using the employee's name itself as a dependent would also work for these cases, understood (maybe written some way in the data dictionary) to mean the employee has no dependents. But this may slow the search and take up a lot of memory . So maybe your advice is best and we could rename the table: EmployeesWithDependents to make your point clear.
I don't understand this. I thought the second table was a table with dependents. The name EmployeesWithDependents suggests that this is a table of Employees. @PAllen advice is correct: employees with no dependent simply will have no related records in the EmpDependent table. No need for any special infrastucture. If you need a list of employees without dependents, an outer join between the two tables, with a NULL criterium on the EmpDependent PK will provide that.
 
Simply put, this is all that you need to do.
Samy_A said:
...simply add a numeric primary key to your EmpDependent table
PAllen said:
You wouldn't enter rows in this table for employees with no dependents.
 
OK, thanks to all; as it must be clear to you, I am pretty new at this, so I lack a context in which to make reasonable/sound decisions at this point.
 
Just picked up on this and in relation to your other recent posts I have a couple of questions:
how are you learning this stuff - are you following a course or book?
what is your aim and what tools are you using - SqlServer is great if you are developing an enterprise application and have access to all the tools in a full .NET development environment, but if you just want to learn about RDBMSs there may be better ways

As a guide, your learning should lead you to designing a schema something like this:
  • There is an employee entity so we need an employees table
    [*]Whilst it looks like we could use EmployeeNumber as the Primary Key for this table there are advantages to having a key that is independent of any property value so we use an auto-increment key which is by convention called id.
    [*]An Employee can have more than one Dependent, so we need a dependents table to represent this entity
    [*]Again the Dependents table has an auto-increment key called id.
    [*]The Dependents table has a column fk_employee which creates a many-to-one relationship to the Employees table.
 
  • #10
MrAnchovy said:
Just picked up on this and in relation to your other recent posts I have a couple of questions:
how are you learning this stuff - are you following a course or book?
what is your aim and what tools are you using - SqlServer is great if you are developing an enterprise application and have access to all the tools in a full .NET development environment, but if you just want to learn about RDBMSs there may be better ways

As a guide, your learning should lead you to designing a schema something like this:
  • There is an employee entity so we need an employees table
    [*]Whilst it looks like we could use EmployeeNumber as the Primary Key for this table there are advantages to having a key that is independent of any property value so we use an auto-increment key which is by convention called id.
    [*]An Employee can have more than one Dependent, so we need a dependents table to represent this entity
    [*]Again the Dependents table has an auto-increment key called id.
    [*]The Dependents table has a column fk_employee which creates a many-to-one relationship to the Employees table.


I am taking a class actually. We have done some setting up of ERDs given different types of data tables. Schools did not provide us with software, only class was based on MSSQL2014 , so I used the free version offered by minisoft (which I had trouble installing, BTW).

So I am trying to learn how to work with it, but I also find it interesting and I want to learn it for its own sake.
 
Last edited:
Back
Top