Sunday, March 25, 2012

Do I need recursion ?

Hi there, Any tips on my problem would be most welcome...

right, the scenario.

A web Blog

blogger1 posts a blog_entry, e.g I love the simpsons
blogger2 comments on that blog_entry, e.g No, I hate the simpsons
Blogger3 comments on that comment, ie, How can you hate the simpsons.

so you can comment on a comment on a comment etc.. lool.

right, i have got two tables... Blog_entry & comment. i need to be able to search for a blog_entry + all the comments on that blog_entry..

at the moment i can search for the comments on the blog_entry using the FK in the comment table.

blog_entry
INSERT INTO Blog_Entry VALUES(0001,'I love the Simpsons');

comments
INSERT INTO Comment VALUES(0001,' No I hate the simpsons, cID 1000);

but i need to be able to search for the comments on comments

INSERT INTO Comment VALUES(0001,' How can you hate the simpsons, cID1000);

hopefully you can see the problem here, with only one comments table how can i get the search for the comment on the comment.. theres nothing linking them...
I could make a sub comments table, and use a FK (as with the blog_entry & first comment)

but then I would have to make another sub sub table to be able to get those comments on the first sub table... this would go on and on for each comment on comment.

you can see the cID1000 (comment PK) I can't use this to get the comment because its duplicating the PK...

So, I need to be able to search for the comments on comments eg. I need to be able to search for blogger2, and any comments that were made on his comments.

Someone I know mention using recursion to get the comment on comment info, is this right ?

Hehe, I hope you understand what im asking here the is my first exploration of SQL, so any tips, hints, would be most welcome.

Thanks loads

PS: if there anything you dont understand about what I have written, or what im asking please say so

Spence.you could actually do it with just one table

when you add a comment, add also the id of the original entry

thus when you add a comment on a comment, the original entry id is available, so you can add it to the comment on the comment too

then to retrieve all comments, and comments on comments, for a specific entry, it's just a simple WHERE originalentry=5

so recursion is not required|||you could actually do it with just one table
when you add a comment, add also the id of the original entry
thus when you add a comment on a comment, the original entry id is available, so you can add it to the comment on the comment too
then to retrieve all comments, and comments on comments, for a specific entry, it's just a simple WHERE originalentry=5
so recursion is not required

Hi rudy, thank you very much for your input here.

using one table, I take it you mean just use the Blog_entry tbl, as the attributes in the blog_entry & comment tbls are the same.. the only difference is the actual text...

but, If i were to add to original entryId in the new comment tuple (But in the same table) it would break the PK constraint, there would be duplicate blog_entryID's.. So i need to keep both the comments and the Blog_entry tables seperate..

To able to make a comment on a comment, dont i need something to identify each individual comment ? if so, then im back to square one...

sheesh, this is confusing...|||yikes! (can't read it, not sure i need to, though)

yes you can do it in one table, but you don't have to
create table entries
( id integer not null primary key auto_increment
, name varchar(255) not null
, unique index entriespk (name)
, thread integer null
, foreign key threadstarter (thread) references entries (id)
, replyto integer null
, foreign key replytothread (replyto) references entries (id)
, entry text
);
entries that start a new thread have both thread and replyto NULL

when you reply to an entry, you link to it via replyto, and via thread to the id of the original entry which started the thread

(so replies to the original new entry have the same id in thread and replyto)|||yikes! (can't read it, not sure i need to, though)

yes you can do it in one table, but you don't have to
create table entries
( id integer not null primary key auto_increment
, name varchar(255) not null
, unique index entriespk (name)
, thread integer null
, foreign key threadstarter (thread) references entries (id)
, replyto integer null
, foreign key replytothread (replyto) references entries (id)
, entry text
);
entries that start a new thread have both thread and replyto NULL

when you reply to an entry, you link to it via replyto, and via thread to the id of the original entry which started the thread

(so replies to the original new entry have the same id in thread and replyto)

LoooL rudy,

Cheers for the help m8..

Sweet stuff!

Have a good day.|||The problem is an hierarchical one, each comment has exactly one parent (except the original thread), and 0 to n childs. The standard approach to store such facts is a recursive one, as shown by Rudy. Such a structure, however, isn't to query using standard SQL.

Usual solutions for this are
* recursive T-SQL stored procedure
* bridging

Acutally, the solution presented here, is a bridge solution, however, just a bridge to the original thread. To be able to sort on (sub)threads and to format the solution (e.g. by using indents) youwill need to have the complete path. This is, of cource, redundant, but not a problem because such a path will not change,so there is no danger for inconsistency.

If you can estimate the maximum deepth of a discussion, you can make a structure acoordingly. If you can't, you may also consider to store your information without the path, but dynamically create a temp table with your actual path length and put the data in it for presentation.

I hope I made myself sufficiently clear.|||the solution presented here did not (yet) include queries, so i don't understand why it's a "bridge" solution -- it is a classic hierarchical solution capable of supporting true recursion

further, one does not need to store the path (and i have not allowed for it), so the redundancy only comes into the design if you want it to come into the design, and i don't

further, i am not going to argue whether the thread column (which points from every comment in a thread, no matter at what level, to the thread starter) is redundant, because it isn't

but i agree whole-heartedly about knowing the maximum depth -- this allows you to write a query consisting of N self-joins, which means that no matter where you are in the hierarchy, you can get all nodes below the node you're on, with indentation, in one single query, without a temp table

recursion and/or temp tables are only required if you do not know the maximum depth in the entire hierarchy|||but i agree whole-heartedly about knowing the maximum depth -- this allows you to write a query consisting of N self-joins, which means that no matter where you are in the hierarchy, you can get all nodes below the node you're on, with indentation, in one single query, without a temp table

recursion and/or temp tables are only required if you do not know the maximum depth in the entire hierarchy
I agree on this as a third option.

further, i am not going to argue whether the thread column (which points from every comment in a thread, no matter at what level, to the thread starter) is redundant, because it isn't
Why not? You can get it by recursively selecting the parent of a comment, or not? However, redundancy isn't the problem here, I would even add the entire path by having 1st level comment, 2nd level comment, 3rd le... etc. The problem is the presentation of hierarchical data.|||just because you can get the thread starter recursively does not make it redundant

also, you can achieve indentation without storing the path, by using the N-level self-join|||just because you can get the thread starter recursively does not make it redundant
Then, with all respect, you have your own, wrong definiton of redundancy, see Definition Redundancy (http://www.hyperdictionary.com/search.aspx?define=redundancy)

But, again, I don't have a problem with redundancy here, since a discussion branche will not be moved, probably. My point is, however, to make either the whole path redundant, or nothing and solve it in this case with other means (N views if you know the maximum deepth, or with a recursive T-SQL procedure). Just to have a reference to the root thread will not help at all.

Please note, that my proposal to store the whole path, is something similiar to your N-View approach with the difference, that the complexitity of your execution plan multiplies with every additional level, while the complexity of my execution plan is almost liniar growing.|||heh, that's pretty amusing, citing a dictionary as an authoritative reference for a relational database concept

the reference to the root thread does so help, if you wanted to quickly list all the ids and titles within a thread without regard to indentation

the complexity of my self-join-N-times does not grow at all, unless you change N, which is assumed to be the maximum depth, and if you change the maximum depth, then it's not a maximum, is it

i do realize that storing the path has certain benefits, i just do not happen to value them over my method

for more on paths, read this thread (http://www.sitepoint.com/forums/showthread.php?t=84833)

in fact, one of the participants in that thread has a tutorial on his method here:

Multi-Threaded Message Board Solution (Hierarchical Data without the Adjacency Model) (http://morgankelsey.com/code/multi-threaded_board/)|||heh, that's pretty amusing, citing a dictionary as an authoritative reference for a relational database concept
I hope that - beside the entainment - you got my point: if the same fact (root of a comment) is stored twice (both as [Thread] and [ReplyTo]), it is redundant!
the reference to the root thread does so help, if you wanted to quickly list all the ids and titles within a thread without regard to indentation
... and structure. You can, of cource, sort by time of ID to list all comments in chronological order, but you can't help the user to detect the mutual relationships, except by printing IDs and let the user puzzle by himself.
the complexity of my self-join-N-times does not grow at all, unless you change N, which is assumed to be the maximum depth, and if you change the maximum depth, then it's not a maximum, is it
Please see my previous comment. My point was that the complexity of your preferred solution depends on N.
i do realize that storing the path has certain benefits, i just do not happen to value them over my method
Okay, we don't need to agree in our recommendations, as long as the poster gets a picture of the possiblities and can choose the one fitting to his needs.
for more on paths, read this thread (http://www.sitepoint.com/forums/showthread.php?t=84833)

in fact, one of the participants in that thread has a tutorial on his method here:

Multi-Threaded Message Board Solution (Hierarchical Data without the Adjacency Model) (http://morgankelsey.com/code/multi-threaded_board/)
I took a look, and storing the comment level is a manner to get at least the indentation. To present the thread with its comment-on-comment rel;ationships, however, this method still needs recursion, while my proposal of storing the whole path is non-recursive.

I guess we made our solutions sufficiently clear for the poster; I'm curious of Spencers reaction.|||if the same fact (root of a comment) is stored twice (both as [Thread] and [ReplyTo]), it is redundant!Thread and ReplyTo aren't the same fact!!

:) :) :)|||Either you or me are making a considerable thinking mistake. Please see the following example, just getting three fields: ID, Thread, ReplyTo

1,null,null
2,1,1
3,1,2

In the second row, Thread=ReplyTo
In the third row: Thread = ReplyTo of ReplyTo

In both cases, Thread can be derived from ReplyTo, so it is stored twice, which is called redundant storage.|||please apply 1NF, 2NF, 3NF rules to this example: (ID, Thread, ReplyTo)

i'd be interested to know when it violates an actual normalization rule, not a doktorblue "it looks redundant to me" normalization rule|||From the normalization point of view, your model isn't in the 3rd nf:

A table is in 3NF if: It is in 2NF and it contains no transitive dependencies.

The Thread depends transitively on the ReplyTo. You can remove the Thread column without removing the Thread information.|||you misunderstand transitivity, it does not work across rows, it is a rule for columns within a row|||Do either of you see the humor or irony in trading post after post on the subject of redundancy? ;)|||I don't anymore ... :cool:|||i see the irony, but not the humour

:)

okay, another recursion example

- each employee row contains the foreign key of his/her supervisor
- each employee must work in the same department as his/her supervisor

where would you store the information about which department an employee is in?|||There is just one employee table, containing employees and their supervisiors, right?! I would store the department information in a associative entity DepEmp(DepID, EmpID), containing just the employess which are the bosses of their department. However, I'm not sure whether this will work nice, so I'd probably add some redundancy here by

* repeating the department for every employee
* or adding a reference to every department boss to every employee

What had you in mind?|||thank you, doktorblue, you have just copied my design for the threads

* repeating the [department]thread for every [employee]post
* or adding a reference to every [department boss]thread starter to every [employee]post

i'm sure this will "work nice" as you say, even though you yourself don't like it because it's "redundant"|||Do you realize that you are admitting that your design contains redundancy?

Regarding my "work nice" comment: the final structure heavely depends on the usage. If you are frequently using the entire suipervisor hierarchy, you will get another design than if you are using only the direct supervisor and department information. If you don't use regularly department information, I would not even consider to duplicate the data.|||no, i do not realize it, because as i said earlier, there is no redundancy

but it is very nice to see you adopt the same design

:) :) :)|||So, according to your logica, the total order amount on order level isn't redundant with the sum of the amount of indivudual order lines? Its not on the same row, is it? I guess 1st class informatica students can tell you better.

Or, according to your same logica, the following table isn't inconsistent, because inconsistency can only happen if there is redundant data

ID, ReplyTo, Thread
1, null, null
2, 1, 1
3, 1, 2

Comment 3, however, belongs to both thread 1 and 2, which can't be.

I hope for you, or better for your clients, that they don't have any complex problems with redundancy.|||inconsistency can only happen if there is redundant dataonly? inconsistency can happen without redundancy, too

if the database cannot enforce the consistency rules, then the application developer had better assume this responsibility

are you suggesting i am not a good enough developer to cover for the weaknesses of current database products?

;)

you know what, doktorblue? i like you

i like you a lot

i like the way you think

you are going on my buddy list

:cool:|||you know what, doktorblue? i like you

i like you a lot

i like the way you think

you are going on my buddy list
:cool:
Don't get personal, will you?!

:D|||What!? That's it?

C'mon, please start fighting again. It's so refreshing for us Yanks to watch an international conflict in which we AREN'T involved...|||Ha Rudy, there is somebody asking for troubles! Let's get him!

:D:D|||the blindman is actually one o' the good guys

he has helped me a hunnert times

:cool:|||Ha! Bullcr@.p! You haven't needed help. But nice of you to say, though. I like posting on this board and I've learned a lot from reading everybody's posts, especially yours.|||Oh heck, I can usually get Rudy pretty riled up. Maybe we can get this show back on the road if I can find the appropriate taunt...

Heavens, it was just getting fun, watching the "mud cannons" getting loaded with huge quantities of various forms of "organic fertilizer" on both sides of the Atlantic!

-PatP|||Oh, its easy to get Rudy riled. You know those Canadians have no sense of humor. That's why all their decent comics come down here to the U S of A.

Now, as far as jibes at people from the Netherlands...I'm drawing a complete blank. What do they do over there, anyway?|||Now, as far as jibes at people from the Netherlands...I'm drawing a complete blank. What do they do over there, anyway?Diamonds, drugs, sex, and flowers for the most part. Delft does gorgeous porcelin.

I don't want to get them riled up personally though, I want to keep them on the database track... That's lots more fun to watch. I was thinking about throwing in a few comments about the advantages of using Excel as a database manager to avoid all those problems that people have with constraints and datatypes.

-PatP|||Why, Excel is a great database application. You don't have to worry about a lot of indexes and constraints and crap, which makes it much more flexible for enterprise application development than a rigid system like SQL Server.

Plus, it's got pivot tables that will do dynamic crosstabs. SQL Server can't even touch that!

;)|||Yeah, with those kind of obvious benefits, how can you go wrong?

-PatP|||Diamonds, drugs, sex, and flowers for the most part. Delft does gorgeous porcelin.-PatP
Fox News told you Americans also recently about massive killings of babies in clinics, and you forgot the clogs and cheese.|||Hey, I may be American, but I DON'T watch Fox News!

I rode my bike through the Netherlands about 20 years ago. I remember you also have a lot of wind. A LOT of wind, and it was all headwind.|||Fox News told you Americans also recently about massive killings of babies in clinics, and you forgot the clogs and cheese.Yeah, but who doesn't like killing babies?

Don't the clogs and cheese just kind of grow there, like mushrooms and bicycles?

Actually, I have some really interesting memories of a bar called "The Three Flies" in Amsterdam. Someday I need to go back to see if it is still there!

-PatP

No comments:

Post a Comment