lists.zerezo.com



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

***BOGO*** Re: trouble with group by and similar tables



Eben, all,


I'm very sorry this got out of sight for so long  :-(
When I first read your mail, I had no idea, and then it was hidden.
Just now, I check it again and detect what may explain this.

I quote nearly the full mail leading to this, so that you readers need not scan your archives.

Eben wrote:
Hi Joerg,

Joerg Bruehe wrote:
Hi Eben, all !


Eben schrieb:
I have the following tables:

table1
-------
id1
some_field

table2
-------
id
id1
score

table3
-------
id
id1
score

I then have the following query:
SELECT table1.id,SUM(table2.score) as table2_score
FROM table1, table2
WHERE table1.some_field = 'value'
AND table2.id1 = table1.id
GROUP BY table1.id

This works fine and returns each record in table1, grouped by id, with the sum of scores from table2. However, when I do this query:

SELECT table1.id,SUM(table2.score) as table2_score, SUM(table3.score) as table3_score
FROM table1, table2, table3
WHERE table1.some_field = 'value'
AND table2.id1 = table1.id
AND table3.id1 = table1.id
GROUP BY table1.id

The sum'd score values go crazy, reflecting #s that aren't logical. Is the issue that table2 and table3 are identical table structures, or that I simply don't understand how the group by is really working here...?

Any advice is appreciated,

Sounds weird, and I have no obvious explanation / don't see anything you obviously did wrong.

[[...]]


Could you describe a bit more exact what you mean by "go crazy" ?

An expected resultset from the first query would look something like:

id1   table2_score
----------------------
1      20

However, if I then add in the join for table3, nevermind adding in the SUM statement in the SELECT, it becomes:

id1   table2_score
----------------------
1      1500


[[...]]


I propose to check that a join between table1 and table3 works correctly, like that between table1 and table2 does. (I would like to know whether it is a problem with table3 by itself, or with the three-table join.)

The individual joins work fine, so if I just join table2 or table3, I get back the expected numbers, it's only when I have both tables joined in the statement that it becomes a problem.

Ok, I think I now know what happens to you - let me show it.

I created tables like you wrote:

mysql> use test
Database changed
mysql> create table table1 ( id int, some_field char(4) );
Query OK, 0 rows affected (0.27 sec)

mysql> create table table2 ( id int, id1 int, score int );
Query OK, 0 rows affected (0.27 sec)

mysql> create table table3 ( id int, id1 int, score int );
Query OK, 0 rows affected (0.28 sec)


Let us assume some data:

mysql> select * from table1 ;
+------+------------+
| id   | some_field |
+------+------------+
|    1 | abc        |
|    2 | xyz        |
+------+------------+
2 rows in set (0.00 sec)

mysql> select * from table2 ;
+------+------+-------+
| id   | id1  | score |
+------+------+-------+
|   21 |    1 |     1 |
|   22 |    1 |     2 |
|   23 |    1 |     3 |
|   24 |    1 |     4 |
|   25 |    2 |     5 |
+------+------+-------+
5 rows in set (0.00 sec)

mysql> select * from table3 ;
+------+------+-------+
| id   | id1  | score |
+------+------+-------+
|   31 |    1 |     1 |
|   32 |    1 |     2 |
|   33 |    1 |     3 |
|   34 |    1 |     4 |
|   35 |    2 |     5 |
+------+------+-------+
5 rows in set (0.00 sec)


Now, we do the sum on the individual tables:

mysql> SELECT table1.id,SUM(table2.score) as table2_score
    -> FROM table1, table2
    -> WHERE table1.some_field = 'abc'
    -> AND table2.id1 = table1.id
    -> GROUP BY table1.id ;
+------+--------------+
| id   | table2_score |
+------+--------------+
|    1 |           10 |
+------+--------------+
1 row in set (0.00 sec)

mysql> SELECT table1.id,SUM(table3.score) as table3_score
    -> FROM table1, table3
    -> WHERE table1.some_field = 'abc'
    -> AND table3.id1 = table1.id
    -> GROUP BY table1.id ;
+------+--------------+
| id   | table3_score |
+------+--------------+
|    1 |           10 |
+------+--------------+
1 row in set (0.01 sec)

Correct values: The rows with "score=5" are not included in the sum, because their "id1=2" leads to "table1.some_field='xyz'".


Now, join all three tables:

mysql> SELECT table1.id,SUM(table2.score) as table2_score, SUM(table3.score) as table3_score
    -> FROM table1, table2, table3
    -> WHERE table1.some_field = 'abc'
    -> AND table2.id1 = table1.id
    -> AND table3.id1 = table1.id
    -> GROUP BY table1.id ;
+------+--------------+--------------+
| id   | table2_score | table3_score |
+------+--------------+--------------+
|    1 |           40 |           40 |
+------+--------------+--------------+
1 row in set (0.00 sec)

That is what you experience (well, your factor is higher than 4).


Why does it happen ?
Let's look at the base data, without grouping and summing:

mysql> SELECT table1.id, table2.score
    -> FROM table1, table2
    -> WHERE table1.some_field = 'abc'
    -> AND table2.id1 = table1.id;
+------+-------+
| id   | score |
+------+-------+
|    1 |     1 |
|    1 |     2 |
|    1 |     3 |
|    1 |     4 |
+------+-------+
4 rows in set (0.00 sec)

Now, for the three tables:

mysql> SELECT table1.id, table2.score, table3.score
    -> FROM table1, table2, table3
    -> WHERE table1.some_field = 'abc'
    -> AND table2.id1 = table1.id
    -> AND table3.id1 = table1.id;
+------+-------+-------+
| id   | score | score |
+------+-------+-------+
|    1 |     1 |     1 |
|    1 |     2 |     1 |
|    1 |     3 |     1 |
|    1 |     4 |     1 |
|    1 |     1 |     2 |
|    1 |     2 |     2 |
|    1 |     3 |     2 |
|    1 |     4 |     2 |
|    1 |     1 |     3 |
|    1 |     2 |     3 |
|    1 |     3 |     3 |
|    1 |     4 |     3 |
|    1 |     1 |     4 |
|    1 |     2 |     4 |
|    1 |     3 |     4 |
|    1 |     4 |     4 |
+------+-------+-------+
16 rows in set (0.00 sec)

The sum=40 is correct, because there are multiple rows in both table2 and table3 matching the same value of table1.id, and the join forms the cartesian product.


Coming back to your original questions:
>>> Is the issue that table2 and table3 are identical table structures,
>>> or that I simply don't understand how the group by is really working
>>> here...?

1) It is independent of the table structures - whether table2 and table3
   look similar (in staructure and/or in data) does not matter.

2) It is *not* an issue of "group by", it is an issue of joining.
   It happens, because table2 and/or table3 (in your case: both) have
   multiple matching lines.

Rather than writing text, let me show it:

I combine the four relevant rows of table2 into one, same "score" sum:

mysql> select * from table2;
+------+------+-------+
| id   | id1  | score |
+------+------+-------+
|   21 |    1 |    10 |
|   25 |    2 |     5 |
+------+------+-------+
2 rows in set (0.00 sec)

mysql> SELECT table1.id,SUM(table2.score) as table2_score
    -> FROM table1, table2
    -> WHERE table1.some_field = 'abc'
    -> AND table2.id1 = table1.id
    -> GROUP BY table1.id;
+------+--------------+
| id   | table2_score |
+------+--------------+
|    1 |           10 |
+------+--------------+
1 row in set (0.00 sec)

Now, again the three table join:

mysql> SELECT table1.id,SUM(table2.score) as table2_score, SUM(table3.score) as table3_score
> FROM table1, table2, table3
> WHERE table1.some_field = 'abc'
> AND table2.id1 = table1.id
> AND table3.id1 = table1.id
> GROUP BY table1.id;
+------+--------------+--------------+
| id   | table2_score | table3_score |
+------+--------------+--------------+
|    1 |           40 |           10 |
+------+--------------+--------------+
1 row in set (0.00 sec)

Now only 1 matching row is found in table2, and the sum(score) for table3 is correct - because its rows occur only once, not 4 times (number of matching rows in table2 in the other example).


I hope that answers your question(s).


The summary is: "works as specified by SQL".


Jörg

--
Joerg Bruehe,  MySQL Build Team,  joerg@xxxxxxxxx
Sun Microsystems GmbH,   Sonnenallee 1,   D-85551 Kirchheim-Heimstetten
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering     Muenchen: HRB161028


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=royale@xxxxxxxxxx