lists.zerezo.com
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
***BOGO*** Re: trouble with group by and similar tables
- Date: Mon, 30 Jun 2008 15:04:42 +0200
- From: Joerg Bruehe <joerg@xxxxxxxxx>
- Subject: ***BOGO*** Re: trouble with group by and similar tables
Eben, all,
I'm very sorry this got out of sight for so long :-(
When I first read your mail, I had no idea, and then it was hidden.
Just now, I check it again and detect what may explain this.
I quote nearly the full mail leading to this, so that you readers need
not scan your archives.
Eben wrote:
Hi Joerg,
Joerg Bruehe wrote:
Hi Eben, all !
Eben schrieb:
I have the following tables:
table1
-------
id1
some_field
table2
-------
id
id1
score
table3
-------
id
id1
score
I then have the following query:
SELECT table1.id,SUM(table2.score) as table2_score
FROM table1, table2
WHERE table1.some_field = 'value'
AND table2.id1 = table1.id
GROUP BY table1.id
This works fine and returns each record in table1, grouped by id,
with the sum of scores from table2. However, when I do this query:
SELECT table1.id,SUM(table2.score) as table2_score, SUM(table3.score)
as table3_score
FROM table1, table2, table3
WHERE table1.some_field = 'value'
AND table2.id1 = table1.id
AND table3.id1 = table1.id
GROUP BY table1.id
The sum'd score values go crazy, reflecting #s that aren't logical.
Is the issue that table2 and table3 are identical table structures,
or that I simply don't understand how the group by is really working
here...?
Any advice is appreciated,
Sounds weird, and I have no obvious explanation / don't see anything
you obviously did wrong.
[[...]]
Could you describe a bit more exact what you mean by "go crazy" ?
An expected resultset from the first query would look something like:
id1 table2_score
----------------------
1 20
However, if I then add in the join for table3, nevermind adding in the
SUM statement in the SELECT, it becomes:
id1 table2_score
----------------------
1 1500
[[...]]
I propose to check that a join between table1 and table3 works
correctly, like that between table1 and table2 does.
(I would like to know whether it is a problem with table3 by itself,
or with the three-table join.)
The individual joins work fine, so if I just join table2 or table3, I
get back the expected numbers, it's only when I have both tables joined
in the statement that it becomes a problem.
Ok, I think I now know what happens to you - let me show it.
I created tables like you wrote:
mysql> use test
Database changed
mysql> create table table1 ( id int, some_field char(4) );
Query OK, 0 rows affected (0.27 sec)
mysql> create table table2 ( id int, id1 int, score int );
Query OK, 0 rows affected (0.27 sec)
mysql> create table table3 ( id int, id1 int, score int );
Query OK, 0 rows affected (0.28 sec)
Let us assume some data:
mysql> select * from table1 ;
+------+------------+
| id | some_field |
+------+------------+
| 1 | abc |
| 2 | xyz |
+------+------------+
2 rows in set (0.00 sec)
mysql> select * from table2 ;
+------+------+-------+
| id | id1 | score |
+------+------+-------+
| 21 | 1 | 1 |
| 22 | 1 | 2 |
| 23 | 1 | 3 |
| 24 | 1 | 4 |
| 25 | 2 | 5 |
+------+------+-------+
5 rows in set (0.00 sec)
mysql> select * from table3 ;
+------+------+-------+
| id | id1 | score |
+------+------+-------+
| 31 | 1 | 1 |
| 32 | 1 | 2 |
| 33 | 1 | 3 |
| 34 | 1 | 4 |
| 35 | 2 | 5 |
+------+------+-------+
5 rows in set (0.00 sec)
Now, we do the sum on the individual tables:
mysql> SELECT table1.id,SUM(table2.score) as table2_score
-> FROM table1, table2
-> WHERE table1.some_field = 'abc'
-> AND table2.id1 = table1.id
-> GROUP BY table1.id ;
+------+--------------+
| id | table2_score |
+------+--------------+
| 1 | 10 |
+------+--------------+
1 row in set (0.00 sec)
mysql> SELECT table1.id,SUM(table3.score) as table3_score
-> FROM table1, table3
-> WHERE table1.some_field = 'abc'
-> AND table3.id1 = table1.id
-> GROUP BY table1.id ;
+------+--------------+
| id | table3_score |
+------+--------------+
| 1 | 10 |
+------+--------------+
1 row in set (0.01 sec)
Correct values: The rows with "score=5" are not included in the sum,
because their "id1=2" leads to "table1.some_field='xyz'".
Now, join all three tables:
mysql> SELECT table1.id,SUM(table2.score) as table2_score, SUM(table3.score) as table3_score
-> FROM table1, table2, table3
-> WHERE table1.some_field = 'abc'
-> AND table2.id1 = table1.id
-> AND table3.id1 = table1.id
-> GROUP BY table1.id ;
+------+--------------+--------------+
| id | table2_score | table3_score |
+------+--------------+--------------+
| 1 | 40 | 40 |
+------+--------------+--------------+
1 row in set (0.00 sec)
That is what you experience (well, your factor is higher than 4).
Why does it happen ?
Let's look at the base data, without grouping and summing:
mysql> SELECT table1.id, table2.score
-> FROM table1, table2
-> WHERE table1.some_field = 'abc'
-> AND table2.id1 = table1.id;
+------+-------+
| id | score |
+------+-------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
+------+-------+
4 rows in set (0.00 sec)
Now, for the three tables:
mysql> SELECT table1.id, table2.score, table3.score
-> FROM table1, table2, table3
-> WHERE table1.some_field = 'abc'
-> AND table2.id1 = table1.id
-> AND table3.id1 = table1.id;
+------+-------+-------+
| id | score | score |
+------+-------+-------+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 1 | 4 | 1 |
| 1 | 1 | 2 |
| 1 | 2 | 2 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
| 1 | 1 | 3 |
| 1 | 2 | 3 |
| 1 | 3 | 3 |
| 1 | 4 | 3 |
| 1 | 1 | 4 |
| 1 | 2 | 4 |
| 1 | 3 | 4 |
| 1 | 4 | 4 |
+------+-------+-------+
16 rows in set (0.00 sec)
The sum=40 is correct, because there are multiple rows in both table2
and table3 matching the same value of table1.id, and the join forms the
cartesian product.
Coming back to your original questions:
>>> Is the issue that table2 and table3 are identical table structures,
>>> or that I simply don't understand how the group by is really working
>>> here...?
1) It is independent of the table structures - whether table2 and table3
look similar (in staructure and/or in data) does not matter.
2) It is *not* an issue of "group by", it is an issue of joining.
It happens, because table2 and/or table3 (in your case: both) have
multiple matching lines.
Rather than writing text, let me show it:
I combine the four relevant rows of table2 into one, same "score" sum:
mysql> select * from table2;
+------+------+-------+
| id | id1 | score |
+------+------+-------+
| 21 | 1 | 10 |
| 25 | 2 | 5 |
+------+------+-------+
2 rows in set (0.00 sec)
mysql> SELECT table1.id,SUM(table2.score) as table2_score
-> FROM table1, table2
-> WHERE table1.some_field = 'abc'
-> AND table2.id1 = table1.id
-> GROUP BY table1.id;
+------+--------------+
| id | table2_score |
+------+--------------+
| 1 | 10 |
+------+--------------+
1 row in set (0.00 sec)
Now, again the three table join:
mysql> SELECT table1.id,SUM(table2.score) as table2_score, SUM(table3.score) as table3_score
> FROM table1, table2, table3
> WHERE table1.some_field = 'abc'
> AND table2.id1 = table1.id
> AND table3.id1 = table1.id
> GROUP BY table1.id;
+------+--------------+--------------+
| id | table2_score | table3_score |
+------+--------------+--------------+
| 1 | 40 | 10 |
+------+--------------+--------------+
1 row in set (0.00 sec)
Now only 1 matching row is found in table2, and the sum(score) for
table3 is correct - because its rows occur only once, not 4 times
(number of matching rows in table2 in the other example).
I hope that answers your question(s).
The summary is: "works as specified by SQL".
Jörg
--
Joerg Bruehe, MySQL Build Team, joerg@xxxxxxxxx
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB161028
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=royale@xxxxxxxxxx