Why your database says paging sucks!
SELECT SQL_CALC_FOUND_ROWS gb.*,
u.username,
u.uid,
u.geschlecht,
u.mitfoto,
[... some more fields...]
FROM member_gold_guestbook gb
LEFT JOIN users u ON u.uid=gb.uid_from
[... some more left joins...]
WHERE gb.uid_to='22152'
AND visible='1'
LIMIT 0,10;
That's not that bad at all, but when you go to page 300 your database server will hat you for this. The database server has not only to calculate the 10 items you want to show but also all 3000 previous items.
Sure you may argue nobody will ever go to page 300. Somebody will not, but "googlebot" and his evil brothers will. And the bad thing is that you can, as long as you need paging, nothing do against it. There are just a few tricks that may reduce your server load a bit.
Optimize it!
A fast query that uses good indexes is just faster, no matter where you use it. But on paging where you might calculate thousands of lines it really does matter a lot.Avoid ORDER BY and GROUP BY because you have to do this (depending on index use) on every line in your table, the limit can't help you here.
Cache it! But still the query has to be run once, especially for fulltext searches, even the first run of a query might put a lot of load on your servers.
YAGNI (You aint gone need it!)
Nobody will ever see page 300, so why you don't set an upper limit for you paging. Your user will not mind and I'm sure the bots will find another way to your stuff. If you really care about SEO, you probably can do a separate list for bots that perform with less operations on the database (remove unneeded joins, don't sort it).Don't join tables you don't need, sure that might sound obvious, but I'm seen this too many times to not mention this here.
Count is evil
Never, never use SQL_CALC_FOUND_ROWS it's just equally slow then "SELECT COUNT(*) FROM table", but you have to do it on every page. The count(*) variant you can cache at last, so you don't have to do it on every page.And there's even another way to avoid the count on paging. It's the way Facebook does paging on some places. Facebook don't give you the usually list of pages from 1 to n there, were you can click at any page. They just give you the page before, the current page and the next page, if there's one. On the application side it's very easy to find out if you have a page befor the current, when you on the second page there's one before (so no surprise here). But what's about the next page if you don't want to make a count. Easy stuff, let me predict you display 10 items per page, so query 11 items instead of 10 per page. This one extra item will cost you nearly nothing and now you can count the returned rows in your application. If you have more than ten rows you have a next page and you can happily throw number eleven away.
Peter Zaitsev in Zurich
Peter Zaitsev, of MySQL Performance Blog fame, is at the end of this week in our offices in Adliswil. He will teach us how we can tweak our MySQL cluster and answer all our questions about MySQL.
To let you profit from his flight to Zurich as well, we decided to let him give a short public speak in Zurich sponsored by tilllate. The namics guys are kindly enough to host us in there office in Zurich. This will make your journey a bit shorter, then the one to Adliswil. The talk is scheduled to Wednesday 21th November 2007 19:30 at Namics. It’s free, just register in the Webtuesday-Wiki. More Information at techblog.tilllate.com. See you there!
Denormalsieren in MySQL
In MySQL gibt es mit dem INSERT...SELECT-Befehl ein mächtiges Werkzeug um Daten aus verschiedenen Tabellen in eine Neue einzufügen. Vor dem Einfügen können die Daten natürlich durch alle in SELECT verfügbaren Möglichkeiten manipuliert werden.
Leider ist es momentan in MySQL nicht möglich auf eine Tabelle im SELECT-Teil zuzugreifen und in die gleiche Tabelle zu schreiben. Deshalb gelingt das unten stehendes Beispiel nicht. (Nicht überprüft.)
INSERT INTO pagelinks (pl_to)Dieses Problem kann allerdings relativ leicht umgangen werden. Man erstellte einfach eine neue Tabelle die ein Abbild jener ist die ergänzt werden soll. Der SQL-Code einer Datenbank kann mit "mysqldump -u benutzer -p datenbank -d tabelle" ausgelesen werden. Der Code muss danach angepasst werden (Tabellennamen ersetzen) und in die Datenbank eingespielt werden. Mit dem INSERT...SELECT-Befehl können danach die neue Tabelle wieder mit Daten aufgefüllt werden. Daraus ergibt sich dann in meinem Fall (mit der neuen Tabelle idlinks und da ich nicht ganz alle Daten brauche).
SELECT page.page_id
FROM page, pagelinks
WHERE page.page_namespace = pagelinks.pl_namespace
AND pagelinks.pl_title = page.page_title;
INSERT INTO idlinks (il_from,il_to)Nun haben wir immer noch ein Problem. Bei vielen Daten bricht die Operation irgendwann mit der Fehlermeldung "ERROR 1206 (HY000): The total number of locks exceeds the lock table size" ab. Dies liegt daran das der InnoDB-Buffer-Pool zu klein ist. Der Puffer kann in der Datei my.cnf wie folgt angepasst werde. (Neustarten der Datenbank nicht vergessen.)
SELECT pagelinks.il_from, page.page_id
FROM page, pagelinks
WHERE page.page_namespace = pagelinks.pl_namespace
AND pagelinks.pl_title = page.page_title;
[mysqld]
# Set buffer pool size to 50-80% of your computer's memory
innodb_buffer_pool_size=70M
innodb_additional_mem_pool_size=10M
Danach sollte diese Operation auch recht zügig ablaufen. Denormalisieren von Daten in MySQL ist nicht ganz eifach, mit ein wenig "pröblen" kommt man aber, wie dieser Artikel zeigt, auf gute Resulate.





