2008-10-09 00:00 -!- flips(~phillips@phunq.net) has joined #tux3 2008-10-09 00:40 reading dleaf.c 2008-10-09 00:40 will try to run some tests today.. 2008-10-09 00:40 hoping to find atleast one bug :D 2008-10-09 00:41 to make myself useful here 2008-10-09 00:48 pranith, I spotted a couple of bugs 2008-10-09 00:48 will put in fixes pretty soon 2008-10-09 00:49 stupid things 2008-10-09 00:49 changed the interface to dleaf_dump 2008-10-09 00:49 will change it back I think 2008-10-09 00:49 hmm 2008-10-09 00:49 ohk 2008-10-09 00:49 just running the tests is already useful 2008-10-09 00:49 hmm 2008-10-09 00:49 stuff like tux3 mkfs 2008-10-09 00:50 ? 2008-10-09 00:50 make tux3? 2008-10-09 00:50 and echo foo | tux3 write testdev testfile 2008-10-09 00:50 you do make tux3 && ./tux3 mkfs testdev 2008-10-09 00:50 for example 2008-10-09 00:50 we need some docs around now 2008-10-09 00:50 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 00:51 sry, dc 2008-10-09 00:52 flips, u back? 2008-10-09 02:44 -!- pgquiles(~pgquiles@166.Red-88-16-39.dynamicIP.rima-tde.net) has joined #tux3 2008-10-09 03:11 -!- kbingham(~kbingham@cvs.mpc-ogw.co.uk) has joined #tux3 2008-10-09 03:22 back now 2008-10-09 03:22 next move is to get some sleep 2008-10-09 04:24 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 04:24 hey all 2008-10-09 04:55 -!- Bobby_(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 04:55 -!- Bobby_(~Bobby@122.162.67.161) has left #tux3 2008-10-09 05:03 -!- data(~data@echo489.server4you.de) has joined #tux3 2008-10-09 07:34 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 07:49 -!- pgquiles(~pgquiles@16.Red-83-41-239.dynamicIP.rima-tde.net) has joined #tux3 2008-10-09 08:32 -!- pgquiles(~pgquiles@16.Red-83-41-239.dynamicIP.rima-tde.net) has joined #tux3 2008-10-09 08:34 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-09 09:04 -!- pranith(~Bobby@122.162.67.161) has joined #tux3 2008-10-09 09:15 hello 2008-10-09 09:23 -!- RazvanM(~RazvanM@dazzler.isi.jhu.edu) has joined #tux3 2008-10-09 09:23 ACTION is back :P 2008-10-09 09:28 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-09 09:40 hey pranith 2008-10-09 10:04 morning tim_dimm 2008-10-09 10:04 morning flips 2008-10-09 10:04 got your sleep? 2008-10-09 10:04 some of it 2008-10-09 10:04 you? 2008-10-09 10:04 feeling rejuvenated? 2008-10-09 10:04 hah 2008-10-09 10:04 you are hilarious dude 2008-10-09 10:05 ;-) 2008-10-09 10:27 -!- less(~less@145-116-238-192.uilenstede.casema.nl) has joined #tux3 2008-10-09 11:15 there we go, tux3 command should function again 2008-10-09 11:16 dleaf_dump interface was changed and commented out of the ops, caused seg fault 2008-10-09 11:16 now put back as it was 2008-10-09 11:16 got to fix the valgrind issues, none of which seem serious 2008-10-09 11:17 then there is a real bug in the new extents stuff that make tux3 read seg fault 2008-10-09 11:17 tux3 write seems to work ok 2008-10-09 11:17 then time to write a mail on atomic commit 2008-10-09 11:18 and there is tux3 u tonight 2008-10-09 11:18 -!- ChanServ changed mode/#tux3 -> +o flips 2008-10-09 11:19 -!- flips changed topic to "http://tux3.org ~ Tux3 U, right here Tuesdays and Thursdays at 8 pm Pacific Time ~ Next session: friends of grab_cache_page " 2008-10-09 11:19 -!- flips changed mode/#tux3 -> -o flips 2008-10-09 12:43 shapor, found the valgrind issue, it was real 2008-10-09 12:53 -!- ajonat(~ajonat@190.48.94.249) has joined #tux3 2008-10-09 12:58 make tests compile without valgrind errors now 2008-10-09 12:59 bunch of little things, real bugs 2008-10-09 12:59 -!- alaine(~alaine@kevbroadley.demon.co.uk) has joined #tux3 2008-10-09 12:59 someone that got owned on msn .. haha makes me ROFL http://www.tibix.eu/include/index.php 2008-10-09 13:01 I wonder if it is actually good to have make mkfs do its thing in /tmp 2008-10-09 13:02 makes for more typing running tests 2008-10-09 13:02 but does keep the local source free of big loopback volumes 2008-10-09 13:03 good call on the autokill 2008-10-09 13:43 i like bitbucket's diffs 2008-10-09 13:43 http://www.bitbucket.org/shapor/tux3/changeset/1b6cf87c7234/ 2008-10-09 13:43 make tests runs successfully now 2008-10-09 13:44 -!- FelipeS(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-09 14:03 tux3 read has a segfault in the new extents code 2008-10-09 14:03 true, bitbucket diffs are nice 2008-10-09 14:05 for some reason the bitbucket mirror is lagged 77 minutes 2008-10-09 14:05 oh 2008-10-09 14:05 sorry 2008-10-09 14:05 because I used your url ;) 2008-10-09 14:24 this has xattr bugs: make inode && ./inode foodev 2008-10-09 14:24 there should be no xattrs but the inode table listing thinks there are 2008-10-09 14:24 something stupid 2008-10-09 15:03 shapor: trac does a similar output and I think they are using some kind of package for it 2008-10-09 15:03 butit does look nice 2008-10-09 17:14 Tonight on tux3 u, provided I can stay awake that long, we will be looking at the relationship between buffers and pages in the page cache 2008-10-09 17:21 2.6.27 is out with lockless page cache 2008-10-09 17:21 changes at the heart of tux3-u :) 2008-10-09 17:27 hey flips 2008-10-09 17:28 shapor: it's been in -rt forever 2008-10-09 17:28 and is a major pain in the ass regarding scalability 2008-10-09 17:33 bh: oh? 2008-10-09 17:34 howso 2008-10-09 19:13 -!- FelipeS_(~Felipe@r77h15.res.gatech.edu) has joined #tux3 2008-10-09 19:55 -!- RazvanM(~RazvanM@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-09 20:00 who's here tonight? 2008-10-09 20:00 ACTION is 2008-10-09 20:00 Ral will also be. 2008-10-09 20:00 saw shapor not too long ago 2008-10-09 20:01 let's start in two places this time 2008-10-09 20:01 http://lxr.linux.no/linux+v2.6.26.6/include/linux/mm_types.h#L36 <- struct page 2008-10-09 20:02 http://lxr.linux.no/linux+v2.6.26.6/include/linux/buffer_head.h#L60 <- struct buffer_head 2008-10-09 20:03 struct page is the thing we use as a handle for a physical page 2008-10-09 20:03 it has an object count so we known when to release the page 2008-10-09 20:03 _mapcount is something new I haven't really looked at 2008-10-09 20:04 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has joined #tux3 2008-10-09 20:04 it has a private field for whatever the owner, whoever alloced the page usually, wants to put there 2008-10-09 20:04 quick q: what is 'ptes' and 'mms'? 2008-10-09 20:04 in practice, that is usually a list of buffers attached to the page 2008-10-09 20:05 pte is a page table entry 2008-10-09 20:05 mm is a memory management context 2008-10-09 20:05 that is, an address space 2008-10-09 20:05 it's a struct mm 2008-10-09 20:05 each process has one, threads share one 2008-10-09 20:06 the struct page also used to have a lock 2008-10-09 20:06 seems to have gone missing now 2008-10-09 20:06 so we don't have one lock per page 2008-10-09 20:06 probably replaced by a hashed lock 2008-10-09 20:07 need to chase that down 2008-10-09 20:07 it has a very important field: index 2008-10-09 20:07 this is the position of the page within a page cache radix tree, if it is in one 2008-10-09 20:08 and that is the tie to vfs 2008-10-09 20:08 the page also has a pointer to the mapping it is in 2008-10-09 20:08 so mapping + index => retrieve the page 2008-10-09 20:09 and we can remove the page from a mapping by because of those fields recorded in it 2008-10-09 20:10 there is also the lru link, which is gives the vmm an idea of which page should be recovered when cache memory gets full 2008-10-09 20:10 over to buffer_head 2008-10-09 20:11 also has a flags and a count, though the buffer flags is named b_state for no particular reason 2008-10-09 20:11 has a pointer to the page the buffer head is attached to, on which the data belonging to the buffer is stored 2008-10-09 20:12 we figure out where on the page the buffer data is stored by looking at the low bits of the index, I think... 2008-10-09 20:12 we will come back to that and check it 2008-10-09 20:13 the buffer also points at a block device b_bdev, but this field is redundant now 2008-10-09 20:13 because we have buffer->page->mapping->... bdev 2008-10-09 20:14 there is an end_io function like the endio for a bio 2008-10-09 20:14 serves the same purpose, and is now also largely redundant 2008-10-09 20:15 assoc_buffers is a crude scheme for flushing file metadata along with data for primitive filessystems like ext2 that let the vfs do all their work for them 2008-10-09 20:16 we also don't see a lock in the buffer_head itself 2008-10-09 20:16 though for both pages and buffers, locking is a huge element of how they are used 2008-10-09 20:18 in the case of buffers, we spin on one of the state bits 2008-10-09 20:18 we will go find that code later also 2008-10-09 20:20 it's __lock_buffer 2008-10-09 20:20 defined somewhere lxr can't find 2008-10-09 20:21 (I'd thought we were meeting at 9pm today...) 2008-10-09 20:22 it should be in buffer.c 2008-10-09 20:22 because we did last time? 2008-10-09 20:22 should be 2008-10-09 20:22 http://lxr.linux.no/linux+v2.6.26.5/fs/buffer.c#L70 2008-10-09 20:23 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L70 2008-10-09 20:23 thanks 2008-10-09 20:23 ok, see it's a lock that spins on a bit 2008-10-09 20:23 another quick q: how is the page structures managed? Is there a huge array somewhere? Or some lists? 2008-10-09 20:23 not very efficient 2008-10-09 20:23 a huge array 2008-10-09 20:23 very simple/crude 2008-10-09 20:24 and sometimes a big problem because of the size of that array 2008-10-09 20:24 let's have a look at lock_page for comparison 2008-10-09 20:25 searching... 2008-10-09 20:25 yeah coming up empty 2008-10-09 20:25 been worked on lately 2008-10-09 20:25 http://lxr.linux.no/linux+v2.6.26.5/include/linux/pagemap.h#L167 2008-10-09 20:25 :) 2008-10-09 20:25 2.6.27 has no index 2008-10-09 20:26 right 2008-10-09 20:26 I should have mentioned 2008-10-09 20:26 2.6.26.6 2008-10-09 20:26 oh... I'm still on .5 :P 2008-10-09 20:26 http://lxr.linux.no/linux+v2.6.26.6/include/linux/pagemap.h#L167 :D 2008-10-09 20:26 http://lxr.linux.no/linux+v2.6.26.6/mm/filemap.c#L599 <- we see that the page lock is another bit spin lock 2008-10-09 20:27 does might_sleep do something or is that a statement for code testing? 2008-10-09 20:27 the closer you look at buffer_heads and struct pages, the more they are quite similar to each other 2008-10-09 20:27 might_sleep will generate a kprint warning if it is called under a spinlock 2008-10-09 20:28 if you have that debug option turned on 2008-10-09 20:28 ok, now we have some slight familiarity with those two, let's go look at a place where they are used together 2008-10-09 20:28 like block_read_full_page 2008-10-09 20:30 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L2093 <- block_read_full_page 2008-10-09 20:31 the page may or may not have a list of buffers attached to it 2008-10-09 20:31 if the blocksize is same as page size, the list will have one bufer 2008-10-09 20:31 otherwise some binary number of buffers 2008-10-09 20:32 the first thing _full_page does is put a list of buffer (heads) on the page if it has none 2008-10-09 20:33 then it loops over the buffer list (again usually one buffer) to find any buffers not uptodate 2008-10-09 20:33 if it just put the buffers on the page, it should already know of course 2008-10-09 20:34 any buffer that is not up to date, it makes a call into the filesystem, get_block 2008-10-09 20:34 which is a callback passed to it by the filesystem 2008-10-09 20:34 because block_read_full_page is always called from filesystem code 2008-10-09 20:35 it is just a library helper to make it easy to do IO on a page 2008-10-09 20:35 easy, but pretty sloppy and executing way too much code 2008-10-09 20:36 which can be masked by a slow disk, but not entirely 2008-10-09 20:37 see further down, there is some coupling of the buffer flags and the page flags in that if all buffers are up to date, the page is set up to date as well 2008-10-09 20:37 does this mean we still have a page cache, even if the file system block size is not page sized? and this is what converts from sub-page sized buffers to the page used by the page cache? 2008-10-09 20:37 this is where we handle the impedence mismatch between page size and block size, if that answers your question 2008-10-09 20:38 it page cache is indexed by pages, but filesystems like ext3 treat it as if it was indexed by buffers 2008-10-09 20:38 (perhaps) 2008-10-09 20:38 see ext3_bread 2008-10-09 20:39 this mismatch is a huge source of complexity in vfs and mm, and a nasty source of bugs 2008-10-09 20:39 by the time we've done the tux3 kernel port, everybody will know exactly what I'm talking about 2008-10-09 20:40 ok, buffer locking is a little counterintuitive 2008-10-09 20:40 we will keep a buffer locked while reading, but not while writing in general 2008-10-09 20:40 same with pages 2008-10-09 20:41 while reading from disk into the buffer, or from the buffer? 2008-10-09 20:41 disk into buffer 2008-10-09 20:41 always what is meant by "read" in here 2008-10-09 20:42 finally, the buffers that need reading are submitted via submit_bh 2008-10-09 20:42 doesn't it make sense to lock reads then? since that's when the memory content actually changes? 2008-10-09 20:42 which is just a simple wrapper on submit_bio 2008-10-09 20:42 which is an old friend of ours 2008-10-09 20:42 yes, we lock reads, not writes 2008-10-09 20:43 this is a horribly inefficient code path we're looking at 2008-10-09 20:44 and doesn't actually get used much any more, though there are cases that still trigger it 2008-10-09 20:44 I don't know what they are exactly, but again we will have a good idea after doing the port 2008-10-09 20:44 let's look at block_write_full_page while we are in here 2008-10-09 20:45 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L2093 2008-10-09 20:45 sorry 2008-10-09 20:45 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1645 2008-10-09 20:45 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1645 2008-10-09 20:45 right 2008-10-09 20:46 starts the same way 2008-10-09 20:46 as often happens in disk io 2008-10-09 20:46 but unfortunately, kernel takes little advantage of such symmetry 2008-10-09 20:47 we take care of zeroing a partial page exctending beyond end of file here 2008-10-09 20:47 "unmap_underlying_metadata" is a scary function to see here 2008-10-09 20:47 we'll leave that for another day 2008-10-09 20:48 see, we keep a state bit in the buffer to tell us whether we need to call the fs get_block method or not 2008-10-09 20:48 what is a non-blockdev mapping? 2008-10-09 20:48 page cache I guess 2008-10-09 20:48 fuinny terminology 2008-10-09 20:50 a slight fib, it seems we keep the buffer locked all the way through the write here 2008-10-09 20:50 the page however gets unlocked 2008-10-09 20:50 it is probably unnecessary to keep the buffer locked 2008-10-09 20:51 "redirty_page_for_writeback" is another scary visitor to see here 2008-10-09 20:51 hacking around various subtle loopholes in the vm design 2008-10-09 20:51 now let's see, where is the get_block call 2008-10-09 20:52 http://lxr.linux.no/linux+v2.6.26.6/fs/buffer.c#L1696 2008-10-09 20:53 by the way, each buffer has a size field, which is now redundant 2008-10-09 20:53 still hanging around 2008-10-09 20:54 we now find out the size from the block device pointed to by the buffer->page->mapping->inode->sb 2008-10-09 20:54 something like that 2008-10-09 20:55 (large number of indirects...) 2008-10-09 20:55 ah, what we are doing in unmapp_underlying_metadata is taking care of the lack of coherence between the inode page cache and the block device buffer cache 2008-10-09 20:56 there might have been a page sitting around in the buffer cache mapped to the same physical block 2008-10-09 20:56 see the bh->b_blocknr, that is what we call buffer->index in the tux3 userspace code 2008-10-09 20:57 but the usage is different here 2008-10-09 20:57 in kernel, this caches the _physical_ block the buffer is mapped to, even if the buffer is on a page in an inode page cache 2008-10-09 20:58 in the tux3 userspace code, the physical mapping is never cached 2008-10-09 20:58 and we use that field like kernel uses the page->index field, to know what the logical offset of the data is 2008-10-09 20:59 it turns out that caching the physical block pointer is pretty useless, almost always 2008-10-09 21:00 since nearly all writes will just write the buffer to the physical location once then address it out of cache after that 2008-10-09 21:00 it might save a get_block trip into the filesystem only for a rewrite 2008-10-09 21:01 the 9 oclock horn just sounded 2008-10-09 21:01 this was a pretty dry one today, no? 2008-10-09 21:01 but important 2008-10-09 21:01 seemed pretty hard core 2008-10-09 21:01 this little corner of the kernel will be visited frequently by anybody doing filesystem work 2008-10-09 21:01 yup 2008-10-09 21:02 ok, on tuesday we're going to get much more hard core 2008-10-09 21:02 is a lot of the complication around here cruft? or is it actually needed for performance and/or edge cases? 2008-10-09 21:02 because we want to use this buffer+page mechanism in ways it was not necessarily designed for 2008-10-09 21:02 major cruft, yes 2008-10-09 21:03 I hope to make it obsolete 2008-10-09 21:03 in due course 2008-10-09 21:03 but we're going to have to work with it for now 2008-10-09 21:03 changing core kernel to merge tux3 isn't really wise 2008-10-09 21:04 questions? 2008-10-09 21:04 otherwise my little girl wants to try out the new video game 2008-10-09 21:04 ACTION doesn't have any :( 2008-10-09 21:04 aaa... what video game is it? :P 2008-10-09 21:04 razvanm, try to read through some more of the block_ functions in buffer.c 2008-10-09 21:04 ACTION doesn't really know how to ask intelligent questions... 2008-10-09 21:04 bioshock 2008-10-09 21:05 my first free time will be after 22 :( 2008-10-09 21:05 I heard about bioshock... 2008-10-09 21:05 oh right 2008-10-09 21:05 where "some" means a little bit 2008-10-09 21:05 it's important to have the layout of the code seeping into you in the background 2008-10-09 21:06 bok 2008-10-09 21:06 a few minutes of looking, then you can go away and let it seep 2008-10-09 21:06 :-) 2008-10-09 21:07 when I get into the nitty gritty of atomic commit, this buffer cache interface gets really important 2008-10-09 21:07 this is where most of the action happens 2008-10-09 21:07 sorry 2008-10-09 21:07 page cache - with buffers attached 2008-10-09 21:07 and we will make it act like a buffer cache, as tux3 uses in user space 2008-10-09 21:07 ok, I'm out 2008-10-09 21:07 have a nice evening 2008-10-09 21:14 -!- RalucaM(~ral@pool-151-196-118-156.balt.east.verizon.net) has left #tux3 2008-10-09 21:42 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-09 22:02 -!- tim_dimm(~timothyhu@cpe-76-90-98-247.socal.res.rr.com) has joined #tux3 2008-10-09 22:47 -!- MaZe(~MaZe@c-24-6-86-168.hsd1.ca.comcast.net) has joined #tux3 2008-10-09 23:01 hello 2008-10-09 23:01 anyone? 2008-10-09 23:48 hey 2008-10-09 23:48 hi pranith 2008-10-09 23:52 shapor: locking that data structure limited it about 2.5 processor scalability 2008-10-09 23:52 it's in old OLS papers if you want to read about it